-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] 显存不断重新加载 并且解码速度很慢 #155
Comments
We recommend using English or English & Chinese for issues so that we could have broader discussion. |
感谢你的报告,我们后续会改进这里的体验。我们正在考虑优化一下小模型的切分方案,或者从根本上启动一个维护模型的进程,不用每个task都进行多次加载,释放。 |
好的 期待你们新的版本 |
那再请问一下 我有什么设置可以加速这个模型的解码吗 目前看来八块a100要评测完一个小数据集合 还是会耗费大量的时间 |
这个一般都是模型本身以及huggingface需要考虑的问题,我们主要专注于评测部分 但是根据我们的经验,原生的llama会快于使用huggingface接口。 |
我对huggingface解码不是特别了解 请问现在是八块卡分别加载完整的模型然后分隔数据集解码吗? 是什么导致了模型频繁开辟申请呢 batchsize的设定是否生效 因为我观测到的解码速度和我之前单卡测llama-7b的速度相差很大 |
是的,
默认的任务切分参数主要考虑的 100+B模型的推理,就是task的分片太小,导致频繁实例化,加载,推理,释放。具体可参考size 调整一下分片大小。 batchsize 是生效的,但生成慢是因为 1. GPT 这类模型,预测 N 个次,就是要推理N次,生成任务就会慢。2. huggingface 的接口会慢于原生llama,这个需要查huggingface的代码看原因。 如果你实验的有任何改进,希望反馈给我们,我们会进一步改进opencompass的用户体验。 |
好的感谢 |
Feel free to re-open this issue if needed. |
描述该错误
使用llama_7B_hf 这个模型进行解码评测 观测nvidia-smi 里面显卡的使用会不断有显存开辟释放 八块a100 两个小时只解了300条 速度很慢 求助是否有设置上的问题
下图是模型设置
显卡观测
环境信息
{'CUDA available': True,
'CUDA_HOME': '/home/work/cuda-11.3',
'GCC': 'gcc (GCC) 8.2.0',
'GPU 0,1,2,3,4,5,6,7': 'NVIDIA A100-SXM4-40GB',
'MMEngine': '0.8.2',
'NVCC': 'Cuda compilation tools, release 11.3, V11.3.58',
'OpenCV': '4.8.0',
'PyTorch': '2.0.1',
'PyTorch compiling details': 'PyTorch built with:\n'
' - GCC 9.3\n'
' - C++ Version: 201703\n'
' - Intel(R) oneAPI Math Kernel Library Version '
'2023.1-Product Build 20230303 for Intel(R) 64 '
'architecture applications\n'
' - Intel(R) MKL-DNN v2.7.3 (Git Hash '
'6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)\n'
' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n'
' - LAPACK is enabled (usually provided by '
'MKL)\n'
' - NNPACK is enabled\n'
' - CPU capability usage: AVX2\n'
' - CUDA Runtime 11.8\n'
' - NVCC architecture flags: '
'-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_37,code=compute_37\n'
' - CuDNN 8.7\n'
' - Magma 2.6.1\n'
' - Build settings: BLAS_INFO=mkl, '
'BUILD_TYPE=Release, CUDA_VERSION=11.8, '
'CUDNN_VERSION=8.7.0, '
'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, '
'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 '
'-fabi-version=11 -Wno-deprecated '
'-fvisibility-inlines-hidden -DUSE_PTHREADPOOL '
'-DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER '
'-DUSE_FBGEMM -DUSE_QNNPACK '
'-DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK '
'-DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC '
'-Wall -Wextra -Werror=return-type '
'-Werror=non-virtual-dtor -Werror=bool-operation '
'-Wnarrowing -Wno-missing-field-initializers '
'-Wno-type-limits -Wno-array-bounds '
'-Wno-unknown-pragmas -Wunused-local-typedefs '
'-Wno-unused-parameter -Wno-unused-function '
'-Wno-unused-result -Wno-strict-overflow '
'-Wno-strict-aliasing '
'-Wno-error=deprecated-declarations '
'-Wno-stringop-overflow -Wno-psabi '
'-Wno-error=pedantic -Wno-error=redundant-decls '
'-Wno-error=old-style-cast '
'-fdiagnostics-color=always -faligned-new '
'-Wno-unused-but-set-variable '
'-Wno-maybe-uninitialized -fno-math-errno '
'-fno-trapping-math -Werror=format '
'-Werror=cast-function-type '
'-Wno-stringop-overflow, LAPACK_INFO=mkl, '
'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, '
'PERF_WITH_AVX512=1, '
'TORCH_DISABLE_GPU_ASSERTS=ON, '
'TORCH_VERSION=2.0.1, USE_CUDA=ON, USE_CUDNN=ON, '
'USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, '
'USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, '
'USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, '
'USE_OPENMP=ON, USE_ROCM=OFF, \n',
'Python': '3.10.12 (main, Jul 5 2023, 18:54:27) [GCC 11.2.0]',
'TorchVision': '0.15.2',
'numpy_random_seed': 2147483648,
'opencompass': '0.1.0+4b0aa80',
'sys.platform': 'linux'}
其他信息
No response
The text was updated successfully, but these errors were encountered: