-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Open
Description
Hello
I am running in the following machine.
CPU: 12th Gen Intel(R) Core(TM) i7-12700
RAM: 32GB, speed: 4400MT/s
NVIDIA RTX A2000 12GB
model is:
llama-2-7b-chat.Q6_K.gguf
And it takes around 2 min to start giving a response.
is it reasonable or it should be faster?
bat command to start the bot
"C:\Users\Public\pyenv-win\pyenv-win\bin\.h2o\Scripts\python.exe"^
"generate.py"^
--share=False ^
--auth=[('jon','password')] ^
--auth_access=closed ^
--gradio_offline_level=1 ^
--base_model="llama" ^
--prompt_type=llama2 ^
--model_path_llama=C:\Users\Public\git\h2ogpt\llama-2-7b-chat.Q6_K.gguf^
--score_model=None ^
--langchain_mode="LLLM" ^
--user_path=user_path ^
--load_4bit=True ^
--llamacpp_dict="{'n_gpu_layers':5}"
While running idle
it is taking 7GB GPU memory (remains same when running the query)
24.4GB RAM (remains same when running the query)
CPU utilization stays 2 to 3%
When running the query CPU utilization goes closer to 100%
GPU remains 1% to 2%
and it takes around 2 min to start giving a response.
It seems it is not utilizing GPU at all.
could you please see what i am doing wrong here?
I want to get faster response
cuda version is
C:\Windows\System32>nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:41:10_Pacific_Daylight_Time_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
below is my pip list
Package Version
---------------------------------------- ---------------
absl-py 2.1.0
accelerate 0.32.1
aiofiles 23.2.1
aiohappyeyeballs 2.4.3
aiohttp 3.10.9
aiosignal 1.3.1
altair 5.4.1
annotated-types 0.7.0
anthropic 0.8.1
antlr4-python3-runtime 4.9.3
anyio 4.6.0
appdirs 1.4.4
APScheduler 3.10.4
argcomplete 3.5.1
arxiv 1.4.8
asgiref 3.8.1
async-timeout 4.0.3
attributedict 0.3.0
attrs 24.2.0
audioread 3.0.1
Authlib 1.3.1
auto-gptq 0.6.0
autoawq 0.1.8+cu118
autoawq_kernels 0.0.3+cu118
babel 2.16.0
backoff 2.2.1
backports.tarfile 1.2.0
bcrypt 4.2.0
beautifulsoup4 4.12.3
bioc 2.1
bitsandbytes 0.41.1
blessings 1.7
boto3 1.35.35
botocore 1.35.35
Brotli 1.1.0
bs4 0.0.2
build 1.2.2.post1
cachetools 5.5.0
certifi 2024.8.30
cffi 1.17.1
chardet 5.2.0
charset-normalizer 3.3.2
chroma-bullet 2.2.0
chroma-hnswlib 0.7.3
chroma-migrate 0.0.7
chromadb 0.4.23
chromamigdb 0.3.26
click 8.1.7
clickhouse-connect 0.6.6
codecov 2.1.13
colorama 0.4.6
coloredlogs 15.0.1
colour-runner 0.1.1
contourpy 1.3.0
coverage 7.6.1
cryptography 43.0.1
cssselect2 0.7.0
cutlet 0.3.0
cycler 0.12.1
dacite 1.7.0
dataclasses-json 0.6.7
DataProperty 1.0.1
datasets 2.16.1
dateparser 1.1.8
decorator 5.1.1
deepdiff 8.0.1
defusedxml 0.7.1
Deprecated 1.2.14
diffusers 0.24.0
dill 0.3.7
diskcache 5.6.3
distlib 0.3.8
distro 1.9.0
dnspython 2.7.0
docopt 0.6.2
docutils 0.20.1
duckdb 0.7.1
duckduckgo_search 6.3.0
durationpy 0.9
effdet 0.4.1
einops 0.8.0
emoji 2.14.0
et-xmlfile 1.1.0
eval_type_backport 0.2.0
evaluate 0.4.0
exceptiongroup 1.2.2
execnet 2.1.1
exllama 0.0.18+cu118
fastapi 0.115.0
feedparser 6.0.11
ffmpeg 1.4
ffmpy 0.4.0
fiftyone 1.0.0
fiftyone-brain 0.17.0
fiftyone_db 1.1.6
filelock 3.16.1
filetype 1.2.0
fire 0.5.0
flatbuffers 24.3.25
fonttools 4.54.1
frozenlist 1.4.1
fsspec 2023.10.0
ftfy 6.2.3
fugashi 1.3.2
future 1.0.0
g2pkk 0.1.2
gekko 1.2.1
glob2 0.7
google-ai-generativelanguage 0.4.0
google-api-core 2.20.0
google-auth 2.35.0
google-generativeai 0.3.2
google_search_results 2.4.2
googleapis-common-protos 1.65.0
gpt4all 1.0.5
gradio 3.50.2
gradio_client 0.6.1
gradio_pdf 0.0.15
gradio_tools 0.0.9
graphql-core 3.2.4
greenlet 3.0.3
grpcio 1.66.2
grpcio-health-checking 1.62.3
grpcio-status 1.62.3
grpcio-tools 1.62.3
gruut 2.2.3
gruut-ipa 0.13.0
gruut-lang-de 2.0.1
gruut-lang-en 2.0.1
gruut-lang-es 2.0.1
gruut_lang_fr 2.0.2
h11 0.14.0
h2 4.1.0
h5py 3.12.1
hf_transfer 0.1.8
hnswlib 0.8.0
hnswmiglib 0.7.0
hpack 4.0.0
html2text 2024.2.26
html5lib 1.1
httpcore 1.0.6
httptools 0.6.1
httpx 0.27.0
huggingface-hub 0.25.1
humanfriendly 10.0
humanize 4.11.0
Hypercorn 0.17.3
hyperframe 6.0.1
idna 3.10
imageio 2.35.1
importlib_metadata 8.4.0
importlib_resources 6.4.5
imutils 0.5.4
inflate64 1.0.0
iniconfig 2.0.0
inspecta 0.1.3
InstructorEmbedding 1.0.1
intervaltree 3.1.0
iopath 0.1.10
jaconv 0.4.0
jamo 0.4.1
jaraco.context 6.0.1
jieba 0.42.1
Jinja2 3.1.4
jiter 0.6.1
jmespath 1.0.1
joblib 1.4.2
jsonlines 1.2.0
jsonpatch 1.33
jsonpath-python 1.0.6
jsonpointer 3.0.0
jsonschema 4.23.0
jsonschema-specifications 2024.10.1
kaleido 0.2.1
kiwisolver 1.4.7
kubernetes 31.0.0
langchain 0.0.354
langchain-community 0.0.8
langchain-core 0.1.6
langchain-experimental 0.0.47
langchain-google-genai 0.0.6
langchain-mistralai 0.0.2
langdetect 1.0.9
langid 1.1.6
langsmith 0.0.77
layoutparser 0.3.4
lazy_loader 0.4
librosa 0.10.1
llama_cpp_python 0.2.26+cpuavx2
llama_cpp_python_cuda 0.2.26+cu121avx
llvmlite 0.43.0
lm-dataformat 0.0.20
lm_eval 0.4.4
loralib 0.1.2
lxml 5.3.0
lz4 4.3.3
Markdown 3.7
markdown-it-py 3.0.0
MarkupSafe 2.1.5
marshmallow 3.22.0
matplotlib 3.9.2
mbstrdecoder 1.1.3
mdurl 0.1.2
mistralai 0.0.8
mmh3 5.0.1
mojimoji 0.0.13
mongoengine 0.24.2
monotonic 1.6
more-itertools 10.5.0
motor 3.5.3
mplcursors 0.5.3
mpmath 1.3.0
msg-parser 1.2.0
msgpack 1.1.0
multidict 6.1.0
multiprocess 0.70.15
multivolumefile 0.2.3
mutagen 1.47.0
mypy-extensions 1.0.0
narwhals 1.9.1
nest-asyncio 1.6.0
networkx 2.8.8
nltk 3.9.1
num2words 0.5.13
numba 0.60.0
numexpr 2.10.1
numpy 1.23.4
oauthlib 3.2.2
olefile 0.47
omegaconf 2.3.0
onnx 1.17.0
onnxruntime 1.15.1
onnxruntime-gpu 1.15.0
openai 1.51.2
opencv-python 4.10.0.84
opencv-python-headless 4.10.0.84
openpyxl 3.1.5
opentelemetry-api 1.27.0
opentelemetry-exporter-otlp-proto-common 1.27.0
opentelemetry-exporter-otlp-proto-grpc 1.27.0
opentelemetry-instrumentation 0.48b0
opentelemetry-instrumentation-asgi 0.48b0
opentelemetry-instrumentation-fastapi 0.48b0
opentelemetry-proto 1.27.0
opentelemetry-sdk 1.27.0
opentelemetry-semantic-conventions 0.48b0
opentelemetry-util-http 0.48b0
openvino 2022.3.0
optimum 1.16.1
orderly-set 5.2.2
orjson 3.10.7
outcome 1.3.0.post0
overrides 7.7.0
packaging 24.1
pandas 2.0.2
pathvalidate 3.2.1
pdf2image 1.17.0
pdfminer.six 20221105
pdfplumber 0.10.4
peft 0.13.1
pikepdf 9.3.0
pillow 10.4.0
pillow_heif 0.18.0
pip 23.0.1
pip-licenses 5.0.0
platformdirs 4.3.6
playwright 1.47.0
plotly 5.24.1
pluggy 1.5.0
pooch 1.8.2
portalocker 2.10.1
posthog 3.7.0
pprintpp 0.4.0
prettytable 3.11.0
primp 0.6.3
priority 2.0.0
propcache 0.2.0
proto-plus 1.24.0
protobuf 4.25.5
psutil 6.0.0
pulsar-client 3.5.0
py7zr 0.22.0
pyarrow 17.0.0
pyarrow-hotfix 0.6
pyasn1 0.6.1
pyasn1_modules 0.4.1
pybcj 1.0.2
pybind11 2.13.6
pyclipper 1.3.0.post5
pycocotools 2.0.8
pycparser 2.22
pycryptodomex 3.21.0
pydantic 2.9.2
pydantic_core 2.23.4
pydantic-settings 2.1.0
pydash 8.0.3
pydub 0.25.1
pydyf 0.11.0
pyee 12.0.0
Pygments 2.18.0
pymongo 4.8.0
PyMuPDF 1.24.11
pynvml 11.5.3
pypandoc 1.14
pypandoc_binary 1.14
pyparsing 3.1.4
pypdf 5.0.1
pypdfium2 4.30.0
pyphen 0.16.0
PyPika 0.48.9
pyppmd 1.1.0
pyproject-api 1.8.0
pyproject_hooks 1.2.0
pyreadline3 3.5.4
PySocks 1.7.1
pytablewriter 1.2.0
pytesseract 0.3.13
pytest 8.3.3
pytest-xdist 3.6.1
python-crfsuite 0.9.11
python-dateutil 2.8.2
python-doctr 0.5.4a0
python-docx 1.1.2
python-dotenv 1.0.1
python-iso639 2024.4.27
python-magic 0.4.27
python-magic-bin 0.4.14
python-multipart 0.0.12
python-pptx 0.6.23
pytube 15.0.0
pytz 2024.2
pywin32 307
PyYAML 6.0.2
pyzstd 0.16.1
RapidFuzz 3.10.0
rarfile 4.2
referencing 0.35.1
regex 2024.9.11
replicate 0.20.0
requests 2.32.3
requests-file 2.1.0
requests-oauthlib 2.0.0
requests-toolbelt 1.0.0
responses 0.18.0
retrying 1.3.4
rich 13.9.2
rootpath 0.1.1
rouge 1.0.1
rouge_score 0.1.2
rpds-py 0.20.0
rsa 4.9
ruff 0.6.9
s3transfer 0.10.2
sacrebleu 2.3.1
safetensors 0.4.5
scikit-image 0.24.0
scikit-learn 1.2.2
scipy 1.13.1
selenium 4.25.0
semantic-version 2.10.0
semanticscholar 0.8.4
sentence-transformers 2.2.2
sentencepiece 0.1.99
setuptools 65.5.0
sgmllib3k 1.0.0
Shapely 1.8.5.post1
shellingham 1.5.4
six 1.16.0
sniffio 1.3.1
sortedcontainers 2.4.0
soundfile 0.12.1
soupsieve 2.6
soxr 0.5.0.post1
SQLAlchemy 2.0.35
sqlitedict 2.1.0
sse-starlette 0.10.3
sseclient-py 1.8.0
starlette 0.38.6
strawberry-graphql 0.246.0
sympy 1.13.3
tabledata 1.3.3
tabulate 0.9.0
taskgroup 0.0.0a4
tcolorpy 0.1.6
tenacity 8.5.0
termcolor 2.5.0
text-generation 0.7.0
textstat 0.7.4
texttable 1.7.0
threadpoolctl 3.5.0
tifffile 2024.9.20
tiktoken 0.8.0
timm 1.0.9
tinycss2 1.3.0
tokenizers 0.19.1
toml 0.10.2
tomli 2.0.2
tomlkit 0.12.0
torch 2.1.2+cu118
torchvision 0.16.2+cu118
tox 4.21.2
tqdm 4.66.5
tqdm-multiprocess 0.0.11
transformers 4.40.2
trio 0.26.2
trio-websocket 0.11.1
typepy 1.3.2
typer 0.12.5
typing_extensions 4.12.2
typing-inspect 0.9.0
tzdata 2024.2
tzlocal 5.2
ujson 5.10.0
Unidecode 1.3.8
universal-analytics-python3 1.1.1
unstructured 0.12.5
unstructured-client 0.26.0
unstructured-inference 0.7.23
unstructured.pytesseract 0.3.13
urllib3 2.2.3
uvicorn 0.31.0
validators 0.34.0
virtualenv 20.26.6
voxel51-eta 0.13.0
watchfiles 0.24.0
wavio 0.0.8
wcwidth 0.2.13
weasyprint 62.3
weaviate-client 4.8.1
webencodings 0.5.1
websocket-client 1.8.0
websockets 11.0.3
wikipedia 1.4.0
wolframalpha 5.1.3
word2number 1.1
wrapt 1.16.0
wsproto 1.2.0
xlrd 2.0.1
XlsxWriter 3.2.0
xmltodict 0.13.0
xxhash 3.5.0
yarl 1.14.0
yt-dlp 2023.10.13
zipp 3.20.2
zopfli 0.2.3
zstandard 0.23.0
Metadata
Metadata
Assignees
Labels
No labels