-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory to store models of closed browser sessions persisting. #1324
Comments
Are you referring to CPU or GPU memory? Thanks! |
@pseudotensor Both. |
Hi @ml-l Thanks for finding. Some clean-up while back led to issue. I pushed fix for case of loading new model or unloading model leading to memory still being consumed. Does that solve your problem? Sorry for the long delay in fix. |
No worries regarding the delay. |
I'm confident the continued GPU use is fixed. I confirmed it was there and that I fixed it. As for still using CPU, I also saw that was fixed. If you give me a specific sequence of what you are doing I can take a look. |
It could very well be with how I've configured things or maybe where/how I'm checking might not align with where you've put in the fix(?) My sequence of actions running in GPU mode are as follows: 1: removed my current gcr.io/vorvan/h2oai/h2ogpt-runtime:latest docker image to ensure that the latest one is downloaded.
3: ran the following 2 commands to run h2ogpt in GPU mode and to be able to choose models dynamically export GRADIO_SERVER_PORT=7860
sudo docker run
--gpus device=0 \
--runtime=nvidia \
--shm-size=2g \
-p $GRADIO_SERVER_PORT:$GRADIO_SERVER_PORT \
--rm --init \
-v /etc/passwd:/etc/passwd:ro \
-v /etc/group:/etc/group:ro \
-u `id -u`:`id -g` \
-v /mnt/alpha/.cache:/workspace/.cache \
-v /mnt/alpha/h2ogpt_share/save:/workspace/save \
-v /mnt/alpha/h2ogpt_share/user_path:/workspace/user_path \
-v /mnt/alpha/h2ogpt_share/db_dir_UserData:/workspace/db_dir_UserData \
-v /mnt/alpha/h2ogpt_share/users:/workspace/users \
-v /mnt/alpha/h2ogpt_share/db_nonusers:/workspace/db_nonusers \
-v /mnt/alpha/h2ogpt_share/llamacpp_path:/workspace/llamacpp_path \
-v /mnt/alpha/h2ogpt_share/h2ogpt_auth:/workspace/h2ogpt_auth \
-e USER=someone \
gcr.io/vorvan/h2oai/h2ogpt-runtime:latest /workspace/generate.py \
--use_safetensors=True \
--save_dir='/workspace/save/' \
--use_gpu_id=False \
--user_path=/workspace/user_path \
--langchain_mode="LLM" \
--langchain_modes="['UserData', 'LLM']" \
--score_model=None \
--max_max_new_tokens=2048 \
--max_new_tokens=1024 At this point, idle GPU0 usage is 4: Open an incognito/private browser (in my case Firefox but I don't think this should matter) to hosted h2oGPT instance. 5: Opened
6: Clicked 7: Closed the browser that's connected to h2ogpt. GPU0 usage remains 8: Re-opened browser in incognito/private mode again to go to hosted h2ogpt again. GPU0 usage remains 9: Repeated step 5 to load in another zephyr model to see if the fix was preventing multiple copies of the same model being loaded. GPU0 usage is now 10: Clicking 11: Closing the browser session and checking again the GPU0 usage remains to be Only until I stop the docker container that's running h2oGPT that GPU0 memory usages goes back to And double checking the hash of the docker image I'm using, the output of sudo docker inspect --format='{{index .RepoDigests 0}}' gcr.io/vorvan/h2oai/h2ogpt-runtime:latest is the following:
EDIT: Step 9 should've said repeat steps 5 and 6 rather than 3. i.e. load in another zephyr model (to check whether the fix was through preventing extra memory being allocated for the same model being loaded in) rather than spinning up another Docker container. |
Ah yes, if you do the step:
The server loses who you are and the model is associated with that prior user. The problem is this: gradio-app/gradio#4016 I'm unsure how to work around. |
Ok, I'm following along. First I have:
|
I modified gradio to be able to do this now. |
Fixes #1324 -- clear memory when browser tab closes
When running h2oGPT through Docker (
gcr.io/vorvan/h2oai/h2ogpt-runtime:0.1.0
) without pre-selecting a base model in order to dynamically choose models in upon connecting to the instance through the browser session, memory allocated/used for any models loaded in that session remain allocated/used and inaccessible after the browser session is closed without unloading prior to closing.The text was updated successfully, but these errors were encountered: