Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I add a model to run it locally? #851

Closed
gabriead opened this issue Sep 15, 2023 · 12 comments
Closed

How do I add a model to run it locally? #851

gabriead opened this issue Sep 15, 2023 · 12 comments

Comments

@gabriead
Copy link

gabriead commented Sep 15, 2023

Dear community,
I have set the repo up and running on my windows machine. However I do not udnerstand how I can add a model to run it locally on my machine. I have downloaded the model 'llama-2-7b-chat.ggmlv3.q8_0.bin' and placed it into the h20GPT root folder. Then in the UI I have selcted it like so:
h20gpt

But if I entered anything in the prompt I get the error on the console:
AssertionError: Please choose a base model with --base_model (CLI) or load in Models Tab (gradio). Then start New Conversation
Can anyone point me into the right direction on how to do this correctly? Thank's a lot!

@pseudotensor
Copy link
Collaborator

pseudotensor commented Sep 15, 2023

Did you click the top button? i.e. "Download/Load Model"

We are aware the coloring is not great. It's a limitation of gradio that buttons and info labels are same. We have some work on it: #818

@gabriead
Copy link
Author

gabriead commented Sep 15, 2023

Hi @pseudotensor , jap I did that, but still same error as above

@pseudotensor
Copy link
Collaborator

I presume it's not finding the file. I haven't had issues. Are you able to diagnose?

@natlamir
Copy link

I am having a similar issue but with the installation from the One Click Windows GPU CUDA Installer. I can't figure out how to load a model. Do I need to place the .bin file at a specific location? When I select something from the dropdown and click Load-Unload Model / LORA button on the right, I get this error on the top right:

image

@pseudotensor
Copy link
Collaborator

@natlamir It seems to have trouble writing some files, probably permissions issue to the disk for where it was installed.

One can debug like this: #652 (comment)

i.e. using python instead of pythonx and running on windows command line termninal.

If you use llama as base_model, then you can provide GGML link from TheBloke. I give details here: https://github.com/h2oai/h2ogpt/blob/main/docs/FAQ.md#adding-models

But your issue is some permission thing, a stack trace from the command line output would help. Thanks!

@natlamir
Copy link

@pseudotensor Thanks for the fast reply. I tried running it through the command line to get the stack trace, and it works just fine when run through the command line! (I was using a non-elevated command prompt) Previously I was trying to run it by clicking on the icon from the Start menu on my Windows 10, and that is when it was erroring. So now I am able to download and use the model I was trying to.

I tried using the GGML link from TheBloke you mentioned. Let me know if I am missing a step for doing this through the UI.

In the Models tab, on the bottom left textbox titled "New Model name/path/URL" I enter: https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q8_0.bin

Then click on the "Add new Model, Lora, Server url:port" button on the bottom right. This auto-populated the URL in the "Choose Base Model" dropdown at the top with the URL I entered in the textbox. Then I click the "Load-Unload Model/LORA" button on the top right, and it downloads the 7GB file, but then errors. Here is the command line output / stack trace of the error (The file it is referencing in the error in the Temp folder appears to be the model file without an extension. it is 7GB):

C:\Users\root>C:\Users\root\AppData\Local\Programs\h2oGPT\Python\python.exe "C:\Users\root\AppData\Local\Programs\h2oGPT\h2oGPT.launch.pyw"
file: C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\win_run_app.py
path1 C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\libnvvp;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\libnvvp;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\libnvvp;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\libnvvp;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\libnvvp;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\libnvvp;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0;C:\WINDOWS\System32\OpenSSH;C:\Program Files\Git\cmd;C:\Program Files (x86)\WinMerge;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\NVIDIA Corporation\NVIDIA NvDLISR;C:\Program Files\NVIDIA Corporation\Nsight Compute 2021.1.0;C:\Program Files\dotnet;C:\Users\root\AppData\Local\Programs\Python\Python310\Scripts;C:\Users\root\AppData\Local\Programs\Python\Python310;C:\Users\root\AppData\Local\Microsoft\WindowsApps;C:\tools\ffmpeg\bin;C:\Users\root\AppData\Local\ffmpegio\ffmpeg-downloader\ffmpeg\bin;C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\bin\Hostx64\x64;C:\tools;C:\Users\root\AppData\Roaming\Python\Python310\Scripts;C:\Users\root\AppData\Local\Google\Cloud SDK\google-cloud-sdk\bin;C:\Users\root\AppData\Roaming\Python\Python38\Scripts;C:\Users\root.dotnet\tools;C:\MinGW\bin;C:\MinGW\mingw32\bin;;C:\Users\root\AppData\Local\Programs\h2oGPT\poppler/Library/bin/;C:\Users\root\AppData\Local\Programs\h2oGPT\poppler/Library/lib/;C:\Users\root\AppData\Local\Programs\h2oGPT\Tesseract-OCRC:\Users\root\AppData\Local\Programs\h2oGPT\ms-playwrightC:\Users\root\AppData\Local\Programs\h2oGPT\ms-playwright/chromium-1076/chrome-winC:\Users\root\AppData\Local\Programs\h2oGPT\ms-playwright/ffmpeg-1009C:\Users\root\AppData\Local\Programs\h2oGPT\ms-playwright/firefox-1422/firefoxC:\Users\root\AppData\Local\Programs\h2oGPT\ms-playwright/webkit-1883
C:\Users\root\AppData\Local\Programs\h2oGPT..\src
C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs..\src
C:\Users\root\AppData\Local\Programs\h2oGPT..\iterators
C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs..\iterators
C:\Users\root\AppData\Local\Programs\h2oGPT..\gradio_utils
C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs..\gradio_utils
C:\Users\root\AppData\Local\Programs\h2oGPT..\metrics
C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs..\metrics
C:\Users\root\AppData\Local\Programs\h2oGPT..\models
C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs..\models
C:\Users\root\AppData\Local\Programs\h2oGPT...
C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs...
Auto set langchain_mode=LLM. Could use MyData instead. To allow UserData to pull files from disk, set user_path or langchain_mode_paths, and ensure allow_upload_to_user_data=True
Prep: persist_directory=db_dir_UserData exists, using
Prep: persist_directory= does not exist, regenerating
Did not generate db since no sources
Prep: persist_directory= does not exist, regenerating
Did not generate db since no sources
favicon_path1=h2o-logo.svg not found
favicon_path2: h2o-logo.svg not found in C:\Users\root\AppData\Local\Programs\h2oGPT\src
Running on local URL: http://0.0.0.0:7860

To create a public link, set share=True in launch().
Starting get_model: https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q8_0.bin
C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\transformers\utils\hub.py:575: UserWarning: Using from_pretrained with the url of a file (here https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q8_0.bin) is deprecated and won't be possible anymore in v5 of Transformers. You should host your file on the Hub (hf.co) instead and use the repository ID. Note that this is not compatible with the caching system (your file will be downloaded at each execution) or multiple processes (each process will download the file in a different temporary file).
warnings.warn(
Downloading (…)chat.ggmlv3.q8_0.bin: 100%|█████████████████████████████████████████| 7.16G/7.16G [01:07<00:00, 107MB/s]
Traceback (most recent call last):
File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\transformers\configuration_utils.py", line 659, in _get_config_dict
config_dict = cls._dict_from_json_file(resolved_config_file)
File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\transformers\configuration_utils.py", line 750, in _dict_from_json_file
text = reader.read()
File "codecs.py", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 28: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\gradio\routes.py", line 488, in run_predict
output = await app.get_blocks().process_api(
File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\gradio\blocks.py", line 1431, in process_api
result = await self.call_function(
File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\gradio\blocks.py", line 1109, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\anyio\to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\anyio_backends_asyncio.py", line 807, in run
result = context.run(func, *args)
File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\gradio\utils.py", line 706, in wrapper
response = f(*args, **kwargs)
File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs..\src\gradio_runner.py", line 3279, in load_model
model1, tokenizer1, device1 = get_model(reward_type=False,
File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs..\src\gen.py", line 1288, in get_model
config, _, max_seq_len = get_config(base_model, **config_kwargs, raise_exception=False)
File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs..\src\gen.py", line 1007, in get_config
config = AutoConfig.from_pretrained(base_model, use_auth_token=use_auth_token,
File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\transformers\models\auto\configuration_auto.py", line 944, in from_pretrained
config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\transformers\configuration_utils.py", line 574, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\transformers\configuration_utils.py", line 662, in _get_config_dict
raise EnvironmentError(
OSError: It looks like the config file at 'C:\Users\root\AppData\Local\Temp\tmps1rk02tn' is not a valid JSON file.

@pseudotensor
Copy link
Collaborator

pseudotensor commented Sep 20, 2023

You can't pass GGML model to --base_model. See: https://github.com/h2oai/h2ogpt/blob/main/docs/FAQ.md#adding-models

For GGML, use 'llama' as base_model, then in UI you'll get more options appear. Then put in the url for the model llama name.

@pseudotensor
Copy link
Collaborator

pseudotensor commented Sep 20, 2023

For general offline, see updates here: https://github.com/h2oai/h2ogpt/blob/main/docs/README_offline.md from #877

@chengjia604
Copy link

亲爱的社区, 我已经在我的 Windows 机器上设置并运行了存储库。但是,我不明白如何添加模型以在我的计算机上本地运行它。我已下载模型“llama-2-7b-chat.ggmlv3.q8_0.bin”并将其放入 h20GPT 根文件夹中。然后在用户界面中我选择了它,如下所示: h20gpt

但是,如果我在提示中输入任何内容,我会在控制台上收到错误: AssertionError: Please choose a base model with --base_model (CLI) or load in Models Tab (gradio). Then start New Conversation 任何人都可以为我指出如何正确执行此操作的正确方向吗?多谢!

Hello, has your problem been solved? I also encountered the same problem.

@pseudotensor
Copy link
Collaborator

The "base_model" is llama for that model. Once you choose "llama" another view will pop-up to enter the llama model path or url. Then you click on the "Download/Load Models" button at top. We'll try to improve the UX.

@Blue-newai
Copy link

Hi, i am using a ubuntu system in laptop how many times i have tried to install h2o.ai in local through terminal i have been facing lot of issues and i could't get what the mistake happens with system can help me

@pseudotensor
Copy link
Collaborator

@Blue-newai if you have a problem you should post the issue etc. I've tried to make it easier and easier to install and use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants