How do I add a model to run it locally? #851

gabriead · 2023-09-15T07:14:36Z

Dear community,
I have set the repo up and running on my windows machine. However I do not udnerstand how I can add a model to run it locally on my machine. I have downloaded the model 'llama-2-7b-chat.ggmlv3.q8_0.bin' and placed it into the h20GPT root folder. Then in the UI I have selcted it like so:

But if I entered anything in the prompt I get the error on the console:
AssertionError: Please choose a base model with --base_model (CLI) or load in Models Tab (gradio). Then start New Conversation
Can anyone point me into the right direction on how to do this correctly? Thank's a lot!

The text was updated successfully, but these errors were encountered:

pseudotensor · 2023-09-15T07:28:26Z

Did you click the top button? i.e. "Download/Load Model"

We are aware the coloring is not great. It's a limitation of gradio that buttons and info labels are same. We have some work on it: #818

gabriead · 2023-09-15T13:04:16Z

Hi @pseudotensor , jap I did that, but still same error as above

pseudotensor · 2023-09-15T20:36:39Z

I presume it's not finding the file. I haven't had issues. Are you able to diagnose?

natlamir · 2023-09-19T20:12:11Z

I am having a similar issue but with the installation from the One Click Windows GPU CUDA Installer. I can't figure out how to load a model. Do I need to place the .bin file at a specific location? When I select something from the dropdown and click Load-Unload Model / LORA button on the right, I get this error on the top right:

pseudotensor · 2023-09-20T01:20:25Z

@natlamir It seems to have trouble writing some files, probably permissions issue to the disk for where it was installed.

One can debug like this: #652 (comment)

i.e. using python instead of pythonx and running on windows command line termninal.

If you use llama as base_model, then you can provide GGML link from TheBloke. I give details here: https://github.com/h2oai/h2ogpt/blob/main/docs/FAQ.md#adding-models

But your issue is some permission thing, a stack trace from the command line output would help. Thanks!

natlamir · 2023-09-20T02:40:51Z

@pseudotensor Thanks for the fast reply. I tried running it through the command line to get the stack trace, and it works just fine when run through the command line! (I was using a non-elevated command prompt) Previously I was trying to run it by clicking on the icon from the Start menu on my Windows 10, and that is when it was erroring. So now I am able to download and use the model I was trying to.

I tried using the GGML link from TheBloke you mentioned. Let me know if I am missing a step for doing this through the UI.

In the Models tab, on the bottom left textbox titled "New Model name/path/URL" I enter: https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q8_0.bin

Then click on the "Add new Model, Lora, Server url:port" button on the bottom right. This auto-populated the URL in the "Choose Base Model" dropdown at the top with the URL I entered in the textbox. Then I click the "Load-Unload Model/LORA" button on the top right, and it downloads the 7GB file, but then errors. Here is the command line output / stack trace of the error (The file it is referencing in the error in the Temp folder appears to be the model file without an extension. it is 7GB):

C:\Users\root>C:\Users\root\AppData\Local\Programs\h2oGPT\Python\python.exe "C:\Users\root\AppData\Local\Programs\h2oGPT\h2oGPT.launch.pyw"
file: C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\win_run_app.py
path1 C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\libnvvp;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\libnvvp;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\libnvvp;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\libnvvp;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\libnvvp;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\libnvvp;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0;C:\WINDOWS\System32\OpenSSH;C:\Program Files\Git\cmd;C:\Program Files (x86)\WinMerge;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\NVIDIA Corporation\NVIDIA NvDLISR;C:\Program Files\NVIDIA Corporation\Nsight Compute 2021.1.0;C:\Program Files\dotnet;C:\Users\root\AppData\Local\Programs\Python\Python310\Scripts;C:\Users\root\AppData\Local\Programs\Python\Python310;C:\Users\root\AppData\Local\Microsoft\WindowsApps;C:\tools\ffmpeg\bin;C:\Users\root\AppData\Local\ffmpegio\ffmpeg-downloader\ffmpeg\bin;C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\bin\Hostx64\x64;C:\tools;C:\Users\root\AppData\Roaming\Python\Python310\Scripts;C:\Users\root\AppData\Local\Google\Cloud SDK\google-cloud-sdk\bin;C:\Users\root\AppData\Roaming\Python\Python38\Scripts;C:\Users\root.dotnet\tools;C:\MinGW\bin;C:\MinGW\mingw32\bin;;C:\Users\root\AppData\Local\Programs\h2oGPT\poppler/Library/bin/;C:\Users\root\AppData\Local\Programs\h2oGPT\poppler/Library/lib/;C:\Users\root\AppData\Local\Programs\h2oGPT\Tesseract-OCRC:\Users\root\AppData\Local\Programs\h2oGPT\ms-playwrightC:\Users\root\AppData\Local\Programs\h2oGPT\ms-playwright/chromium-1076/chrome-winC:\Users\root\AppData\Local\Programs\h2oGPT\ms-playwright/ffmpeg-1009C:\Users\root\AppData\Local\Programs\h2oGPT\ms-playwright/firefox-1422/firefoxC:\Users\root\AppData\Local\Programs\h2oGPT\ms-playwright/webkit-1883
C:\Users\root\AppData\Local\Programs\h2oGPT..\src
C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs..\src
C:\Users\root\AppData\Local\Programs\h2oGPT..\iterators
C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs..\iterators
C:\Users\root\AppData\Local\Programs\h2oGPT..\gradio_utils
C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs..\gradio_utils
C:\Users\root\AppData\Local\Programs\h2oGPT..\metrics
C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs..\metrics
C:\Users\root\AppData\Local\Programs\h2oGPT..\models
C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs..\models
C:\Users\root\AppData\Local\Programs\h2oGPT...
C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs...
Auto set langchain_mode=LLM. Could use MyData instead. To allow UserData to pull files from disk, set user_path or langchain_mode_paths, and ensure allow_upload_to_user_data=True
Prep: persist_directory=db_dir_UserData exists, using
Prep: persist_directory= does not exist, regenerating
Did not generate db since no sources
Prep: persist_directory= does not exist, regenerating
Did not generate db since no sources
favicon_path1=h2o-logo.svg not found
favicon_path2: h2o-logo.svg not found in C:\Users\root\AppData\Local\Programs\h2oGPT\src
Running on local URL: http://0.0.0.0:7860

To create a public link, set share=True in launch().
Starting get_model: https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q8_0.bin
C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\transformers\utils\hub.py:575: UserWarning: Using from_pretrained with the url of a file (here https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q8_0.bin) is deprecated and won't be possible anymore in v5 of Transformers. You should host your file on the Hub (hf.co) instead and use the repository ID. Note that this is not compatible with the caching system (your file will be downloaded at each execution) or multiple processes (each process will download the file in a different temporary file).
warnings.warn(
Downloading (…)chat.ggmlv3.q8_0.bin: 100%|█████████████████████████████████████████| 7.16G/7.16G [01:07<00:00, 107MB/s]
Traceback (most recent call last):
File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\transformers\configuration_utils.py", line 659, in _get_config_dict
config_dict = cls._dict_from_json_file(resolved_config_file)
File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\transformers\configuration_utils.py", line 750, in _dict_from_json_file
text = reader.read()
File "codecs.py", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 28: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\gradio\routes.py", line 488, in run_predict
output = await app.get_blocks().process_api(
File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\gradio\blocks.py", line 1431, in process_api
result = await self.call_function(
File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\gradio\blocks.py", line 1109, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\anyio\to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\anyio_backends_asyncio.py", line 807, in run
result = context.run(func, *args)
File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\gradio\utils.py", line 706, in wrapper
response = f(*args, **kwargs)
File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs..\src\gradio_runner.py", line 3279, in load_model
model1, tokenizer1, device1 = get_model(reward_type=False,
File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs..\src\gen.py", line 1288, in get_model
config, _, max_seq_len = get_config(base_model, **config_kwargs, raise_exception=False)
File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs..\src\gen.py", line 1007, in get_config
config = AutoConfig.from_pretrained(base_model, use_auth_token=use_auth_token,
File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\transformers\models\auto\configuration_auto.py", line 944, in from_pretrained
config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\transformers\configuration_utils.py", line 574, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\transformers\configuration_utils.py", line 662, in _get_config_dict
raise EnvironmentError(
OSError: It looks like the config file at 'C:\Users\root\AppData\Local\Temp\tmps1rk02tn' is not a valid JSON file.

pseudotensor · 2023-09-20T21:25:26Z

You can't pass GGML model to --base_model. See: https://github.com/h2oai/h2ogpt/blob/main/docs/FAQ.md#adding-models

For GGML, use 'llama' as base_model, then in UI you'll get more options appear. Then put in the url for the model llama name.

pseudotensor · 2023-09-20T21:27:10Z

For general offline, see updates here: https://github.com/h2oai/h2ogpt/blob/main/docs/README_offline.md from #877

chengjia604 · 2023-10-08T17:46:13Z

亲爱的社区，我已经在我的 Windows 机器上设置并运行了存储库。但是，我不明白如何添加模型以在我的计算机上本地运行它。我已下载模型“llama-2-7b-chat.ggmlv3.q8_0.bin”并将其放入 h20GPT 根文件夹中。然后在用户界面中我选择了它，如下所示：

但是，如果我在提示中输入任何内容，我会在控制台上收到错误： AssertionError: Please choose a base model with --base_model (CLI) or load in Models Tab (gradio). Then start New Conversation 任何人都可以为我指出如何正确执行此操作的正确方向吗？多谢！

Hello, has your problem been solved? I also encountered the same problem.

pseudotensor · 2023-10-09T18:30:37Z

The "base_model" is llama for that model. Once you choose "llama" another view will pop-up to enter the llama model path or url. Then you click on the "Download/Load Models" button at top. We'll try to improve the UX.

Blue-newai · 2023-11-18T13:35:34Z

Hi, i am using a ubuntu system in laptop how many times i have tried to install h2o.ai in local through terminal i have been facing lot of issues and i could't get what the mistake happens with system can help me

pseudotensor · 2023-11-19T03:02:55Z

@Blue-newai if you have a problem you should post the issue etc. I've tried to make it easier and easier to install and use.

pseudotensor closed this as completed Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do I add a model to run it locally? #851

How do I add a model to run it locally? #851

gabriead commented Sep 15, 2023 •

edited

Loading

pseudotensor commented Sep 15, 2023 •

edited

Loading

gabriead commented Sep 15, 2023 •

edited

Loading

pseudotensor commented Sep 15, 2023

natlamir commented Sep 19, 2023

pseudotensor commented Sep 20, 2023

natlamir commented Sep 20, 2023

pseudotensor commented Sep 20, 2023 •

edited

Loading

pseudotensor commented Sep 20, 2023 •

edited

Loading

chengjia604 commented Oct 8, 2023

pseudotensor commented Oct 9, 2023

Blue-newai commented Nov 18, 2023

pseudotensor commented Nov 19, 2023

How do I add a model to run it locally? #851

How do I add a model to run it locally? #851

Comments

gabriead commented Sep 15, 2023 • edited Loading

pseudotensor commented Sep 15, 2023 • edited Loading

gabriead commented Sep 15, 2023 • edited Loading

pseudotensor commented Sep 15, 2023

natlamir commented Sep 19, 2023

pseudotensor commented Sep 20, 2023

natlamir commented Sep 20, 2023

pseudotensor commented Sep 20, 2023 • edited Loading

pseudotensor commented Sep 20, 2023 • edited Loading

chengjia604 commented Oct 8, 2023

pseudotensor commented Oct 9, 2023

Blue-newai commented Nov 18, 2023

pseudotensor commented Nov 19, 2023

gabriead commented Sep 15, 2023 •

edited

Loading

pseudotensor commented Sep 15, 2023 •

edited

Loading

gabriead commented Sep 15, 2023 •

edited

Loading

pseudotensor commented Sep 20, 2023 •

edited

Loading

pseudotensor commented Sep 20, 2023 •

edited

Loading