Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

large embedded file fails on model create #501

Closed
BruceMacD opened this issue Sep 9, 2023 · 5 comments
Closed

large embedded file fails on model create #501

BruceMacD opened this issue Sep 9, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@BruceMacD
Copy link
Contributor

Adding a large file to an embedding may cause an unexpected error.

ollama crate exampleModel -f Modelfile
...
Error: unexpected end to create model
FROM codellama

SYSTEM """
You are a DND game master that reviews dice rolls and responds with JSON  in the following format: "{\"action\":\"do stuff\"}"
"""

EMBED embeds/*.txt
 2% || (4367/151236, 31 it/s) [4m59s:1h19m37s]creating model system layer

There shouldn’t be a limit. The buffer size may be reaching its capacity.

@BruceMacD BruceMacD added the bug Something isn't working label Sep 9, 2023
@fmackenzie
Copy link

fmackenzie commented Sep 28, 2023

I'm also experiencing this. I've tried a number of things, including breaking down the text files into smaller chunks with overlapping chunks, updating the num_ctx value (to 4096, 8192, and 16384). I've tried this on a VM with 32GB of RAM, and bare metal with 64GB of RAM, both on Ubuntu linux. All have the same outcome, so I am wondering if it is tied to the total amount of content, not the size of a specific file. The experience is consistent when specifying each file individually instead of specifying the entire folder.

It's interesting. I see that when Ollama is started up, there are 5 handlers for the EmbeddingHandler:

[GIN-debug] POST   /api/embeddings           --> github.com/jmorganca/ollama/server.EmbeddingHandler (5 handlers)

When it is doing the model creation, I can see that it uses the 5 handlers on a specific port, then as it continues, it switches to a different port (almost as though it isn't closing the port and has to get a new port for the embedding), and eventually it it gets to a port where the ollama serve process just crashes (see below):

{"timestamp":1695985330,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":50080,"status":200,"method":"POST","path":"/embedding","params":{}}
{"timestamp":1695985335,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":50080,"status":200,"method":"POST","path":"/embedding","params":{}}
{"timestamp":1695985341,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":50080,"status":200,"method":"POST","path":"/embedding","params":{}}
{"timestamp":1695985348,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":50080,"status":200,"method":"POST","path":"/embedding","params":{}}
{"timestamp":1695985354,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":50080,"status":200,"method":"POST","path":"/embedding","params":{}}
{"timestamp":1695985361,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":35734,"status":200,"method":"POST","path":"/embedding","params":{}}
{"timestamp":1695985367,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":35734,"status":200,"method":"POST","path":"/embedding","params":{}}
{"timestamp":1695985374,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":35734,"status":200,"method":"POST","path":"/embedding","params":{}}
{"timestamp":1695985381,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":35734,"status":200,"method":"POST","path":"/embedding","params":{}}
{"timestamp":1695985381,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":35734,"status":200,"method":"POST","path":"/embedding","params":{}}
{"timestamp":1695985388,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":37356,"status":200,"method":"POST","path":"/embedding","params":{}}
2023/09/29 07:03:13 images.go:662: failed to generate embedding for '/data/git/ollama/data/test1/test_doc_p00004.txt' line 4: POST embedding: Post "http://127.0.0.1:58936/embedding": EOF
2023/09/29 07:03:13 llama.go:320: llama runner exited with error: signal: killed
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0xb1f748]

goroutine 53 [running]:
github.com/jmorganca/ollama/server.embeddingLayers({0xc00033c618, 0x14}, {{0xc000028280, 0x6}, 0xc00009f080, {0xc000076e40, 0x1, 0x1}, 0xc000076b70})
	/data/git/ollama/server/images.go:660 +0xfa8
github.com/jmorganca/ollama/server.CreateModel({0x1070850, 0xc0000921e0}, {0xc00033c618, 0x14}, {0xc0000281e0, 0xc}, {0xc00002e200, 0x38}, 0xc000076b70)
	/data/git/ollama/server/images.go:527 +0x2135
github.com/jmorganca/ollama/server.CreateModelHandler.func1()
	/data/git/ollama/server/routes.go:358 +0x151
created by github.com/jmorganca/ollama/server.CreateModelHandler
	/data/git/ollama/server/routes.go:349 +0x23d

@chronicblondiee
Copy link

chronicblondiee commented Oct 8, 2023

I am also running into the same issue Ryzen 5900X 64 RAM I have even tried spiting the large file into various different smaller files still seems to fail at some point?

I am also looking to make something DnD related and tried to import the rules as a txt file dataset but haven't been able to make it work..

below two different runs with two versions of the DnD rule set one with new lines the other with all new lines stripped I thought it would help didn't seem to make a difference

ollama create dnd-gen -f ./Modelfile
parsing modelfile    
looking for model    
creating model template layer    
creating model system layer    
creating parameter layer    
creating embeddings for file /tmp/llm-model-stuff/mistral/data/DnD_Basic_rules.txt   2% |███                                                                                                                                                                                            | (892/36409, 3 it/s) [5m0s:3h46m51s]creating parameter layer

ollama create dnd-gen -f ./Modelfile
parsing modelfile    
looking for model    
creating model template layer    
creating model system layer    
creating parameter layer    
creating embeddings for file /tmp/llm-model-stuff/mistral/DnD_BasicRules_2018.txt   3% |█████                                                                                                                                                                                           | (591/18119, 2 it/s) [5m0s:2h53m35s]creating parameter layer
Error: unexpected end to create model

@vividfog
Copy link

vividfog commented Oct 9, 2023

+1 same issue and symptoms as above. M1 Max 32 GB. The only workaround I've found is to not do large embedding runs for now.

Edit: An OK workaround is to use multiple EMBED lines, in which no individual batch is too large to work out well.

Ollama will find if embeddings already exist for an EMBED line. It's then possible to ollama create my_rag_model -f my_rag_model.Modelfile again and again, each time with one more EMBEDs pointing to new content as time goes on. Only the new content is processed, old content is reused.

A real fix would of course be nice! How to troubleshoot further?

@chronicblondiee
Copy link

chronicblondiee commented Oct 9, 2023

@vividfog I am trying your method with my data I used split -l 600 mydata.txt to split by line and just going to try run through each of the EMBED Layers.

I will note that the max number of lines I could input per EMBED txt varied in my testing anything above 800 seemed to be unstable at least on my system. Open SUSE Tumbleweed 12core Ryzen 9 5900x 64GB RAM

Edit: I got it working using @vividfog method I am going to try write automate this with some ansible maybe until there is a proper fix if I get a good solution working in an automated way I will post the solution here!

Edit2: https://github.com/jmorganca/ollama/tree/main/examples/langchain-document @vividfog @BruceMacD @fmackenzie that example is using langchain and a vector store to store all the embeddings locally much better way of going about for loading in large datasets been working for my use case !

@BruceMacD
Copy link
Contributor Author

Closing this for now as we removed this feature for the time being.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants