-
Notifications
You must be signed in to change notification settings - Fork 603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix various issues with subindex reloading #618
Comments
Nothing with the configuration looks out of the ordinary. Can you confirm you can write to the storage using another process/program? For example, a test python script that just writes some files to the external storage mount. |
All seems well on that front. I was able to exec into the pod and...create a file through bash, create a directory through bash, create a file through python, create a directory through python. And all files were showing in external storage as expected. Not sure it helps, but in case you want to try to reproduce, this can also be replicated with Azure Container Apps, which is much easier to setup than an AKS cluster. |
Hard to understand what the issue could be. If this is presented as a regular file volume perhaps Faiss or SQLite do some sort of file operation that the filesystem doesn't support. You can try to debug the components directly using methods found in this article: https://neuml.hashnode.dev/embeddings-index-components |
Yeah, I’m wondering the same thing. Right now I’m using Azure File storage, but might try Blob storage next. A few questions though while I continue to troubleshoot on my end:
|
Does the behavior change at all if you set path to |
For 2 above, what would be the file path, if I don’t provide a path config |
A few updates:
|
That is interesting. What happens if you set content to |
Haven't tried duckdb yet, but an update before proceeding...updating to use an external Postgres DB (setting content: client and providing a CLIENT_URL env variable) does allow the app to start up correctly and I can successfully embed/index some data...however when I restart the container, the index isn't returning anything. The data is still in the content db and the embeddings are still in storage as before, but the app doesn't seem to recognize it. Also, I can re-embed/index successfully, but that kind of defeats the purpose of what I'm trying to do. And it looks like the container is running: Python 3.8.10 |
I may have figured out the issue...digging through the logic a bit, it looks like when the app initializes, you check for an embeddings file based on the path provided here: https://github.com/neuml/txtai/blob/b44d5778d87a81662cae563082089dde2661c61e/src/python/txtai/embeddings/base.py#L508C35-L508C35. In our case however, since we're using sub-indexes, the embeddings files are within the indexes sub-directories, not in the top level directory (/mnt/data). Does that make sense? And any thoughts on how to proceed? |
Ok, final bit of findings...based on the above, a previous comment you made in another thread, and something I saw in the source code about using archive files, I updated my config to use index compression. After doing that, I was able to verify I only had 1 file in external storage (index.tar.gz) and when I restart the container, the app picks up the index as expected 🎉 ...although that was only after I worked around 1 other error. Essentially my current setup has 2 sub-indexes, for testing I was only indexing data into 1 of those indexes. Upon restarted I noticed the following errors: 2023-12-26T22:26:58.358006112Z ERROR: Traceback (most recent call last):
2023-12-26T22:26:58.358011993Z File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 677, in lifespan
2023-12-26T22:26:58.358015800Z async with self.lifespan_context(app) as maybe_state:
2023-12-26T22:26:58.358021461Z File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 538, in __aenter__
2023-12-26T22:26:58.358027672Z return self._cm.__enter__()
2023-12-26T22:26:58.358033092Z File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
2023-12-26T22:26:58.358038182Z return next(self.gen)
2023-12-26T22:26:58.358043812Z File "/usr/local/lib/python3.8/dist-packages/txtai/api/application.py", line 89, in lifespan
2023-12-26T22:26:58.358049784Z INSTANCE = Factory.create(config, api) if api else API(config)
2023-12-26T22:26:58.358054973Z File "/usr/local/lib/python3.8/dist-packages/txtai/api/base.py", line 18, in __init__
2023-12-26T22:26:58.358059842Z super().__init__(config, loaddata)
2023-12-26T22:26:58.358064792Z File "/usr/local/lib/python3.8/dist-packages/txtai/app/base.py", line 78, in __init__
2023-12-26T22:26:58.358070763Z self.indexes(loaddata)
2023-12-26T22:26:58.358075662Z File "/usr/local/lib/python3.8/dist-packages/txtai/app/base.py", line 209, in indexes
2023-12-26T22:26:58.358080431Z self.embeddings.load(self.config.get("path"), self.config.get("cloud"))
2023-12-26T22:26:58.358085090Z File "/usr/local/lib/python3.8/dist-packages/txtai/embeddings/base.py", line 556, in load
2023-12-26T22:26:58.358089708Z self.indexes.load(f"{path}/indexes")
2023-12-26T22:26:58.358095048Z File "/usr/local/lib/python3.8/dist-packages/txtai/embeddings/index/indexes.py", line 158, in load
2023-12-26T22:26:58.358099186Z index.load(os.path.join(path, name))
2023-12-26T22:26:58.358103584Z File "/usr/local/lib/python3.8/dist-packages/txtai/embeddings/base.py", line 536, in load
2023-12-26T22:26:58.358107962Z self.ann.load(f"{path}/embeddings")
2023-12-26T22:26:58.358112290Z File "/usr/local/lib/python3.8/dist-packages/txtai/ann/faiss.py", line 32, in load
2023-12-26T22:26:58.358117130Z self.backend = readindex(path, IO_FLAG_MMAP if self.setting("mmap") is True else 0)
2023-12-26T22:26:58.358121548Z File "/usr/local/lib/python3.8/dist-packages/faiss/swigfaiss_avx2.py", line 10206, in read_index
2023-12-26T22:26:58.358125856Z return _swigfaiss_avx2.read_index(*args)
2023-12-26T22:26:58.358130975Z RuntimeError: Error in faiss::FileIOReader::FileIOReader(const char*) at /project/faiss/faiss/impl/io.cpp:67: Error: 'f' failed: could not open /tmp/tmp9gjgdja_/indexes/document2/embeddings for reading: No such file or directory
2023-12-26T22:26:58.358136616Z
2023-12-26T22:26:58.358150742Z ERROR: Application startup failed. Exiting. So, looks like its trying to find an embeddings file in the other index (which I didn't index any data yet). I then indexed some data into that sub-index and it started fine. ...so, 2 things:
TIA! |
Thank you for the dedication on trying to solve this issue. It sounds like it might be more API related than AKS related, which is good from a reproducibility standpoint. Lot for me to unpack but I'll try to put focused time on this in the next couple of days. |
Yup, finally was able to dedicate some time myself. And definitely a lot of rambling on my part. Please don’t hesitate to reach out if you have any questions or need me to try anything. Thanks again!! |
FYI, DuckDB works. Indexes are created as expected. But still have the restart issue with sud-indexes. |
Hello! Just wanted to checkin on any progress with this. TIA! |
I don't have an answer yet. I have pending work to run on K8s clusters and was hoping to see if anything came up with that. |
I just checked in a change that I believe addresses this issue. If you want to confirm, you can install txtai from GitHub. |
It works! Currently running with index compression, external mounted storage, and external postgres content storage. Restarted the pod and everything seems to have come up fine. Thanks for the help! |
Great, glad to hear it! I'll go ahead and close this issue. |
I can't seem to get txtai running properly within our AKS (Azure Kubernetes Service) environment, specifically when mapping to external storage. We're creating our txtai instance with a config similar to the below:
I am able to start the txtai instance successfully, but when I call the index endpoint, it essentially just spins and eventually times out. Upon investigation of the pod, I can see that the config and database files are created, but the database file is empty and the indexes directories are not created at all. Below is what I see when inspecting the mapped directory:
And I am not seeing any actual errors in the pod logs when I call index.
Lastly, the above setup does work on a local machine, and it works when not mapping to external storage. Any ideas what might be going on?
The text was updated successfully, but these errors were encountered: