Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENHANCEMENT] disk persistent storage for indexes and documents #296

Closed
achao2013 opened this issue Jan 31, 2023 · 14 comments
Closed

[ENHANCEMENT] disk persistent storage for indexes and documents #296

achao2013 opened this issue Jan 31, 2023 · 14 comments
Labels
enhancement New feature or request

Comments

@achao2013
Copy link

Is your feature request related to a problem? Please describe.
The document is cached in docker.If docker is down, you need to run the database again. Can we support disk storage?
Describe the solution you'd like
provide a config which can set database storage path in the disk which has maped into docker.

Describe alternatives you've considered

Additional context

@achao2013 achao2013 added the enhancement New feature or request label Jan 31, 2023
@achao2013
Copy link
Author

@wanliAlex @pandu-k

@pandu-k
Copy link
Collaborator

pandu-k commented Jan 31, 2023

Disk storage is supported by default the form of a Docker Volume.

The quick start guide commands are to run a Marqo instance for the first time. If you want to access the same Marqo instance after restarting your computer, follow the starting and stopping guide in Marqo: https://docs.marqo.ai/0.0.12/starting_and_stopping/

If there are persistence issues because of your cloud computing environment (for example, if you are using SageMaker), you can change the Docker storage location: https://docs.marqo.ai/0.0.12/Advanced-Usage/change_storage_location/

@pandu-k
Copy link
Collaborator

pandu-k commented Jan 31, 2023

Also, if you want to transfer Marqo's state to a new Marqo container (for example, a version update), follow this guide: https://docs.marqo.ai/0.0.12/Advanced-Usage/transferring_state/

@achao2013
Copy link
Author

achao2013 commented Jan 31, 2023

can i config the disk storage path? I mean the disk storage path of document, not docker itself @pandu-k

@achao2013
Copy link
Author

in other words, docker contaner store in alternative places(e.g. /var/lib/docker), the text or image codes(so-called document) store in fixed disk storage(e.g. /mnt/disk1).

@achao2013
Copy link
Author

@pandu-k @wanliAlex

@jn2clark
Copy link
Contributor

jn2clark commented Feb 1, 2023

hi @achao2013 , would you be able to provide some more details? To change the docker storage location you can use this https://docs.marqo.ai/0.0.12/Advanced-Usage/change_storage_location/ . The images can live in another location and only the corresponding embeddings will be stored in marqo-os. For text, the original will be stored within marqo-os along with the embeddings. To summarise, pointers to images can be used but at the moment the original text will be also stored and pointer only for text is not supported. Does that help answer?

@achao2013
Copy link
Author

achao2013 commented Feb 1, 2023

hi @achao2013 , would you be able to provide some more details? To change the docker storage location you can use this https://docs.marqo.ai/0.0.12/Advanced-Usage/change_storage_location/ . The images can live in another location and only the corresponding embeddings will be stored in marqo-os. For text, the original will be stored within marqo-os along with the embeddings. To summarise, pointers to images can be used but at the moment the original text will be also stored and pointer only for text is not supported. Does that help answer?

If i want to store the image or text embeddings in the disk, not the marqo-os in docker, does the design of marqo support it or how can i edit the marqo code to implement this function? @jn2clark

@achao2013
Copy link
Author

@pandu-k @wanliAlex

@jn2clark
Copy link
Contributor

jn2clark commented Feb 2, 2023

you can run the backend (opensearch) outside of the marqo docker. this means the opensearch volume can persist without the marqo docker. see the developer guide here https://github.com/marqo-ai/marqo/tree/mainline/src/marqo. option C is what you want. just make sure that opensearch is started first

@achao2013
Copy link
Author

achao2013 commented Feb 4, 2023

you can run the backend (opensearch) outside of the marqo docker. this means the opensearch volume can persist without the marqo docker. see the developer guide here https://github.com/marqo-ai/marqo/tree/mainline/src/marqo. option C is what you want. just make sure that opensearch is started first

thanks, It's getting close to what I want. Further down the line, is there a specific location in the code where you can set the disk storage path of the opensearch volume. @jn2clark

@achao2013
Copy link
Author

by the way , what's the differene between "marqoai/marqo-os:0.0.3" and builded marqo_docker_0 in option C. @jn2clark @pandu-k

@pandu-k
Copy link
Collaborator

pandu-k commented Feb 14, 2023

by the way , what's the differene between "marqoai/marqo-os:0.0.3" and builded marqo_docker_0 in option C. @jn2clark @pandu-k

marqoai/marqo-os:0.0.3 is the version of OpenSearch that Marqo uses. In the future we plan to have some sort of "dump index" functionality. Would this be help solve this prolem?

@achao2013
Copy link
Author

if it's a configurable disk path, i think it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants