-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: migrate from memmap to sqlite #4348
Conversation
Latency summaryCurrent PR yields:
Breakdown
Backed by latency-tracking. Further commits will update this comment. |
docs/get-started/migrate.md
Outdated
|
||
### DocumentArray: new storage options | ||
Jina 2 used to offer persistence of DocumentArray through `DocumentArrayMemmap`. In Jina 3, this data structure is | ||
deprecated and we introduce different [Document Stores](https://docarray.jina.ai/advanced/document-store/) within the | ||
`DocumentArray` API. Thus, you can enjoy a consistent `DocumentArray` API across different storage backends and leverage | ||
modern databases. | ||
|
||
For example, you can use [SQLite backend](https://docarray.jina.ai/advanced/document-store/sqlite/) as a replacement | ||
for `DocumentArrayMemmap`: | ||
|
||
```python | ||
from docarray import Document, DocumentArray | ||
das = DocumentArray(storage='sqlite', config={'connection': 'my_connection', 'table_name': 'my_table_name'}) | ||
das.extend([Document() for _ in range(10)]) | ||
``` | ||
|
||
This will persist the Documents into disk using SQLite and therefore, you should find the Documents within another | ||
session: | ||
|
||
```python | ||
from docarray import DocumentArray | ||
das = DocumentArray(storage='sqlite', config={'connection': 'my_connection', 'table_name': 'my_table_name'}) | ||
das.summary() | ||
``` | ||
|
||
|
||
```text | ||
Documents Summary | ||
|
||
Length 10 | ||
Homogenous Documents True | ||
Common Attributes ('id',) | ||
|
||
Attributes Summary | ||
|
||
Attribute Data type #Unique values Has empty value | ||
────────────────────────────────────────────────────────── | ||
id ('str',) 10 False | ||
|
||
Storage Summary | ||
|
||
Backend SQLite (https://www.sqlite.org) | ||
Connection my_connection | ||
Table Name my_table_name | ||
Serialization Protocol | ||
Class DocumentArraySqlite | ||
``` | ||
|
||
The API is **almost the same** as the deprecated `DocumentArrayMemmap` and is consistent across storage backends and | ||
in-memory storage. Furthermore, some Document Stores offer fast Nearest Neighbor algorithms and are more convenient in | ||
production. | ||
|
||
````{admonition} See Also | ||
:class: seealso | ||
Read more about [Document Stores](https://docarray.jina.ai/advanced/document-store/) in DocArray | ||
```` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am worried that this does not really fit the purpose of the migration guide, which really just tells people what they need to do to get their code working with Jina 3, as succinctly as possible. For further explanations it should refer to external resources. As you can see, document stores would be by far the longest item here.
So I suggest the following compromise:
### DocumentArray: new storage options | |
Jina 2 used to offer persistence of DocumentArray through `DocumentArrayMemmap`. In Jina 3, this data structure is | |
deprecated and we introduce different [Document Stores](https://docarray.jina.ai/advanced/document-store/) within the | |
`DocumentArray` API. Thus, you can enjoy a consistent `DocumentArray` API across different storage backends and leverage | |
modern databases. | |
For example, you can use [SQLite backend](https://docarray.jina.ai/advanced/document-store/sqlite/) as a replacement | |
for `DocumentArrayMemmap`: | |
```python | |
from docarray import Document, DocumentArray | |
das = DocumentArray(storage='sqlite', config={'connection': 'my_connection', 'table_name': 'my_table_name'}) | |
das.extend([Document() for _ in range(10)]) | |
``` | |
This will persist the Documents into disk using SQLite and therefore, you should find the Documents within another | |
session: | |
```python | |
from docarray import DocumentArray | |
das = DocumentArray(storage='sqlite', config={'connection': 'my_connection', 'table_name': 'my_table_name'}) | |
das.summary() | |
``` | |
```text | |
Documents Summary | |
Length 10 | |
Homogenous Documents True | |
Common Attributes ('id',) | |
Attributes Summary | |
Attribute Data type #Unique values Has empty value | |
────────────────────────────────────────────────────────── | |
id ('str',) 10 False | |
Storage Summary | |
Backend SQLite (https://www.sqlite.org) | |
Connection my_connection | |
Table Name my_table_name | |
Serialization Protocol | |
Class DocumentArraySqlite | |
``` | |
The API is **almost the same** as the deprecated `DocumentArrayMemmap` and is consistent across storage backends and | |
in-memory storage. Furthermore, some Document Stores offer fast Nearest Neighbor algorithms and are more convenient in | |
production. | |
````{admonition} See Also | |
:class: seealso | |
Read more about [Document Stores](https://docarray.jina.ai/advanced/document-store/) in DocArray | |
```` | |
**New storage options**: | |
Jina 2 used to offer persistence of DocumentArray through `DocumentArrayMemmap`. In Jina 3, this data structure is | |
deprecated and we introduce different [Document Stores](https://docarray.jina.ai/advanced/document-store/) within the | |
`DocumentArray` API. Thus, you can enjoy a consistent `DocumentArray` API across different storage backends and leverage | |
modern databases, such as [SQLite backend](https://docarray.jina.ai/advanced/document-store/sqlite/), while using an API that is **almost the same** as the deprecated `DocumentArrayMemmap`. | |
For example, you can use [SQLite backend](https://docarray.jina.ai/advanced/document-store/sqlite/) as a replacement | |
for `DocumentArrayMemmap`, which lets you persist Documents to disk and load them in another session: | |
````{tab} Storing to disk | |
```python | |
from docarray import Document, DocumentArray | |
docs = DocumentArray(storage='sqlite', config={'connection': 'my_connection', 'table_name': 'my_table_name'}) | |
docs.extend([Document() for _ in range(10)]) |
````{tab} Loading from disk
```python
from docarray import DocumentArray
docs = DocumentArray(storage='sqlite', config={'connection': 'my_connection', 'table_name': 'my_table_name'})
```
I removed the info box at the bottom because it just links to storage backends, but those are already linked at the top. I tried to cut as much as possible while still keeping the key information in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so the code suggestion is messed (the box with the second tab should be inside of it) up but yeah
📝 Docs are deployed on https://docs-sqlite-migration--jina-docs.netlify.app 🎉 |
concise enough for me now ;) |
docs: migrate from memmap to sqlite