Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: migrate from memmap to sqlite #4348

Merged
merged 2 commits into from
Feb 16, 2022
Merged

docs: migrate from memmap to sqlite #4348

merged 2 commits into from
Feb 16, 2022

Conversation

alaeddine-13
Copy link
Contributor

docs: migrate from memmap to sqlite

@github-actions github-actions bot added size/S area/docs This issue/PR affects the docs labels Feb 15, 2022
@github-actions
Copy link

github-actions bot commented Feb 15, 2022

Latency summary

Current PR yields:

  • 🐢🐢 index QPS at 860, delta to last 2 avg.: -28%
  • 😶 query QPS at 36, delta to last 2 avg.: -1%
  • 🐎🐎🐎🐎 avg flow time within 2.2014 seconds, delta to last 2 avg.: +46%
  • 😶 import jina within 0.4382 seconds, delta to last 2 avg.: +5%

Breakdown

Version Index QPS Query QPS Avg Flow Time (s) Import Time (s)
current 860 36 2.2014 0.4382
2.7.0 1209 51 1.4446 0.3509
2.6.4 1181 21 1.5656 0.4773

Backed by latency-tracking. Further commits will update this comment.

JoanFM
JoanFM previously approved these changes Feb 15, 2022
Comment on lines 178 to 233

### DocumentArray: new storage options
Jina 2 used to offer persistence of DocumentArray through `DocumentArrayMemmap`. In Jina 3, this data structure is
deprecated and we introduce different [Document Stores](https://docarray.jina.ai/advanced/document-store/) within the
`DocumentArray` API. Thus, you can enjoy a consistent `DocumentArray` API across different storage backends and leverage
modern databases.

For example, you can use [SQLite backend](https://docarray.jina.ai/advanced/document-store/sqlite/) as a replacement
for `DocumentArrayMemmap`:

```python
from docarray import Document, DocumentArray
das = DocumentArray(storage='sqlite', config={'connection': 'my_connection', 'table_name': 'my_table_name'})
das.extend([Document() for _ in range(10)])
```

This will persist the Documents into disk using SQLite and therefore, you should find the Documents within another
session:

```python
from docarray import DocumentArray
das = DocumentArray(storage='sqlite', config={'connection': 'my_connection', 'table_name': 'my_table_name'})
das.summary()
```


```text
Documents Summary

Length 10
Homogenous Documents True
Common Attributes ('id',)

Attributes Summary

Attribute Data type #Unique values Has empty value
──────────────────────────────────────────────────────────
id ('str',) 10 False

Storage Summary

Backend SQLite (https://www.sqlite.org)
Connection my_connection
Table Name my_table_name
Serialization Protocol
Class DocumentArraySqlite
```

The API is **almost the same** as the deprecated `DocumentArrayMemmap` and is consistent across storage backends and
in-memory storage. Furthermore, some Document Stores offer fast Nearest Neighbor algorithms and are more convenient in
production.

````{admonition} See Also
:class: seealso
Read more about [Document Stores](https://docarray.jina.ai/advanced/document-store/) in DocArray
````
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am worried that this does not really fit the purpose of the migration guide, which really just tells people what they need to do to get their code working with Jina 3, as succinctly as possible. For further explanations it should refer to external resources. As you can see, document stores would be by far the longest item here.

So I suggest the following compromise:

Suggested change
### DocumentArray: new storage options
Jina 2 used to offer persistence of DocumentArray through `DocumentArrayMemmap`. In Jina 3, this data structure is
deprecated and we introduce different [Document Stores](https://docarray.jina.ai/advanced/document-store/) within the
`DocumentArray` API. Thus, you can enjoy a consistent `DocumentArray` API across different storage backends and leverage
modern databases.
For example, you can use [SQLite backend](https://docarray.jina.ai/advanced/document-store/sqlite/) as a replacement
for `DocumentArrayMemmap`:
```python
from docarray import Document, DocumentArray
das = DocumentArray(storage='sqlite', config={'connection': 'my_connection', 'table_name': 'my_table_name'})
das.extend([Document() for _ in range(10)])
```
This will persist the Documents into disk using SQLite and therefore, you should find the Documents within another
session:
```python
from docarray import DocumentArray
das = DocumentArray(storage='sqlite', config={'connection': 'my_connection', 'table_name': 'my_table_name'})
das.summary()
```
```text
Documents Summary
Length 10
Homogenous Documents True
Common Attributes ('id',)
Attributes Summary
Attribute Data type #Unique values Has empty value
──────────────────────────────────────────────────────────
id ('str',) 10 False
Storage Summary
Backend SQLite (https://www.sqlite.org)
Connection my_connection
Table Name my_table_name
Serialization Protocol
Class DocumentArraySqlite
```
The API is **almost the same** as the deprecated `DocumentArrayMemmap` and is consistent across storage backends and
in-memory storage. Furthermore, some Document Stores offer fast Nearest Neighbor algorithms and are more convenient in
production.
````{admonition} See Also
:class: seealso
Read more about [Document Stores](https://docarray.jina.ai/advanced/document-store/) in DocArray
````
**New storage options**:
Jina 2 used to offer persistence of DocumentArray through `DocumentArrayMemmap`. In Jina 3, this data structure is
deprecated and we introduce different [Document Stores](https://docarray.jina.ai/advanced/document-store/) within the
`DocumentArray` API. Thus, you can enjoy a consistent `DocumentArray` API across different storage backends and leverage
modern databases, such as [SQLite backend](https://docarray.jina.ai/advanced/document-store/sqlite/), while using an API that is **almost the same** as the deprecated `DocumentArrayMemmap`.
For example, you can use [SQLite backend](https://docarray.jina.ai/advanced/document-store/sqlite/) as a replacement
for `DocumentArrayMemmap`, which lets you persist Documents to disk and load them in another session:
````{tab} Storing to disk
```python
from docarray import Document, DocumentArray
docs = DocumentArray(storage='sqlite', config={'connection': 'my_connection', 'table_name': 'my_table_name'})
docs.extend([Document() for _ in range(10)])
````{tab} Loading from disk
```python
from docarray import DocumentArray
docs = DocumentArray(storage='sqlite', config={'connection': 'my_connection', 'table_name': 'my_table_name'})
```

I removed the info box at the bottom because it just links to storage backends, but those are already linked at the top. I tried to cut as much as possible while still keeping the key information in place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so the code suggestion is messed (the box with the second tab should be inside of it) up but yeah

@github-actions
Copy link

📝 Docs are deployed on https://docs-sqlite-migration--jina-docs.netlify.app 🎉

@JohannesMessner
Copy link
Contributor

concise enough for me now ;)

@JoanFM JoanFM merged commit d566e5b into master Feb 16, 2022
@JoanFM JoanFM deleted the docs-sqlite-migration branch February 16, 2022 08:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docs This issue/PR affects the docs size/S
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants