Skip to content

Commit

Permalink
docs: enhance DocVec section (#1658)
Browse files Browse the repository at this point in the history
Signed-off-by: maxwelljin <101249253+maxwelljin@users.noreply.github.com>
  • Loading branch information
maxwelljin committed Jun 19, 2023
1 parent 0c27fef commit e870eb8
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 21 deletions.
24 changes: 11 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -674,41 +674,39 @@ And to seal the deal, let us show you how easily documents slot into your FastAP
```python
import numpy as np
from fastapi import FastAPI
from httpx import AsyncClient

from docarray.base_doc import DocArrayResponse
from docarray import BaseDoc
from docarray.documents import ImageDoc
from docarray.typing import NdArray
from docarray.base_doc import DocArrayResponse


class InputDoc(BaseDoc):
img: ImageDoc
text: str


class OutputDoc(BaseDoc):
embedding_clip: NdArray
embedding_bert: NdArray


input_doc = InputDoc(img=ImageDoc(tensor=np.zeros((3, 224, 224))))

app = FastAPI()

def model_img(img: ImageTensor) -> NdArray:
return np.zeros((100, 1))

def model_text(text: str) -> NdArray:
return np.zeros((100, 1))

@app.post("/doc/", response_model=OutputDoc, response_class=DocArrayResponse)
@app.post("/embed/", response_model=OutputDoc, response_class=DocArrayResponse)
async def create_item(doc: InputDoc) -> OutputDoc:
## call my fancy model to generate the embeddings
doc = OutputDoc(
embedding_clip=np.zeros((100, 1)), embedding_bert=np.zeros((100, 1))
embedding_clip=model_img(doc.img.tensor), embedding_bert=model_text(doc.text)
)
return doc


async with AsyncClient(app=app, base_url="http://test") as ac:
response = await ac.post("/doc/", data=input_doc.json())
resp_doc = await ac.get("/docs")
resp_redoc = await ac.get("/redoc")
response = await ac.post("/embed/", data=input_doc.json())

```

Just like a vanilla Pydantic model!
Expand Down
18 changes: 10 additions & 8 deletions docs/user_guide/representing/array.md
Original file line number Diff line number Diff line change
Expand Up @@ -256,20 +256,20 @@ This is where the custom syntax `DocList[DocType]` comes into play.
!!! note
`DocList[DocType]` creates a custom [`DocList`][docarray.array.doc_list.doc_list.DocList] that can only contain `DocType` Documents.

This syntax is inspired by more statically typed languages, and even though it might offend Python purists, we believe that it is a good user experience to think of an Array of `BaseDoc`s rather than just an array of non-homogenous `BaseDoc`s.
This syntax is inspired by more statically typed languages, and even though it might offend Python purists, we believe that it is a good user experience to think of an Array of `BaseDoc`s rather than just an array of heterogeneous `BaseDoc`s.

That said, `AnyDocArray` can also be used to create a non-homogenous `AnyDocArray`:
That said, `AnyDocArray` can also be used to create a heterogeneous `AnyDocArray`:

!!! note
The default `DocList` can be used to create a non-homogenous list of `BaseDoc`.
The default `DocList` can be used to create a heterogeneous list of `BaseDoc`.

!!! warning
`DocVec` cannot store non-homogenous `BaseDoc` and always needs the `DocVec[DocType]` syntax.
`DocVec` cannot store heterogeneous `BaseDoc` and always needs the `DocVec[DocType]` syntax.

The usage of a non-homogenous `DocList` is similar to a normal Python list but still offers DocArray functionality
The usage of a heterogeneous `DocList` is similar to a normal Python list but still offers DocArray functionality
like [serialization and sending over the wire](../sending/first_step.md). However, it won't be able to extend the API of your custom schema to the Array level.

Here is how you can instantiate a non-homogenous `DocList`:
Here is how you can instantiate a heterogeneous `DocList`:

```python
from docarray import BaseDoc, DocList
Expand Down Expand Up @@ -386,10 +386,10 @@ this means that if you call `docs.image` multiple times, under the hood you will
Let's see how it will work with `DocVec`:

```python
from docarray import DocList
from docarray import DocVec
import numpy as np

docs = DocList[ImageDoc](
docs = DocVec[ImageDoc](
[ImageDoc(image=np.random.rand(3, 224, 224)) for _ in range(10)]
)

Expand Down Expand Up @@ -460,6 +460,8 @@ Both [`DocList`][docarray.array.doc_list.doc_list.DocList] and [`DocVec`][docarr
Using nested optional fields differs slightly between DocList and DocVes, so watch out. But in a nutshell:

When accessing a nested BaseDoc:


* DocList will return a list of documents if the field is optional and a DocList if the field is not optional
* DocVec will return a DocVec if all documents are there, or None if all docs are None. No mix of docs and None allowed!
* DocVec will behave the same for a tensor field instead of a BaseDoc
Expand Down

0 comments on commit e870eb8

Please sign in to comment.