Skip to content

Commit

Permalink
docs(document): add fluent interface and make docbot generate it (#3980)
Browse files Browse the repository at this point in the history
  • Loading branch information
hanxiao committed Nov 23, 2021
1 parent c7bbd30 commit e7822c3
Show file tree
Hide file tree
Showing 11 changed files with 185 additions and 6 deletions.
Binary file added docs/fundamentals/document/apple.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/fundamentals/document/apple1.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 6 additions & 0 deletions docs/fundamentals/document/document-api.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# Document

```{toctree}
:hidden:
fluent-interface
```

{class}`~jina.types.document.Document` is the basic data type in Jina. Whether you're working with text, image, video, audio, or 3D meshes, they are
all `Document`s in Jina.

Expand Down
113 changes: 113 additions & 0 deletions docs/fundamentals/document/fluent-interface.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# Fluent Interface

Jina provides a simple fluent interface for `Document` that allows one to process (often preprocess) a Document object by chaining methods. For example to read an image file as `numpy.ndarray`, resize it, normalize it and then store it to another file; one can simply do:

```python
from jina import Document

d = (
Document(uri='apple.png')
.load_uri_to_image_blob()
.set_image_blob_shape((64, 64))
.set_image_blob_normalization()
.dump_image_blob_to_file('apple1.png')
)
```

```{figure} apple.png
:scale: 20%
Original `apple.png`
```

```{figure} apple1.png
:scale: 50%
Processed `apple1.png`
```

````{important}
Note that, chaining methods always modify the original Document in-place. That means the above example is equivalent to:
```python
from jina import Document
d = Document(uri='apple.png')
(d.load_uri_to_image_blob()
.set_image_blob_shape((64, 64))
.set_image_blob_normalization()
.dump_image_blob_to_file('apple1.png'))
```
````


## Methods

All the following methods can be chained.


<!-- fluent-interface-start -->
### Convert
Provide helper functions for {class}`Document` to support conversion between {attr}`.blob`, {attr}`.text`
and {attr}`.buffer`.
- {meth}`~jina.types.document.mixins.convert.ConvertMixin.convert_blob_to_buffer`
- {meth}`~jina.types.document.mixins.convert.ConvertMixin.convert_buffer_to_blob`
- {meth}`~jina.types.document.mixins.convert.ConvertMixin.convert_uri_to_datauri`


### TextData
Provide helper functions for {class}`Document` to support text data.
- {meth}`~jina.types.document.mixins.text.TextDataMixin.convert_blob_to_text`
- {meth}`~jina.types.document.mixins.text.TextDataMixin.convert_text_to_blob`
- {meth}`~jina.types.document.mixins.text.TextDataMixin.dump_text_to_datauri`
- {meth}`~jina.types.document.mixins.text.TextDataMixin.load_uri_to_text`


### ImageData
Provide helper functions for {class}`Document` to support image data.
- {meth}`~jina.types.document.mixins.image.ImageDataMixin.convert_buffer_to_image_blob`
- {meth}`~jina.types.document.mixins.image.ImageDataMixin.convert_image_blob_to_sliding_windows`
- {meth}`~jina.types.document.mixins.image.ImageDataMixin.convert_image_blob_to_uri`
- {meth}`~jina.types.document.mixins.image.ImageDataMixin.dump_image_blob_to_file`
- {meth}`~jina.types.document.mixins.image.ImageDataMixin.load_uri_to_image_blob`
- {meth}`~jina.types.document.mixins.image.ImageDataMixin.set_image_blob_channel_axis`
- {meth}`~jina.types.document.mixins.image.ImageDataMixin.set_image_blob_normalization`
- {meth}`~jina.types.document.mixins.image.ImageDataMixin.set_image_blob_shape`


### AudioData
Provide helper functions for {class}`Document` to support audio data.
- {meth}`~jina.types.document.mixins.audio.AudioDataMixin.dump_audio_blob_to_file`
- {meth}`~jina.types.document.mixins.audio.AudioDataMixin.load_uri_to_audio_blob`


### BufferData
Provide helper functions for {class}`Document` to handle binary data.
- {meth}`~jina.types.document.mixins.buffer.BufferDataMixin.dump_buffer_to_datauri`
- {meth}`~jina.types.document.mixins.buffer.BufferDataMixin.load_uri_to_buffer`


### DumpFile
Provide helper functions for {class}`Document` to dump content to a file.
- {meth}`~jina.types.document.mixins.dump.DumpFileMixin.dump_buffer_to_file`
- {meth}`~jina.types.document.mixins.dump.DumpFileMixin.dump_uri_to_file`


### ContentProperty
Provide helper functions for {class}`Document` to allow universal content property access.
- {meth}`~jina.types.document.mixins.content.ContentPropertyMixin.dump_content_to_datauri`


### VideoData
Provide helper functions for {class}`Document` to support video data.
- {meth}`~jina.types.document.mixins.video.VideoDataMixin.dump_video_blob_to_file`
- {meth}`~jina.types.document.mixins.video.VideoDataMixin.load_uri_to_video_blob`


### MeshData
Provide helper functions for {class}`Document` to support 3D mesh data and point cloud.
- {meth}`~jina.types.document.mixins.mesh.MeshDataMixin.load_uri_to_point_cloud_blob`


<!-- fluent-interface-end -->
1 change: 1 addition & 0 deletions jina/hubble/helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -287,6 +287,7 @@ def wrapper(*args, **kwargs):
call_hash = f'{func.__name__}({", ".join(map(str, args))})'

pickle_protocol = 4
import filelock

cache_db = None
with filelock.FileLock(f'{cache_file}.lock', timeout=-1):
Expand Down
6 changes: 4 additions & 2 deletions jina/types/document/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -240,7 +240,9 @@ def _update_doc(d: Dict):
_remainder2 = _remainder.difference(_intersect2)

if _intersect2:
self.set_attributes(**{p: document[p] for p in _intersect2})
self._set_attributes(
**{p: document[p] for p in _intersect2}
)

if _remainder2:
self._pb_body.tags.update(
Expand Down Expand Up @@ -274,7 +276,7 @@ def _update_doc(d: Dict):
raise ValueError(
f'Document content fields are mutually exclusive, please provide only one of {_all_doc_content_keys}'
)
self.set_attributes(**kwargs)
self._set_attributes(**kwargs)

@property
def weight(self) -> float:
Expand Down
2 changes: 1 addition & 1 deletion jina/types/document/mixins/attribute.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ def get_attributes(self, *fields: str) -> Union[Any, List[Any]]:

return ret

def set_attributes(self, **kwargs):
def _set_attributes(self, **kwargs) -> None:
"""Bulk update Document fields with key-value specified in kwargs
.. seealso::
Expand Down
4 changes: 2 additions & 2 deletions jina/types/document/mixins/match.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ def match(
exclude_self: bool = False,
only_id: bool = False,
use_scipy: bool = False,
):
) -> None:
"""Matching the current Document against a set of Documents.
The result will be stored in :attr:`.matches`.
Expand Down Expand Up @@ -52,7 +52,7 @@ def match(
"""
...

def match(self, *args, **kwargs):
def match(self, *args, **kwargs) -> None:
"""
# noqa: D102
# noqa: DAR101
Expand Down
1 change: 1 addition & 0 deletions scripts/devbot.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ arr=( $(PYTHONPATH=.. python inject-document-props-as-overload.py) ) && black -S

# update autocomplete info && black it
python update-autocomplete-cli.py && black -S ../cli/autocomplete.py
python update-fluent-interface.py

# sync package requirements with resources/ requirements
cp ../extra-requirements.txt ../jina/resources/
56 changes: 56 additions & 0 deletions scripts/update-fluent-interface.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
import inspect
import re
import sys
from collections import defaultdict

from jina import Document

all_meth = defaultdict(list)
for f in inspect.getmembers(Document):
if (
callable(f[1])
and not f[1].__name__.startswith('_')
and not f[0].startswith('_')
):
if (
'return' in inspect.getfullargspec(f[1]).annotations
and str(inspect.getfullargspec(f[1]).annotations['return']) == '~T'
):
module_name = f[1].__qualname__.split('.')[0].replace('Mixin', '')
desc = inspect.getdoc(
vars(sys.modules[f[1].__module__])[f[1].__qualname__.split('.')[0]]
)

all_meth[
(
module_name,
desc.strip()
.replace(':class:', '{class}')
.replace(':attr:', '{attr}'),
)
].append(f'{{meth}}`~{f[1].__module__}.{f[1].__qualname__}`')

all_s = []
for k, v in all_meth.items():
all_s.append(f'### {k[0].strip()}')
all_s.append(f'{k[1].strip()}')
for vv in v:
all_s.append(f'- {vv}')

all_s.append('\n')


doc_md = '../docs/fundamentals/document/fluent-interface.md'
text = '\n'.join(all_s)

with open(doc_md) as fp:
_old = fp.read()
_new = re.sub(
r'(<!-- fluent-interface-start -->\s*?\n).*(\n\s*?<!-- fluent-interface-end -->)',
rf'\g<1>{text}\g<2>',
_old,
flags=re.DOTALL,
)

with open(doc_md, 'w') as fp:
fp.write(_new)
2 changes: 1 addition & 1 deletion tests/unit/types/document/test_document.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ def test_doc_update_fields():
d = [12, 34, 56]
e = 'text-mod'
w = 2.0
a.set_attributes(embedding=b, tags=c, location=d, modality=e, weight=w)
a._set_attributes(embedding=b, tags=c, location=d, modality=e, weight=w)
np.testing.assert_equal(a.embedding, b)
assert list(a.location) == d
assert a.modality == e
Expand Down

0 comments on commit e7822c3

Please sign in to comment.