docs(fundamentals): more polishing (#3924)

* docs(fundamentals): polish document api * docs(fundamentals): polish documentarray api * docs(fundamentals): polish documentmemmaparray api
jina-ai · Nov 13, 2021 · 62f244f · 62f244f
1 parent 70643b7
commit 62f244f
Show file tree

Hide file tree

Showing 3 changed files with 114 additions and 114 deletions.
diff --git a/docs/fundamentals/document/document-api.md b/docs/fundamentals/document/document-api.md
@@ -1,6 +1,6 @@
 # Document
 
-`Document` is the basic data type that Jina operates with text, picture, video, audio, image or 3D mesh: they are
+`Document` is Jina's basic data type. Whether you're working with text, image, video, audio, or 3D meshes, they are
 all `Document`s in Jina.
 
 ## Minimum working example
@@ -26,18 +26,18 @@ d3 = Document(blob=numpy.array([1, 2, 3]))
 
 ### Content 
 
-`text`, `blob`, and `buffer` are three content attributes of a Document. They correspond to string-like data (e.g. for natural language), `ndarray`-like data (e.g. for image/audio/video data), and binary data for general purpose, respectively. Each Document can contain only one type of content.
+`text`, `blob`, and `buffer` are the three content attributes of a Document. They correspond to string-like data (e.g. for natural language), `ndarray`-like data (e.g. for image/audio/video data), and binary data for general purpose, respectively. Each Document can contain only one type of content.
 
 | Attribute | Accept type | Use case |
 | --- | --- | --- |
-| `doc.text` | Python string | To contain text |
-| `doc.blob` | Numpy `ndarray`, Scipy sparse matrix (`spmatrix`), Tensorflow dense & sparse tensor, PyTorch dense & sparse tensor, PaddlePaddle dense tensor | To contain image/video/audio |
-| `doc.buffer` | 	Binary string | To contain intermediate IO buffer |
+| `doc.text` | Python string | Contain text |
+| `doc.blob` | Numpy `ndarray`, SciPy sparse matrix (`spmatrix`), TensorFlow dense & sparse tensor, PyTorch dense & sparse tensor, PaddlePaddle dense tensor | Contain image/video/audio |
+| `doc.buffer` | 	Binary string | Contain intermediate IO buffer |
 
 ````{admonition} Exclusivity of the content
 :class: important
 
-Note that one `Document` can only contain one type of `content`: it is either `text`, `buffer`, or `blob`. If you set one, the other will be cleared. 
+Note that one `Document` can only contain one type of `content`: either `text`, `buffer`, or `blob`. If you set one, the others will be cleared. 
 
 ```python
 import numpy as np
@@ -50,17 +50,17 @@ d.text  # <- now it's empty
 
 ````
 
-````{admonition} Why Document contains only data type
+````{admonition} Why a Document contains only data type
 :class: question
 
-What if you want to represent more than one kind of information? Say, to fully represent a PDF slide you need to store both image and text. In this case, you can use {ref}`nested Document<recursive-nested-document>` by putting image into one sub Document, and putting text into another sub Document.
+What if you want to represent more than one kind of information? Say, to fully represent a PDF page you need to store both image and text. In this case, you can use {ref}`nested Document<recursive-nested-document>`s by putting image into one sub-Document, and text into another.
 
 ```python
 d = Document(chunks=[Document(blob=...), Document(text=...)])
 ```
 
 
-The principle is each Document contains only one modality. This makes the whole logic more clear.
+The principle is each Document contains only one modality. This makes the whole logic clearer.
 ````
 
 ```{tip}
@@ -71,11 +71,11 @@ There is also a `doc.content` sugar getter/setter of the above non-empty field.
 
 #### Loading content from URI
 
-Often, you need to load data from a URI instead of assign them directly in your code, `.uri` is the attribute you must learn. 
+Often, you need to load data from a URI instead of assigning them directly in your code, `.uri` is the attribute you must learn. 
 
-After set `.uri`, you can load the data into `.text`/`.buffer`/`.blob` as below.
+After setting `.uri`, you can load data into `.text`/`.buffer`/`.blob` as follows.
 
-The value of `.uri` can point to either local or remote or [data URI](https://en.wikipedia.org/wiki/Data_URI_scheme).
+The value of `.uri` can point to either local URI, remote URI or [data URI](https://en.wikipedia.org/wiki/Data_URI_scheme).
 
 ````{tab} Local image URI
 
@@ -138,11 +138,11 @@ blob [[[255 255 255]
 
 ````
 
-There are more `.convert_uri_to_*` functions allow you to read {ref}`text<text-type>`, {ref}`image<image-type>`, {ref}`video<video-type>`, {ref}`3D mesh<mesh-type>`, {ref}`audio<audio-type>` and {ref}`tabular<table-type>` data into Jina.
+There are more `.convert_uri_to_*` functions that allow you to read {ref}`text<text-type>`, {ref}`image<image-type>`, {ref}`video<video-type>`, {ref}`3D mesh<mesh-type>`, {ref}`audio<audio-type>` and {ref}`tabular<table-type>` data into Jina.
 
 ```{admonition} Write to data URI
 :class: tip
-Inline data URI is helpful when you need a quick visualization in a HTML, as it embeds all resources directly into that HTML. 
+Inline data URI is helpful when you need a quick visualization in HTML, as it embeds all resources directly into that HTML. 
 
 You can convert a URI to a data URI using `doc.convert_uri_to_datauri()`. This will fetch the resource and make it inline.
 ```
@@ -152,9 +152,9 @@ You can convert a URI to a data URI using `doc.convert_uri_to_datauri()`. This w
 
 Embedding is a multi-dimensional representation of a `Document` (often a `[1, D]` vector). It serves as a very important piece in the neural search. 
 
-Document has an attribute `.embedding` to contain the embedding infromation.
+Document has an attribute `.embedding` to contain the embedding information.
 
-Like `.blob`, you can assign it with Numpy `ndarray`, Scipy sparse matrix (`spmatrix`), Tensorflow dense & sparse tensor, PyTorch dense & sparse tensor, PaddlePaddle dense tensor.
+Like `.blob`, you can assign it with Numpy `ndarray`, SciPy sparse matrix (`spmatrix`), TensorFlow dense and sparse tensor, PyTorch dense and sparse tensor, or PaddlePaddle dense tensor.
 
 ```python
 import numpy as np
@@ -172,7 +172,7 @@ d5 = Document(embedding=tf.sparse.from_dense(np.array([[1, 2, 3], [4, 5, 6]])))
 
 #### Finding nearest-neighbours
 
-Once a document has `.embedding` filled, it can be "matched". In this example, we build 10 Documents and put them into a {ref}`DocumentArray<da-intro>`, and then use another Document to search against with.
+Once a Document has `.embedding` filled, it can be "matched". In this example, we build ten Documents and put them into a {ref}`DocumentArray<da-intro>`, and then use another Document to search against them.
 
 ```python
 from jina import DocumentArray, Document
@@ -194,14 +194,14 @@ print(q.matches[0])
 ```{admonition} Working on multiple queries
 :class: tip
 
-When you want to match a set Documents (let's call it set `A`) against another set of Documents (set `B`), where you want to find for each element in `A` what are its nearest neighbours in `B`. Then you need `DocumentArray.match()`. It is far more efficient than looping over each Document. You can {ref}`read more about it here<match-documentarray>`.    
+When you want to match a set of Documents (let's call it set `A`) against another set of Documents (set `B`), where you want to find for each element in `A` its nearest neighbours in `B`, then you need `DocumentArray.match()`. It is far more efficient than looping over each Document. You can {ref}`read more about it here<match-documentarray>`.    
 ```
 
 (recursive-nested-document)=
-### Recursive Document
+### Recursive Documents
 
 `Document` can be recursive both horizontally and vertically. The following graphic illustrates the recursive `Document` structure. Each `Document` can have multiple "Chunks"
-and "matches". Chunks and matches are `Document` object as well.
+and "matches". Chunks and matches are `Document` objects as well.
 
 <img src="https://hanxiao.io/2020/08/28/What-s-New-in-Jina-v0-5/blog-post-v050-protobuf-documents.jpg">
 
@@ -244,7 +244,7 @@ of {ref}`DocumentArray<documentarray>`. We will introduce `DocumentArray` later.
 
 #### Caveat: order matters
 
-When adding sub-Documents to `Document.chunks`, do not create them in one line to keep recursive document structure correct. This is because `chunks` use `ref_doc` to control its `granularity`, at `chunk` creation time, it didn't know anything about its parent, and will get a wrong `granularity` value.
+When adding sub-Documents to `Document.chunks`, avoid creating them in one line, otherwise the recursive Document structure will not be correct. This is because `chunks` use `ref_doc` to control their `granularity`. At `chunk` creation time the `chunk` doesn't know anything about its parent, and will get a wrong `granularity` value.
 
 ````{tab} ✅ Do
 ```python
@@ -277,7 +277,7 @@ root_document = Document(
 ### Tags
 
 `Document` contains the `tags` field that can hold a map-like structure that can map arbitrary values. 
-In practice, one can store meta information in `tags`.
+In practice, you can store meta information in `tags`.
 
 ```python
 from jina import Document
@@ -291,7 +291,7 @@ doc.tags['dimensions']
 {'weight': 10.0, 'height': 5.0, 'last_modified': 'Monday'}
 ```
 
-In order to provide easy access to nested fields, the `Document` allows to access attributes by composing the attribute
+To provide easy access to nested fields, the `Document` allows you to access attributes by composing the attribute
 qualified name with interlaced `__` symbols:
 
 ```python
@@ -326,10 +326,10 @@ da.get_attributes('tags__dimensions__height', 'tags__dimensions__weight')
 
 As `tags` does not have a fixed schema, it is declared with type `google.protobuf.Struct` in the `DocumentProto`
 protobuf declaration. However, `google.protobuf.Struct` follows the JSON specification and does not 
-differentiate `int` from `float`. So, data of type `int` in `tags` will be **always** casted to `float` when request is
-sent to executor.
+differentiate `int` from `float`. So, data of type `int` in `tags` will be **always** casted to `float` when a request is
+sent to an Executor.
 
-As a result, users need be explicit and cast the data to the expected type as follows.
+As a result, users need be explicit and cast the data to the expected type as follows:
 
 ````{tab} ✅ Do
 ```{code-block} python
@@ -455,7 +455,7 @@ d.dict()
 
 ### Set/unset attributes
 
-Set an attribute as how you would set an attribute to any Python object: 
+Set an attribute as you would with any Python object: 
 
 ```python
 from jina import Document
@@ -503,7 +503,7 @@ d.pop('text', 'id', 'mime_type')
 ### Add relevancy
 
 ```{tip}
-You normaly don't need to add those values by yourself. For example, `.match()` function automatically fills in 
+You normally don't need to add those values by yourself. For example, the `.match()` function automatically fills in 
 `.scores` of each matched Document.
 ```
 
@@ -578,7 +578,7 @@ for evaluation_key, evaluation_score in d.evaluations.items():
 | Attribute | Description |
 | --- | --- |
 | `doc.id` | A hexdigest that represents a unique Document ID |
-| `doc.parent_id` | A hexdigest that represents the document's parent id |
+| `doc.parent_id` | A hexdigest that represents the Document's parent id |
 | `doc.weight` | The weight of the Document |
 | `doc.mime_type` | The mime type of the Document |
 | `doc.content_type` | The content type of the Document |
@@ -692,17 +692,17 @@ d.update(s)
 
 #### Construct from JSON, CSV, files
 
-The `jina.types.document.generators` module let you construct `Document` from common file types such as JSON,
-CSV, `ndarray` and text files. The following functions will give a generator of `Document`, where each `Document` object
+The `jina.types.document.generators` module let you construct `Document`s from common file types such as JSON,
+CSV, `ndarray` and text files. The following functions will give a generator of `Document`s, where each `Document` object
 corresponds to a line/row in the original format:
 
-|     |     |
-| --- | --- |
-| `from_ndjson()` | Yield `Document` from a line-based JSON file. Each line is a `Document` object |
-| `from_csv()` | Yield `Document` from a CSV file. Each line is a `Document` object |
-| `from_files()` | Yield `Document` from a glob files. Each file is a `Document` object |
-| `from_ndarray()` | Yield `Document` from a `ndarray`. Each row (depending on `axis`) is a `Document` object |
-| `from_lines()` | Yield `Document` from lines, json and csv |
+|                  |                                                                                           |
+| ---              | ---                                                                                       |
+| `from_ndjson()`  | Yield `Document` from a line-based JSON file. Each line is a `Document` object            |
+| `from_csv()`     | Yield `Document` from a CSV file. Each line is a `Document` object                        |
+| `from_files()`   | Yield `Document` from a glob of files. Each file is a `Document` object                   |
+| `from_ndarray()` | Yield `Document` from an `ndarray`. Each row (depending on `axis`) is a `Document` object |
+| `from_lines()`   | Yield `Document` from lines, JSON and CSV                                                 |
 
 Using a generator is sometimes less memory-demanding, as it does not load/build all Document objects in one shot.