Skip to content

Commit

Permalink
docs: polish (#146)
Browse files Browse the repository at this point in the history
  • Loading branch information
alexcg1 committed Oct 19, 2021
1 parent bc8b36e commit ac2d23d
Show file tree
Hide file tree
Showing 9 changed files with 87 additions and 90 deletions.
30 changes: 14 additions & 16 deletions docs/basics/data-format.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ This chapter introduces how to construct a `Document` in a way that Finetuner wi

## Understand supervision

Finetuner tunes a deep neural network on search tasks. In this context, the supervision comes from if the nearest-neighbour matches are good or bad, where matches are often computed on model's embeddings. You have to label those good matches and bad matches, so Finetuner can learn about your feedback and improve the model. The following graph illustrates the process.
Finetuner tunes a deep neural network on search tasks. In this context, the supervision comes from whether the nearest-neighbour matches are good or bad, where matches are often computed on model's embeddings. You have to label these good matches and bad matches so Finetuner can learn your feedback and improve the model. The following graph illustrates the process:

```{figure} tuner-journey.svg
:align: center
Expand All @@ -36,7 +36,7 @@ In summary, you either label the matches on-the-fly or prepare the labeled data

### Matches

Finetuner relies on matching data in `.matches`. To manually add a match to a `Document` object, one can do:
Finetuner relies on matching data in `.matches`. To manually add a match to a `Document` object, you can do:

```python
from jina import Document
Expand All @@ -54,7 +54,7 @@ print(d)
```

Note that the match `Document` should share the same content type as its parent `Document`. The following combinations
are not valid to Finetuner:
are not valid in Finetuner:

```python
from jina import Document
Expand Down Expand Up @@ -132,17 +132,17 @@ d.matches.extend([m1, m2, m3])
```{admonition} Is it okay to have all matches as 1, or all as -1?
:class: hint
Yes. Labels should reflect the groundtruth as-is. If a Document contains only postive matches or only negative matches, then so be it.
Yes. Labels should reflect the groundtruth as-is. If a Document contains only positive matches or only negative matches, then so be it.
However, if all match labels from all Documents are the same, then Finetuner can not learn anything useful.
However, if all match labels from all Documents are the same, then Finetuner cannot learn anything useful.
```

## Data source

After organizing the labeled `Document` into `DocumentArray` or `DocumentArrayMemmap`, you can feed them
into `finetuner.fit()`.

But where are the labels come from? You can use Labeler, which allows one interactively label data and tune the model at
But where do the labels come from? You can use Labeler, which allows you to interactively label data and tune the model at
the same time.

Otherwise, you will need to prepare labeled data on your own.
Expand All @@ -163,11 +163,10 @@ grayscale image.
:align: center
```

To convert this dataset into match data, we build each document to contain the following info that are
relevant:
To convert this dataset into match data, we build each Document to contain the following relevant information:

- `.blob`: the image;
- `.matches`: the generated positive & negative matches Document;
- `.matches`: the generated positive and negative matches of the Document;
- `.blob`: the matched Document's image;
- `.tags['finetuner']['label']`: the match label: `1` or `-1`.

Expand All @@ -180,14 +179,13 @@ Matches are built with the logic below:
### Covid QA


Covid QA data is a CSV that has 481 rows with columns `question`, `answer` & `wrong_answer`.
Covid QA data is a CSV that has 481 rows with the columns `question`, `answer` & `wrong_answer`.

```{figure} covid-qa-data.png
:align: center
```

To convert this dataset
into match data, we build each document to contain the following info that are relevant:
To convert this dataset into match data, we build each Document to contain the following relevant information:

- `.text`: the original `question` column
- `.blob`: a fixed length `ndarray` tokenized from `.text`
Expand All @@ -198,15 +196,15 @@ into match data, we build each document to contain the following info that are r

Matches are built with the logic below:

- only allows 1 positive match per Document, it is taken from the `answer` column;
- always include `wrong_answer` column as the negative match. Then sample other documents' answer as negative matches.
- only allows one positive match per Document, taken from the `answer` column;
- always include `wrong_answer` column as the negative match. Then sample other Documents' answer as negative matches.


```{tip}
Finetuner codebase contains two synthetic matching data generator for demo and debugging purpose:
The Finetuner codebase contains two synthetic matching data generators for demo and debugging purpose:
- `finetuner.toydata.generate_fashion_match()`: the generator of Fashion-MNIST matching data.
- `finetuner.toydata.generate_qa_match()`: the generator of Covid QA matching data.
```
```
10 changes: 5 additions & 5 deletions docs/basics/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,17 @@ Embedding model
A DNN with any shape input (image/text/sequence) and an output `ndarray` in the shape `[B x D]`, where `B` is the batch size same as the input, and `D` is the dimension of the embedding.
Unlabeled data
A `DocumentArray`-like object, filling with `Document` with `.content`.
A `DocumentArray`-like object, filled with `Document`s with `.content`.
Labeled data
A `DocumentArray`-like object, filling with `Document` with `.content` and `.matches`; where each `match` contains `.content` and `.tags['finetuner']['label']`.
A `DocumentArray`-like object, filled with `Document`s with `.content` and `.matches`; where each `match` contains `.content` and `.tags['finetuner']['label']`.
Tuner
A component in Finetuner. Given an {term}`embedding model` and {term}`labeled data`, train the model to fit the data.
A component in Finetuner. Given an {term}`embedding model` and {term}`labeled data`, it trains the model to fit the data.
Tailor
A component in Finetuner. Convert any {term}`general model` into an {term}`embedding model`;
A component in Finetuner. Converts any {term}`general model` into an {term}`embedding model`;
Labeler
A component in Finetuner. Given {term}`unlabeled data` and an {term}`embedding model` or {term}`general model`, labeler asks human for labeling data, trains model and asks better question for labeling.
```
```
31 changes: 15 additions & 16 deletions docs/components/labeler.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Labeler is a component in Finetuner. It contains a backend and a frontend UI. Gi

Algorithms such as few-shot learning, negative sampling, active learning are implemented in the Labeler.

Labeler can be also used together with Tailor.
Labeler can also be used together with Tailor.

## Fit method

Expand Down Expand Up @@ -69,17 +69,17 @@ UserWarning: ignored unknown argument: ['thread']. (raised from /Users/hanxiao/D
⠴ Working... ━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0:00:00 estimating... JINA@29672[I]:Finetuner is available at http://localhost:61130/finetuner
```

All `UserWarning` can be ignored. After few seconds, your browser will open the Labeler UI. If not (depending on your operating system/browser setup), you can find the URL in the terminal and then open it manually. For example,
All `UserWarning`s can be ignored. After a few seconds, your browser will open the Labeler UI. If not (depending on your operating system/browser setup), you can find the URL in the terminal and then open it manually. For example,

```console
JINA@29672[I]:Finetuner is available at http://localhost:61130/finetuner
```

```{tip}
While the frontend may already show examples to label, you may observe a progressbar at the backend keep showing `Working...`. This is because it is still loading your complete input data into the Labeler. The Labeler is designed in an "async" way that you can directly start labeling without waiting for all data to be loaded.
While the frontend may already show examples to label, you may observe a progress bar on the backend that keeps showing `Working...`. This is because it is still loading your complete input data into the Labeler. The Labeler is designed in an "async" way so that you can directly start labeling without waiting for all data to load.
```

If everything is successful, you should observe the following UI.
If everything is successful, you should observe the following UI:

````{tab} Image
```{figure} labeler-img.png
Expand All @@ -104,7 +104,7 @@ Control panel is on the left side of the UI. It collects some configs of the fro

#### View

View section collects the configs determining how frontend renders the question panel.
The view section collects the configs determining how frontend renders the question panel.


````{sidebar} View
Expand All @@ -114,13 +114,13 @@ View section collects the configs determining how frontend renders the question
````

- `Field`: represents the field of `Document` your question data come from.
- `Tags Key`: when you select `Field` as `.tags`, this textbox will show up, asking you to further specify which `.tags` key your question data come from.
- `Tags Key`: when you select `Field` as `.tags`, this textbox will show up, asking you to further specify which `.tags` key your question data comes from.
- `Content Type`: you need to select the right content type to have the correct rendering on the the question data.
- `Examples/View`: The maximum number of labeling examples on the frontend.
- `TopK/Examples`: The maximum number of results for each example on the frontend.

````{tip}
If your question panel looks something below, this means rendering is not setup correctly. You need to change `Field`, `Content Type` and `Tags Key` to correct the render setup.
If your question panel looks like the image below, this means rendering is not setup correctly. You need to change `Field`, `Content Type` and `Tags Key` to correct the render setup.
```{figure} bad-config.png
:align: center
Expand All @@ -147,11 +147,11 @@ Progress section collects the statistics of the labeling procedure so far.
- `Positve`: the number of labeled positive instances.
- `Negative`: the number of labeled negative instances.
- `Ignore`: the number of ignored instances.
- `Saved`: the times of saving the model
- `Saved`: how many times the model has been saved.

Underneath the stats there is a progressbar, indicating the ratio of positive, negative and ignored instances so far.
Below the stats there is a progress bar, indicating the ratio of positive, negative and ignored instances so far.

Click `Save Model` button to tell the backend store the model weights at any time.
Click `Save Model` button to tell the backend to store the model weights at any time.

#### Advanced

Expand All @@ -166,13 +166,13 @@ In the advanced section, you can find some configs that affect the training proc
- `Positive Label`: the value of the label when an instance is considered as positively related to the question.
- `Negative Label`: the value of the label when an instance is considered as negatively related/unrelated to the question.
- `Epochs`: the number of training epochs every time a new example is labeled.
- `Match pool`: the size of the pool for computing nearest neighbours. Note that, a larger pool means more diversity when proposing a labeling question; yet slower on every proposal. A smaller pool means faster question proposal, but you may not have very meaningful questions as all top-K answers are bad.
- `Model save path`: the file path for saving the model, this is used when you click "Save model" button.
- `Match pool`: the size of the pool for computing nearest neighbours. Note that a larger pool means more diversity when proposing a labeling question; yet it's slower on every proposal. A smaller pool means faster question proposal, but you may not have very meaningful questions if all top-K answers are bad.
- `Model save path`: the file path for saving the model, used when you click "Save model" button.

### Question panel


Question panel shows a multi-selection question in a card. The user needs to select the most related answers from the list/grid and submit the results.
Question panel shows a multi-choice question in a card. The user needs to select the most relevant answers from the list/grid and submit the results.

```{figure} labeler-question.gif
:align: center
Expand All @@ -184,13 +184,12 @@ Question panel shows a multi-selection question in a card. The user needs to sel
:width: 50%
```


You can use keyboard shortcut to select related answers. The selections are considered as positive, whereas the remains are considered as negative. Use `Invert` or hit `<i>` to invert the selections.
You can use a keyboard shortcut to select related answers. The selections are considered positive, whereas the remains are considered negative. Use `Invert` or hit `<i>` to invert the selection.


Click `Done` or hit `<space>` to submit the result.

Once a submission is done, you will see the backend starts to train based on your submission. A spinner will be showed near the "Progress" section, indicating the backend is working. Afterwards, a new question is proposed based on the newly trained model.
Once a submission is completed, you will see the backend starts to train based on your submission. A spinner will show near the "Progress" section, indicating the backend is working. Afterwards, a new question is proposed based on the newly trained model.



Expand Down
4 changes: 2 additions & 2 deletions docs/components/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Finetuner project is composed of three components:
- **Tuner**: to tune any embedding model for better embedding on labeled data;
- **Tailor**: to convert any deep neural network into an embedding model;
- **Labeler**: a UI for interactive labeling and conduct [active learning](https://en.wikipedia.org/wiki/Active_learning_(machine_learning)) via Tuner.
- **Labeler**: a UI for interactive labeling and conducting [active learning](https://en.wikipedia.org/wiki/Active_learning_(machine_learning)) via Tuner.

```{figure} finetuner-composition.svg
:align: center
Expand All @@ -12,7 +12,7 @@ Finetuner project is composed of three components:

## Usage

The three components can be used in combinations under different scenarios.
The three components can be used in combination under different scenarios.


```{figure} four-usecases.svg
Expand Down
Loading

0 comments on commit ac2d23d

Please sign in to comment.