From ac2d23dec56d17c8aee7b46703e80fe26c227d42 Mon Sep 17 00:00:00 2001 From: Alex Cureton-Griffiths Date: Tue, 19 Oct 2021 16:13:51 +0200 Subject: [PATCH] docs: polish (#146) --- docs/basics/data-format.md | 30 ++++++++--------- docs/basics/glossary.md | 10 +++--- docs/components/labeler.md | 31 +++++++++--------- docs/components/overview.md | 4 +-- docs/components/tailor.md | 54 +++++++++++++++---------------- docs/components/tuner.md | 18 +++++------ docs/get-started/covid-qa.md | 10 +++--- docs/get-started/fashion-mnist.md | 8 ++--- docs/index.md | 12 +++---- 9 files changed, 87 insertions(+), 90 deletions(-) diff --git a/docs/basics/data-format.md b/docs/basics/data-format.md index 28c037cd0..ba07afe4e 100644 --- a/docs/basics/data-format.md +++ b/docs/basics/data-format.md @@ -11,7 +11,7 @@ This chapter introduces how to construct a `Document` in a way that Finetuner wi ## Understand supervision -Finetuner tunes a deep neural network on search tasks. In this context, the supervision comes from if the nearest-neighbour matches are good or bad, where matches are often computed on model's embeddings. You have to label those good matches and bad matches, so Finetuner can learn about your feedback and improve the model. The following graph illustrates the process. +Finetuner tunes a deep neural network on search tasks. In this context, the supervision comes from whether the nearest-neighbour matches are good or bad, where matches are often computed on model's embeddings. You have to label these good matches and bad matches so Finetuner can learn your feedback and improve the model. The following graph illustrates the process: ```{figure} tuner-journey.svg :align: center @@ -36,7 +36,7 @@ In summary, you either label the matches on-the-fly or prepare the labeled data ### Matches -Finetuner relies on matching data in `.matches`. To manually add a match to a `Document` object, one can do: +Finetuner relies on matching data in `.matches`. To manually add a match to a `Document` object, you can do: ```python from jina import Document @@ -54,7 +54,7 @@ print(d) ``` Note that the match `Document` should share the same content type as its parent `Document`. The following combinations -are not valid to Finetuner: +are not valid in Finetuner: ```python from jina import Document @@ -132,9 +132,9 @@ d.matches.extend([m1, m2, m3]) ```{admonition} Is it okay to have all matches as 1, or all as -1? :class: hint -Yes. Labels should reflect the groundtruth as-is. If a Document contains only postive matches or only negative matches, then so be it. +Yes. Labels should reflect the groundtruth as-is. If a Document contains only positive matches or only negative matches, then so be it. -However, if all match labels from all Documents are the same, then Finetuner can not learn anything useful. +However, if all match labels from all Documents are the same, then Finetuner cannot learn anything useful. ``` ## Data source @@ -142,7 +142,7 @@ However, if all match labels from all Documents are the same, then Finetuner can After organizing the labeled `Document` into `DocumentArray` or `DocumentArrayMemmap`, you can feed them into `finetuner.fit()`. -But where are the labels come from? You can use Labeler, which allows one interactively label data and tune the model at +But where do the labels come from? You can use Labeler, which allows you to interactively label data and tune the model at the same time. Otherwise, you will need to prepare labeled data on your own. @@ -163,11 +163,10 @@ grayscale image. :align: center ``` -To convert this dataset into match data, we build each document to contain the following info that are -relevant: +To convert this dataset into match data, we build each Document to contain the following relevant information: - `.blob`: the image; -- `.matches`: the generated positive & negative matches Document; +- `.matches`: the generated positive and negative matches of the Document; - `.blob`: the matched Document's image; - `.tags['finetuner']['label']`: the match label: `1` or `-1`. @@ -180,14 +179,13 @@ Matches are built with the logic below: ### Covid QA -Covid QA data is a CSV that has 481 rows with columns `question`, `answer` & `wrong_answer`. +Covid QA data is a CSV that has 481 rows with the columns `question`, `answer` & `wrong_answer`. ```{figure} covid-qa-data.png :align: center ``` -To convert this dataset -into match data, we build each document to contain the following info that are relevant: +To convert this dataset into match data, we build each Document to contain the following relevant information: - `.text`: the original `question` column - `.blob`: a fixed length `ndarray` tokenized from `.text` @@ -198,15 +196,15 @@ into match data, we build each document to contain the following info that are r Matches are built with the logic below: -- only allows 1 positive match per Document, it is taken from the `answer` column; -- always include `wrong_answer` column as the negative match. Then sample other documents' answer as negative matches. +- only allows one positive match per Document, taken from the `answer` column; +- always include `wrong_answer` column as the negative match. Then sample other Documents' answer as negative matches. ```{tip} -Finetuner codebase contains two synthetic matching data generator for demo and debugging purpose: +The Finetuner codebase contains two synthetic matching data generators for demo and debugging purpose: - `finetuner.toydata.generate_fashion_match()`: the generator of Fashion-MNIST matching data. - `finetuner.toydata.generate_qa_match()`: the generator of Covid QA matching data. -``` \ No newline at end of file +``` diff --git a/docs/basics/glossary.md b/docs/basics/glossary.md index 91fc1dd97..cc9334454 100644 --- a/docs/basics/glossary.md +++ b/docs/basics/glossary.md @@ -8,17 +8,17 @@ Embedding model A DNN with any shape input (image/text/sequence) and an output `ndarray` in the shape `[B x D]`, where `B` is the batch size same as the input, and `D` is the dimension of the embedding. Unlabeled data - A `DocumentArray`-like object, filling with `Document` with `.content`. + A `DocumentArray`-like object, filled with `Document`s with `.content`. Labeled data - A `DocumentArray`-like object, filling with `Document` with `.content` and `.matches`; where each `match` contains `.content` and `.tags['finetuner']['label']`. + A `DocumentArray`-like object, filled with `Document`s with `.content` and `.matches`; where each `match` contains `.content` and `.tags['finetuner']['label']`. Tuner - A component in Finetuner. Given an {term}`embedding model` and {term}`labeled data`, train the model to fit the data. + A component in Finetuner. Given an {term}`embedding model` and {term}`labeled data`, it trains the model to fit the data. Tailor - A component in Finetuner. Convert any {term}`general model` into an {term}`embedding model`; + A component in Finetuner. Converts any {term}`general model` into an {term}`embedding model`; Labeler A component in Finetuner. Given {term}`unlabeled data` and an {term}`embedding model` or {term}`general model`, labeler asks human for labeling data, trains model and asks better question for labeling. -``` \ No newline at end of file +``` diff --git a/docs/components/labeler.md b/docs/components/labeler.md index cdbea0c13..d5e903938 100644 --- a/docs/components/labeler.md +++ b/docs/components/labeler.md @@ -4,7 +4,7 @@ Labeler is a component in Finetuner. It contains a backend and a frontend UI. Gi Algorithms such as few-shot learning, negative sampling, active learning are implemented in the Labeler. -Labeler can be also used together with Tailor. +Labeler can also be used together with Tailor. ## Fit method @@ -69,17 +69,17 @@ UserWarning: ignored unknown argument: ['thread']. (raised from /Users/hanxiao/D ⠴ Working... ━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0:00:00 estimating... JINA@29672[I]:Finetuner is available at http://localhost:61130/finetuner ``` -All `UserWarning` can be ignored. After few seconds, your browser will open the Labeler UI. If not (depending on your operating system/browser setup), you can find the URL in the terminal and then open it manually. For example, +All `UserWarning`s can be ignored. After a few seconds, your browser will open the Labeler UI. If not (depending on your operating system/browser setup), you can find the URL in the terminal and then open it manually. For example, ```console JINA@29672[I]:Finetuner is available at http://localhost:61130/finetuner ``` ```{tip} -While the frontend may already show examples to label, you may observe a progressbar at the backend keep showing `Working...`. This is because it is still loading your complete input data into the Labeler. The Labeler is designed in an "async" way that you can directly start labeling without waiting for all data to be loaded. +While the frontend may already show examples to label, you may observe a progress bar on the backend that keeps showing `Working...`. This is because it is still loading your complete input data into the Labeler. The Labeler is designed in an "async" way so that you can directly start labeling without waiting for all data to load. ``` -If everything is successful, you should observe the following UI. +If everything is successful, you should observe the following UI: ````{tab} Image ```{figure} labeler-img.png @@ -104,7 +104,7 @@ Control panel is on the left side of the UI. It collects some configs of the fro #### View -View section collects the configs determining how frontend renders the question panel. +The view section collects the configs determining how frontend renders the question panel. ````{sidebar} View @@ -114,13 +114,13 @@ View section collects the configs determining how frontend renders the question ```` - `Field`: represents the field of `Document` your question data come from. - - `Tags Key`: when you select `Field` as `.tags`, this textbox will show up, asking you to further specify which `.tags` key your question data come from. + - `Tags Key`: when you select `Field` as `.tags`, this textbox will show up, asking you to further specify which `.tags` key your question data comes from. - `Content Type`: you need to select the right content type to have the correct rendering on the the question data. - `Examples/View`: The maximum number of labeling examples on the frontend. - `TopK/Examples`: The maximum number of results for each example on the frontend. ````{tip} -If your question panel looks something below, this means rendering is not setup correctly. You need to change `Field`, `Content Type` and `Tags Key` to correct the render setup. +If your question panel looks like the image below, this means rendering is not setup correctly. You need to change `Field`, `Content Type` and `Tags Key` to correct the render setup. ```{figure} bad-config.png :align: center @@ -147,11 +147,11 @@ Progress section collects the statistics of the labeling procedure so far. - `Positve`: the number of labeled positive instances. - `Negative`: the number of labeled negative instances. - `Ignore`: the number of ignored instances. -- `Saved`: the times of saving the model +- `Saved`: how many times the model has been saved. -Underneath the stats there is a progressbar, indicating the ratio of positive, negative and ignored instances so far. +Below the stats there is a progress bar, indicating the ratio of positive, negative and ignored instances so far. -Click `Save Model` button to tell the backend store the model weights at any time. +Click `Save Model` button to tell the backend to store the model weights at any time. #### Advanced @@ -166,13 +166,13 @@ In the advanced section, you can find some configs that affect the training proc - `Positive Label`: the value of the label when an instance is considered as positively related to the question. - `Negative Label`: the value of the label when an instance is considered as negatively related/unrelated to the question. - `Epochs`: the number of training epochs every time a new example is labeled. -- `Match pool`: the size of the pool for computing nearest neighbours. Note that, a larger pool means more diversity when proposing a labeling question; yet slower on every proposal. A smaller pool means faster question proposal, but you may not have very meaningful questions as all top-K answers are bad. -- `Model save path`: the file path for saving the model, this is used when you click "Save model" button. +- `Match pool`: the size of the pool for computing nearest neighbours. Note that a larger pool means more diversity when proposing a labeling question; yet it's slower on every proposal. A smaller pool means faster question proposal, but you may not have very meaningful questions if all top-K answers are bad. +- `Model save path`: the file path for saving the model, used when you click "Save model" button. ### Question panel -Question panel shows a multi-selection question in a card. The user needs to select the most related answers from the list/grid and submit the results. +Question panel shows a multi-choice question in a card. The user needs to select the most relevant answers from the list/grid and submit the results. ```{figure} labeler-question.gif :align: center @@ -184,13 +184,12 @@ Question panel shows a multi-selection question in a card. The user needs to sel :width: 50% ``` - -You can use keyboard shortcut to select related answers. The selections are considered as positive, whereas the remains are considered as negative. Use `Invert` or hit `` to invert the selections. +You can use a keyboard shortcut to select related answers. The selections are considered positive, whereas the remains are considered negative. Use `Invert` or hit `` to invert the selection. Click `Done` or hit `` to submit the result. -Once a submission is done, you will see the backend starts to train based on your submission. A spinner will be showed near the "Progress" section, indicating the backend is working. Afterwards, a new question is proposed based on the newly trained model. +Once a submission is completed, you will see the backend starts to train based on your submission. A spinner will show near the "Progress" section, indicating the backend is working. Afterwards, a new question is proposed based on the newly trained model. diff --git a/docs/components/overview.md b/docs/components/overview.md index 6ebbfa642..edd9e1995 100644 --- a/docs/components/overview.md +++ b/docs/components/overview.md @@ -3,7 +3,7 @@ Finetuner project is composed of three components: - **Tuner**: to tune any embedding model for better embedding on labeled data; - **Tailor**: to convert any deep neural network into an embedding model; -- **Labeler**: a UI for interactive labeling and conduct [active learning](https://en.wikipedia.org/wiki/Active_learning_(machine_learning)) via Tuner. +- **Labeler**: a UI for interactive labeling and conducting [active learning](https://en.wikipedia.org/wiki/Active_learning_(machine_learning)) via Tuner. ```{figure} finetuner-composition.svg :align: center @@ -12,7 +12,7 @@ Finetuner project is composed of three components: ## Usage -The three components can be used in combinations under different scenarios. +The three components can be used in combination under different scenarios. ```{figure} four-usecases.svg diff --git a/docs/components/tailor.md b/docs/components/tailor.md index e0d86813d..a0f4a459d 100644 --- a/docs/components/tailor.md +++ b/docs/components/tailor.md @@ -1,8 +1,8 @@ # Tailor -Tailor is a component of Finetuner. It converts any {term}`general model` into an {term}`embedding model`. Given a general model (either written from scratch, or from Pytorch/Keras/Huggingface model zoo), Tailor does micro-operations on the model architecture and outputs an embedding model for the {term}`Tuner`. +Tailor is a component of Finetuner. It converts any {term}`general model` into an {term}`embedding model`. Given a general model (either written from scratch, or from PyTorch/Keras/Huggingface model zoo), Tailor performs micro-operations on the model architecture and outputs an embedding model for the {term}`Tuner`. -Given a general model with weights, Tailor *preserves its weights* and does (some of) the following steps: +Given a general model with weights, Tailor *preserves its weights* and performs (some of) the following steps: - finding all dense layers by iterating over layers; - chopping off all layers after a certain dense layer; - freezing the weights on the remaining layers; @@ -12,11 +12,11 @@ Given a general model with weights, Tailor *preserves its weights* and does (som :align: center ``` -In the end, Tailor outputs an embedding model that can be fine-tuned in Tuner. +Finally, Tailor outputs an embedding model that can be fine-tuned in Tuner. ## `to_embedding_model` method -Tailor provides a high-level API `finetuner.tailor.to_embedding_model()`, which can be used as following: +Tailor provides a high-level API `finetuner.tailor.to_embedding_model()`, which can be used as follows: ```python from finetuner.tailor import to_embedding_model @@ -31,15 +31,15 @@ to_embedding_model( ) -> AnyDNN ``` -Here, `model` is the general model with loaded weights; `layer_name` is the selected bottleneck layer; `freeze` defines if to set weights of remaining layers as nontrainable parameters. +Here, `model` is the general model with loaded weights; `layer_name` is the selected bottleneck layer; `freeze` defines whether to set weights of remaining layers as non-trainable parameters. -`input_size` and `input_dtype` are input type specification required by Pytorch and Paddle models. They are not required for Kersas models. +`input_size` and `input_dtype` are input type specification required by PyTorch and Paddle models. They are not required for Keras models. -In general, you do not need to call `to_embedding_model` manually. You can directly use it via `finetuner.fit(..., to_embedding_model=True)` +In general, you do not need to call `to_embedding_model` manually. You can use it directly via `finetuner.fit(..., to_embedding_model=True)` ## `display` method -Tailor also provides a helper function `finetuner.tailor.display()` that gives a table summary of a Keras/Pytorch/Paddle model. +Tailor also provides a helper function `finetuner.tailor.display()` that gives a table summary of a Keras/PyTorch/Paddle model. Let's see how to use them in action. @@ -47,7 +47,7 @@ Let's see how to use them in action. ### Simple MLP -1. Let's first build a simple 2-layer perceptron with 128 and 32-dim output as layers via Pytorch/Keras/Paddle. +1. Let's first build a simple 2-layer perceptron with 128 and 32-dim output as layers via PyTorch/Keras/Paddle. ````{tab} PyTorch ```python import torch @@ -84,7 +84,7 @@ Let's see how to use them in action. ```` 2. Let's use `display` to look at the layer information. - ````{tab} Pytorch + ````{tab} PyTorch ```python from finetuner.tailor import display @@ -132,7 +132,7 @@ Let's see how to use them in action. linear_4 [32] 4128 True ``` ```` -3. Say we want to get an embedding model that outputs 100-dimensional embeddings. One can simply do +3. Say we want to get an embedding model that outputs 100-dimensional embeddings. You can simply do: ```python from finetuner.tailor import to_embedding_model @@ -146,7 +146,7 @@ Let's see how to use them in action. display(model, input_size=(28, 28)) ``` - ````{tab} Pytorch + ````{tab} PyTorch ```console name output_shape_display nb_params trainable @@ -182,11 +182,11 @@ Let's see how to use them in action. linear_5 [100] 3300 True ``` ```` - One can see that Tailor adds an additional linear layer with 100-dimensional output at the end. + You can see that Tailor adds an additional linear layer with 100-dimensional output at the end. ### Simple Bi-LSTM -1. Let's first build a simple Bi-directional LSTM with Pytorch/Keras/Paddle. +1. Let's first build a simple bi-directional LSTM with PyTorch/Keras/Paddle. ````{tab} PyTorch ```python import torch @@ -231,7 +231,7 @@ Let's see how to use them in action. ```` 2. Let's use `display` to look at the layer information. - ````{tab} Pytorch + ````{tab} PyTorch ```python from finetuner.tailor import display @@ -279,8 +279,8 @@ Let's see how to use them in action. linear_4 [32] 4128 True ``` ```` -3. Say we want to get an embedding model that outputs 100-dimensional embeddings. But this time, we want to directly concat this layer after LSTM, and freeze all previous layers. One can use `layer_name` and `freeze` to solve this problem. In Pytorch and Paddle implementation, the layer name is `lastcell_3`; in Keras the layer name is `bidirectional`. - ````{tab} Pytorch +3. Say we want to get an embedding model that outputs 100-dimensional embeddings. But this time, we want to directly concat this layer after LSTM, and freeze all previous layers. You can use `layer_name` and `freeze` to solve this problem. In PyTorch or Paddle implementations, the layer name is `lastcell_3`; in Keras the layer name is `bidirectional`. + ````{tab} PyTorch ```python from finetuner.tailor import to_embedding_model @@ -339,15 +339,15 @@ Let's see how to use them in action. linear_4 [100] 12900 True ``` ```` - One can observe the last linear layer is replaced from a 32-dimensional output to a 100-dimensional output. Also, the weights of all layers except the last layers are frozen and not trainable. + You can observe the last linear layer is replaced from a 32-dimensional output to a 100-dimensional output. Also, the weights of all layers except the last layers are frozen and not trainable. ### Pretrained VGG16 model -Apart from building model on your own and then tailor it, Tailor can work directly on pretrained models. In this example, we load a pretrained VGG16 model and tailor it into an embedding model. +Apart from building a model on your own and then tailoring it, Tailor can work directly on pretrained models. In this example, we load a pretrained VGG16 model and tailor it into an embedding model. -1. Let's first load a pretrained VGG16 from Pytorch/Keras/Paddle model zoo. +1. Let's first load a pretrained VGG16 from PyTorch/Keras/Paddle model zoo. ````{tab} PyTorch ```python import torchvision.models as models @@ -371,7 +371,7 @@ Apart from building model on your own and then tailor it, Tailor can work direct ```` 2. Let's use `display` to look at the layer information. - ````{tab} Pytorch + ````{tab} PyTorch ```python from finetuner.tailor import display @@ -490,8 +490,8 @@ Apart from building model on your own and then tailor it, Tailor can work direct linear_39 [1000] 4097000 True ``` ```` -3. Say we want to get an embedding model that outputs 100-dimensional embeddings. This time we want to remove all existing dense layers, and freeze all previous layers, then concat a new 100-dimensional dense output to the mode. To achieve that, - ````{tab} Pytorch +3. Say we want to get an embedding model that outputs 100-dimensional embeddings. This time we want to remove all existing dense layers, and freeze all previous layers, then concat a new 100-dimensional dense output to the model. To achieve that you can do: + ````{tab} PyTorch ```python from finetuner.tailor import to_embedding_model @@ -599,10 +599,10 @@ Apart from building model on your own and then tailor it, Tailor can work direct linear_34 [100] 409700 True ``` ```` - One can observe the original last two linear layers are removed, and a new linear layer with 100-dimensional output is added at the end. Also, the weights of all layers except the last layers are frozen and not trainable. + You can observe the original last two linear layers are removed, and a new linear layer with 100-dimensional output has been added at the end. Also, the weights of all layers except the last layers are frozen and not trainable. ## Tips -- For Pytorch/Paddle models, having the correct `input_size` and `input_dtype` is fundamental to use `to_embedding_model` and `display`. -- One can chop-off layers and concat new layer afterward. To get the accurate layer name, one can first use `display` to list all layers. -- Different frameworks may give different layer names. Often, Pytorch and Paddle layer names are consistent. \ No newline at end of file +- For PyTorch/Paddle models, having the correct `input_size` and `input_dtype` is fundamental to use `to_embedding_model` and `display`. +- You can chop off layers and concat new layers afterward. To get the accurate layer name, you can first use `display` to list all layers. +- Different frameworks may give different layer names. Often, PyTorch and Paddle layer names are consistent. diff --git a/docs/components/tuner.md b/docs/components/tuner.md index 3b7abeb26..388e7c6f2 100644 --- a/docs/components/tuner.md +++ b/docs/components/tuner.md @@ -7,7 +7,7 @@ Labeled data can be constructed {ref}`by following this` ## Fit method -Tuner can be called via `finetuner.fit()`. Its minimum form looks like the folllowing: +Tuner can be called via `finetuner.fit()`. Its minimum form is as follows: ```python import finetuner @@ -20,13 +20,13 @@ finetuner.fit( ``` -Here, `embed_model` must be {term}`embedding model`; and `train_data` must be {term}`labeled data`. +Here, `embed_model` must be an {term}`embedding model`; and `train_data` must be {term}`labeled data`. ### Loss function -By default, Tuner uses `CosineSiameseLoss` for training. you can also use other built-in losses by `finetuner.fit(..., loss='...')`. +By default, Tuner uses `CosineSiameseLoss` for training. You can also use other built-in losses by `finetuner.fit(..., loss='...')`. -Let $\mathbf{x}_i$ denote the predicted embedding for Document $i$, the built-in losses are summarized as below: +Let $\mathbf{x}_i$ denotes the predicted embedding for Document $i$. The built-in losses are summarized as follows: :::{dropdown} `CosineSiameseLoss` :open: @@ -58,7 +58,7 @@ $$\ell_{i, p, n}=\max(0, \left \|\mathbf{x}_i, \mathbf{x}_p \right \|-\left \|\m ```{tip} -Although siamese and triplet loss work on pair and triplet input respectively, there is **no need** to worry about the data input format. You only need to make sure your data is labeled according to {ref}`data-format`, then you can switch between all losses freely. +Although siamese and triplet loss works on pair and triplet inputs respectively, there is **no need** to worry about the data input format. You only need to make sure your data is labeled according to {ref}`data-format`, then you can switch between all losses freely. ``` @@ -103,10 +103,10 @@ Although siamese and triplet loss work on pair and triplet input respectively, t ```` -2. Build labeled match data {ref}`according to the steps in here`. One can refer +2. Build labeled match data {ref}`according to the steps here`. You can refer to `finetuner.toydata.generate_fashion_match` for an implementation. In this example, for each `Document` we generate 10 positive matches and 10 negative matches. -3. Feed labeled data and the embedding model into Finetuner: +3. Feed the labeled data and embedding model into Finetuner: ```python import finetuner from finetuner.toydata import generate_fashion_match @@ -125,7 +125,7 @@ Although siamese and triplet loss work on pair and triplet input respectively, t ### Tune a bidirectional LSTM on Covid QA -1. Write an embedding model. +1. Write an embedding model: ````{tab} Keras ```python @@ -169,7 +169,7 @@ Although siamese and triplet loss work on pair and triplet input respectively, t ``` ```` -2. Build labeled match data {ref}`according to the steps in here`. One can refer +2. Build labeled match data {ref}`according to the steps here`. You can refer to `finetuner.toydata.generate_qa_match` for an implementation. 3. Feed labeled data and the embedding model into Finetuner: diff --git a/docs/get-started/covid-qa.md b/docs/get-started/covid-qa.md index 26623f23c..8a5b4bba2 100644 --- a/docs/get-started/covid-qa.md +++ b/docs/get-started/covid-qa.md @@ -10,13 +10,13 @@ In this example, we want to "tune" the 32-dim embedding vectors from a bidirecti Precisely, "tuning" means: - we set up a Jina search pipeline and will look at the top-K semantically similar questions; - we accept or reject the results based on their quality; -- we let the model to remember our feedback and produces better search result. +- we let the model remember our feedback and produce better search results. -Hopefully the procedure converges after several rounds; and we get a tuned embedding for better search task. +Hopefully the procedure converges after several rounds and we get a tuned embedding for better search tasks. ## Build embedding model -Let's write a 2-layer MLP as our {ref}`embedding model` using any of the following framework. +Let's write a 2-layer MLP as our {ref}`embedding model` using any of the following frameworks: ````{tab} PyTorch @@ -69,7 +69,7 @@ embed_model = paddle.nn.Sequential( ## Prepare data -Now prepare CovidQA data for the Finetuner. Note that Finetuner accepts Jina `DocumentArray`/`DocumentArrayMemmap`, so we first convert them into this format. +Now prepare CovidQA data for the Finetuner. Note that Finetuner accepts Jina `DocumentArray`/`DocumentArrayMemmap`, so we first convert the data into this format. ```python from finetuner.toydata import generate_qa_match @@ -99,7 +99,7 @@ finetuner.fit( From the left bar, select `text` as the view. -In the content, select `.tags` and then fill in `question` to tell the UI renders text from `Document.tags['question']`. +In the content, select `.tags` and then fill in `question` to tell the UI to render text from `Document.tags['question']`. You can now label the data by mouse/keyboard. The model will get trained and improved as you are labeling. diff --git a/docs/get-started/fashion-mnist.md b/docs/get-started/fashion-mnist.md index f85d9f5ce..280231647 100644 --- a/docs/get-started/fashion-mnist.md +++ b/docs/get-started/fashion-mnist.md @@ -9,13 +9,13 @@ In this example, we want to "tune" the 32-dim embedding vectors from a 2-layer M Precisely, "tuning" means: - we set up a Jina search pipeline and will look at the top-K visually similar result; - we accept or reject the results based on their quality; -- we let the model to remember our feedback and produces better search result. +- we let the model remember our feedback and produce better search result. -Hopefully the procedure converges after several rounds; and we get a tuned embedding for better search task. +Hopefully the procedure converges after several rounds and we get a tuned embedding for better search task. ## Build embedding model -Let's write a 2-layer MLP as our {ref}`embedding model` using any of the following framework. +Let's write a 2-layer MLP as our {ref}`embedding model` using any of the following frameworks: ````{tab} PyTorch @@ -93,7 +93,7 @@ You can now label the data by mouse/keyboard. The model will get trained and imp :align: center ``` -From the backend you will see model's training procedure: +From the backend you will see the model's training procedure: ```bash Flow@22900[I]:🎉 Flow is ready to use! diff --git a/docs/index.md b/docs/index.md index 0f143792e..af96dc7b2 100644 --- a/docs/index.md +++ b/docs/index.md @@ -11,7 +11,7 @@ ```bash pip install finetuner ``` -2. In this example, we want to tune the 32-dim embedding vectors from a 2-layer MLP on the Fashion-MNIST data. Let's write a model with any of the following framework: +2. In this example, we want to tune the 32-dim embedding vectors from a 2-layer MLP on the Fashion-MNIST dataset. Let's write a model with any of the following frameworks: ````{tab} PyTorch ```python @@ -77,7 +77,7 @@ ``` ```` -Now that you’re set up, let’s dive into more of how Finetuner works and can improve the performance of your neural search apps. +Now that you’re set up, let’s dive into more of how Finetuner works and improves the performance of your neural search apps. ## Next steps @@ -131,7 +131,7 @@ Learn more about {term}`labeled data`. :::{card} Finetuner usage 1 -Perfect! Now `embed_model` and `train_data` are given by you already, simply do: +Perfect! Now `embed_model` and `train_data` are already provided by you, simply do: ```python import finetuner @@ -151,7 +151,7 @@ Learn more about {term}`Tuner`. :::{card} Finetuner usage 2 -You have an `embed_model` to use, but no labeled data for finetuning this model. No worry, you can use Finetuner to interactive label data and train `embed_model` as below: +You have an `embed_model` to use, but no labeled data for fine-tuning this model. No worries, you can use Finetuner to interactively label data and train `embed_model` as follows: ```{code-block} python --- @@ -175,7 +175,7 @@ Learn more about {term}`Tuner` and {term}`Labeler`. :::{card} Finetuner usage 3 -You have a `general_model` but it does not output embeddings. Luckily you provide some `labeled_data` for training. No worry, Finetuner can convert your model into an embedding model and train it via: +You have a `general_model` but it does not output embeddings. Luckily, you've got some `labeled_data` for training. No worries, Finetuner can convert your model into an embedding model and train it via: ```{code-block} python --- @@ -200,7 +200,7 @@ Learn more about {term}`Tailor` and {term}`Tuner`. :::{card} Finetuner usage 4 -You have a `general_model` which is not for embeddings. Meanwhile, you don't have labeled data for training. But no worries, Finetuner can help you train an embedding model with interactive labeling on-the-fly: +You have a `general_model` which is not for embeddings. Meanwhile, you don't have any labeled data for training. But no worries, Finetuner can help you train an embedding model with interactive labeling on-the-fly: ```{code-block} python ---