docs: add finetuner docs (#771)

* docs: add finetuner docs * docs: add finetuner instruction * docs: add finetuner instruction * docs: add finetuner instruction * docs: add finetuner instruction * docs: add finetuner instruction * docs: add finetuner instruction * docs: improve narratives * docs: minor revision * docs: minor revision * docs: address comment * docs: table width * docs: fix table * docs: minor revision Co-authored-by: Isabelle Mohr <retrospect@protonmail.com> * docs: minor revision Co-authored-by: Isabelle Mohr <retrospect@protonmail.com> * docs: minor revision Co-authored-by: Isabelle Mohr <retrospect@protonmail.com> * docs: address comment * docs: restructure * docs: intersphinx * docs: typo * docs: add image preview * docs: fix image link * docs: fix typo * docs: add model path example * docs: add model path example * docs: improve narratives Co-authored-by: Isabelle Mohr <retrospect@protonmail.com>
jina-ai · Jul 20, 2022 · bc6b72e · bc6b72e
1 parent 0ff4e25
commit bc6b72e
Show file tree

Hide file tree

Showing 4 changed files with 238 additions and 5 deletions.
diff --git a/docs/conf.py b/docs/conf.py
@@ -80,6 +80,8 @@
 html_show_sourcelink = False
 html_favicon = '_static/favicon.png'
 
+intersphinx_mapping = {'docarray': ('https://docarray.jina.ai/', None), 'finetuner': ('https://finetuner.jina.ai/', None)}
+
 latex_documents = [(master_doc, f'{slug}.tex', project, author, 'manual')]
 man_pages = [(master_doc, slug, project, [author], 1)]
 texinfo_documents = [

diff --git a/docs/index.md b/docs/index.md
@@ -178,7 +178,6 @@ It means the client and the server are now connected. Well done!
 user-guides/client
 user-guides/server
 user-guides/faq
-
 ```
 
 ```{toctree}

diff --git a/docs/user-guides/finetuner.md b/docs/user-guides/finetuner.md
@@ -0,0 +1,187 @@
+(Finetuner)=
+# Fine-tune Models
+
+Although CLIP-as-service has provided you a list of pre-trained models, you can also fine-tune your models. 
+This guide will show you how to use [Finetuner](https://finetuner.jina.ai) to fine-tune models and use them in CLIP-as-service.
+
+For installation and basic usage of Finetuner, please refer to [Finetuner documentation](https://finetuner.jina.ai).
+You can also [learn more details about fine-tuning CLIP](https://finetuner.jina.ai/tasks/text-to-image/).
+
+## Prepare Training Data
+
+Finetuner accepts training data and evaluation data in the form of {class}`~docarray.array.document.DocumentArray`.
+The training data for CLIP is a list of (text, image) pairs.
+Each pair is stored in a {class}`~docarray.document.Document` which wraps two [`chunks`](https://docarray.jina.ai/fundamentals/document/nested/) with `image` and `text` modality respectively.
+You can push the resulting {class}`~docarray.array.document.DocumentArray` to the cloud using the {meth}`~docarray.array.document.DocumentArray.push` method.
+
+We use [fashion captioning dataset](https://github.com/xuewyang/Fashion_Captioning) as a sample dataset in this tutorial.
+The following are examples of descriptions and image urls from the dataset.
+We also include a preview of each image.
+
+| Description                                                                                                                           | Image URL                                                                                                                                                           | Preview                                                                                                        |
+|---------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------|
+| subtly futuristic and edgy this liquid metal cuff bracelet is shaped from sculptural rectangular link                                 | [https://n.nordstrommedia.com/id/sr3/<br/>58d1a13f-b6b6-4e68-b2ff-3a3af47c422e.jpeg](https://n.nordstrommedia.com/id/sr3/58d1a13f-b6b6-4e68-b2ff-3a3af47c422e.jpeg) | <img src="https://n.nordstrommedia.com/id/sr3/58d1a13f-b6b6-4e68-b2ff-3a3af47c422e.jpeg?raw=true" width=100px> |
+| high quality leather construction defines a hearty boot one-piece on a tough lug sole                                                 | [https://n.nordstrommedia.com/id/sr3/<br/>21e7a67c-0a54-4d09-a4a4-6a0e0840540b.jpeg](https://n.nordstrommedia.com/id/sr3/21e7a67c-0a54-4d09-a4a4-6a0e0840540b.jpeg) | <img src="https://n.nordstrommedia.com/id/sr3/21e7a67c-0a54-4d09-a4a4-6a0e0840540b.jpeg?raw=true" width=100px> |
+| this shimmering tricot knit tote is traced with decorative whipstitching and diamond cut chain the two hallmark of the falabella line | [https://n.nordstrommedia.com/id/sr3/<br/>1d8dd635-6342-444d-a1d3-4f91a9cf222b.jpeg](https://n.nordstrommedia.com/id/sr3/1d8dd635-6342-444d-a1d3-4f91a9cf222b.jpeg) | <img src="https://n.nordstrommedia.com/id/sr3/1d8dd635-6342-444d-a1d3-4f91a9cf222b.jpeg?raw=true" width=100px> |
+| ...                                                                                                                                   | ...                                                                                                                                                                 | ...                                                                                                            |
+
+You can use the following script to transform the first three entries of the dataset to a {class}`~docarray.array.document.DocumentArray` and push it to the cloud using the name `fashion-sample`.
+
+```python
+from docarray import Document, DocumentArray
+
+train_da = DocumentArray(
+    [
+        Document(
+            chunks=[
+                Document(
+                    content='subtly futuristic and edgy this liquid metal cuff bracelet is shaped from sculptural rectangular link',
+                    modality='text',
+                ),
+                Document(
+                    uri='https://n.nordstrommedia.com/id/sr3/58d1a13f-b6b6-4e68-b2ff-3a3af47c422e.jpeg',
+                    modality='image',
+                ),
+            ],
+        ),
+        Document(
+            chunks=[
+                Document(
+                    content='high quality leather construction defines a hearty boot one-piece on a tough lug sole',
+                    modality='text',
+                ),
+                Document(
+                    uri='https://n.nordstrommedia.com/id/sr3/21e7a67c-0a54-4d09-a4a4-6a0e0840540b.jpeg',
+                    modality='image',
+                ),
+            ],
+        ),
+        Document(
+            chunks=[
+                Document(
+                    content='this shimmering tricot knit tote is traced with decorative whipstitching and diamond cut chain the two hallmark of the falabella line',
+                    modality='text',
+                ),
+                Document(
+                    uri='https://n.nordstrommedia.com/id/sr3/1d8dd635-6342-444d-a1d3-4f91a9cf222b.jpeg',
+                    modality='image',
+                ),
+            ],
+        ),
+    ]
+)
+train_da.push('fashion-sample')
+```
+
+The full dataset has been converted to `clip-fashion-train-data` and `clip-fashion-eval-data` and pushed to the cloud which can be directly used in Finetuner.
+
+## Start Finetuner
+
+You may now create and run a fine-tuning job after login to Jina ecosystem.
+
+```python
+import finetuner
+
+finetuner.login()
+run = finetuner.fit(
+    model='openai/clip-vit-base-patch32',
+    run_name='clip-fashion',
+    train_data='clip-fashion-train-data',
+    eval_data='clip-fashion-eval-data',  # optional
+    epochs=5,
+    learning_rate=1e-5,
+    loss='CLIPLoss',
+    cpu=False,
+)
+```
+
+After the job started, you may use {meth}`~finetuner.run.Run.status` to check the status of the job.
+
+```python
+import finetuner
+
+finetuner.login()
+run = finetuner.get_run('clip-fashion')
+print(run.status())
+```
+
+When the status is `FINISHED`, you can download the tuned model to your local machine.
+
+```python
+import finetuner
+
+finetuner.login()
+run = finetuner.get_run('clip-fashion')
+run.save_artifact('clip-model')
+```
+
+You should now get a zip file containing the tuned model named `clip-fashion.zip` under the folder `clip-model`.
+
+## Use the Model
+
+After unzipping the model you get from the previous step, a folder with the following structure will be generated:
+
+```text
+.
+└── clip-fashion/
+    ├── config.yml
+    ├── metadata.yml
+    ├── metrics.yml
+    └── models/
+        ├── clip-text/
+        │   ├── metadata.yml
+        │   └── model.onnx
+        ├── clip-vision/
+        │   ├── metadata.yml
+        │   └── model.onnx
+        └── input-map.yml
+```
+
+Since the tuned model generated from Finetuner contains richer information such as metadata and config, we now transform it to simpler structure used by CLIP-as-service.
+
+* Firstly, create a new folder named `clip-fashion-cas` or name of your choice. This will be the storage of the models to use in CLIP-as-service.
+
+* Secondly, copy the textual model `clip-fashion/models/clip-text/model.onnx` into the folder `clip-fashion-cas` and rename the model to `textual.onnx`.
+
+* Similarly, copy the visual model `clip-fashion/models/clip-vision/model.onnx` into the folder `clip-fashion-cas` and rename the model to `visual.onnx`.
+
+This is the expected structure of `clip-fashion-cas`:
+
+```text
+.
+└── clip-fashion-cas/
+    ├── textual.onnx
+    └── visual.onnx
+```
+
+In order to use the fine-tuned model, create a custom YAML file `finetuned_clip.yml` like below. Learn more about [Flow YAML configuration](https://docs.jina.ai/fundamentals/flow/yaml-spec/) and [`clip_server` YAML configuration](https://clip-as-service.jina.ai/user-guides/server/#yaml-config).
+
+```yaml
+jtype: Flow
+version: '1'
+with:
+  port: 51000
+executors:
+  - name: clip_o
+    uses:
+      jtype: CLIPEncoder
+      metas:
+        py_modules:
+          - clip_server.executors.clip_onnx
+      with:
+        name: ViT-B/32
+        model_path: 'clip-fashion-cas' # path to clip-fashion-cas
+    replicas: 1
+```
+
+```{warning}
+Note that Finetuner only support ViT-B/32 CLIP model currently. The model name should match the fine-tuned model, or you will get incorrect output.
+```
+
+You can now start the `clip_server` using fine-tuned model to get a performance boost:
+
+```bash
+python -m clip_server finetuned_clip.yml
+```
+
+That's it, enjoy 🚀
diff --git a/docs/user-guides/server.md b/docs/user-guides/server.md
@@ -75,6 +75,23 @@ Open AI has released 9 models so far. `ViT-B/32` is used as default model in all
 | ViT-L/14       | ✅       | ✅    | ❌        | 768              | 933             | 3.66                | 2.04                 |
 | ViT-L/14@336px | ✅       | ✅    | ❌        | 768              | 934             | 3.74                | 2.23                 |
 
+### Use custom model
+
+You can also use your own model in ONNX runtime by specifying the model name and the path to model directory in YAML file.
+The model directory should have the same structure as below:
+
+```text
+.
+└── custom-model/
+    ├── textual.onnx
+    └── visual.onnx
+```
+
+One may wonder how to produce the model as described above. 
+Fortunately, you can simply use the [Finetuner](https://finetuner.jina.ai) to fine-tune your model based on custom dataset.
+[Finetuner](https://finetuner.jina.ai) is a cloud service that makes fine-tuning simple and fast. 
+Moving the process into the cloud, [Finetuner](https://finetuner.jina.ai) handles all related complexity and infrastructure, making models performant and production ready.
+{ref}`Click here for detail instructions<Finetuner>`.
 
 ## YAML config
 
@@ -230,11 +247,11 @@ executors:
 
 For all backends, you can set the following parameters via `with`:
 
-| Parameter | Description                                                                                                                    |
-|-----------|--------------------------------------------------------------------------------------------------------------------------------|
-| `name`    | Model weights, default is `ViT-B/32`. Support all OpenAI released pretrained models.                                           |
+| Parameter               | Description                                                                                                                    |
+|-------------------------|--------------------------------------------------------------------------------------------------------------------------------|
+| `name`                  | Model weights, default is `ViT-B/32`. Support all OpenAI released pretrained models.                                           |
 | `num_worker_preprocess` | The number of CPU workers for image & text prerpocessing, default 4.                                                           | 
-| `minibatch_size` | The size of a minibatch for CPU preprocessing and GPU encoding, default 64. Reduce the size of it if you encounter OOM on GPU. |
+| `minibatch_size`        | The size of a minibatch for CPU preprocessing and GPU encoding, default 64. Reduce the size of it if you encounter OOM on GPU. |
 
 There are also runtime-specific parameters listed below:
 
@@ -252,6 +269,7 @@ There are also runtime-specific parameters listed below:
 | Parameter | Description                                                                                                                    |
 |-----------|--------------------------------------------------------------------------------------------------------------------------------|
 | `device`  | `cuda` or `cpu`. Default is `None` means auto-detect.
+| `model_path`            | The path to custom CLIP model, default `None`.                                                                                   |
 
 ````
 
@@ -278,6 +296,33 @@ executors:
           - executors/clip_torch.py
 ```
 
+To use custom model in ONNX runtime, one can do:
+
+```{code-block} yaml
+---
+emphasize-lines: 9-11
+---
+
+jtype: Flow
+version: '1'
+with:
+  port: 51000
+executors:
+  - name: clip_o
+    uses:
+      jtype: CLIPEncoder
+      with:
+        name: ViT-B/32
+        model_path: 'custom-model'
+      metas:
+        py_modules:
+          - executors/clip_onnx.py
+```
+
+```{warning}
+The model name should match the fine-tuned model, or you will get incorrect output.
+```
+
 ### Executor config
 
 The full list of configs for Executor can be found via `jina executor --help`. The most important one is probably `replicas`, which **allows you to run multiple CLIP models in parallel** to achieve horizontal scaling.