diff --git a/docs/conf.py b/docs/conf.py index b183f8ac7..6980387e2 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -80,6 +80,8 @@ html_show_sourcelink = False html_favicon = '_static/favicon.png' +intersphinx_mapping = {'docarray': ('https://docarray.jina.ai/', None), 'finetuner': ('https://finetuner.jina.ai/', None)} + latex_documents = [(master_doc, f'{slug}.tex', project, author, 'manual')] man_pages = [(master_doc, slug, project, [author], 1)] texinfo_documents = [ diff --git a/docs/index.md b/docs/index.md index 6887fb6f2..7d8183d53 100644 --- a/docs/index.md +++ b/docs/index.md @@ -178,7 +178,6 @@ It means the client and the server are now connected. Well done! user-guides/client user-guides/server user-guides/faq - ``` ```{toctree} diff --git a/docs/user-guides/finetuner.md b/docs/user-guides/finetuner.md new file mode 100644 index 000000000..2962c0730 --- /dev/null +++ b/docs/user-guides/finetuner.md @@ -0,0 +1,187 @@ +(Finetuner)= +# Fine-tune Models + +Although CLIP-as-service has provided you a list of pre-trained models, you can also fine-tune your models. +This guide will show you how to use [Finetuner](https://finetuner.jina.ai) to fine-tune models and use them in CLIP-as-service. + +For installation and basic usage of Finetuner, please refer to [Finetuner documentation](https://finetuner.jina.ai). +You can also [learn more details about fine-tuning CLIP](https://finetuner.jina.ai/tasks/text-to-image/). + +## Prepare Training Data + +Finetuner accepts training data and evaluation data in the form of {class}`~docarray.array.document.DocumentArray`. +The training data for CLIP is a list of (text, image) pairs. +Each pair is stored in a {class}`~docarray.document.Document` which wraps two [`chunks`](https://docarray.jina.ai/fundamentals/document/nested/) with `image` and `text` modality respectively. +You can push the resulting {class}`~docarray.array.document.DocumentArray` to the cloud using the {meth}`~docarray.array.document.DocumentArray.push` method. + +We use [fashion captioning dataset](https://github.com/xuewyang/Fashion_Captioning) as a sample dataset in this tutorial. +The following are examples of descriptions and image urls from the dataset. +We also include a preview of each image. + +| Description | Image URL | Preview | +|---------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------| +| subtly futuristic and edgy this liquid metal cuff bracelet is shaped from sculptural rectangular link | [https://n.nordstrommedia.com/id/sr3/
58d1a13f-b6b6-4e68-b2ff-3a3af47c422e.jpeg](https://n.nordstrommedia.com/id/sr3/58d1a13f-b6b6-4e68-b2ff-3a3af47c422e.jpeg) | | +| high quality leather construction defines a hearty boot one-piece on a tough lug sole | [https://n.nordstrommedia.com/id/sr3/
21e7a67c-0a54-4d09-a4a4-6a0e0840540b.jpeg](https://n.nordstrommedia.com/id/sr3/21e7a67c-0a54-4d09-a4a4-6a0e0840540b.jpeg) | | +| this shimmering tricot knit tote is traced with decorative whipstitching and diamond cut chain the two hallmark of the falabella line | [https://n.nordstrommedia.com/id/sr3/
1d8dd635-6342-444d-a1d3-4f91a9cf222b.jpeg](https://n.nordstrommedia.com/id/sr3/1d8dd635-6342-444d-a1d3-4f91a9cf222b.jpeg) | | +| ... | ... | ... | + +You can use the following script to transform the first three entries of the dataset to a {class}`~docarray.array.document.DocumentArray` and push it to the cloud using the name `fashion-sample`. + +```python +from docarray import Document, DocumentArray + +train_da = DocumentArray( + [ + Document( + chunks=[ + Document( + content='subtly futuristic and edgy this liquid metal cuff bracelet is shaped from sculptural rectangular link', + modality='text', + ), + Document( + uri='https://n.nordstrommedia.com/id/sr3/58d1a13f-b6b6-4e68-b2ff-3a3af47c422e.jpeg', + modality='image', + ), + ], + ), + Document( + chunks=[ + Document( + content='high quality leather construction defines a hearty boot one-piece on a tough lug sole', + modality='text', + ), + Document( + uri='https://n.nordstrommedia.com/id/sr3/21e7a67c-0a54-4d09-a4a4-6a0e0840540b.jpeg', + modality='image', + ), + ], + ), + Document( + chunks=[ + Document( + content='this shimmering tricot knit tote is traced with decorative whipstitching and diamond cut chain the two hallmark of the falabella line', + modality='text', + ), + Document( + uri='https://n.nordstrommedia.com/id/sr3/1d8dd635-6342-444d-a1d3-4f91a9cf222b.jpeg', + modality='image', + ), + ], + ), + ] +) +train_da.push('fashion-sample') +``` + +The full dataset has been converted to `clip-fashion-train-data` and `clip-fashion-eval-data` and pushed to the cloud which can be directly used in Finetuner. + +## Start Finetuner + +You may now create and run a fine-tuning job after login to Jina ecosystem. + +```python +import finetuner + +finetuner.login() +run = finetuner.fit( + model='openai/clip-vit-base-patch32', + run_name='clip-fashion', + train_data='clip-fashion-train-data', + eval_data='clip-fashion-eval-data', # optional + epochs=5, + learning_rate=1e-5, + loss='CLIPLoss', + cpu=False, +) +``` + +After the job started, you may use {meth}`~finetuner.run.Run.status` to check the status of the job. + +```python +import finetuner + +finetuner.login() +run = finetuner.get_run('clip-fashion') +print(run.status()) +``` + +When the status is `FINISHED`, you can download the tuned model to your local machine. + +```python +import finetuner + +finetuner.login() +run = finetuner.get_run('clip-fashion') +run.save_artifact('clip-model') +``` + +You should now get a zip file containing the tuned model named `clip-fashion.zip` under the folder `clip-model`. + +## Use the Model + +After unzipping the model you get from the previous step, a folder with the following structure will be generated: + +```text +. +└── clip-fashion/ + ├── config.yml + ├── metadata.yml + ├── metrics.yml + └── models/ + ├── clip-text/ + │ ├── metadata.yml + │ └── model.onnx + ├── clip-vision/ + │ ├── metadata.yml + │ └── model.onnx + └── input-map.yml +``` + +Since the tuned model generated from Finetuner contains richer information such as metadata and config, we now transform it to simpler structure used by CLIP-as-service. + +* Firstly, create a new folder named `clip-fashion-cas` or name of your choice. This will be the storage of the models to use in CLIP-as-service. + +* Secondly, copy the textual model `clip-fashion/models/clip-text/model.onnx` into the folder `clip-fashion-cas` and rename the model to `textual.onnx`. + +* Similarly, copy the visual model `clip-fashion/models/clip-vision/model.onnx` into the folder `clip-fashion-cas` and rename the model to `visual.onnx`. + +This is the expected structure of `clip-fashion-cas`: + +```text +. +└── clip-fashion-cas/ + ├── textual.onnx + └── visual.onnx +``` + +In order to use the fine-tuned model, create a custom YAML file `finetuned_clip.yml` like below. Learn more about [Flow YAML configuration](https://docs.jina.ai/fundamentals/flow/yaml-spec/) and [`clip_server` YAML configuration](https://clip-as-service.jina.ai/user-guides/server/#yaml-config). + +```yaml +jtype: Flow +version: '1' +with: + port: 51000 +executors: + - name: clip_o + uses: + jtype: CLIPEncoder + metas: + py_modules: + - clip_server.executors.clip_onnx + with: + name: ViT-B/32 + model_path: 'clip-fashion-cas' # path to clip-fashion-cas + replicas: 1 +``` + +```{warning} +Note that Finetuner only support ViT-B/32 CLIP model currently. The model name should match the fine-tuned model, or you will get incorrect output. +``` + +You can now start the `clip_server` using fine-tuned model to get a performance boost: + +```bash +python -m clip_server finetuned_clip.yml +``` + +That's it, enjoy 🚀 diff --git a/docs/user-guides/server.md b/docs/user-guides/server.md index 740ac43b3..b1069e3b1 100644 --- a/docs/user-guides/server.md +++ b/docs/user-guides/server.md @@ -75,6 +75,23 @@ Open AI has released 9 models so far. `ViT-B/32` is used as default model in all | ViT-L/14 | ✅ | ✅ | ❌ | 768 | 933 | 3.66 | 2.04 | | ViT-L/14@336px | ✅ | ✅ | ❌ | 768 | 934 | 3.74 | 2.23 | +### Use custom model + +You can also use your own model in ONNX runtime by specifying the model name and the path to model directory in YAML file. +The model directory should have the same structure as below: + +```text +. +└── custom-model/ + ├── textual.onnx + └── visual.onnx +``` + +One may wonder how to produce the model as described above. +Fortunately, you can simply use the [Finetuner](https://finetuner.jina.ai) to fine-tune your model based on custom dataset. +[Finetuner](https://finetuner.jina.ai) is a cloud service that makes fine-tuning simple and fast. +Moving the process into the cloud, [Finetuner](https://finetuner.jina.ai) handles all related complexity and infrastructure, making models performant and production ready. +{ref}`Click here for detail instructions`. ## YAML config @@ -230,11 +247,11 @@ executors: For all backends, you can set the following parameters via `with`: -| Parameter | Description | -|-----------|--------------------------------------------------------------------------------------------------------------------------------| -| `name` | Model weights, default is `ViT-B/32`. Support all OpenAI released pretrained models. | +| Parameter | Description | +|-------------------------|--------------------------------------------------------------------------------------------------------------------------------| +| `name` | Model weights, default is `ViT-B/32`. Support all OpenAI released pretrained models. | | `num_worker_preprocess` | The number of CPU workers for image & text prerpocessing, default 4. | -| `minibatch_size` | The size of a minibatch for CPU preprocessing and GPU encoding, default 64. Reduce the size of it if you encounter OOM on GPU. | +| `minibatch_size` | The size of a minibatch for CPU preprocessing and GPU encoding, default 64. Reduce the size of it if you encounter OOM on GPU. | There are also runtime-specific parameters listed below: @@ -252,6 +269,7 @@ There are also runtime-specific parameters listed below: | Parameter | Description | |-----------|--------------------------------------------------------------------------------------------------------------------------------| | `device` | `cuda` or `cpu`. Default is `None` means auto-detect. +| `model_path` | The path to custom CLIP model, default `None`. | ```` @@ -278,6 +296,33 @@ executors: - executors/clip_torch.py ``` +To use custom model in ONNX runtime, one can do: + +```{code-block} yaml +--- +emphasize-lines: 9-11 +--- + +jtype: Flow +version: '1' +with: + port: 51000 +executors: + - name: clip_o + uses: + jtype: CLIPEncoder + with: + name: ViT-B/32 + model_path: 'custom-model' + metas: + py_modules: + - executors/clip_onnx.py +``` + +```{warning} +The model name should match the fine-tuned model, or you will get incorrect output. +``` + ### Executor config The full list of configs for Executor can be found via `jina executor --help`. The most important one is probably `replicas`, which **allows you to run multiple CLIP models in parallel** to achieve horizontal scaling.