diff --git a/.gitattributes b/.gitattributes new file mode 100644 index 000000000..0fa2b8fc5 --- /dev/null +++ b/.gitattributes @@ -0,0 +1,2 @@ +# ignore ipynb line counts +*.ipynb linguist-documentation \ No newline at end of file diff --git a/CHANGELOG.md b/CHANGELOG.md index 9ef361063..759011cb4 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -50,6 +50,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - Add `finetuner` namespace to artifact names in the documentation. ([#649](https://github.com/jina-ai/finetuner/pull/649)) +- Rewrite M-CLIP notebook to use German fashion dataset. ([#643](https://github.com/jina-ai/finetuner/pull/643)) + +- New advanced topics section. ([#643](https://github.com/jina-ai/finetuner/pull/643)) + +- Improve developer reference. ([#643](https://github.com/jina-ai/finetuner/pull/643)) + +- Improve walkthrough sections. ([#643](https://github.com/jina-ai/finetuner/pull/643)) + ## [0.6.7] - 2022-11-25 diff --git a/README.md b/README.md index d2fc06ffe..b7dfc089e 100644 --- a/README.md +++ b/README.md @@ -18,20 +18,25 @@ -Fine-tuning is an effective way to improve performance on neural search tasks. However, setting up and performing -fine-tuning can be very time-consuming and resource-intensive. +Fine-tuning is an effective way to improve performance on [neural search](https://jina.ai/news/what-is-neural-search-and-learn-to-build-a-neural-search-engine/) tasks. +However, setting up and performing fine-tuning can be very time-consuming and resource-intensive. -Jina AI's Finetuner makes fine-tuning easier and faster by streamlining the workflow and handling all complexity and -infrastructure in the cloud. With Finetuner, one can easily enhance the performance of pre-trained models, making them -production-ready without buying expensive hardware. +Jina AI's Finetuner makes fine-tuning easier and faster by streamlining the workflow and handling all the complexity and infrastructure in the cloud. +With Finetuner, you can easily enhance the performance of pre-trained models, +making them production-ready [without extensive labeling](https://jina.ai/news/fine-tuning-with-low-budget-and-high-expectations/) or expensive hardware. -📈 **Performance promise**: enhance the performance of pre-trained models and deliver state-of-the-art performance on -domain-specific neural search applications. +🎏 **Better embeddings**: Create high-quality embeddings for semantic search, visual similarity search, cross-modal text<->image search, recommendation systems, +clustering, duplication detection, anomaly detection, or other uses. -🔱 **Simple yet powerful**: easy access to 40+ mainstream loss functions, 10+ optimisers, layer pruning, weight +⏰ **Low budget, high expectations**: Bring considerable improvements to model performance, making the most out of as little as a few hundred training samples, and finish fine-tuning in as little as an hour. + +📈 **Performance promise**: Enhance the performance of pre-trained models so that they deliver state-of-the-art performance on +domain-specific applications. + +🔱 **Simple yet powerful**: Easy access to 40+ mainstream loss functions, 10+ optimisers, layer pruning, weight freezing, dimensionality reduction, hard-negative mining, cross-modal models, and distributed training. -☁ **All-in-cloud**: train using our free GPU infrastructure, manage runs, experiments and artifacts on Jina AI Cloud +☁ **All-in-cloud**: Train using our free GPU infrastructure, manage runs, experiments and artifacts on Jina AI Cloud without worrying about resource availability, complex integration, or infrastructure costs. @@ -105,7 +110,7 @@ without worrying about resource availability, complex integration, or infrastruc 0.430 0.648 50.7% -

Open In Colab

+

Open In Colab

Recall @@ -113,11 +118,26 @@ without worrying about resource availability, complex integration, or infrastruc 0.340 37.7% + + PointNet++ + ModelNet40 3D Mesh Search + mRR + 0.791 + 0.891 + 12.7% +

Open In Colab

+ + + Recall + 0.154 + 0.242 + 57.1% + -All metrics were evaluated for k@20 after training for 5 epochs using the Adam optimizer with learning rates of 1e-4 for ResNet, 1e-7 for CLIP and 1e-5 for the BERT models. +All metrics were evaluated for k@20 after training for 5 epochs using the Adam optimizer with learning rates of 1e-4 for ResNet, 1e-7 for CLIP and 1e-5 for the BERT models, 5e-4 for PointNet++ @@ -142,127 +162,6 @@ pip install "finetuner[full]" > ⚠️ Starting with version 0.5.0, Finetuner computing is performed on Jina AI Cloud. The last local version is `0.4.1`. > This version is still available for installation via `pip`. See [Finetuner git tags and releases](https://github.com/jina-ai/finetuner/releases). - - - - -## Get Started - -The following code snippet describes how to fine-tune ResNet50 on the [_Totally Looks Like_ dataset](https://sites.google.com/view/totally-looks-like-dataset). -You can run it as-is. The model and training data are already hosted in Jina AI Cloud and Finetuner will -download them automatically. -(NB: If there is already a run called `resnet50-tll-run`, choose a different run-name in the code below.) - -```python -import finetuner -from finetuner.callback import EvaluationCallback - -finetuner.login() - -run = finetuner.fit( - model='resnet50', - run_name='resnet50-tll-run', - train_data='finetuner/tll-train-data', - callbacks=[ - EvaluationCallback( - query_data='finetuner/tll-test-query-data', - index_data='finetuner/tll-test-index-data', - ) - ], -) -``` -This code snippet describes the following steps: - -1. Log in to Jina AI Cloud. -2. Select backbone model, training and evaluation data for your evaluation callback. -3. Start the cloud run. - -You can also pass data to Finetuner as a CSV file or a `DocumentArray` object, as described [in the Finetuner documentation](https://finetuner.jina.ai/walkthrough/create-training-data/). - -Depending on the data, task, model, hyperparameters, fine-tuning might take some time to finish. You can leave your jobs -to run on the Jina AI Cloud, and later reconnect to them, using code like this below: - -```python -import finetuner - -finetuner.login() - -run = finetuner.get_run('resnet50-tll-run') - -for log_entry in run.stream_logs(): - print(log_entry) - -run.save_artifact('resnet-tll') -``` - -This code logs into Jina AI Cloud, then connects to your run by name. After that, it does the following: - * Monitors the status of the run and prints out the logs. - * Saves the model once fine-tuning is done. - -## Using Finetuner to encode - -Finetuner has interfaces for using models to do encoding: - -```python -import finetuner -from docarray import Document, DocumentArray - -da = DocumentArray([Document(uri='~/Pictures/your_img.png')]) - -model = finetuner.get_model('resnet-tll') -finetuner.encode(model=model, data=da) - -da.summary() -``` - -When encoding, you can provide data either as a DocumentArray or a list. Since the modality of your input data can be inferred from the model being used, there is no need to provide any additional information besides the content you want to encode. When providing data as a list, the `finetuner.encode` method will return a `np.ndarray` of embeddings, instead of a `docarray.DocumentArray`: - -```python -import finetuner -from docarray import Document, DocumentArray - -images = ['~/Pictures/your_img.png'] - -model = finetuner.get_model('resnet-tll') -embeddings = finetuner.encode(model=model, data=images) -``` - -## Training on your own data - -If you want to train a model using your own dataset instead of one on the Jina AI Cloud, you can provide labeled data in a CSV file. - -A CSV file is a tab or comma-delimited plain text file. For example: - -```plaintext -This is an apple apple_label -This is a pear pear_label -... -``` -The file should have two columns: The first for the data and the second for the category label. - -You can then provide a path to a CSV file as training data for Finetuner: - -```python -run = finetuner.fit( - model='bert-base-cased', - run_name='bert-my-own-run', - train_data='path/to/some/data.csv', -) -``` -More information on providing your own training data is found in the [Prepare Training Data](https://finetuner.jina.ai/walkthrough/create-training-data/) section of the [Finetuner documentation](https://finetuner.jina.ai/). - - - -### Next steps - -- Take the [walkthrough](https://finetuner.jina.ai/walkthrough/) and submit your first fine-tuning job. -- Try out different search tasks: - - [Text-to-Text Search via BERT](https://finetuner.jina.ai/notebooks/text_to_text/) - - [Image-to-Image Search via ResNet50](https://finetuner.jina.ai/notebooks/image_to_image/) - - [Text-to-Image Search via CLIP](https://finetuner.jina.ai/notebooks/text_to_image/) - -[Read our documentation](https://finetuner.jina.ai/) to learn more about what Finetuner can do. - ## Support diff --git a/docs/advanced-topics/budget.md b/docs/advanced-topics/budget.md new file mode 100644 index 000000000..56472d4bf --- /dev/null +++ b/docs/advanced-topics/budget.md @@ -0,0 +1,92 @@ +(budget)= +# {octicon}`database` How much data? + +```{admonition} Read full blog +:class: hint +Please checkout [Fine-tuning with Low Budget and High Expectations](https://jina.ai/news/fine-tuning-with-low-budget-and-high-expectations/) +to read the full tech blog. +``` + +Fine-tuning takes a pre-trained model, +trained on a related task, and then further trains it for a new task. +Alternately, it can mean taking a model pre-trained for an open domain task, and further training it for a domain-specific one. +Compared to training from scratch, fine-tuning is a much more cost-efficient solution whenever it is feasible. But: + ++ Exactly how much **data** do you need to get a good result? ++ Exactly how much **time** do you need to get good results? + +## Experiments + +We designed two experiments to quantitatively study how labeled data and training time affect fine-tuning performance. +For each experiment, we constructed three search tasks by fine-tuning three models. +We chose seven datasets, two of which are non-domain-specific public datasets, to ensure the generality of our experiment. + +We measured the performance of the fine-tuned models by evaluating their ability to perform search tasks, as measured by Mean Reciprocal Rank (mRR), Recall, and Mean Average Precision (mAP). +These metrics are calculated using the top 20 results of each search in the validation subset held out from each dataset. + +### How much labeled data is needed? + +We gradually increase the amount of labeled data fed to Finetuner from 100 items to 100,000 and see how this affects performance on the metrics described in the previous section. + +In the figures below, the X-axis represents the amount of labeled data, and the Y-axis represents the relative improvement over the pre-trained model. The higher, the better. + +... | ... +:-------------------------:|:-------------------------: +![text-text-quora](https://jina-ai-gmbh.ghost.io/content/images/2022/12/Text-to-text-search-on-QuoraQA--3-.svg) | ![text-text-clinc](https://jina-ai-gmbh.ghost.io/content/images/2022/12/Text-to-text-search-on-Clinc150--3-.svg) +![image-image-tll](https://jina-ai-gmbh.ghost.io/content/images/2022/12/Image-to-image-search-on-Totally-looks-like.svg) | ![image-image-celeba](https://jina-ai-gmbh.ghost.io/content/images/2022/12/Image-to-image-search-on-Celeba--4-.svg) +![image-image-flickr30k](https://jina-ai-gmbh.ghost.io/content/images/2022/12/Text-to-image-search-on-Flickr30K--5-.svg) | ![image-image-coco](https://jina-ai-gmbh.ghost.io/content/images/2022/12/Text-to-image-search-on-CoCoCaptions--4-.svg) + +These results are promising but not particularly surprising. +Performance improves with more labeled data on nearly all tasks and all datasets, more for some tasks and datasets than for others. +However, the only conclusion we can draw from these figures is that the Finetuner works as advertised. So far so good. + +We further calculate the return on investment (ROI), +by dividing the relative improvement (a proxy for net profit) by the amount of labeled data (a proxy for investment cost). +**This is useful because it indicates the point at which adding more data is producing diminishing returns.** + +In the figures below, the X-axis represents the amount of labeled data, and the Y-axis represents the ROI per labeled data item. The higher, the better. +In particular, `ROI=0` means adding new labeled data at that point no longer contributes to any improvement. + +... | ... +:-------------------------:|:-------------------------: +![text-text-quora](https://jina-ai-gmbh.ghost.io/content/images/2022/12/Text-to-text-search-on-QuoraQA--7-.svg) | ![text-text-clinc](https://jina-ai-gmbh.ghost.io/content/images/2022/12/Text-to-text-search-on-Clinc150--7-.svg) +![image-image-tll](https://jina-ai-gmbh.ghost.io/content/images/2022/12/Image-to-image-search-on-Totally-looks-like--1-.svg) | ![image-image-celeba](https://jina-ai-gmbh.ghost.io/content/images/2022/12/Image-to-image-search-on-Celeba--5-.svg) +![image-image-flickr30k](https://jina-ai-gmbh.ghost.io/content/images/2022/12/Text-to-image-search-on-Flickr30K--6-.svg) | ![image-image-coco](https://jina-ai-gmbh.ghost.io/content/images/2022/12/Text-to-image-search-on-CoCoCaptions--5-.svg) + +Surprisingly, we can see that the ROI per unit of new labeled data starts to drop almost immediately. We expected that it would eventually decrease, but this is an unexpected result. + +### How much time is needed? + +To measure the value of added training time, we fixed the amount of new labeled data to 1000 items, and then we gradually increased the number of training epochs from 1 to 10. +At each increase, we measure improvement over the pre-trained model and calculate the ROI. +For these experiments, the ROI is calculated by dividing the relative improvement by the elapsed time in seconds. +This means that when `ROI=0`, adding training time no longer improves performance. + +... | ... +:-------------------------:|:-------------------------: +![text-text-quora](https://jina-ai-gmbh.ghost.io/content/images/2022/12/Text-to-text-search-on-QuoraQA--4-.svg) | ![text-text-clinc](https://jina-ai-gmbh.ghost.io/content/images/2022/12/Text-to-text-search-on-Clinc150--4-.svg) +![image-image-tll](https://jina-ai-gmbh.ghost.io/content/images/2022/12/Image-to-image-search-on-Totally-look-like--2-.svg) | ![image-image-celeba](https://jina-ai-gmbh.ghost.io/content/images/2022/12/Image-to-image-search-on-Celeba--2-.svg) +![image-image-flickr30k](https://jina-ai-gmbh.ghost.io/content/images/2022/12/Text-to-image-search-on-Flickr30K--3-.svg) | ![image-image-coco](https://jina-ai-gmbh.ghost.io/content/images/2022/12/Text-to-image-search-on-CocoCaptions--2-.svg) + +We knew in advance that adding more time does not guarantee any improvement at all. +It can, in fact, reduce performance due to the overfitting problem. +Some models (e.g. CLIP) are more prone to overfitting than others. +In principle, if we keep training with the same 1000 data points over and over, we are guaranteed to overfit on the data and the overall performance will drop. + +Let's look at the ROI curves. + +... | ... +:-------------------------:|:-------------------------: +![text-text-quora](https://jina-ai-gmbh.ghost.io/content/images/2022/12/Text-to-text-search-on-QuoraQA--5-.svg) | ![text-text-clinc](https://jina-ai-gmbh.ghost.io/content/images/2022/12/Text-to-text-search-on-Clinc150--9-.svg) +![image-image-tll](https://jina-ai-gmbh.ghost.io/content/images/2022/12/Image-to-image-search-on-Totally-look-like--3-.svg) | ![image-image-celeba](https://jina-ai-gmbh.ghost.io/content/images/2022/12/Image-to-image-search-on-Celeba--3-.svg) +![image-image-flickr30k](https://jina-ai-gmbh.ghost.io/content/images/2022/12/Text-to-image-search-on-Flickr30K--4-.svg) | ![image-image-coco](https://jina-ai-gmbh.ghost.io/content/images/2022/12/Text-to-image-search-on-CocoCaptions--3-.svg) + +The ROI drops immediately after the first epoch of fine-tuning. +Unlike in the last experiment, where ROI approached zero but stayed positive when increasing the number of epochs, here, the ROI on added time can go negative due to the overfitting problem! + +## Summary + +What does this mean for users looking to maximize gains and minimize costs? + ++ Many state-of-the-art deep neural networks are capable of few-shot learning. They are quick learners and can make large improvements with only a few hundred items of labeled data and only a few minutes of training time. You might have thought that deep neural network training requires millions of data items and a week of runtime, but we have shown in these examples how that stereotype does not hold up to reality. ++ Because they can learn so much, so fast, from so little data, ROI drops quickly as you put more time and data into fine-tuning. In the experiments above, ROI shrinks by 70% from its highest value after 500 labeled data items or 600 added seconds of GPU training time. Further investment beyond a few hundred items of training data and very minimal training time may not pay off as well as you would like. \ No newline at end of file diff --git a/docs/walkthrough/integrate-with-jina.md b/docs/advanced-topics/finetuner-executor.md similarity index 52% rename from docs/walkthrough/integrate-with-jina.md rename to docs/advanced-topics/finetuner-executor.md index e0b6d3b01..8ac6f4252 100644 --- a/docs/walkthrough/integrate-with-jina.md +++ b/docs/advanced-topics/finetuner-executor.md @@ -1,91 +1,5 @@ -# Encode Documents - -Once fine-tuning is finished, it's time to actually use the model. -You can use the fine-tuned models directly to encode [DocumentArray](https://docarray.jina.ai/) objects or setting up an encoding service. -When encoding, data can also be provided as a regular list. - -(integrate-with-docarray)= -## Embed DocumentArray - -To embed a [DocumentArray](https://docarray.jina.ai/) with a fine-tuned model, you can get the model of your Run via the {func}`~finetuner.get_model` function and embed it via the {func}`finetuner.encode` function: - -````{tab} Artifact id and token -```python -from docarray import DocumentArray, Document -import finetuner - -finetuner.login() - -token = finetuner.get_token() -run = finetuner.get_run( - experiment_name='YOUR-EXPERIMENT', - run_name='YOUR-RUN' -) - -model = finetuner.get_model( - run.artifact_id, - token=token, - device='cuda', # model will be placed on cpu by default. -) - -da = DocumentArray([Document(text='some text to encode')]) - -finetuner.encode(model=model, data=da) - -for doc in da: - print(f'Text of the returned document: {doc.text}') - print(f'Shape of the embedding: {doc.embedding.shape}') -``` -```` -````{tab} Locally saved artifact -```python -from docarray import DocumentArray, Document -import finetuner - -model = finetuner.get_model('/path/to/YOUR-MODEL.zip') - -da = DocumentArray([Document(text='some text to encode')]) - -finetuner.encode(model=model, data=da) - -for doc in da: - print(f'Text of the returned document: {doc.text}') - print(f'Shape of the embedding: {doc.embedding.shape}') -``` -```` - -```console -Text of the returned document: some text to encode -Shape of the embedding: (768,) -``` - -## Encoding a List -Data that is stored in a regular list can be embedded in the same way you would a [DocumentArray](https://docarray.jina.ai/). Since the modality of your input data can be inferred from the model being used, there is no need to provide any additional information besides the content you want to encode. When providing data as a list, the `finetuner.encode` method will return a `np.ndarray` of embeddings, instead of a `docarray.DocumentArray`: - -```python -from docarray import DocumentArray, Document -import finetuner - -model = finetuner.get_model('/path/to/YOUR-MODEL.zip') - -texts = ['some text to encode'] - -embeddings = finetuner.encode(model=model, data=texts) - -for text, embedding in zip(texts, embeddings): - print(f'Text of the returned document: {text}') - print(f'Shape of the embedding: {embedding.shape}') -``` - - -```{admonition} Inference with ONNX -:class: tip -In case you set `to_onnx=True` when calling `finetuner.fit` function, -please use `model = finetuner.get_model('/path/to/YOUR-MODEL.zip', is_onnx=True)` -``` - -(integrate-with-jina)= -## Fine-tuned model as Executor +(finetuner-executor)= +# {octicon}`gear` Use FinetunerExecutor inside a Jina Flow Finetuner, being part of the Jina AI Cloud, provides a convenient way to use tuned models via [Jina Executors](https://docs.jina.ai/fundamentals/executor/). @@ -190,58 +104,6 @@ into the same vector space. To use those models, you have to provide the name of the model via an additional `select_model` parameter to the {func}`~finetuner.get_model` function. - -````{tab} CLIP text model -```python -from docarray import DocumentArray, Document -import finetuner - -finetuner.login() - -token = finetuner.get_token() -run = finetuner.get_run( - experiment_name='YOUR-EXPERIMENT', - run_name='YOUR-RUN' -) - -model = finetuner.get_model( - run.artifact_id, - token=token, - device='cuda', - select_model='clip-text' -) - -da = DocumentArray([Document(text='some text to encode')]) - -finetuner.encode(model=model, data=da) -``` -```` -````{tab} CLIP vision model -```python -from docarray import DocumentArray, Document -import finetuner - -finetuner.login() - -token = finetuner.get_token() -run = finetuner.get_run( - experiment_name='YOUR-EXPERIMENT', - run_name='YOUR-RUN' -) - -model = finetuner.get_model( - run.artifact_id, - token=token, - device='cuda', - select_model='clip-vision' -) - -da = DocumentArray([Document(text='~/Pictures/my_img.png')]) - -finetuner.encode(model=model, data=da) -``` -```` - If you want to host the CLIP models, you also have to provide the name of the model via the `select_model` parameter inside the `uses_with` attribute: @@ -264,4 +126,5 @@ f = Flow().add( }, ) -``` \ No newline at end of file +``` + diff --git a/docs/advanced-topics/linear-probe.md b/docs/advanced-topics/linear-probe.md new file mode 100644 index 000000000..8f355b426 --- /dev/null +++ b/docs/advanced-topics/linear-probe.md @@ -0,0 +1,69 @@ +(projection-head)= +# {octicon}`pin` Projection Head + +## Why freezing? + +Depending on your task and the amount of training data, +it is not always necessary to tune the entire model. +In some cases, +freezing some of the weights of the pre-trained model and just fine-tuning specific layers produces comparable or better results. +Furthermore, freezing weights can reduce the training time dramatically. + +Finetuner allows you to fine-tune a Linear Projection Head easily. + +```{warning} +Currently, we only allow you to freeze layers for image-to-image search tasks. +These models are built on top of Convolutional Neural Networks (CNNs). + +For transformer architectures, +we can only fine-tune the entire neural network. +If you need to freeze weights for transformers, consider submitting a feature request in our [Github Issues page](https://github.com/jina-ai/finetuner/issues) +``` + +```{admonition} Dimensionality reduction +:class: hint +Use a smaller `output_dim` to get compact embeddings. +``` + +## How? + +Finetuner has a built-in module called Tailor. +Given a general model written in Pytorch, +Tailor performs the micro-operations on the model architecture required for fine-tuning and outputs an embedding model. + +Given a general model with weights, Tailor performs some or all of the following steps: + ++ Iterating over all layers to find dense layers. ++ Chopping off all layers after a certain dense layer. ++ Freezing weights on specific layers. ++ Adding new layers on top of the model. + +![tailor](../imgs/tailor.svg) + +For example, just using the arguments `freeze=True` and `output_dim=X` with the `fit` function, as shown below: + +```diff +run = finetuner.fit( + model='resnet50', + ..., ++ freeze=True, ++ output_dim=1024, # default output_dim of ResNet50 is 2048. + ..., +) +``` + +Finetuner will: + +1. Remove the classification head of `ResNet` model, and convert it into an embedding model. +2. Freeze all layers of the embedding model. +3. Attach a trainable 3-layer Linear Projection Head on top of the embedding model with an `output_dim=1024`. + +```warning +Keep in mind that whenever you use `freeze=True`, always set `output_dim`. +Otherwise, nothing can be tuned since all layers are frozen. +``` + +## Summary + +If you want to achieve efficient fine-tuning without retraining the entire model, +tuning a Linear Projection Head could be a good solution. \ No newline at end of file diff --git a/docs/advanced-topics/negative-mining.md b/docs/advanced-topics/negative-mining.md new file mode 100644 index 000000000..90b3268a1 --- /dev/null +++ b/docs/advanced-topics/negative-mining.md @@ -0,0 +1,89 @@ +(negative-mining)= +# {octicon}`telescope` Negative Mining + +Negative Mining is an advanced machine learning technique, which optimizes the way data is sampled from your training dataset. +Usually, it aims at making the metric learning tasks for the model harder during the training. +In this way it can lead to better fine-tuning results. + +## Context: Deep Metric Learning + +First, let's take a look at how we construct the training data for metric learning tasks. + +Metric Learning algorithms attempt to teach neural network models to tell +which objects are semantically/visually similar and which ones are not. + +For uni-modal fine-tuning tasks such as text-to-text, image-to-image, or mesh-to-mesh, +Finetuner constructs training data in the following way: + +![batch-sample](../imgs/batch-sampling.png) + +Assume we have a list of Documents belonging to four classes: `1`, `2`, `3`, and `4`, +Finetuner will evenly sample *X* items per class to make a batch *B* which is encoded by the model into a set of embeddings. + +Afterward, the loss is calculated based on the relations between the embeddings. +Many of Finetuner's loss functions contrast the embeddings of three items, or a __Triplet__. +Finetuner creates all possible Triplets *(anchor, pos, neg)* from this batch which satisfy the following conditions: +For each Triplet, the first is the __anchor__, the second is an embedding that ought to be closer to the embedding of the anchor (has the same label), and the third is one that should be further from the anchor (has a different label). +The objective is to pull the embeddings of items that belong to the same class closer together in the embedding space, +while pushing the embeddings of items which belong to different classes farther away from each other. + +![training](../imgs/metric-train.png) + + +## The Triplet Margin Miner + +For some Triplets, the pre-trained model already performs well, i.e. + +the distance between the `anchor` embedding and `pos` is much smaller than +the distance between `anchor` and `neg`? +These Triplets do not contribute to improving the model, since they are already in the desired relation to each other in the embedding space. +A more effective way is to use only a subset of all Triplets for model training. We call this subset the **hard** or **semi-hard negative samples**. + +![mining](../imgs/mining.png) + +Let's say `1₀` is an `anchor`, `1₁` is the `pos` while `2₄` is the `neg`, and `D(x,y)` is the distance between the embeddings of `x` and `y`. + +If: + ++ `D(anchor, neg) < D(anchor, pos) `, then `neg` can be considered as a "hard negative" (`2₄ - H`). ++ `D(anchor, pos) < D(anchor, neg) < D(anchor, pos) + margin`, where `neg` is a little further from the `pos`, but within the margin, then `neg` can be considered as a "semi-hard negative" (`2₄ - S`). ++ `D(anchor, neg) > D(anchor, pos) + margin`, then `neg` can be considered as "easy negative" (`2₄ - E`). + +Training is more effective when using only **hard** and **semi-hard** negatives, given a reasonable margin value to distinguish them from **easy** Triplets. + +## Doing Negative Mining in Finetuner + +Finetuner is compatible with the miners provided by the [PyTorch Metric Learning](https://kevinmusgrave.github.io/pytorch-metric-learning) framework. +To select a specific miner, pass its name to the `fit` function, e.g., `AngularMiner`, `TripletMarginMiner`, ... + +Please note that the miner has to be compatible with the loss function you selected. +For instance, if you choose to train a model with the `TripleMarginLoss`, you can use the `TripletMarginMiner`. +While without this miner, all possible triples with an anchor, a positive, and a negative candidate are used to calculate the loss, the miner reduces this set of triples. +By default, the miner only selects triples with hard negatives where the distance between the positive and the negative example is inside a margin of `0.2`. +To pass additional parameters to configure the miner, use the `miner_options` parameter of the fit function. +For example, to use only hard-negative Triples and set the margin to `0.3`: + +```diff +run = finetuner.fit( + ..., + loss='TripleMarginLoss', ++ miner='TripletMarginMiner', ++ miner_options={'margin': 0.3, 'type_of_triplets': 'hard'} +) +``` + +Possible choices for `type_of_triplets` are: + ++ `easy`: Use all easy triplets - all triplets that do not violate the margin. ++ `semihard`: Use semi-hard triplets, but not hard triplets, i.e. those where difference in distance is within the specified margin. ++ `hard`: Use only hard triplets - the negative is closer to the anchor than the positive. ++ `all`: Use `hard` and `semihard` triples - all but the `easy` triples + +Finetuner takes `TripleMarginLoss` as its default loss function with no negative mining. +For a detailed description of the miners and their parameters, see the [PyTorch Metric Learning documentation](https://kevinmusgrave.github.io/pytorch-metric-learning/miners/). + +## Summary + +Metric Learning and Triplets are extremely useful for fine-tuning models for similarity search. +Easy Triplets have little impact on improving the model. +Consider using semi-hard/hard Triplets for model tuning. \ No newline at end of file diff --git a/docs/walkthrough/using-callbacks.md b/docs/advanced-topics/using-callbacks.md similarity index 95% rename from docs/walkthrough/using-callbacks.md rename to docs/advanced-topics/using-callbacks.md index 7ff2b109f..ede397ef6 100644 --- a/docs/walkthrough/using-callbacks.md +++ b/docs/advanced-topics/using-callbacks.md @@ -1,23 +1,18 @@ (using-callbacks)= -# Using Callbacks +# {octicon}`link` Using Callbacks Callbacks are a way of adding additional methods to the finetuning process. The methods are executed when certain events occur and there are several callback classes, each serving a different function by providing different methods for different events. A run can be assigned multiple callbacks using the optional `callbacks` parameter when it is created. -```python +```diff run = finetuner.fit( - model = 'resnet50', - run_name = 'resnet-tll-early-6', - train_data = 'finetuner/tll-train-da', - epochs = 5, - learning_rate = 1e-6, - callbacks=[ - EvaluationCallback( - query_data='finetuner/tll-test-query-da', - index_data='finetuner/tll-test-index-da' - ), - EarlyStopping() - ] + ..., ++ callbacks=[ ++ EvaluationCallback( ++ query_data='finetuner/tll-test-query-da', ++ index_data='finetuner/tll-test-index-da' ++ ), ++ ] ) ``` @@ -54,6 +49,7 @@ On the other hand, the evaluation callback is used to evaluate the quality of th These search metrics can be used by other callbacks if the evaluation callback is first in the list of callbacks when creating a run. ```{admonition} Evaluation callback with two models +:class: hint Usually, you don't need to provide the name of a model to the evalution callback. The callback just takes the model which is fine-tuned. However, if multiple models are involved in the fine-tuning process, like this is the case for CLIP models, it needs to be clear which model is used to encode the documents in `query_data` and `index_data`. diff --git a/docs/api-rst.rst b/docs/api-rst.rst new file mode 100644 index 000000000..faf28ff25 --- /dev/null +++ b/docs/api-rst.rst @@ -0,0 +1,67 @@ +====================== +:fab:`python` Python API +====================== + +This section includes the API documentation from the `Finetuner` codebase, as extracted from the `docstrings `_ in the code. + +:mod:`finetuner.__init__` - Finetuner +-------------------- + +.. currentmodule:: finetuner.__init__ + +.. autosummary:: + :nosignatures: + :template: class.rst + + finetuner.login + finetuner.describe_models + finetuner.fit + finetuner.list_callbacks + finetuner.get_run + finetuner.get_experiment + finetuner.get_token + finetuner.build_model + finetuner.get_model + finetuner.encode + finetuner.list_runs + finetuner.delete_run + finetuner.delete_runs + finetuner.create_experiment + finetuner.list_experiments + finetuner.delete_experiment + finetuner.delete_experiments + +:mod:`finetuner.run.Run` - Run +-------------------- + +.. currentmodule:: finetuner.run.Run + +.. autosummary:: + :nosignatures: + :template: class.rst + + finetuner.run.Run.name + finetuner.run.Run.config + finetuner.run.Run.status + finetuner.run.Run.logs + finetuner.run.Run.stream_logs + finetuner.run.Run.save_artifact + finetuner.run.Run.artifact_id + + +:mod:`finetuner.experiment.Experiment` - Experiment +-------------------- + +.. currentmodule:: finetuner.experiment.Experiment + +.. autosummary:: + :nosignatures: + :template: class.rst + + finetuner.experiment.Experiment.name + finetuner.experiment.Experiment.create_run + finetuner.experiment.Experiment.get_run + finetuner.experiment.Experiment.list_runs + finetuner.experiment.Experiment.delete_run + finetuner.experiment.Experiment.delete_runs + diff --git a/docs/conf.py b/docs/conf.py index 0c07fe2ea..1093c3c30 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -13,6 +13,7 @@ master_doc = 'index' language = 'en' repo_dir = '../' +nitpick_ignore = [('py:class', 'type')] templates_path = ['_templates'] exclude_patterns = [ @@ -77,6 +78,7 @@ extensions = [ 'sphinx.ext.autodoc', + 'sphinx.ext.autosummary', 'sphinx_autodoc_typehints', 'sphinx.ext.viewcode', 'sphinx.ext.coverage', diff --git a/docs/get-started/design-principles.md b/docs/get-started/design-principles.md deleted file mode 100644 index 526d4bdaa..000000000 --- a/docs/get-started/design-principles.md +++ /dev/null @@ -1,45 +0,0 @@ -# Design Principles - -There are several fancy machine learning libraries out there, -so what makes Finetuner unique? - -## Focus on the quality of embeddings - -Finetuner is not designed to tackle classification, -sentiment analysis or object detection task. -Finetuner cares about the quality of the embeddings for neural search, -and this is what the fine-tuned model will produce. - -Given a query {class}`~docarray.document.Document` represented by `embeddings`, -you can compare the similarity/distance of the query Documents against all indexed (embedded) Documents in your storage backend. - - -## Dedicated to optimizing your search task - -Finetuner helps you boost your search system performance on different uses cases: - -+ text-to-text search (or dense vector search). -+ image-to-image search (or content-based image search). -+ text-to-image search (based on [OpenAI CLIP](https://openai.com/blog/clip/)) and [OpenCLIP](https://github.com/mlfoundations/open_clip). -+ more is on the way! - -Search performance depends on a lot of factors. -Internally we have conducted a lot of experiments on various tasks, -such as image-to-image search, -text-to-text search, -cross-modal search. -Across these three tasks, -**Finetuner is able to boost 20%-45% of precision@k and recall@k**. -You can also observe significant performance improvement on other search metrics, -such as mean recipal rank (mRR) or normalized discounted cumulative gain (nDCG). - -## Easy to use - -Finetuner gives the user flexibility to choose machine learning hyper-parameters, -while all these parameters are optional. - -If you do not have a machine learning background, -don't worry about it. -As was stated before, you only need to provide the training data organized as a {class}`~docarray.array.document.DocumentArray` or as a CSV file. -In case you do not know which backbone to choose, -use {meth}`~finetuner.describe_models()` to let Finetuner suggest a backbone model for you. \ No newline at end of file diff --git a/docs/get-started/how-it-works.md b/docs/get-started/how-it-works.md index 90897bcea..c59edaa10 100644 --- a/docs/get-started/how-it-works.md +++ b/docs/get-started/how-it-works.md @@ -1,23 +1,26 @@ -# How Does it Work? +# {octicon}`question` How Does it Work? -## Contrastive metric learning +Finetuner is a framework for using the contrastive learning approach to improve similarity matching with models that encode data into embeddings. +This involves three steps: -From an algorithmic perspective, -**Finetuner** leverages a contrastive metric learning approach to improve your model. -How does it work? +## Step 1: Build an embedding model -### Step 1: Convert a model into an embedding model +Finetuner interprets the architecture of an existing (pre-trained model) which we call backbone. +Those model might not be an embedding model upfront and the architecture not suitable for training it to encode data into embeddings. +Therefore, Finetuner removes the default *head*, applies *pooling* and freezes layers that do not need to be trained. -Finetuner interprets the backbone model architecture, -removes the default *head*, applies *pooling* and freezes layers that do not need to be trained. -For an image classification task (e.g. cats and dogs), -Finetuner is going to remove the classification head (cat-dog classifier) and turn your model into an *embedding model*. +Finetuner takes an existing, pre-trained model, typically called the __backbone__, and analyzes its architecture. +If this model does not already produce embeddings, Finetuner is able to remove the default *head* (the last layers of the network), add new projection layers, apply *pooling*, and freeze layers that do not need to be trained. -This embedding model does not make predictions or outputs a probability, -but instead outputs a feature vector to represent your data. +For instance, Finetuner will turn an image classification model, e.g., for separating cats from dogs, into an *embedding model* +by removing its last layer - the classification head (cat-dog classifier). -### Step 2: Triplet construction and training on-the-fly +This embedding model does not make predictions or output a probability, +but instead outputs a feature vector (an __embedding__) that represents its input. +## Step 2: Tuple/Triplet construction + +````{tab} Uni-modal (with label) Finetuner works on labeled data. It expects either a CSV file or a {class}`~docarray.array.document.DocumentArray` consisting of {class}`~docarray.document.Document`s where each one contains `finetuner_label` corresponding to the class of a specific training example. After receiving a CSV file, its contents are parsed and a {class}`~docarray.array.document.DocumentArray` is constructed. @@ -28,21 +31,30 @@ Finetuner looks for a `Document` with the same `finetuner_label` (positive), and a `Document` with a different `finetuner_label` (negative). The objective is to pull `Document`s which belong to the same class together, while pushing the `Document`s which belong to a different class away from each other. +```` +````{tab} Cross-modal (without label) +Finetuner works on unlabeled text-image pairs. +You can fine-tune a CLIP-like model for text to images search directly without any labels. +It expects either a CSV file or a {class}`~docarray.array.document.DocumentArray` consisting a list of {class}`~docarray.array.document.Document` that contain two chunks: an image chunk and a text chunk. +During fine-tuning, Finetuner leverages text-image pairs and jointly optimizes two models (`CLIPTextEncoder` and `CLIPImageEncoder`) with respect to two classification losses: (1) given a text, find the best matching +image and (2) given an image, find the best matching text. Then it aggregates the two losses into the `CLIPLoss`. +At the end, the output embedding of your data from the `CLIPTextEncoder` is comparable to `CLIPImageEncoder`. +```` -## Cloud-based fine-tuning +## Step 3: Tuning in the cloud -From an engineering perspective, +From an operational perspective, we have hidden all the complexity of machine learning algorithms and resource configuration (such as GPUs). All you need to do is decide on your backbone model and prepare your training data. Once you have logged in to the Jina Ecosystem with {meth}`~finetuner.login()`, -Finetuner will push your training data into our *Cloud Artifact Storage* (only visible to you). +Finetuner will push your training data into the *Jina AI Cloud* (only visible to you). At the same time, we will spin-up an isolated computational resource -with proper memory, CPU, GPU dedicated to your fine-tuning job. +with proper memory, CPU, and a GPU dedicated to your fine-tuning job. -Once fine-tuning is done, Finetuner will again push your fine-tuned model to the *Cloud Artifact Storage* -and make it available for you to pull it back to your machine. +Once fine-tuning is done, Finetuner will push your fine-tuned model to the *Jina AI Cloud* +and make it available for you to download. That's it! On the other hand, diff --git a/docs/get-started/installation.md b/docs/get-started/installation.md index 7aed5cc6b..0c60f21b1 100644 --- a/docs/get-started/installation.md +++ b/docs/get-started/installation.md @@ -1,5 +1,5 @@ (install-finetuner)= -# Installation +# {octicon}`desktop-download` Installation ![PyPI](https://img.shields.io/pypi/v/finetuner?color=%23ffffff&label=%20) is the latest version. @@ -9,7 +9,7 @@ Make sure you have `Python 3.7+` installed on Linux/Mac/Windows: pip install -U finetuner ``` -If you want to encode `docarray.DocumentArray` objects with the {meth}`~finetuner.encode` function, you need to install `"finetuner[full]"`. +If you want to encode `docarray.DocumentArray` objects locally with the {meth}`~finetuner.encode` function, you need to install `"finetuner[full]"`. In this case, some extra dependencies are installed which are necessary to do the inference, e.g., torch, torchvision, and open clip: ```bash diff --git a/docs/imgs/batch-sampling.png b/docs/imgs/batch-sampling.png new file mode 100644 index 000000000..95204a95a Binary files /dev/null and b/docs/imgs/batch-sampling.png differ diff --git a/docs/imgs/metric-train.png b/docs/imgs/metric-train.png new file mode 100644 index 000000000..a18a9df24 Binary files /dev/null and b/docs/imgs/metric-train.png differ diff --git a/docs/imgs/mining.png b/docs/imgs/mining.png new file mode 100644 index 000000000..289e83f84 Binary files /dev/null and b/docs/imgs/mining.png differ diff --git a/docs/imgs/tailor.svg b/docs/imgs/tailor.svg new file mode 100644 index 000000000..74eddf911 --- /dev/null +++ b/docs/imgs/tailor.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/docs/index.md b/docs/index.md index db41cf40b..4c602655c 100644 --- a/docs/index.md +++ b/docs/index.md @@ -21,10 +21,20 @@ get-started/how-it-works get-started/installation -get-started/design-principles walkthrough/index ``` +```{toctree} +:caption: Advanced Topics +:hidden: + +advanced-topics/budget +advanced-topics/negative-mining +advanced-topics/using-callbacks +advanced-topics/linear-probe +advanced-topics/finetuner-executor +``` + ```{toctree} @@ -43,7 +53,7 @@ notebooks/mesh_to_mesh :hidden: :maxdepth: 1 -api/finetuner +api-rst ``` --- diff --git a/docs/notebooks/image_to_image.ipynb b/docs/notebooks/image_to_image.ipynb index 896dd21bb..8e613b63e 100644 --- a/docs/notebooks/image_to_image.ipynb +++ b/docs/notebooks/image_to_image.ipynb @@ -333,57 +333,16 @@ "id": "irvn0igWdLOf" } }, - { - "cell_type": "markdown", - "source": [ - "```python\n", - "import copy\n", - "from io import BytesIO\n", - "from PIL import Image\n", - "\n", - "query_pt = copy.deepcopy(query_data)\n", - "index_pt = copy.deepcopy(index_data)\n", - "\n", - "query_ft = copy.deepcopy(query_pt)\n", - "index_ft = copy.deepcopy(index_pt)\n", - "\n", - "model_pt = finetuner.build_model('resnet50')\n", - "\n", - "finetuner.encode(model=model, data=query_ft)\n", - "finetuner.encode(model=model, data=index_ft)\n", - "\n", - "finetuner.encode(model=model_pt, data=query_pt)\n", - "finetuner.encode(model=model_pt, data=index_pt)\n", - "\n", - "query_ft.match(index_ft)\n", - "query_pt.match(index_pt)\n", - "\n", - "num_samples = 10\n", - "\n", - "for i, (doc_pt, doc_ft) in enumerate(zip(query_pt, query_ft)):\n", - " if i < num_samples:\n", - " print(f'\\n\\nQuery:')\n", - " display(Image.open(BytesIO(doc_pt.blob)))\n", - " print(f'top match before fine-tuning:')\n", - " display(Image.open(BytesIO(doc_pt.matches[0].blob)))\n", - " print(f'top match after fine-tuning:')\n", - " display(Image.open(BytesIO(doc_ft.matches[0].blob)))\n", - "```" - ], - "metadata": { - "id": "cVVqC_vsdXlK" - } - }, { "cell_type": "markdown", "source": [ "To save you some time, we have plotted some examples where the model's ability to return similar images has clearly improved:\n", "\n", - "![image-image-triplets-good](https://finetuner.jina.ai/_images/image-image-triplets-good.png)\n", + "![image-image-triplets-good](https://user-images.githubusercontent.com/6599259/212634591-03bd93dc-900f-47c5-8ada-77cf1d4f9fe6.png)\n", "\n", "On the other hand, there are also cases where the fine-tuned model performs worse, and fails to correctly match images that it previously could. This case is much rarer than the previous case. For this dataset there were 108 occasions where the fine-tuned model returned the correct pair where it couldn't before, and only 33 occasions where the finetuned model returned an incorrect image after fine-tuning but returned a correct one before. Nevertheless it still can happen:\n", "\n", - "![image-image-triplets-bad](https://finetuner.jina.ai/_images/image-image-triplets-bad.png)" + "![image-image-triplets-bad](https://user-images.githubusercontent.com/6599259/212634649-370b643b-63ad-4d46-8a16-bc4988265568.png)" ], "metadata": { "id": "TwL33Jz1datD" diff --git a/docs/notebooks/image_to_image.md b/docs/notebooks/image_to_image.md index 4a0e8c49c..2dd08984c 100644 --- a/docs/notebooks/image_to_image.md +++ b/docs/notebooks/image_to_image.md @@ -209,48 +209,12 @@ query.match(index_data, limit=10, metric='cosine') We can directly compare the results of our fine-tuned model with its zero-shot counterpart to get a better idea of how finetuning affects the results of a search. While the differences between the two models may be subtle for some queries, some of the examples the examples below (such as the second example) show that the model after fine-tuning is able to better match similar images. - -```python -import copy -from io import BytesIO -from PIL import Image - -query_pt = copy.deepcopy(query_data) -index_pt = copy.deepcopy(index_data) - -query_ft = copy.deepcopy(query_pt) -index_ft = copy.deepcopy(index_pt) - -model_pt = finetuner.build_model('resnet50') - -finetuner.encode(model=model, data=query_ft) -finetuner.encode(model=model, data=index_ft) - -finetuner.encode(model=model_pt, data=query_pt) -finetuner.encode(model=model_pt, data=index_pt) - -query_ft.match(index_ft) -query_pt.match(index_pt) - -num_samples = 10 - -for i, (doc_pt, doc_ft) in enumerate(zip(query_pt, query_ft)): - if i < num_samples: - print(f'\n\nQuery:') - display(Image.open(BytesIO(doc_pt.blob))) - print(f'top match before fine-tuning:') - display(Image.open(BytesIO(doc_pt.matches[0].blob))) - print(f'top match after fine-tuning:') - display(Image.open(BytesIO(doc_ft.matches[0].blob))) -``` - - To save you some time, we have plotted some examples where the model's ability to return similar images has clearly improved: -![image-image-triplets-good](https://finetuner.jina.ai/_images/image-image-triplets-good.png) +![image-image-triplets-good](https://user-images.githubusercontent.com/6599259/212634591-03bd93dc-900f-47c5-8ada-77cf1d4f9fe6.png) On the other hand, there are also cases where the fine-tuned model performs worse, and fails to correctly match images that it previously could. This case is much rarer than the previous case. For this dataset there were 108 occasions where the fine-tuned model returned the correct pair where it couldn't before, and only 33 occasions where the finetuned model returned an incorrect image after fine-tuning but returned a correct one before. Nevertheless it still can happen: -![image-image-triplets-bad](https://finetuner.jina.ai/_images/image-image-triplets-bad.png) +![image-image-triplets-bad](https://user-images.githubusercontent.com/6599259/212634649-370b643b-63ad-4d46-8a16-bc4988265568.png) diff --git a/docs/notebooks/images/WandB-mclip.png b/docs/notebooks/images/WandB-mclip.png deleted file mode 100644 index bee735636..000000000 Binary files a/docs/notebooks/images/WandB-mclip.png and /dev/null differ diff --git a/docs/notebooks/images/clip-example-ft.png b/docs/notebooks/images/clip-example-ft.png deleted file mode 100644 index 60722137d..000000000 Binary files a/docs/notebooks/images/clip-example-ft.png and /dev/null differ diff --git a/docs/notebooks/images/clip-example-pt.png b/docs/notebooks/images/clip-example-pt.png deleted file mode 100644 index f15f07b67..000000000 Binary files a/docs/notebooks/images/clip-example-pt.png and /dev/null differ diff --git a/docs/notebooks/images/image-image-triplets-bad.png b/docs/notebooks/images/image-image-triplets-bad.png deleted file mode 100644 index 615b58416..000000000 Binary files a/docs/notebooks/images/image-image-triplets-bad.png and /dev/null differ diff --git a/docs/notebooks/images/image-image-triplets-good.png b/docs/notebooks/images/image-image-triplets-good.png deleted file mode 100644 index d18131e27..000000000 Binary files a/docs/notebooks/images/image-image-triplets-good.png and /dev/null differ diff --git a/docs/notebooks/images/mclip-example-ft-1.png b/docs/notebooks/images/mclip-example-ft-1.png deleted file mode 100644 index 0bc65a70d..000000000 Binary files a/docs/notebooks/images/mclip-example-ft-1.png and /dev/null differ diff --git a/docs/notebooks/images/mclip-example-ft-2.png b/docs/notebooks/images/mclip-example-ft-2.png deleted file mode 100644 index 571d1603d..000000000 Binary files a/docs/notebooks/images/mclip-example-ft-2.png and /dev/null differ diff --git a/docs/notebooks/images/mclip-example-pt-1.png b/docs/notebooks/images/mclip-example-pt-1.png deleted file mode 100644 index d33d12d66..000000000 Binary files a/docs/notebooks/images/mclip-example-pt-1.png and /dev/null differ diff --git a/docs/notebooks/images/mclip-example-pt-2.png b/docs/notebooks/images/mclip-example-pt-2.png deleted file mode 100644 index 8713e76ef..000000000 Binary files a/docs/notebooks/images/mclip-example-pt-2.png and /dev/null differ diff --git a/docs/notebooks/multilingual_text_to_image.ipynb b/docs/notebooks/multilingual_text_to_image.ipynb index 7f16c806a..1d96fdef5 100644 --- a/docs/notebooks/multilingual_text_to_image.ipynb +++ b/docs/notebooks/multilingual_text_to_image.ipynb @@ -7,9 +7,9 @@ "id": "72867ba9-6a8c-4b14-acbf-487ea0a61836" }, "source": [ - "# Multilingual Text-to-Image search with MultilingualCLIP\n", + "# Multilingual Text-to-Image Search with MultilingualCLIP\n", "\n", - "\"Open\n" + "\"Open\n" ] }, { @@ -45,7 +45,7 @@ }, "outputs": [], "source": [ - "!pip install \"finetuner[full]\"" + "!pip install 'finetuner[full]'" ] }, { @@ -65,9 +65,11 @@ "id": "ed1f88d4-f140-48d4-9d20-00e628c73e38" }, "source": [ - "We'll be fine-tuning multilingual CLIP on the electronics section of the [German XMarket dataset](https://xmrec.github.io/data/de/), which contains images and descriptions of electronics products in German. \n", + "We'll be fine-tuning multilingual CLIP on the electronics section of the [German Fashion12k dataset](https://github.com/Toloka/Fashion12K_german_queries), which contains images and descriptions of fashion products in German.\n", "\n", - "Each product in the dataset contains several attributes, we will be making use of the image and category attributes to create a [`Document`](https://docarray.jina.ai/fundamentals/document/#document) containing two [chunks](https://docarray.jina.ai/fundamentals/document/nested/#nested-structure), one containing the image and another containing the category of the product." + "The images are a subset of the [xthan/fashion-200k dataset](https://github.com/xthan/fashion-200k), and we have commissioned their human annotations via crowdsourcing platform. Annotations were made in two steps. First, we passed the 12,000 images to annotators in their large international user community, who added descriptive captions.\n", + "\n", + "Each product in the dataset contains several attributes, we will be making use of the image and captions to create a [`Document`](https://docarray.jina.ai/fundamentals/document/#document) containing two [chunks](https://docarray.jina.ai/fundamentals/document/nested/#nested-structure), one containing the image and another containing the category of the product." ] }, { @@ -78,7 +80,7 @@ }, "source": [ "## Data\n", - "We will use the `xmarket-de-electronics` dataset, which we have already pre-processed and made available on the Jina AI Cloud. You can access it using `DocArray.pull`:" + "We will use the `DE-Fashion-Image-Text-Multimodal-train` dataset, which we have already pre-processed and made available on the Jina AI Cloud. You can access it using `DocArray.pull`:" ] }, { @@ -105,11 +107,13 @@ }, "outputs": [], "source": [ - "train_data = 'finetuner/xmarket-de-electronics-train-data'\n", - "eval_data = 'finetuner/xmarket-de-electronics-test-data'\n", + "train_data = DocumentArray.pull('finetuner/DE-Fashion-Image-Text-Multimodal-train', show_progress=True)\n", + "eval_data = DocumentArray.pull('finetuner/DE-Fashion-Image-Text-Multimodal-test', show_progress=True)\n", + "\n", + "query_data = DocumentArray.pull('finetuner/DE-Fashion-Image-Text-Multimodal-query', show_progress=True)\n", + "index_data = DocumentArray.pull('finetuner/DE-Fashion-Image-Text-Multimodal-index', show_progress=True)\n", "\n", - "query_data = 'finetuner/xmarket-de-electronics-query-data'\n", - "index_data = 'finetuner/xmarket-de-electronics-index-data'\n" + "train_data.summary()" ] }, { @@ -143,21 +147,19 @@ }, "outputs": [], "source": [ - "import finetuner\n", "from finetuner.callback import EvaluationCallback, WandBLogger\n", "\n", "run = finetuner.fit(\n", " model='xlm-roberta-base-ViT-B-32::laion5b_s13b_b90k',\n", - " train_data=train_data,\n", - " eval_data=eval_data,\n", + " train_data='finetuner/DE-Fashion-Image-Text-Multimodal-train',\n", " epochs=5,\n", " learning_rate=1e-6,\n", " loss='CLIPLoss',\n", " device='cuda',\n", " callbacks=[\n", " EvaluationCallback(\n", - " query_data=query_data,\n", - " index_data=index_data,\n", + " query_data='finetuner/DE-Fashion-Image-Text-Multimodal-query',\n", + " index_data='finetuner/DE-Fashion-Image-Text-Multimodal-index',\n", " model='clip-text',\n", " index_model='clip-vision'\n", " ),\n", @@ -248,12 +250,23 @@ "wandb: Syncing run ancient-galaxy-2\n", "wandb: View project at \n", "wandb: View run at \n", - "\n", + "[07:48:21] INFO Done ✨ __main__.py:195\n", + " DEBUG Finetuning took 0 days, 0 hours 8 minutes and 19 seconds __main__.py:197\n", + " DEBUG Metric: 'clip-text-to-clip-vision_precision_at_k' Value: 0.04035 __main__.py:206\n", + " DEBUG Metric: 'clip-text-to-clip-vision_hit_at_k' Value: 0.79200 __main__.py:206\n", + " DEBUG Metric: 'clip-text-to-clip-vision_average_precision' Value: 0.41681 __main__.py:206\n", + " DEBUG Metric: 'clip-text-to-clip-vision_reciprocal_rank' Value: 0.41773 __main__.py:206\n", + " DEBUG Metric: 'clip-text-to-clip-vision_dcg_at_k' Value: 0.57113 __main__.py:206\n", + " INFO Building the artifact ... __main__.py:208\n", + " INFO Pushing artifact to Jina AI Cloud ... __main__.py:234\n", + "[08:02:33] INFO Artifact pushed under ID '63b52b5b3278416c15353bf3' __main__.py:236\n", + " DEBUG Artifact size is 2599.190 MB __main__.py:238\n", + " INFO Finished 🚀 __main__.py:239\n", "```\n", "\n", "The generated plots should look like this:\n", "\n", - "![WandB-mclip](https://finetuner.jina.ai/_images/WandB-mclip.png)\n" + "![WandB-mclip](https://user-images.githubusercontent.com/6599259/212645881-20071aba-8643-4878-bc53-97eb6f766bf0.png)\n" ] }, { @@ -302,7 +315,6 @@ }, "outputs": [], "source": [ - "from docarray import Document, DocumentArray\n", "text_da = DocumentArray([Document(text='setwas Text zum Codieren')])\n", "image_da = DocumentArray([Document(uri='https://upload.wikimedia.org/wikipedia/commons/4/4e/Single_apple.png')])\n", "\n", @@ -359,61 +371,13 @@ "id": "e69fdfb2-6482-45fb-9c4d-41e548ef8f06" }, "source": [ - "```python\n", - "from finetuner import build_model\n", - "\n", - "pt_query = copy.deepcopy(query_data)\n", - "pt_index = copy.deepcopy(index_data)\n", - "\n", - "ft_query = copy.deepcopy(query_data)\n", - "ft_index = copy.deepcopy(index_data)\n", - "\n", - "zero_shot_text_encoder = build_model(\n", - " name='xlm-roberta-base-ViT-B-32::laion5b_s13b_b90k',\n", - " select_model='clip-text',\n", - ")\n", - "zero_shot_image_encoder = build_model(\n", - " name='xlm-roberta-base-ViT-B-32::laion5b_s13b_b90k',\n", - " select_model='clip-vision',\n", - ")\n", - "\n", - "finetuner.encode(model=zero_shot_text_encoder, data=pt_query)\n", - "finetuner.encode(model=zero_shot_image_encoder, data=pt_index)\n", - "\n", - "finetuner.encode(model=mclip_text_encoder, data=ft_query)\n", - "finetuner.encode(model=mclip_image_encoder, data=ft_index)\n", - "\n", - "pt_query.match(pt_index)\n", - "ft_query.match(ft_index)\n", - "\n", - "def plot_matches(num_samples = 10):\n", - " seen = set()\n", - " for i, (pt_q, ft_q) in enumerate(zip(pt_query, ft_query)):\n", - " if i >= num_samples: break\n", - " if pt_q.text in seen:\n", - " i = i - 1\n", - " continue\n", - " seen.add(pt_q.text)\n", - " print((\n", - " f'results for query \"{pt_q.text}\"'\n", - " ' using a zero-shot model (top) and '\n", - " 'the fine-tuned model (bottom):'\n", - " ))\n", - " pt_q.matches[:1].plot_image_sprites(fig_size=(3,3))\n", - " ft_q.matches[:1].plot_image_sprites(fig_size=(3,3))\n", - "```\n", - "```plaintext\n", - "results for query: \"externe mikrofone (external microphone)\" using a zero-shot model (top) and the fine-tuned model (bottom)\n", - "```\n", - "![mclip-example-pt-1](https://finetuner.jina.ai/_images/mclip-example-pt-1.png)\n", - "![mclip-example-ft-1](https://finetuner.jina.ai/_images/mclip-example-ft-1.png)\n", - "\n", "```plaintext\n", - "results for query: \"prozessorlüfter (processor fan)\" using a zero-shot model (top) and the fine-tuned model (bottom)\n", + "results for query: \"Spitzen-Midirock Teilfutter Schwarz\" (Lace midi skirt partial lining black) using a zero-shot model and the fine-tuned model\n", "```\n", "\n", - "![mclip-example-pt-2](https://finetuner.jina.ai/_images/mclip-example-pt-2.png)\n", - "![mclip-example-ft-2](https://finetuner.jina.ai/_images/mclip-example-ft-2.png)\n", + "before | after\n", + ":-------------------------:|:-------------------------:\n", + "![mclip-example-pt-1](https://jina-ai-gmbh.ghost.io/content/images/2022/12/mclip-before.png) | ![mclip-example-ft-1](https://jina-ai-gmbh.ghost.io/content/images/2022/12/mclip-after.png)\n", "\n", "\n" ] @@ -435,7 +399,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.10" + "version": "3.9.12" }, "vscode": { "interpreter": { diff --git a/docs/notebooks/multilingual_text_to_image.md b/docs/notebooks/multilingual_text_to_image.md index b2fb6d673..106df4bdb 100644 --- a/docs/notebooks/multilingual_text_to_image.md +++ b/docs/notebooks/multilingual_text_to_image.md @@ -13,9 +13,9 @@ jupyter: --- -# Multilingual Text-to-Image search with MultilingualCLIP +# Multilingual Text-to-Image Search with MultilingualCLIP -Open In Colab +Open In Colab @@ -33,7 +33,7 @@ This guide will show you how to finetune a multilingual CLIP model for a text to ```python id="9261d0a7-ad6d-461f-bdf7-54e9804cc45d" -!pip install "finetuner[full]" +!pip install 'finetuner[full]' ``` @@ -41,14 +41,16 @@ This guide will show you how to finetune a multilingual CLIP model for a text to -We'll be fine-tuning multilingual CLIP on the electronics section of the [German XMarket dataset](https://xmrec.github.io/data/de/), which contains images and descriptions of electronics products in German. +We'll be fine-tuning multilingual CLIP on the electronics section of the [German Fashion12k dataset](https://github.com/Toloka/Fashion12K_german_queries), which contains images and descriptions of fashion products in German. -Each product in the dataset contains several attributes, we will be making use of the image and category attributes to create a [`Document`](https://docarray.jina.ai/fundamentals/document/#document) containing two [chunks](https://docarray.jina.ai/fundamentals/document/nested/#nested-structure), one containing the image and another containing the category of the product. +The images are a subset of the [xthan/fashion-200k dataset](https://github.com/xthan/fashion-200k), and we have commissioned their human annotations via crowdsourcing platform. Annotations were made in two steps. First, we passed the 12,000 images to annotators in their large international user community, who added descriptive captions. + +Each product in the dataset contains several attributes, we will be making use of the image and captions to create a [`Document`](https://docarray.jina.ai/fundamentals/document/#document) containing two [chunks](https://docarray.jina.ai/fundamentals/document/nested/#nested-structure), one containing the image and another containing the category of the product. ## Data -We will use the `xmarket-de-electronics` dataset, which we have already pre-processed and made available on the Jina AI Cloud. You can access it using `DocArray.pull`: +We will use the `DE-Fashion-Image-Text-Multimodal-train` dataset, which we have already pre-processed and made available on the Jina AI Cloud. You can access it using `DocArray.pull`: ```python id="4420a4ac-531a-4db3-af75-ebb58d8f828b" @@ -59,12 +61,13 @@ finetuner.login(force=True) ``` ```python id="bab5c3fb-ee75-4818-bd18-23c7a5983e1b" -train_data = 'finetuner/xmarket-de-electronics-train-data' -eval_data = 'finetuner/xmarket-de-electronics-test-data' +train_data = DocumentArray.pull('finetuner/DE-Fashion-Image-Text-Multimodal-train', show_progress=True) +eval_data = DocumentArray.pull('finetuner/DE-Fashion-Image-Text-Multimodal-test', show_progress=True) -query_data = 'finetuner/xmarket-de-electronics-query-data' -index_data = 'finetuner/xmarket-de-electronics-index-data' +query_data = DocumentArray.pull('finetuner/DE-Fashion-Image-Text-Multimodal-query', show_progress=True) +index_data = DocumentArray.pull('finetuner/DE-Fashion-Image-Text-Multimodal-index', show_progress=True) +train_data.summary() ``` @@ -78,21 +81,19 @@ Now that our data has been prepared, we can start our fine-tuning run. ```python id="a0cba20d-e335-43e0-8936-d926568034b3" -import finetuner from finetuner.callback import EvaluationCallback, WandBLogger run = finetuner.fit( model='xlm-roberta-base-ViT-B-32::laion5b_s13b_b90k', - train_data=train_data, - eval_data=eval_data, + train_data='finetuner/DE-Fashion-Image-Text-Multimodal-train', epochs=5, learning_rate=1e-6, loss='CLIPLoss', device='cuda', callbacks=[ EvaluationCallback( - query_data=query_data, - index_data=index_data, + query_data='finetuner/DE-Fashion-Image-Text-Multimodal-query', + index_data='finetuner/DE-Fashion-Image-Text-Multimodal-index', model='clip-text', index_model='clip-vision' ), @@ -149,12 +150,23 @@ wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run ancient-galaxy-2 wandb: View project at wandb: View run at - +[07:48:21] INFO Done ✨ __main__.py:195 + DEBUG Finetuning took 0 days, 0 hours 8 minutes and 19 seconds __main__.py:197 + DEBUG Metric: 'clip-text-to-clip-vision_precision_at_k' Value: 0.04035 __main__.py:206 + DEBUG Metric: 'clip-text-to-clip-vision_hit_at_k' Value: 0.79200 __main__.py:206 + DEBUG Metric: 'clip-text-to-clip-vision_average_precision' Value: 0.41681 __main__.py:206 + DEBUG Metric: 'clip-text-to-clip-vision_reciprocal_rank' Value: 0.41773 __main__.py:206 + DEBUG Metric: 'clip-text-to-clip-vision_dcg_at_k' Value: 0.57113 __main__.py:206 + INFO Building the artifact ... __main__.py:208 + INFO Pushing artifact to Jina AI Cloud ... __main__.py:234 +[08:02:33] INFO Artifact pushed under ID '63b52b5b3278416c15353bf3' __main__.py:236 + DEBUG Artifact size is 2599.190 MB __main__.py:238 + INFO Finished 🚀 __main__.py:239 ``` The generated plots should look like this: -![WandB-mclip](https://finetuner.jina.ai/_images/WandB-mclip.png) +![WandB-mclip](https://user-images.githubusercontent.com/6599259/212645881-20071aba-8643-4878-bc53-97eb6f766bf0.png) @@ -176,7 +188,6 @@ let's use the fine-tuned model to encode a new `Document`: ```python id="fe43402f-4191-4343-905c-c75c64694662" -from docarray import Document, DocumentArray text_da = DocumentArray([Document(text='setwas Text zum Codieren')]) image_da = DocumentArray([Document(uri='https://upload.wikimedia.org/wikipedia/commons/4/4e/Single_apple.png')]) @@ -215,61 +226,13 @@ We can directly compare the results of our fine-tuned model with an untrained mu -```python -from finetuner import build_model - -pt_query = copy.deepcopy(query_data) -pt_index = copy.deepcopy(index_data) - -ft_query = copy.deepcopy(query_data) -ft_index = copy.deepcopy(index_data) - -zero_shot_text_encoder = build_model( - name='xlm-roberta-base-ViT-B-32::laion5b_s13b_b90k', - select_model='clip-text', -) -zero_shot_image_encoder = build_model( - name='xlm-roberta-base-ViT-B-32::laion5b_s13b_b90k', - select_model='clip-vision', -) - -finetuner.encode(model=zero_shot_text_encoder, data=pt_query) -finetuner.encode(model=zero_shot_image_encoder, data=pt_index) - -finetuner.encode(model=mclip_text_encoder, data=ft_query) -finetuner.encode(model=mclip_image_encoder, data=ft_index) - -pt_query.match(pt_index) -ft_query.match(ft_index) - -def plot_matches(num_samples = 10): - seen = set() - for i, (pt_q, ft_q) in enumerate(zip(pt_query, ft_query)): - if i >= num_samples: break - if pt_q.text in seen: - i = i - 1 - continue - seen.add(pt_q.text) - print(( - f'results for query "{pt_q.text}"' - ' using a zero-shot model (top) and ' - 'the fine-tuned model (bottom):' - )) - pt_q.matches[:1].plot_image_sprites(fig_size=(3,3)) - ft_q.matches[:1].plot_image_sprites(fig_size=(3,3)) -``` -```plaintext -results for query: "externe mikrofone (external microphone)" using a zero-shot model (top) and the fine-tuned model (bottom) -``` -![mclip-example-pt-1](https://finetuner.jina.ai/_images/mclip-example-pt-1.png) -![mclip-example-ft-1](https://finetuner.jina.ai/_images/mclip-example-ft-1.png) - ```plaintext -results for query: "prozessorlüfter (processor fan)" using a zero-shot model (top) and the fine-tuned model (bottom) +results for query: "Spitzen-Midirock Teilfutter Schwarz" (Lace midi skirt partial lining black) using a zero-shot model and the fine-tuned model ``` -![mclip-example-pt-2](https://finetuner.jina.ai/_images/mclip-example-pt-2.png) -![mclip-example-ft-2](https://finetuner.jina.ai/_images/mclip-example-ft-2.png) +before | after +:-------------------------:|:-------------------------: +![mclip-example-pt-1](https://jina-ai-gmbh.ghost.io/content/images/2022/12/mclip-before.png) | ![mclip-example-ft-1](https://jina-ai-gmbh.ghost.io/content/images/2022/12/mclip-after.png) diff --git a/docs/notebooks/text_to_image.ipynb b/docs/notebooks/text_to_image.ipynb index c936ba803..6cb68c1e1 100644 --- a/docs/notebooks/text_to_image.ipynb +++ b/docs/notebooks/text_to_image.ipynb @@ -371,61 +371,13 @@ { "cell_type": "markdown", "source": [ - "```python\n", - "import copy\n", - "from finetuner import build_model\n", - "\n", - "pt_query = copy.deepcopy(query_data)\n", - "pt_index = copy.deepcopy(index_data)\n", - "\n", - "ft_query = copy.deepcopy(query_data)\n", - "ft_index = copy.deepcopy(index_data)\n", - "\n", - "zero_shot_text_encoder = build_model(\n", - " name='openai/clip-vit-base-patch32',\n", - " select_model='clip-text',\n", - ")\n", - "zero_shot_image_encoder = build_model(\n", - " name='openai/clip-vit-base-patch32',\n", - " select_model='clip-vision',\n", - ")\n", - "\n", - "finetuner.encode(model=zero_shot_text_encoder, data=pt_query)\n", - "finetuner.encode(model=zero_shot_image_encoder, data=pt_index)\n", - "\n", - "finetuner.encode(model=clip_text_encoder, data=ft_query)\n", - "finetuner.encode(model=clip_image_encoder, data=ft_index)\n", - "\n", - "pt_query.match(pt_index)\n", - "ft_query.match(ft_index)\n", - "\n", - "def plot_matches(num_samples = 5):\n", - " seen = set()\n", - " for i, (pt_q, ft_q) in enumerate(zip(pt_query, ft_query)):\n", - " if i > num_samples: break\n", - " if pt_q.text in seen:\n", - " continue\n", - " seen.add(pt_q.text)\n", - " print((\n", - " f'results for query \"{pt_q.text}\"'\n", - " ' using a zero-shot model (top) and '\n", - " 'the fine-tuned model (bottom):'\n", - " ))\n", - " pt_q.matches[:4].plot_image_sprites(fig_size=(3,3))\n", - " ft_q.matches[:4].plot_image_sprites(fig_size=(3,3))\n", - " \n", - "plot_matches()\n", - "\n", - "\n", - "```\n", - "\n", "\n", "```plaintext\n", "Results for query: \"nightingale tee jacket\" using a zero-shot model (top) and the fine-tuned model (bottom)\n", "```\n", - "![clip-example-pt](https://finetuner.jina.ai/_images/clip-example-pt.png)\n", + "![clip-example-pt](https://user-images.githubusercontent.com/6599259/212634395-6f336d39-cda7-425d-80a2-10facae3b824.png)\n", "\n", - "![clip-example-ft](https://finetuner.jina.ai/_images/clip-example-ft.png)\n" + "![clip-example-ft](https://user-images.githubusercontent.com/6599259/212634112-a44c6c4c-2cc1-4dfb-8e29-0d02b2d6b95c.png)\n" ], "metadata": { "id": "C30UVpHDX4HF" diff --git a/docs/notebooks/text_to_image.md b/docs/notebooks/text_to_image.md index ff7ca0962..d23d4c8fc 100644 --- a/docs/notebooks/text_to_image.md +++ b/docs/notebooks/text_to_image.md @@ -248,60 +248,12 @@ We can directly compare the results of our fine-tuned model with a pre-trained c -```python -import copy -from finetuner import build_model - -pt_query = copy.deepcopy(query_data) -pt_index = copy.deepcopy(index_data) - -ft_query = copy.deepcopy(query_data) -ft_index = copy.deepcopy(index_data) - -zero_shot_text_encoder = build_model( - name='openai/clip-vit-base-patch32', - select_model='clip-text', -) -zero_shot_image_encoder = build_model( - name='openai/clip-vit-base-patch32', - select_model='clip-vision', -) - -finetuner.encode(model=zero_shot_text_encoder, data=pt_query) -finetuner.encode(model=zero_shot_image_encoder, data=pt_index) - -finetuner.encode(model=clip_text_encoder, data=ft_query) -finetuner.encode(model=clip_image_encoder, data=ft_index) - -pt_query.match(pt_index) -ft_query.match(ft_index) - -def plot_matches(num_samples = 5): - seen = set() - for i, (pt_q, ft_q) in enumerate(zip(pt_query, ft_query)): - if i > num_samples: break - if pt_q.text in seen: - continue - seen.add(pt_q.text) - print(( - f'results for query "{pt_q.text}"' - ' using a zero-shot model (top) and ' - 'the fine-tuned model (bottom):' - )) - pt_q.matches[:4].plot_image_sprites(fig_size=(3,3)) - ft_q.matches[:4].plot_image_sprites(fig_size=(3,3)) - -plot_matches() - - -``` - ```plaintext Results for query: "nightingale tee jacket" using a zero-shot model (top) and the fine-tuned model (bottom) ``` -![clip-example-pt](https://finetuner.jina.ai/_images/clip-example-pt.png) +![clip-example-pt](https://user-images.githubusercontent.com/6599259/212634395-6f336d39-cda7-425d-80a2-10facae3b824.png) -![clip-example-ft](https://finetuner.jina.ai/_images/clip-example-ft.png) +![clip-example-ft](https://user-images.githubusercontent.com/6599259/212634112-a44c6c4c-2cc1-4dfb-8e29-0d02b2d6b95c.png) diff --git a/docs/notebooks/text_to_text.ipynb b/docs/notebooks/text_to_text.ipynb index 4fbfc92f6..05463294e 100644 --- a/docs/notebooks/text_to_text.ipynb +++ b/docs/notebooks/text_to_text.ipynb @@ -30,17 +30,6 @@ "!pip install 'finetuner[full]'" ] }, - { - "cell_type": "code", - "source": [ - "!pip install git+https://github.com/jina-ai/jina-hubble-sdk.git@fix-post-success-type" - ], - "metadata": { - "id": "cu1HRyCAlH_A" - }, - "execution_count": null, - "outputs": [] - }, { "cell_type": "markdown", "metadata": { @@ -351,42 +340,7 @@ "## Before and after\n", "We can directly compare the results of our fine-tuned model with its zero-shot counterpart to get a better idea of how finetuning affects the results of a search. While the zero-shot model is able to produce results that are very similar to the initial query, it is common for the topic of the question to change, with the structure staying the same. After fine-tuning, the returned questions are consistently relevant to the initial query, even in cases where the structure of the sentence is different.\n", "\n", - "```python\n", - "import copy\n", - "\n", - "query_pt = DocumentArray.pull('finetuner/quora-test-query-da', show_progress=True)\n", - "index_pt = DocumentArray.pull('finetuner/quora-test-index-da', show_progress=True)\n", - "\n", - "query_ft = copy.deepcopy(query_pt)\n", - "index_ft = copy.deepcopy(index_pt)\n", - "\n", - "model_pt = finetuner.build_model('bert-base-cased')\n", - "\n", - "finetuner.encode(model=model, data=query_ft)\n", - "finetuner.encode(model=model, data=index_ft)\n", - "\n", - "finetuner.encode(model=model_pt, data=query_pt)\n", - "finetuner.encode(model=model_pt, data=index_pt)\n", - "\n", - "query_ft.match(index_ft)\n", - "query_py.match(index_pt)\n", - "\n", - "num_samples = 5\n", - "\n", - "for i, (doc_pt, doc_ft) in enumerate(zip(query_pt, query_ft)):\n", - " if i < num_samples:\n", - " print(f'\\nQuery: {doc_ft.text}')\n", - " print(' matches pretrained:')\n", - " for match in doc_pt.matches[:5]:\n", - " print(f' - {match.text}')\n", - " print(' matches finetuned')\n", - " for match in doc_ft.matches[:5]:\n", - " print(f' - {match.text}')\n", - "\n", - "```\n", - "\n", "```plaintext\n", - "\n", "Query: What's the best way to start learning robotics?\n", " matches pretrained:\n", " - What is the best way to start with robotics?\n", diff --git a/docs/notebooks/text_to_text.md b/docs/notebooks/text_to_text.md index fffe98052..081bb247d 100644 --- a/docs/notebooks/text_to_text.md +++ b/docs/notebooks/text_to_text.md @@ -29,10 +29,6 @@ This guide will lead you through an example use-case to show you how Finetuner c !pip install 'finetuner[full]' ``` -```python id="cu1HRyCAlH_A" -!pip install git+https://github.com/jina-ai/jina-hubble-sdk.git@fix-post-success-type -``` - ## Task @@ -234,42 +230,7 @@ query.match(index_data, limit=10, metric='cosine') ## Before and after We can directly compare the results of our fine-tuned model with its zero-shot counterpart to get a better idea of how finetuning affects the results of a search. While the zero-shot model is able to produce results that are very similar to the initial query, it is common for the topic of the question to change, with the structure staying the same. After fine-tuning, the returned questions are consistently relevant to the initial query, even in cases where the structure of the sentence is different. -```python -import copy - -query_pt = DocumentArray.pull('finetuner/quora-test-query-da', show_progress=True) -index_pt = DocumentArray.pull('finetuner/quora-test-index-da', show_progress=True) - -query_ft = copy.deepcopy(query_pt) -index_ft = copy.deepcopy(index_pt) - -model_pt = finetuner.build_model('bert-base-cased') - -finetuner.encode(model=model, data=query_ft) -finetuner.encode(model=model, data=index_ft) - -finetuner.encode(model=model_pt, data=query_pt) -finetuner.encode(model=model_pt, data=index_pt) - -query_ft.match(index_ft) -query_py.match(index_pt) - -num_samples = 5 - -for i, (doc_pt, doc_ft) in enumerate(zip(query_pt, query_ft)): - if i < num_samples: - print(f'\nQuery: {doc_ft.text}') - print(' matches pretrained:') - for match in doc_pt.matches[:5]: - print(f' - {match.text}') - print(' matches finetuned') - for match in doc_ft.matches[:5]: - print(f' - {match.text}') - -``` - ```plaintext - Query: What's the best way to start learning robotics? matches pretrained: - What is the best way to start with robotics? diff --git a/docs/walkthrough/index.md b/docs/walkthrough/index.md index c80b5c104..035bab24e 100644 --- a/docs/walkthrough/index.md +++ b/docs/walkthrough/index.md @@ -1,4 +1,4 @@ -# Walkthrough +# {octicon}`list-ordered` Walkthrough Why do I need Finetuner? @@ -72,6 +72,5 @@ create-training-data choose-backbone run-job save-model -using-callbacks -integrate-with-jina +inference ``` \ No newline at end of file diff --git a/docs/walkthrough/inference.md b/docs/walkthrough/inference.md new file mode 100644 index 000000000..6a42056a7 --- /dev/null +++ b/docs/walkthrough/inference.md @@ -0,0 +1,177 @@ +# Inference + +Once fine-tuning is finished, it's time to actually use the model. +You can use the fine-tuned models directly to encode [DocumentArray](https://docarray.jina.ai/) objects or setting up an encoding service. +When encoding, data can also be provided as a regular list. + +```{admonition} Use FinetunerExecutor inside a Jina Flow +:class: hint +Finetuner offers the {class}`~finetuner.encode` interface to embed your data locally +If you would like to use fine-tuned model inside a Jina Flow as an Executor, checkout +{doc}`/advanced-topics/finetuner-executor`. +``` + +(integrate-with-list)= +## Encoding a List +Data that is stored in a regular list can be embedded in the same way you would a [DocumentArray](https://docarray.jina.ai/). Since the modality of your input data can be inferred from the model being used, there is no need to provide any additional information besides the content you want to encode. When providing data as a list, the `finetuner.encode` method will return a `np.ndarray` of embeddings, instead of a `docarray.DocumentArray`: + +````{tab} Artifact id and token +```python +import finetuner + +finetuner.login() + +token = finetuner.get_token() +run = finetuner.get_run( + experiment_name='YOUR-EXPERIMENT', + run_name='YOUR-RUN' +) + +model = finetuner.get_model( + run.artifact_id, + token=token, +) + +texts = ['some text to encode'] + +embeddings = finetuner.encode(model=model, data=texts) + +for text, embedding in zip(texts, embeddings): + print(f'Text of the returned document: {text}') + print(f'Shape of the embedding: {embedding.shape}') +``` +```` +````{tab} Locally saved artifact +```python +import finetuner + +model = finetuner.get_model('/path/to/YOUR-MODEL.zip') + +texts = ['some text to encode'] + +embeddings = finetuner.encode(model=model, data=texts) + +for text, embedding in zip(texts, embeddings): + print(f'Text of the returned document: {text}') + print(f'Shape of the embedding: {embedding.shape}') +``` +```` +````{tab} (Special case) CLIP inference +```python +import finetuner + +finetuner.login() + +token = finetuner.get_token() +run = finetuner.get_run( + experiment_name='YOUR-EXPERIMENT', + run_name='YOUR-RUN' +) + +model = finetuner.get_model( + run.artifact_id, + token=token, + select_model='clip-text' # use `clip-vision` to encode image. +) + +texts = ['some text to encode'] +embeddings = finetuner.encode(model=model, data=texts) + +for text, embedding in zip(texts, embeddings): + print(f'Text of the returned document: {text}') + print(f'Shape of the embedding: {embedding.shape}') +``` +```` + + +```{admonition} Inference with ONNX +:class: tip +In case you set `to_onnx=True` when calling `finetuner.fit` function, +please use `model = finetuner.get_model('/path/to/YOUR-MODEL.zip', is_onnx=True)` +``` + +```{admonition} Encoding other Modalities +:class: tip +Of course you can not only encode texts. +For encoding a list of images, you can provide uris, e.g., +`embeddings = finetuner.encode(model=model, data=['path/to/apple.png'])` +``` + +(integrate-with-docarray)= +## Encoding a DocumentArray + +To embed a [DocumentArray](https://docarray.jina.ai/) with a fine-tuned model, you can get the model of your Run via the {func}`~finetuner.get_model` function and embed it via the {func}`finetuner.encode` function: + +````{tab} Artifact id and token +```python +from docarray import DocumentArray, Document +import finetuner + +finetuner.login() + +token = finetuner.get_token() +run = finetuner.get_run( + experiment_name='YOUR-EXPERIMENT', + run_name='YOUR-RUN' +) + +model = finetuner.get_model( + run.artifact_id, + token=token, +) + +da = DocumentArray([Document(text='some text to encode')]) +finetuner.encode(model=model, data=da) + +for doc in da: + print(f'Text of the returned document: {doc.text}') + print(f'Shape of the embedding: {doc.embedding.shape}') +``` +```` +````{tab} Locally saved artifact +```python +from docarray import DocumentArray, Document +import finetuner + +model = finetuner.get_model('/path/to/YOUR-MODEL.zip') + +da = DocumentArray([Document(text='some text to encode')]) +finetuner.encode(model=model, data=da) + +for doc in da: + print(f'Text of the returned document: {doc.text}') + print(f'Shape of the embedding: {doc.embedding.shape}') +``` +```` +````{tab} (Special case) CLIP inference +```python +from docarray import DocumentArray, Document +import finetuner + +finetuner.login() + +token = finetuner.get_token() +run = finetuner.get_run( + experiment_name='YOUR-EXPERIMENT', + run_name='YOUR-RUN' +) + +model = finetuner.get_model( + run.artifact_id, + token=token, + select_model='clip-text' # use `clip-vision` to encode image. +) + +da = DocumentArray([Document(text='some text to encode')]) +finetuner.encode(model=model, data=da) + +for doc in da: + print(f'Text of the returned document: {doc.text}') + print(f'Shape of the embedding: {doc.embedding.shape}') +``` +```` + +```console +Text of the returned document: some text to encode +Shape of the embedding: (768,) +``` diff --git a/docs/walkthrough/run-job.md b/docs/walkthrough/run-job.md index a0bcceb41..1231862fb 100644 --- a/docs/walkthrough/run-job.md +++ b/docs/walkthrough/run-job.md @@ -111,35 +111,13 @@ Otherwise, it could happen, that your model overfits on the training data and fo Similarly, two or three epochs (number of passes thorough the training data) are often enough for a fine-tuning job. ``` -### Configuration of the miner -To filter the instances in a batch that are used to calculate the loss, you can use miners. -Finetuner allows you to use miners provided by the [PyTorch Metric Learning](https://kevinmusgrave.github.io/pytorch-metric-learning) framework. -To select a specific miner, you can pass its name to the fit function, e.g., `AngularMiner`, `TripletMarginMiner`, ... - -Please note that the miner has to be compatible with the loss function you selected. -For instance, if you choose to train a model with the `TripleMarginLoss`, you can use the `TripletMarginMiner`. -While without this miner, all possible triples with an anchor, a positive, and a negative candidate are constructed, the miner reduces this set of triples. -Usually, only triples with hard negatives are selected where the distance between the positive and the negative example is inside a margin of `0.2`. -If you want to pass additional parameters to configure the miner, you can specify the `miner_options` parameter of the fit function. -The example below shows how to apply hard-negative mining: - -```diff -run = finetuner.fit( - ..., - loss='TripleMarginLoss', -+ miner='TripletMarginMiner', -+ miner_options={'margin': 0.3, 'type_of_triplets': 'hard'} -) -``` - -The possible choices `type_of_triplets` are: - -+ `all`: Use all triplets, identical to no mining. -+ `easy`: Use all easy triplets, all triplets that do not violate the margin. -+ `semihard`: Use semi-hard triplets, the negative is further from the anchor than the positive. -+ `hard`: Use hard triplets, the negative is closer to the anchor than the positive. - -Finetuner takes `TripleMarginLoss` as default loss function with no negative mining. -A detailed description of the miners and their parameters is specified in the [PyTorch Metric Learning documentation](https://kevinmusgrave.github.io/pytorch-metric-learning/miners/). +### Construction of Training Batches +The training of your model is done in batches. +The `batch_size` parameter determines the number of items per batch. +Finetuner constructs batches so that each batch contains the same number of classes and +as many items per class as configured via the `num_items_per_class` parameter. +However, if it is not possible, e.g., because `batch_size` is not dividable by +`num_items_per_class` or the training dataset does not contain enough classes, +Finetuner tries to choose a similar value for `num_items_per_class` which is working. \ No newline at end of file diff --git a/finetuner/__init__.py b/finetuner/__init__.py index 327576e49..fce5d4e2e 100644 --- a/finetuner/__init__.py +++ b/finetuner/__init__.py @@ -38,7 +38,8 @@ def login(force: bool = False, interactive: Optional[bool] = None): """ - Login to Jina AI to use cloud-based fine-tuning. Thereby, an authentication token is + Login to Jina AI Cloud to use cloud-based fine-tuning. + Thereby, an authentication token is generated which can be read with the :func:`~finetuner.get_token` function. :param force: If set to true, an existing token will be overwritten. Otherwise, @@ -65,7 +66,7 @@ def _build_name_stub_map() -> Dict[str, model_stub.ModelStubType]: def list_models() -> List[str]: - """List available models for training.""" + """List available models.""" return [name for name in list_model_classes()] @@ -91,7 +92,8 @@ def list_model_options() -> Dict[str, List[Dict[str, Any]]]: def describe_models(task: Optional[str] = None) -> None: - """Describe available models in a table. + """Print model information, such as name, task, output dimension, architecture + and description as a table. :param task: The task for the backbone model, one of `text-to-text`, `text-to-image`, `image-to-image`. If not provided, will print all backbone @@ -130,7 +132,8 @@ def fit( public: bool = False, num_items_per_class: int = 4, ) -> Run: - """Start a finetuner run! + """Create a Finetuner :class:`Run`, calling this function will submit a fine-tuning + job to the Jina AI Cloud. :param model: The name of model to be fine-tuned. Run `finetuner.list_models()` or `finetuner.describe_models()` to see the available model names. @@ -244,14 +247,14 @@ def fit( def get_run(run_name: str, experiment_name: Optional[str] = None) -> Run: - """Get run by its name and (optional) experiment. + """Get a :class:`Run` by its name and (optional) :class:`Experiment` name. If an experiment name is not specified, we'll look for the run in the default experiment. - :param run_name: Name of the run. - :param experiment_name: Optional name of the experiment. - :return: A `Run` object. + :param run_name: Name of the :class:`Run`. + :param experiment_name: Optional name of the :class:`Experiment`. + :return: A :class:`Run` object. """ return ft.get_run(run_name=run_name, experiment_name=experiment_name) @@ -259,13 +262,14 @@ def get_run(run_name: str, experiment_name: Optional[str] = None) -> Run: def list_runs( experiment_name: Optional[str] = None, page: int = 1, size: int = 50 ) -> List[Run]: - """List all created runs inside a given experiment. + """List all created :class:`Run` inside a given :class:`Experiment`. - If no experiment is specified, list runs for all available experiments. - :param experiment_name: The name of the experiment. + If no :class:`Experiment` is specified, list :class:`Run` for all available + :class:`Experiment`. + :param experiment_name: The name of the :class:`Experiment`. :param page: The page index. - :param size: Number of runs to retrieve. - :return: List of all runs. + :param size: Number of :class:`Run` to retrieve. + :return: List of all :class:`Run`. ..note:: `page` and `size` works together. For example, page 1 size 50 gives the 50 runs in the first page. To get 50-100, set `page` as 2. @@ -275,7 +279,8 @@ def list_runs( def delete_run(run_name: str, experiment_name: Optional[str] = None) -> None: - """Delete a run. + """Delete a :class:`Run` given a `run_name` and + optional `experiment_name`. If an experiment name is not specified, we'll look for the run in the default experiment. @@ -287,7 +292,7 @@ def delete_run(run_name: str, experiment_name: Optional[str] = None) -> None: def delete_runs(experiment_name: Optional[str] = None) -> None: - """Delete every run. + """Delete all :class:`Run` given an optional `experiment_name`. If an experiment name is not specified, we'll delete every run across all experiments. @@ -299,7 +304,7 @@ def delete_runs(experiment_name: Optional[str] = None) -> None: def create_experiment(name: str = 'default') -> Experiment: - """Create an experiment. + """Create an :class:`Experiment`. :param name: The name of the experiment. If not provided, the experiment is named as `default`. @@ -309,7 +314,7 @@ def create_experiment(name: str = 'default') -> Experiment: def get_experiment(name: str) -> Experiment: - """Get an experiment by its name. + """Get an :class:`Experiment` given a `name`. :param name: Name of the experiment. :return: An `Experiment` object. @@ -318,7 +323,7 @@ def get_experiment(name: str) -> Experiment: def list_experiments(page: int = 1, size: int = 50) -> List[Experiment]: - """List every experiment. + """List all :class:`Experiment`. :param page: The page index. :param size: The number of experiments to retrieve. @@ -332,7 +337,8 @@ def list_experiments(page: int = 1, size: int = 50) -> List[Experiment]: def delete_experiment(name: str) -> Experiment: - """Delete an experiment by its name. + """Delete an :class:`Experiment` given a `name`. + :param name: Name of the experiment. View your experiment names with `list_experiments()`. :return: Deleted experiment. @@ -341,14 +347,14 @@ def delete_experiment(name: str) -> Experiment: def delete_experiments() -> List[Experiment]: - """Delete every experiment. + """Delete all :class:`Experiment`. :return: List of deleted experiments. """ return ft.delete_experiments() def get_token() -> str: - """Get user token of jina ecosystem. + """Get user token from the Jina AI Cloud, :meth:`login` is required. :return: user token as string object. """ @@ -364,7 +370,7 @@ def build_model( is_onnx: bool = False, ) -> 'InferenceEngine': """ - Builds a pre-trained model from a given descriptor. + Builds a pre-trained model given a `name`. :param name: Refers to a pre-trained model, see https://finetuner.jina.ai/walkthrough/choose-backbone/ or use the @@ -489,7 +495,8 @@ def encode( data: Union[DocumentArray, List[str]], batch_size: int = 32, ) -> Union[DocumentArray, 'np.ndarray']: - """Preprocess, collate and encode the `DocumentArray` with embeddings. + """Preprocess, collate and encode the `list or :class:`DocumentArray` + with embeddings. :param model: The model to be used to encode `DocumentArray`. In this case an instance of `ONNXRuntimeInferenceEngine` or `TorchInferenceEngine` diff --git a/finetuner/experiment.py b/finetuner/experiment.py index 44e2bd8b0..49e1faeb2 100644 --- a/finetuner/experiment.py +++ b/finetuner/experiment.py @@ -64,14 +64,16 @@ def __init__( @property def name(self) -> str: + """Get the name of the :class:`Experiment`.""" return self._name @property def status(self) -> str: + """Get the status of the :class:`Experiment`.""" return self._status def get_run(self, name: str) -> Run: - """Get a run by its name. + """Get a :class:`Run` given a `name`. :param name: Name of the run. :return: A `Run` object. @@ -88,7 +90,7 @@ def get_run(self, name: str) -> Run: return run def list_runs(self, page: int = 50, size: int = 50) -> List[Run]: - """List every run. + """List all :class:`Run`. :param page: The page index. :param size: The number of runs to retrieve per page. @@ -114,14 +116,14 @@ def list_runs(self, page: int = 50, size: int = 50) -> List[Run]: ] def delete_run(self, name: str): - """Delete a run by its name. + """Delete a :class:`Run` by its name. :param name: Name of the run. """ self._client.delete_run(experiment_name=self._name, run_name=name) def delete_runs(self): - """Delete every run inside the experiment.""" + """Delete all :class:`Run` inside the :class:`Experiment`.""" self._client.delete_runs(experiment_name=self._name) def create_run( @@ -133,7 +135,7 @@ def create_run( csv_options: Optional[Dict[str, Any]] = None, **kwargs, ) -> Run: - """Create a run inside the experiment.""" + """Create a :class:`Run` inside the :class:`Experiment`.""" if not run_name: run_name = get_random_name() @@ -212,7 +214,7 @@ def _create_config_for_run( run_name: str, **kwargs, ) -> Dict[str, Any]: - """Create config for a run. + """Create config for a :class:`Run`. :param model: Name of the model to be fine-tuned. :param train_data: Either a `DocumentArray` for training data or a diff --git a/finetuner/run.py b/finetuner/run.py index 2449fd33e..0e1eed375 100644 --- a/finetuner/run.py +++ b/finetuner/run.py @@ -45,10 +45,12 @@ def __init__( @property def name(self) -> str: + """Get the name of the :class:`Run`.""" return self._name @property def config(self) -> dict: + """Get the config of the :class:`Run`.""" return self._config def _get_run(self) -> dict: @@ -58,16 +60,16 @@ def _get_run(self) -> dict: ) def status(self) -> dict: - """Run status. + """Get :class:`Run` status. - :returns: A string representing the run status. + :returns: A dict representing the :class:`Run` status. """ return self._client.get_run_status( experiment_name=self._experiment_name, run_name=self._name ) def logs(self) -> str: - """Check the run logs. + """Check the :class:`Run` logs. :returns: A string dump of the run logs. """ @@ -77,7 +79,7 @@ def logs(self) -> str: ) def stream_logs(self, interval: int = 5) -> Iterator[str]: - """Stream the run logs. + """Stream the :class:`Run` logs lively. :param interval: The time interval to sync the status of finetuner `Run`. :yield: An iterators keep stream the logs from server. @@ -117,7 +119,7 @@ def _check_run_status_started(self): ) def save_artifact(self, directory: str = ARTIFACTS_DIR) -> str: - """Save artifact if the run is finished. + """Save artifact if the :class:`Run` is finished. :param directory: Directory where the artifact will be stored. :returns: A string object that indicates the download path. @@ -132,7 +134,7 @@ def save_artifact(self, directory: str = ARTIFACTS_DIR) -> str: @property def artifact_id(self): - """Get artifact id from the run. + """Get artifact id of the :class:`Run`. An artifact in finetuner contains fine-tuned model and its metadata. Such as preprocessing function, collate function. This id could be useful