From fb9296d5756b1118dd48cf06d22dd8459e379c32 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michael=20G=C3=BCnther?= Date: Thu, 12 Jan 2023 14:17:41 +0100 Subject: [PATCH] docs: add namespace to artifact names (#649) --- CHANGELOG.md | 2 + README.md | 6 +- docs/notebooks/image_to_image.ipynb | 272 +++--- docs/notebooks/image_to_image.md | 32 +- docs/notebooks/mesh_to_mesh.ipynb | 12 +- docs/notebooks/mesh_to_mesh.md | 12 +- .../multilingual_text_to_image.ipynb | 858 +++++++++--------- docs/notebooks/multilingual_text_to_image.md | 92 +- docs/notebooks/text_to_image.ipynb | 265 +++--- docs/notebooks/text_to_image.md | 33 +- docs/notebooks/text_to_text.ipynb | 49 +- docs/notebooks/text_to_text.md | 28 +- docs/walkthrough/using-callbacks.md | 14 +- 13 files changed, 859 insertions(+), 816 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index edcf87d03..9ef361063 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -48,6 +48,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - Add documentation for PointNet++ model and handling 3D mesh dataset. ([#638](https://github.com/jina-ai/finetuner/pull/638)) +- Add `finetuner` namespace to artifact names in the documentation. ([#649](https://github.com/jina-ai/finetuner/pull/649)) + ## [0.6.7] - 2022-11-25 diff --git a/README.md b/README.md index 1c346e05e..d2fc06ffe 100644 --- a/README.md +++ b/README.md @@ -162,11 +162,11 @@ finetuner.login() run = finetuner.fit( model='resnet50', run_name='resnet50-tll-run', - train_data='tll-train-data', + train_data='finetuner/tll-train-data', callbacks=[ EvaluationCallback( - query_data='tll-test-query-data', - index_data='tll-test-index-data', + query_data='finetuner/tll-test-query-data', + index_data='finetuner/tll-test-index-data', ) ], ) diff --git a/docs/notebooks/image_to_image.ipynb b/docs/notebooks/image_to_image.ipynb index 15d108e09..896dd21bb 100644 --- a/docs/notebooks/image_to_image.ipynb +++ b/docs/notebooks/image_to_image.ipynb @@ -1,10 +1,22 @@ { + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + }, + "accelerator": "GPU" + }, "cells": [ { "cell_type": "markdown", - "metadata": { - "id": "p8jc8EyfruKw" - }, "source": [ "# Image-to-Image Search via ResNet50\n", "\n", @@ -17,24 +29,24 @@ "*Note, please consider switching to GPU/TPU Runtime for faster inference.*\n", "\n", "## Install" - ] + ], + "metadata": { + "id": "p8jc8EyfruKw" + } }, { "cell_type": "code", - "execution_count": null, + "source": [ + "!pip install 'finetuner[full]'" + ], "metadata": { "id": "VdKH0S0FrwS3" }, - "outputs": [], - "source": [ - "!pip install 'finetuner[full]'" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": { - "id": "7EliQdGCsdL0" - }, "source": [ "## Task\n", "\n", @@ -44,13 +56,13 @@ "The dataset consists of pairs of images, these are the positive pairs. Negative pairs are constructed by taking two different images, i.e. images that are not in the same pair initially. Following this approach, we construct triplets and use the `TripletLoss`. You can find more in the [how Finetuner works](https://finetuner.jina.ai/get-started/how-it-works/#contrastive-metric-learning) section.\n", "\n", "After fine-tuning, the embeddings of positive pairs are expected to be pulled closer, while the embeddings for negative pairs are expected to be pushed away." - ] + ], + "metadata": { + "id": "7EliQdGCsdL0" + } }, { "cell_type": "markdown", - "metadata": { - "id": "M1sii3xdtD2y" - }, "source": [ "## Data\n", "\n", @@ -61,92 +73,92 @@ "We don't require you to push data to the Jina AI Cloud by yourself. Instead of a name, you can provide a `DocumentArray` and Finetuner will do the job for you.\n", "When working with documents where images are stored locally, please call `doc.load_uri_to_blob()` to reduce network transmission and speed up training.\n", "```" - ] + ], + "metadata": { + "id": "M1sii3xdtD2y" + } }, { "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "L0NfPGbTkNsc" - }, - "outputs": [], "source": [ "import finetuner\n", "from docarray import DocumentArray, Document\n", "\n", "finetuner.login(force=True)" - ] + ], + "metadata": { + "id": "L0NfPGbTkNsc" + }, + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ONpXDwFBsqQS" - }, - "outputs": [], "source": [ - "train_data = DocumentArray.pull('tll-train-data', show_progress=True)\n", - "query_data = DocumentArray.pull('tll-test-query-data', show_progress=True)\n", - "index_data = DocumentArray.pull('tll-test-index-data', show_progress=True)\n", + "train_data = DocumentArray.pull('finetuner/tll-train-data', show_progress=True)\n", + "query_data = DocumentArray.pull('finetuner/tll-test-query-data', show_progress=True)\n", + "index_data = DocumentArray.pull('finetuner/tll-test-index-data', show_progress=True)\n", "\n", "train_data.summary()" - ] + ], + "metadata": { + "id": "ONpXDwFBsqQS" + }, + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": { - "id": "mUoY1jq0klwk" - }, "source": [ "## Backbone model\n", "Now let's see which backbone models we can use. You can see available models by calling `finetuner.describe_models()`.\n", "\n", "\n", "For this example, we're gonna go with `resnet50`." - ] + ], + "metadata": { + "id": "mUoY1jq0klwk" + } }, { "cell_type": "markdown", - "metadata": { - "id": "xA7IIhIOk0h0" - }, "source": [ "## Fine-tuning\n", "\n", "Now that we have the training and evaluation datasets loaded as `DocumentArray`s and selected our model, we can start our fine-tuning run." - ] + ], + "metadata": { + "id": "xA7IIhIOk0h0" + } }, { "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "qGrHfz-2kVC7" - }, - "outputs": [], "source": [ "from finetuner.callback import EvaluationCallback\n", "\n", "run = finetuner.fit(\n", " model='resnet50',\n", - " train_data='tll-train-data',\n", + " train_data='finetuner/tll-train-data',\n", " batch_size=128,\n", " epochs=5,\n", " learning_rate=1e-4,\n", " device='cuda',\n", " callbacks=[\n", " EvaluationCallback(\n", - " query_data='tll-test-query-data',\n", - " index_data='tll-test-index-data',\n", + " query_data='finetuner/tll-test-query-data',\n", + " index_data='finetuner/tll-test-index-data',\n", " )\n", " ],\n", ")" - ] + ], + "metadata": { + "id": "qGrHfz-2kVC7" + }, + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": { - "id": "9gvoWipMlG5P" - }, "source": [ "Let's understand what this piece of code does:\n", "\n", @@ -157,37 +169,37 @@ "* We set `TripletMarginLoss`.\n", "* Additionally, we use `finetuner.callback.EvaluationCallback` for evaluation.\n", "* Lastly, we set the number of `epochs` and provide a `learning_rate`." - ] + ], + "metadata": { + "id": "9gvoWipMlG5P" + } }, { "cell_type": "markdown", - "metadata": { - "id": "7ftSOH_olcak" - }, "source": [ "## Monitoring\n", "\n", "Now that we've created a run, let's see its status. You can monitor the run by checking the status - `run.status()` - and the logs - `run.logs()` or `run.stream_logs()`. " - ] + ], + "metadata": { + "id": "7ftSOH_olcak" + } }, { "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2k3hTskflI7e" - }, - "outputs": [], "source": [ "# note, the fine-tuning might takes 30~ minutes\n", "for entry in run.stream_logs():\n", " print(entry)" - ] + ], + "metadata": { + "id": "2k3hTskflI7e" + }, + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": { - "id": "N8O-Ms_El-lV" - }, "source": [ "Since some runs might take up to several hours, it's important to know how to reconnect to Finetuner and retrieve your runs.\n", "\n", @@ -199,13 +211,13 @@ "```\n", "\n", "You can continue monitoring the runs by checking the status - `finetuner.run.Run.status()` or the logs - `finetuner.run.Run.logs()`. " - ] + ], + "metadata": { + "id": "N8O-Ms_El-lV" + } }, { "cell_type": "markdown", - "metadata": { - "id": "BMpQxydypeZ3" - }, "source": [ "## Evaluating\n", "Currently, we don't have a user-friendly way to get evaluation metrics from the `finetuner.callback.EvaluationCallback` we initialized previously.\n", @@ -229,35 +241,35 @@ "[16:39:41] INFO Pushed model artifact ID: '62b33cb0037ad91ca7f20530' __main__.py:231\n", " INFO Finished 🚀 __main__.py:233 __main__.py:248\n", "```" - ] + ], + "metadata": { + "id": "BMpQxydypeZ3" + } }, { "cell_type": "markdown", - "metadata": { - "id": "0l4e4GrspilM" - }, "source": [ "## Saving\n", "\n", "After the run has finished successfully, you can download the tuned model on your local machine:\n" - ] + ], + "metadata": { + "id": "0l4e4GrspilM" + } }, { "cell_type": "code", - "execution_count": null, + "source": [ + "artifact = run.save_artifact('resnet-model')" + ], "metadata": { "id": "KzfxhqeCmCa8" }, - "outputs": [], - "source": [ - "artifact = run.save_artifact('resnet-model')" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": { - "id": "gkNHTyBkprQ0" - }, "source": [ "## Inference\n", "\n", @@ -268,15 +280,13 @@ "In case you set `to_onnx=True` when calling `finetuner.fit` function,\n", "please use `model = finetuner.get_model(artifact, is_onnx=True)`\n", "```" - ] + ], + "metadata": { + "id": "gkNHTyBkprQ0" + } }, { "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "bOi5qcNLplaI" - }, - "outputs": [], "source": [ "query = DocumentArray([query_data[0]])\n", "\n", @@ -286,41 +296,45 @@ "finetuner.encode(model=model, data=index_data)\n", "\n", "assert query.embeddings.shape == (1, 2048)" - ] + ], + "metadata": { + "id": "bOi5qcNLplaI" + }, + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": { - "id": "1cC46TQ9pw-H" - }, "source": [ "And finally you can use the embeded `query` to find top-k visually related images within `index_data` as follows:" - ] + ], + "metadata": { + "id": "1cC46TQ9pw-H" + } }, { "cell_type": "code", - "execution_count": null, + "source": [ + "query.match(index_data, limit=10, metric='cosine')" + ], "metadata": { "id": "tBYG9OKrpZ36" }, - "outputs": [], - "source": [ - "query.match(index_data, limit=10, metric='cosine')" - ] + "execution_count": null, + "outputs": [] }, { - "attachments": {}, "cell_type": "markdown", - "metadata": {}, "source": [ "## Before and after\n", - "We can directly compare the results of our fine-tuned model with its zero-shot counterpart to get a better idea of how finetuning affects the results of a search. While the differences between the two models may be subtle for some queries, some of the examples the examples below (such as the the second example) show that the model after fine-tuning is able to better match similar images." - ] + "We can directly compare the results of our fine-tuned model with its zero-shot counterpart to get a better idea of how finetuning affects the results of a search. While the differences between the two models may be subtle for some queries, some of the examples the examples below (such as the second example) show that the model after fine-tuning is able to better match similar images." + ], + "metadata": { + "id": "irvn0igWdLOf" + } }, { - "attachments": {}, "cell_type": "markdown", - "metadata": {}, "source": [ "```python\n", "import copy\n", @@ -355,49 +369,25 @@ " print(f'top match after fine-tuning:')\n", " display(Image.open(BytesIO(doc_ft.matches[0].blob)))\n", "```" - ] + ], + "metadata": { + "id": "cVVqC_vsdXlK" + } }, { - "attachments": {}, "cell_type": "markdown", - "metadata": {}, "source": [ "To save you some time, we have plotted some examples where the model's ability to return similar images has clearly improved:\n", "\n", - "![image-image-triplets-good](images/image-image-triplets-good.png)\n", + "![image-image-triplets-good](https://finetuner.jina.ai/_images/image-image-triplets-good.png)\n", "\n", - "On the other hand, there are also cases where the fine-tuned model performs worse, and fails to correctly match images that it previously could. This case is much rarer than the the previous case. For this dataset there were 108 occasions where the fine-tuned model returned the correct pair where it couldn't before, and oly 33 occasions where the the finetuned model returned an incorrect image after fine-tuning but returned a correct one before. Nevertheless it still can happen:\n", + "On the other hand, there are also cases where the fine-tuned model performs worse, and fails to correctly match images that it previously could. This case is much rarer than the previous case. For this dataset there were 108 occasions where the fine-tuned model returned the correct pair where it couldn't before, and only 33 occasions where the finetuned model returned an incorrect image after fine-tuning but returned a correct one before. Nevertheless it still can happen:\n", "\n", - "![image-image-triplets-bad](images/image-image-triplets-bad.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [] - } - ], - "metadata": { - "accelerator": "GPU", - "colab": { - "collapsed_sections": [], - "provenance": [] - }, - "kernelspec": { - "display_name": "Python 3.7.15 ('.venv': venv)", - "language": "python", - "name": "python3" - }, - "language_info": { - "name": "python", - "version": "3.7.15 (default, Oct 12 2022, 19:14:01) \n[GCC 11.2.0]" - }, - "vscode": { - "interpreter": { - "hash": "9ad9c14fbc5ce15e23594239b0b0bb7cf990b71472055d7d43822c20d61e1cff" + "![image-image-triplets-bad](https://finetuner.jina.ai/_images/image-image-triplets-bad.png)" + ], + "metadata": { + "id": "TwL33Jz1datD" } } - }, - "nbformat": 4, - "nbformat_minor": 0 -} + ] +} \ No newline at end of file diff --git a/docs/notebooks/image_to_image.md b/docs/notebooks/image_to_image.md index 7fb7ab689..4a0e8c49c 100644 --- a/docs/notebooks/image_to_image.md +++ b/docs/notebooks/image_to_image.md @@ -7,8 +7,7 @@ jupyter: format_version: '1.3' jupytext_version: 1.14.1 kernelspec: - display_name: 'Python 3.7.15 (''.venv'': venv)' - language: python + display_name: Python 3 name: python3 --- @@ -61,9 +60,9 @@ finetuner.login(force=True) ``` ```python id="ONpXDwFBsqQS" -train_data = DocumentArray.pull('tll-train-data', show_progress=True) -query_data = DocumentArray.pull('tll-test-query-data', show_progress=True) -index_data = DocumentArray.pull('tll-test-index-data', show_progress=True) +train_data = DocumentArray.pull('finetuner/tll-train-data', show_progress=True) +query_data = DocumentArray.pull('finetuner/tll-test-query-data', show_progress=True) +index_data = DocumentArray.pull('finetuner/tll-test-index-data', show_progress=True) train_data.summary() ``` @@ -87,15 +86,15 @@ from finetuner.callback import EvaluationCallback run = finetuner.fit( model='resnet50', - train_data='tll-train-data', + train_data='finetuner/tll-train-data', batch_size=128, epochs=5, learning_rate=1e-4, device='cuda', callbacks=[ EvaluationCallback( - query_data='tll-test-query-data', - index_data='tll-test-index-data', + query_data='finetuner/tll-test-query-data', + index_data='finetuner/tll-test-index-data', ) ], ) @@ -205,10 +204,12 @@ And finally you can use the embeded `query` to find top-k visually related image query.match(index_data, limit=10, metric='cosine') ``` + ## Before and after -We can directly compare the results of our fine-tuned model with its zero-shot counterpart to get a better idea of how finetuning affects the results of a search. While the differences between the two models may be subtle for some queries, some of the examples the examples below (such as the the second example) show that the model after fine-tuning is able to better match similar images. +We can directly compare the results of our fine-tuned model with its zero-shot counterpart to get a better idea of how finetuning affects the results of a search. While the differences between the two models may be subtle for some queries, some of the examples the examples below (such as the second example) show that the model after fine-tuning is able to better match similar images. + - + ```python import copy from io import BytesIO @@ -244,13 +245,12 @@ for i, (doc_pt, doc_ft) in enumerate(zip(query_pt, query_ft)): ``` + To save you some time, we have plotted some examples where the model's ability to return similar images has clearly improved: -![image-image-triplets-good](images/image-image-triplets-good.png) - -On the other hand, there are also cases where the fine-tuned model performs worse, and fails to correctly match images that it previously could. This case is much rarer than the the previous case. For this dataset there were 108 occasions where the fine-tuned model returned the correct pair where it couldn't before, and oly 33 occasions where the the finetuned model returned an incorrect image after fine-tuning but returned a correct one before. Nevertheless it still can happen: - -![image-image-triplets-bad](images/image-image-triplets-bad.png) - +![image-image-triplets-good](https://finetuner.jina.ai/_images/image-image-triplets-good.png) +On the other hand, there are also cases where the fine-tuned model performs worse, and fails to correctly match images that it previously could. This case is much rarer than the previous case. For this dataset there were 108 occasions where the fine-tuned model returned the correct pair where it couldn't before, and only 33 occasions where the finetuned model returned an incorrect image after fine-tuning but returned a correct one before. Nevertheless it still can happen: +![image-image-triplets-bad](https://finetuner.jina.ai/_images/image-image-triplets-bad.png) + diff --git a/docs/notebooks/mesh_to_mesh.ipynb b/docs/notebooks/mesh_to_mesh.ipynb index 6724aecae..5a755e95d 100644 --- a/docs/notebooks/mesh_to_mesh.ipynb +++ b/docs/notebooks/mesh_to_mesh.ipynb @@ -102,9 +102,9 @@ { "cell_type": "code", "source": [ - "train_data = DocumentArray.pull('modelnet40-train', show_progress=True)\n", - "query_data = DocumentArray.pull('modelnet40-queries', show_progress=True)\n", - "index_data = DocumentArray.pull('modelnet40-index', show_progress=True)\n", + "train_data = DocumentArray.pull('finetuner/modelnet40-train', show_progress=True)\n", + "query_data = DocumentArray.pull('finetuner/modelnet40-queries', show_progress=True)\n", + "index_data = DocumentArray.pull('finetuner/modelnet40-index', show_progress=True)\n", "\n", "train_data.summary()" ], @@ -172,7 +172,7 @@ "\n", "run = finetuner.fit(\n", " model='pointnet++',\n", - " train_data='modelnet40-train',\n", + " train_data='finetuner/modelnet40-train',\n", " epochs=10,\n", " batch_size=64,\n", " learning_rate= 5e-4,\n", @@ -180,8 +180,8 @@ " device='cuda',\n", " callbacks=[\n", " EvaluationCallback(\n", - " query_data='modelnet40-queries',\n", - " index_data='modelnet40-index',\n", + " query_data='finetuner/modelnet40-queries',\n", + " index_data='finetuner/modelnet40-index',\n", " batch_size=64,\n", " )\n", " ],\n", diff --git a/docs/notebooks/mesh_to_mesh.md b/docs/notebooks/mesh_to_mesh.md index bf8b15b03..0593b5310 100644 --- a/docs/notebooks/mesh_to_mesh.md +++ b/docs/notebooks/mesh_to_mesh.md @@ -64,9 +64,9 @@ finetuner.login(force=True) ``` ```python id="Y-Um5gE8IORv" -train_data = DocumentArray.pull('modelnet40-train', show_progress=True) -query_data = DocumentArray.pull('modelnet40-queries', show_progress=True) -index_data = DocumentArray.pull('modelnet40-index', show_progress=True) +train_data = DocumentArray.pull('finetuner/modelnet40-train', show_progress=True) +query_data = DocumentArray.pull('finetuner/modelnet40-queries', show_progress=True) +index_data = DocumentArray.pull('finetuner/modelnet40-index', show_progress=True) train_data.summary() ``` @@ -100,7 +100,7 @@ from finetuner.callback import EvaluationCallback run = finetuner.fit( model='pointnet++', - train_data='modelnet40-train', + train_data='finetuner/modelnet40-train', epochs=10, batch_size=64, learning_rate= 5e-4, @@ -108,8 +108,8 @@ run = finetuner.fit( device='cuda', callbacks=[ EvaluationCallback( - query_data='modelnet40-queries', - index_data='modelnet40-index', + query_data='finetuner/modelnet40-queries', + index_data='finetuner/modelnet40-index', batch_size=64, ) ], diff --git a/docs/notebooks/multilingual_text_to_image.ipynb b/docs/notebooks/multilingual_text_to_image.ipynb index bf4ceac29..7f16c806a 100644 --- a/docs/notebooks/multilingual_text_to_image.ipynb +++ b/docs/notebooks/multilingual_text_to_image.ipynb @@ -1,411 +1,451 @@ { - "cells": [ - { - "cell_type": "markdown", - "id": "72867ba9-6a8c-4b14-acbf-487ea0a61836", - "metadata": {}, - "source": [ - "# Multilingual Text-to-Image Search with MultilingualCLIP\n", - "\n", - "\"Open\n" - ] + "cells": [ + { + "cell_type": "markdown", + "id": "72867ba9-6a8c-4b14-acbf-487ea0a61836", + "metadata": { + "id": "72867ba9-6a8c-4b14-acbf-487ea0a61836" + }, + "source": [ + "# Multilingual Text-to-Image search with MultilingualCLIP\n", + "\n", + "\"Open\n" + ] + }, + { + "cell_type": "markdown", + "id": "f576573b-a48f-4790-817d-e99f8bd28fd0", + "metadata": { + "id": "f576573b-a48f-4790-817d-e99f8bd28fd0" + }, + "source": [ + "Most text-image models are only able to provide embeddings for text in a single language, typically English. Multilingual CLIP models, however, are models that have been trained on multiple different languages. This allows the model to produce similar embeddings for the same sentence in multiple different languages. \n", + "\n", + "This guide will show you how to finetune a multilingual CLIP model for a text to image retrieval task in non-English languages.\n", + "\n", + "*Note, Check the runtime menu to be sure you are using a GPU/TPU instance, or this code will run very slowly.*\n" + ] + }, + { + "cell_type": "markdown", + "id": "ed1e7d55-a458-4dfd-8f4c-eeb02521c221", + "metadata": { + "id": "ed1e7d55-a458-4dfd-8f4c-eeb02521c221" + }, + "source": [ + "## Install" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9261d0a7-ad6d-461f-bdf7-54e9804cc45d", + "metadata": { + "id": "9261d0a7-ad6d-461f-bdf7-54e9804cc45d" + }, + "outputs": [], + "source": [ + "!pip install \"finetuner[full]\"" + ] + }, + { + "cell_type": "markdown", + "id": "11f13ad8-e0a7-4ba6-b52b-f85dd221db0f", + "metadata": { + "id": "11f13ad8-e0a7-4ba6-b52b-f85dd221db0f" + }, + "source": [ + "## Task" + ] + }, + { + "cell_type": "markdown", + "id": "ed1f88d4-f140-48d4-9d20-00e628c73e38", + "metadata": { + "id": "ed1f88d4-f140-48d4-9d20-00e628c73e38" + }, + "source": [ + "We'll be fine-tuning multilingual CLIP on the electronics section of the [German XMarket dataset](https://xmrec.github.io/data/de/), which contains images and descriptions of electronics products in German. \n", + "\n", + "Each product in the dataset contains several attributes, we will be making use of the image and category attributes to create a [`Document`](https://docarray.jina.ai/fundamentals/document/#document) containing two [chunks](https://docarray.jina.ai/fundamentals/document/nested/#nested-structure), one containing the image and another containing the category of the product." + ] + }, + { + "cell_type": "markdown", + "id": "2a40f0b1-7272-4ae6-9d0a-f5c8d6d534d8", + "metadata": { + "id": "2a40f0b1-7272-4ae6-9d0a-f5c8d6d534d8" + }, + "source": [ + "## Data\n", + "We will use the `xmarket-de-electronics` dataset, which we have already pre-processed and made available on the Jina AI Cloud. You can access it using `DocArray.pull`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4420a4ac-531a-4db3-af75-ebb58d8f828b", + "metadata": { + "id": "4420a4ac-531a-4db3-af75-ebb58d8f828b" + }, + "outputs": [], + "source": [ + "import finetuner\n", + "from docarray import DocumentArray, Document\n", + "\n", + "finetuner.login(force=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bab5c3fb-ee75-4818-bd18-23c7a5983e1b", + "metadata": { + "id": "bab5c3fb-ee75-4818-bd18-23c7a5983e1b" + }, + "outputs": [], + "source": [ + "train_data = 'finetuner/xmarket-de-electronics-train-data'\n", + "eval_data = 'finetuner/xmarket-de-electronics-test-data'\n", + "\n", + "query_data = 'finetuner/xmarket-de-electronics-query-data'\n", + "index_data = 'finetuner/xmarket-de-electronics-index-data'\n" + ] + }, + { + "cell_type": "markdown", + "id": "3b859e9c-99e0-484b-98d5-643ad51de8f0", + "metadata": { + "id": "3b859e9c-99e0-484b-98d5-643ad51de8f0" + }, + "source": [ + "## Backbone Model\n", + "Currently, we only support one multilingual CLIP model. This model is the `xlm-roberta-base-ViT-B-32` from [open-clip](https://github.com/mlfoundations/open_clip), which has been trained on the [`laion5b` dataset](https://github.com/LAION-AI/laion5B-paper)." + ] + }, + { + "cell_type": "markdown", + "id": "0b57559c-aa55-40ff-9d05-f061dfb01354", + "metadata": { + "id": "0b57559c-aa55-40ff-9d05-f061dfb01354" + }, + "source": [ + "## Fine-tuning\n", + "Now that our data has been prepared, we can start our fine-tuning run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a0cba20d-e335-43e0-8936-d926568034b3", + "metadata": { + "id": "a0cba20d-e335-43e0-8936-d926568034b3" + }, + "outputs": [], + "source": [ + "import finetuner\n", + "from finetuner.callback import EvaluationCallback, WandBLogger\n", + "\n", + "run = finetuner.fit(\n", + " model='xlm-roberta-base-ViT-B-32::laion5b_s13b_b90k',\n", + " train_data=train_data,\n", + " eval_data=eval_data,\n", + " epochs=5,\n", + " learning_rate=1e-6,\n", + " loss='CLIPLoss',\n", + " device='cuda',\n", + " callbacks=[\n", + " EvaluationCallback(\n", + " query_data=query_data,\n", + " index_data=index_data,\n", + " model='clip-text',\n", + " index_model='clip-vision'\n", + " ),\n", + " WandBLogger(),\n", + " ]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "6be36da7-452b-4450-a5d5-6cae84522bb5", + "metadata": { + "id": "6be36da7-452b-4450-a5d5-6cae84522bb5" + }, + "source": [ + "Let's understand what this piece of code does:\n", + "\n", + "* We start with providing `model`, names of training and evaluation data.\n", + "* We also provide some hyper-parameters such as number of `epochs` and a `learning_rate`.\n", + "* We use `CLIPLoss` to optimize the CLIP model.\n", + "* We use `finetuner.callback.EvaluationCallback` for evaluation.\n", + "* We then use the `finetuner.callback.WandBLogger` to display our results." + ] + }, + { + "cell_type": "markdown", + "id": "923e4206-ac60-4a75-bb3d-4acfc4218cea", + "metadata": { + "id": "923e4206-ac60-4a75-bb3d-4acfc4218cea" + }, + "source": [ + "## Monitoring\n", + "\n", + "Now that we've created a run, let's see its status. You can monitor the run by checking the status - `run.status()` - and the logs - `run.logs()` or `run.stream_logs()`. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "56d020bf-8095-4a83-a532-9b6c296e985a", + "metadata": { + "scrolled": true, + "tags": [], + "id": "56d020bf-8095-4a83-a532-9b6c296e985a" + }, + "outputs": [], + "source": [ + "# note, the fine-tuning might takes 20~ minutes\n", + "for entry in run.stream_logs():\n", + " print(entry)" + ] + }, + { + "cell_type": "markdown", + "id": "b58930f1-d9f5-43d3-b852-5cbaa04cb1aa", + "metadata": { + "id": "b58930f1-d9f5-43d3-b852-5cbaa04cb1aa" + }, + "source": [ + "Since some runs might take up to several hours/days, it's important to know how to reconnect to Finetuner and retrieve your run.\n", + "\n", + "```python\n", + "import finetuner\n", + "\n", + "finetuner.login()\n", + "run = finetuner.get_run(run.name)\n", + "```\n", + "\n", + "You can continue monitoring the run by checking the status - `finetuner.run.Run.status()` or the logs `finetuner.run.Run.logs()`." + ] + }, + { + "cell_type": "markdown", + "id": "f0b81ec1-2e02-472f-b2f4-27085bb041cc", + "metadata": { + "id": "f0b81ec1-2e02-472f-b2f4-27085bb041cc" + }, + "source": [ + "## Evaluating\n", + "Once the run is finished, the metrics calculated by the {class}`~finetuner.callback.EvaluationCallback` are plotted using the {class}`~finetuner.callback.WandBLogger` callback. These plots can be accessed using the link provided in the logs once finetuning starts:\n", + "\n", + "```bash\n", + " INFO Finetuning ... \n", + "wandb: Currently logged in as: anony-mouse-448424. Use `wandb login --relogin` to force relogin\n", + "wandb: Tracking run with wandb version 0.13.5\n", + "wandb: Run data is saved locally in \n", + "wandb: Run `wandb offline` to turn off syncing.\n", + "wandb: Syncing run ancient-galaxy-2\n", + "wandb: View project at \n", + "wandb: View run at \n", + "\n", + "```\n", + "\n", + "The generated plots should look like this:\n", + "\n", + "![WandB-mclip](https://finetuner.jina.ai/_images/WandB-mclip.png)\n" + ] + }, + { + "cell_type": "markdown", + "id": "2b8da34d-4c14-424a-bae5-6770f40a0721", + "metadata": { + "id": "2b8da34d-4c14-424a-bae5-6770f40a0721" + }, + "source": [ + "## Saving\n", + "\n", + "After the run has finished successfully, you can download the tuned model on your local machine:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0476c03f-838a-4589-835c-60d1b7f3f893", + "metadata": { + "id": "0476c03f-838a-4589-835c-60d1b7f3f893" + }, + "outputs": [], + "source": [ + "artifact = run.save_artifact('mclip-model')" + ] + }, + { + "cell_type": "markdown", + "id": "baabd6be-8660-47cc-a48d-feb43d0a507b", + "metadata": { + "id": "baabd6be-8660-47cc-a48d-feb43d0a507b" + }, + "source": [ + "## Inference\n", + "\n", + "Now you saved the `artifact` into your host machine,\n", + "let's use the fine-tuned model to encode a new `Document`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fe43402f-4191-4343-905c-c75c64694662", + "metadata": { + "id": "fe43402f-4191-4343-905c-c75c64694662" + }, + "outputs": [], + "source": [ + "from docarray import Document, DocumentArray\n", + "text_da = DocumentArray([Document(text='setwas Text zum Codieren')])\n", + "image_da = DocumentArray([Document(uri='https://upload.wikimedia.org/wikipedia/commons/4/4e/Single_apple.png')])\n", + "\n", + "mclip_text_encoder = finetuner.get_model(artifact=artifact, select_model='clip-text')\n", + "mclip_image_encoder = finetuner.get_model(artifact=artifact, select_model='clip-vision')\n", + "\n", + "finetuner.encode(model=mclip_text_encoder, data=text_da)\n", + "finetuner.encode(model=mclip_image_encoder, data=image_da)\n", + "\n", + "print(text_da.embeddings.shape)\n", + "print(image_da.embeddings.shape)" + ] + }, + { + "cell_type": "markdown", + "id": "ff2e7818-bf11-4179-a34d-d7b790b0db12", + "metadata": { + "id": "ff2e7818-bf11-4179-a34d-d7b790b0db12" + }, + "source": [ + "```bash\n", + "(1, 512)\n", + "(1, 512)\n", + "```\n", + "\n", + "```{admonition} what is select_model?\n", + "When fine-tuning CLIP, we are fine-tuning the CLIPVisionEncoder and CLIPTextEncoder in parallel.\n", + "The artifact contains two models: `clip-vision` and `clip-text`.\n", + "The parameter `select_model` tells finetuner which model to use for inference, in the above example,\n", + "we use `clip-text` to encode a Document with text content.\n", + "```\n", + "\n", + "```{admonition} Inference with ONNX\n", + "In case you set `to_onnx=True` when calling `finetuner.fit` function,\n", + "please use `model = finetuner.get_model(artifact, is_onnx=True)`\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "38bc9069-0f0e-47c6-8560-bf77ad200774", + "metadata": { + "id": "38bc9069-0f0e-47c6-8560-bf77ad200774" + }, + "source": [ + "## Before and after\n", + "We can directly compare the results of our fine-tuned model with an untrained multilingual clip model by displaying the matches each model has for the same query, while the differences between the results of the two models are quite subtle for some queries, the examples below clearly show that finetuning increses the quality of the search results:" + ] + }, + { + "cell_type": "markdown", + "id": "e69fdfb2-6482-45fb-9c4d-41e548ef8f06", + "metadata": { + "id": "e69fdfb2-6482-45fb-9c4d-41e548ef8f06" + }, + "source": [ + "```python\n", + "from finetuner import build_model\n", + "\n", + "pt_query = copy.deepcopy(query_data)\n", + "pt_index = copy.deepcopy(index_data)\n", + "\n", + "ft_query = copy.deepcopy(query_data)\n", + "ft_index = copy.deepcopy(index_data)\n", + "\n", + "zero_shot_text_encoder = build_model(\n", + " name='xlm-roberta-base-ViT-B-32::laion5b_s13b_b90k',\n", + " select_model='clip-text',\n", + ")\n", + "zero_shot_image_encoder = build_model(\n", + " name='xlm-roberta-base-ViT-B-32::laion5b_s13b_b90k',\n", + " select_model='clip-vision',\n", + ")\n", + "\n", + "finetuner.encode(model=zero_shot_text_encoder, data=pt_query)\n", + "finetuner.encode(model=zero_shot_image_encoder, data=pt_index)\n", + "\n", + "finetuner.encode(model=mclip_text_encoder, data=ft_query)\n", + "finetuner.encode(model=mclip_image_encoder, data=ft_index)\n", + "\n", + "pt_query.match(pt_index)\n", + "ft_query.match(ft_index)\n", + "\n", + "def plot_matches(num_samples = 10):\n", + " seen = set()\n", + " for i, (pt_q, ft_q) in enumerate(zip(pt_query, ft_query)):\n", + " if i >= num_samples: break\n", + " if pt_q.text in seen:\n", + " i = i - 1\n", + " continue\n", + " seen.add(pt_q.text)\n", + " print((\n", + " f'results for query \"{pt_q.text}\"'\n", + " ' using a zero-shot model (top) and '\n", + " 'the fine-tuned model (bottom):'\n", + " ))\n", + " pt_q.matches[:1].plot_image_sprites(fig_size=(3,3))\n", + " ft_q.matches[:1].plot_image_sprites(fig_size=(3,3))\n", + "```\n", + "```plaintext\n", + "results for query: \"externe mikrofone (external microphone)\" using a zero-shot model (top) and the fine-tuned model (bottom)\n", + "```\n", + "![mclip-example-pt-1](https://finetuner.jina.ai/_images/mclip-example-pt-1.png)\n", + "![mclip-example-ft-1](https://finetuner.jina.ai/_images/mclip-example-ft-1.png)\n", + "\n", + "```plaintext\n", + "results for query: \"prozessorlüfter (processor fan)\" using a zero-shot model (top) and the fine-tuned model (bottom)\n", + "```\n", + "\n", + "![mclip-example-pt-2](https://finetuner.jina.ai/_images/mclip-example-pt-2.png)\n", + "![mclip-example-ft-2](https://finetuner.jina.ai/_images/mclip-example-ft-2.png)\n", + "\n", + "\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.10" + }, + "vscode": { + "interpreter": { + "hash": "9ad9c14fbc5ce15e23594239b0b0bb7cf990b71472055d7d43822c20d61e1cff" + } + }, + "colab": { + "provenance": [] + } }, - { - "cell_type": "markdown", - "id": "f576573b-a48f-4790-817d-e99f8bd28fd0", - "metadata": {}, - "source": [ - "Most text-image models are only able to provide embeddings for text in a single language, typically English. Multilingual CLIP models, however, are models that have been trained on multiple different languages. This allows the model to produce similar embeddings for the same sentence in multiple different languages. \n", - "\n", - "This guide will show you how to finetune a multilingual CLIP model for a text to image retrieval task in non-English languages.\n", - "\n", - "*Note, Check the runtime menu to be sure you are using a GPU/TPU instance, or this code will run very slowly.*\n" - ] - }, - { - "cell_type": "markdown", - "id": "ed1e7d55-a458-4dfd-8f4c-eeb02521c221", - "metadata": {}, - "source": [ - "## Install" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "9261d0a7-ad6d-461f-bdf7-54e9804cc45d", - "metadata": {}, - "outputs": [], - "source": [ - "!pip install 'finetuner[full]'" - ] - }, - { - "cell_type": "markdown", - "id": "11f13ad8-e0a7-4ba6-b52b-f85dd221db0f", - "metadata": {}, - "source": [ - "## Task" - ] - }, - { - "cell_type": "markdown", - "id": "ed1f88d4-f140-48d4-9d20-00e628c73e38", - "metadata": {}, - "source": [ - "We'll be fine-tuning multilingual CLIP on the electronics section of the [German XMarket dataset](https://xmrec.github.io/data/de/), which contains images and descriptions of electronics products in German. \n", - "\n", - "Each product in the dataset contains several attributes, we will be making use of the image and category attributes to create a [`Document`](https://docarray.jina.ai/fundamentals/document/#document) containing two [chunks](https://docarray.jina.ai/fundamentals/document/nested/#nested-structure), one containing the image and another containing the category of the product." - ] - }, - { - "cell_type": "markdown", - "id": "2a40f0b1-7272-4ae6-9d0a-f5c8d6d534d8", - "metadata": {}, - "source": [ - "## Data\n", - "We will use the `xmarket-de-electronics` dataset, which we have already pre-processed and made available on the Jina AI Cloud. You can access it using `DocArray.pull`:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "4420a4ac-531a-4db3-af75-ebb58d8f828b", - "metadata": {}, - "outputs": [], - "source": [ - "import finetuner\n", - "from docarray import DocumentArray, Document\n", - "\n", - "finetuner.login(force=True)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "bab5c3fb-ee75-4818-bd18-23c7a5983e1b", - "metadata": {}, - "outputs": [], - "source": [ - "train_data = DocumentArray.pull('xmarket-de-electronics-train-data', show_progress=True)\n", - "eval_data = DocumentArray.pull('xmarket-de-electronics-test-data', show_progress=True)\n", - "\n", - "query_data = DocumentArray.pull('xmarket-de-electronics-query-data', show_progress=True)\n", - "index_data = DocumentArray.pull('xmarket-de-electronics-index-data', show_progress=True)\n", - "\n", - "train_data.summary()" - ] - }, - { - "cell_type": "markdown", - "id": "3b859e9c-99e0-484b-98d5-643ad51de8f0", - "metadata": {}, - "source": [ - "## Backbone Model\n", - "Currently, we only support one multilingual CLIP model. This model is the `xlm-roberta-base-ViT-B-32` from [open-clip](https://github.com/mlfoundations/open_clip), which has been trained on the [`laion5b` dataset](https://github.com/LAION-AI/laion5B-paper)." - ] - }, - { - "cell_type": "markdown", - "id": "0b57559c-aa55-40ff-9d05-f061dfb01354", - "metadata": {}, - "source": [ - "## Fine-tuning\n", - "Now that our data has been prepared, we can start our fine-tuning run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "a0cba20d-e335-43e0-8936-d926568034b3", - "metadata": {}, - "outputs": [], - "source": [ - "import finetuner\n", - "from finetuner.callback import EvaluationCallback, WandBLogger\n", - "\n", - "run = finetuner.fit(\n", - " model='xlm-roberta-base-ViT-B-32::laion5b_s13b_b90k',\n", - " train_data=train_data,\n", - " eval_data=eval_data,\n", - " epochs=5,\n", - " learning_rate=1e-6,\n", - " loss='CLIPLoss',\n", - " device='cpu',\n", - " callbacks=[\n", - " EvaluationCallback(\n", - " query_data='xmarket-de-electronics-query-data',\n", - " index_data='xmarket-de-electronics-index-data',\n", - " model='clip-text',\n", - " index_model='clip-vision'\n", - " ),\n", - " WandBLogger(),\n", - " ]\n", - ")" - ] - }, - { - "cell_type": "markdown", - "id": "6be36da7-452b-4450-a5d5-6cae84522bb5", - "metadata": {}, - "source": [ - "Let's understand what this piece of code does:\n", - "\n", - "* We start with providing `model`, names of training and evaluation data.\n", - "* We also provide some hyper-parameters such as number of `epochs` and a `learning_rate`.\n", - "* We use `CLIPLoss` to optimize the CLIP model.\n", - "* We use `finetuner.callback.EvaluationCallback` for evaluation.\n", - "* We then use the `finetuner.callback.WandBLogger` to display our results." - ] - }, - { - "cell_type": "markdown", - "id": "923e4206-ac60-4a75-bb3d-4acfc4218cea", - "metadata": {}, - "source": [ - "## Monitoring\n", - "\n", - "Now that we've created a run, let's see its status. You can monitor the run by checking the status - `run.status()` - and the logs - `run.logs()` or `run.stream_logs()`. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "56d020bf-8095-4a83-a532-9b6c296e985a", - "metadata": { - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "# note, the fine-tuning might takes 20~ minutes\n", - "for entry in run.stream_logs():\n", - " print(entry)" - ] - }, - { - "cell_type": "markdown", - "id": "b58930f1-d9f5-43d3-b852-5cbaa04cb1aa", - "metadata": {}, - "source": [ - "Since some runs might take up to several hours/days, it's important to know how to reconnect to Finetuner and retrieve your run.\n", - "\n", - "```python\n", - "import finetuner\n", - "\n", - "finetuner.login()\n", - "run = finetuner.get_run(run.name)\n", - "```\n", - "\n", - "You can continue monitoring the run by checking the status - `finetuner.run.Run.status()` or the logs `finetuner.run.Run.logs()`." - ] - }, - { - "cell_type": "markdown", - "id": "f0b81ec1-2e02-472f-b2f4-27085bb041cc", - "metadata": {}, - "source": [ - "## Evaluating\n", - "Once the run is finished, the metrics calculated by the {class}`~finetuner.callback.EvaluationCallback` are plotted using the {class}`~finetuner.callback.WandBLogger` callback. These plots can be accessed using the link provided in the logs once finetuning starts:\n", - "\n", - "```bash\n", - " INFO Finetuning ... \n", - "wandb: Currently logged in as: anony-mouse-448424. Use `wandb login --relogin` to force relogin\n", - "wandb: Tracking run with wandb version 0.13.5\n", - "wandb: Run data is saved locally in \n", - "wandb: Run `wandb offline` to turn off syncing.\n", - "wandb: Syncing run ancient-galaxy-2\n", - "wandb: View project at \n", - "wandb: View run at \n", - "\n", - "```\n", - "\n", - "The generated plots should look like this:\n", - "\n", - "![WandB-mclip](images/WandB-mclip.png)\n" - ] - }, - { - "cell_type": "markdown", - "id": "2b8da34d-4c14-424a-bae5-6770f40a0721", - "metadata": {}, - "source": [ - "## Saving\n", - "\n", - "After the run has finished successfully, you can download the tuned model on your local machine:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "0476c03f-838a-4589-835c-60d1b7f3f893", - "metadata": {}, - "outputs": [], - "source": [ - "artifact = run.save_artifact('mclip-model')" - ] - }, - { - "cell_type": "markdown", - "id": "baabd6be-8660-47cc-a48d-feb43d0a507b", - "metadata": {}, - "source": [ - "## Inference\n", - "\n", - "Now you saved the `artifact` into your host machine,\n", - "let's use the fine-tuned model to encode a new `Document`:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "fe43402f-4191-4343-905c-c75c64694662", - "metadata": {}, - "outputs": [], - "source": [ - "text_da = DocumentArray([Document(text='setwas Text zum Codieren')])\n", - "image_da = DocumentArray([Document(uri='https://upload.wikimedia.org/wikipedia/commons/4/4e/Single_apple.png')])\n", - "\n", - "mclip_text_encoder = finetuner.get_model(artifact=artifact, select_model='clip-text')\n", - "mclip_image_encoder = finetuner.get_model(artifact=artifact, select_model='clip-vision')\n", - "\n", - "finetuner.encode(model=mclip_text_encoder, data=text_da)\n", - "finetuner.encode(model=mclip_image_encoder, data=image_da)\n", - "\n", - "print(text_da.embeddings.shape)\n", - "print(image_da.embeddings.shape)" - ] - }, - { - "cell_type": "markdown", - "id": "ff2e7818-bf11-4179-a34d-d7b790b0db12", - "metadata": {}, - "source": [ - "```bash\n", - "(1, 512)\n", - "(1, 512)\n", - "```\n", - "\n", - "```{admonition} what is select_model?\n", - "When fine-tuning CLIP, we are fine-tuning the CLIPVisionEncoder and CLIPTextEncoder in parallel.\n", - "The artifact contains two models: `clip-vision` and `clip-text`.\n", - "The parameter `select_model` tells finetuner which model to use for inference, in the above example,\n", - "we use `clip-text` to encode a Document with text content.\n", - "```\n", - "\n", - "```{admonition} Inference with ONNX\n", - "In case you set `to_onnx=True` when calling `finetuner.fit` function,\n", - "please use `model = finetuner.get_model(artifact, is_onnx=True)`\n", - "```" - ] - }, - { - "cell_type": "markdown", - "id": "38bc9069-0f0e-47c6-8560-bf77ad200774", - "metadata": {}, - "source": [ - "## Before and after\n", - "We can directly compare the results of our fine-tuned model with an untrained multilingual clip model by displaying the matches each model has for the same query, while the differences between the results of the two models are quite subtle for some queries, the examples below clearly show that finetuning increses the quality of the search results:" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "e69fdfb2-6482-45fb-9c4d-41e548ef8f06", - "metadata": {}, - "source": [ - "```python\n", - "from finetuner import build_model\n", - "\n", - "pt_query = copy.deepcopy(query_data)\n", - "pt_index = copy.deepcopy(index_data)\n", - "\n", - "ft_query = copy.deepcopy(query_data)\n", - "ft_index = copy.deepcopy(index_data)\n", - "\n", - "zero_shot_text_encoder = build_model(\n", - " name='xlm-roberta-base-ViT-B-32::laion5b_s13b_b90k',\n", - " select_model='clip-text',\n", - ")\n", - "zero_shot_image_encoder = build_model(\n", - " name='xlm-roberta-base-ViT-B-32::laion5b_s13b_b90k',\n", - " select_model='clip-vision',\n", - ")\n", - "\n", - "finetuner.encode(model=zero_shot_text_encoder, data=pt_query)\n", - "finetuner.encode(model=zero_shot_image_encoder, data=pt_index)\n", - "\n", - "finetuner.encode(model=mclip_text_encoder, data=ft_query)\n", - "finetuner.encode(model=mclip_image_encoder, data=ft_index)\n", - "\n", - "pt_query.match(pt_index)\n", - "ft_query.match(ft_index)\n", - "\n", - "def plot_matches(num_samples = 10):\n", - " seen = set()\n", - " for i, (pt_q, ft_q) in enumerate(zip(pt_query, ft_query)):\n", - " if i >= num_samples: break\n", - " if pt_q.text in seen:\n", - " i = i - 1\n", - " continue\n", - " seen.add(pt_q.text)\n", - " print((\n", - " f'results for query \"{pt_q.text}\"'\n", - " ' using a zero-shot model (top) and '\n", - " 'the fine-tuned model (bottom):'\n", - " ))\n", - " pt_q.matches[:1].plot_image_sprites(fig_size=(3,3))\n", - " ft_q.matches[:1].plot_image_sprites(fig_size=(3,3))\n", - "```\n", - "```plaintext\n", - "results for query: \"externe mikrofone\" (external microphone) using a zero-shot model (top) and the fine-tuned model (bottom)\n", - "```\n", - "![mclip-example-pt-1](images/mclip-example-pt-1.png)\n", - "\n", - "![mclip-example-ft-1](images/mclip-example-ft-1.png)\n", - "\n", - "```plaintext\n", - "results for query: \"prozessorlüfter\" (processor fan) using a zero-shot model (top) and the fine-tuned model (bottom)\n", - "```\n", - "\n", - "![mclip-example-pt-2](images/mclip-example-pt-2.png)\n", - "\n", - "![mclip-example-ft-2](images/mclip-example-ft-2.png)\n", - "\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "id": "7d3f5a5e", - "metadata": {}, - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.15 (default, Oct 12 2022, 19:14:01) \n[GCC 11.2.0]" - }, - "vscode": { - "interpreter": { - "hash": "9ad9c14fbc5ce15e23594239b0b0bb7cf990b71472055d7d43822c20d61e1cff" - } - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file diff --git a/docs/notebooks/multilingual_text_to_image.md b/docs/notebooks/multilingual_text_to_image.md index 83027f3d0..b2fb6d673 100644 --- a/docs/notebooks/multilingual_text_to_image.md +++ b/docs/notebooks/multilingual_text_to_image.md @@ -12,62 +12,72 @@ jupyter: name: python3 --- -# Multilingual Text-to-Image Search with MultilingualCLIP + +# Multilingual Text-to-Image search with MultilingualCLIP Open In Colab + - + Most text-image models are only able to provide embeddings for text in a single language, typically English. Multilingual CLIP models, however, are models that have been trained on multiple different languages. This allows the model to produce similar embeddings for the same sentence in multiple different languages. This guide will show you how to finetune a multilingual CLIP model for a text to image retrieval task in non-English languages. *Note, Check the runtime menu to be sure you are using a GPU/TPU instance, or this code will run very slowly.* + - + ## Install + -```python -!pip install 'finetuner[full]' +```python id="9261d0a7-ad6d-461f-bdf7-54e9804cc45d" +!pip install "finetuner[full]" ``` + ## Task + - + We'll be fine-tuning multilingual CLIP on the electronics section of the [German XMarket dataset](https://xmrec.github.io/data/de/), which contains images and descriptions of electronics products in German. Each product in the dataset contains several attributes, we will be making use of the image and category attributes to create a [`Document`](https://docarray.jina.ai/fundamentals/document/#document) containing two [chunks](https://docarray.jina.ai/fundamentals/document/nested/#nested-structure), one containing the image and another containing the category of the product. + - + ## Data We will use the `xmarket-de-electronics` dataset, which we have already pre-processed and made available on the Jina AI Cloud. You can access it using `DocArray.pull`: + -```python +```python id="4420a4ac-531a-4db3-af75-ebb58d8f828b" import finetuner from docarray import DocumentArray, Document finetuner.login(force=True) ``` -```python -train_data = DocumentArray.pull('xmarket-de-electronics-train-data', show_progress=True) -eval_data = DocumentArray.pull('xmarket-de-electronics-test-data', show_progress=True) +```python id="bab5c3fb-ee75-4818-bd18-23c7a5983e1b" +train_data = 'finetuner/xmarket-de-electronics-train-data' +eval_data = 'finetuner/xmarket-de-electronics-test-data' -query_data = DocumentArray.pull('xmarket-de-electronics-query-data', show_progress=True) -index_data = DocumentArray.pull('xmarket-de-electronics-index-data', show_progress=True) +query_data = 'finetuner/xmarket-de-electronics-query-data' +index_data = 'finetuner/xmarket-de-electronics-index-data' -train_data.summary() ``` + ## Backbone Model Currently, we only support one multilingual CLIP model. This model is the `xlm-roberta-base-ViT-B-32` from [open-clip](https://github.com/mlfoundations/open_clip), which has been trained on the [`laion5b` dataset](https://github.com/LAION-AI/laion5B-paper). + - + ## Fine-tuning Now that our data has been prepared, we can start our fine-tuning run. + -```python +```python id="a0cba20d-e335-43e0-8936-d926568034b3" import finetuner from finetuner.callback import EvaluationCallback, WandBLogger @@ -78,11 +88,11 @@ run = finetuner.fit( epochs=5, learning_rate=1e-6, loss='CLIPLoss', - device='cpu', + device='cuda', callbacks=[ EvaluationCallback( - query_data='xmarket-de-electronics-query-data', - index_data='xmarket-de-electronics-index-data', + query_data=query_data, + index_data=index_data, model='clip-text', index_model='clip-vision' ), @@ -91,6 +101,7 @@ run = finetuner.fit( ) ``` + Let's understand what this piece of code does: * We start with providing `model`, names of training and evaluation data. @@ -98,19 +109,21 @@ Let's understand what this piece of code does: * We use `CLIPLoss` to optimize the CLIP model. * We use `finetuner.callback.EvaluationCallback` for evaluation. * We then use the `finetuner.callback.WandBLogger` to display our results. + - + ## Monitoring Now that we've created a run, let's see its status. You can monitor the run by checking the status - `run.status()` - and the logs - `run.logs()` or `run.stream_logs()`. + -```python tags=[] +```python tags=[] id="56d020bf-8095-4a83-a532-9b6c296e985a" # note, the fine-tuning might takes 20~ minutes for entry in run.stream_logs(): print(entry) ``` - + Since some runs might take up to several hours/days, it's important to know how to reconnect to Finetuner and retrieve your run. ```python @@ -123,7 +136,7 @@ run = finetuner.get_run(run.name) You can continue monitoring the run by checking the status - `finetuner.run.Run.status()` or the logs `finetuner.run.Run.logs()`. - + ## Evaluating Once the run is finished, the metrics calculated by the {class}`~finetuner.callback.EvaluationCallback` are plotted using the {class}`~finetuner.callback.WandBLogger` callback. These plots can be accessed using the link provided in the logs once finetuning starts: @@ -141,24 +154,29 @@ wandb: View run at The generated plots should look like this: -![WandB-mclip](images/WandB-mclip.png) +![WandB-mclip](https://finetuner.jina.ai/_images/WandB-mclip.png) + ## Saving After the run has finished successfully, you can download the tuned model on your local machine: + -```python +```python id="0476c03f-838a-4589-835c-60d1b7f3f893" artifact = run.save_artifact('mclip-model') ``` + ## Inference Now you saved the `artifact` into your host machine, let's use the fine-tuned model to encode a new `Document`: + -```python +```python id="fe43402f-4191-4343-905c-c75c64694662" +from docarray import Document, DocumentArray text_da = DocumentArray([Document(text='setwas Text zum Codieren')]) image_da = DocumentArray([Document(uri='https://upload.wikimedia.org/wikipedia/commons/4/4e/Single_apple.png')]) @@ -172,7 +190,7 @@ print(text_da.embeddings.shape) print(image_da.embeddings.shape) ``` - + ```bash (1, 512) (1, 512) @@ -191,10 +209,12 @@ please use `model = finetuner.get_model(artifact, is_onnx=True)` ``` + ## Before and after We can directly compare the results of our fine-tuned model with an untrained multilingual clip model by displaying the matches each model has for the same query, while the differences between the results of the two models are quite subtle for some queries, the examples below clearly show that finetuning increses the quality of the search results: + - + ```python from finetuner import build_model @@ -239,22 +259,18 @@ def plot_matches(num_samples = 10): ft_q.matches[:1].plot_image_sprites(fig_size=(3,3)) ``` ```plaintext -results for query: "externe mikrofone" (external microphone) using a zero-shot model (top) and the fine-tuned model (bottom) +results for query: "externe mikrofone (external microphone)" using a zero-shot model (top) and the fine-tuned model (bottom) ``` -![mclip-example-pt-1](images/mclip-example-pt-1.png) - -![mclip-example-ft-1](images/mclip-example-ft-1.png) +![mclip-example-pt-1](https://finetuner.jina.ai/_images/mclip-example-pt-1.png) +![mclip-example-ft-1](https://finetuner.jina.ai/_images/mclip-example-ft-1.png) ```plaintext -results for query: "prozessorlüfter" (processor fan) using a zero-shot model (top) and the fine-tuned model (bottom) +results for query: "prozessorlüfter (processor fan)" using a zero-shot model (top) and the fine-tuned model (bottom) ``` -![mclip-example-pt-2](images/mclip-example-pt-2.png) - -![mclip-example-ft-2](images/mclip-example-ft-2.png) +![mclip-example-pt-2](https://finetuner.jina.ai/_images/mclip-example-pt-2.png) +![mclip-example-ft-2](https://finetuner.jina.ai/_images/mclip-example-ft-2.png) - - diff --git a/docs/notebooks/text_to_image.ipynb b/docs/notebooks/text_to_image.ipynb index ab05bfad5..c936ba803 100644 --- a/docs/notebooks/text_to_image.ipynb +++ b/docs/notebooks/text_to_image.ipynb @@ -1,10 +1,22 @@ { + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + }, + "accelerator": "GPU" + }, "cells": [ { "cell_type": "markdown", - "metadata": { - "id": "3UCyCMPcvLGw" - }, "source": [ "# Text-to-Image Search via CLIP\n", "\n", @@ -19,36 +31,36 @@ "*Note, please consider switching to GPU/TPU Runtime for faster inference.*\n", "\n", "## Install" - ] + ], + "metadata": { + "id": "3UCyCMPcvLGw" + } }, { "cell_type": "code", - "execution_count": null, + "source": [ + "!pip install 'finetuner[full]'" + ], "metadata": { "id": "vglobi-vvqCd" }, - "outputs": [], - "source": [ - "!pip install 'finetuner[full]'" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": { - "id": "GXddluSIwCGW" - }, "source": [ "## Task\n", "We'll be fine-tuning CLIP on the [fashion captioning dataset](https://github.com/xuewyang/Fashion_Captioning) which contains information about fashion products.\n", "\n", "For each product the dataset contains a title and images of multiple variants of the product. We constructed a parent [`Document`](https://docarray.jina.ai/fundamentals/document/#document) for each picture, which contains two [chunks](https://docarray.jina.ai/fundamentals/document/nested/#nested-structure): an image document and a text document holding the description of the product." - ] + ], + "metadata": { + "id": "GXddluSIwCGW" + } }, { "cell_type": "markdown", - "metadata": { - "id": "EVBez7dHwIye" - }, "source": [ "## Data\n", "Our journey starts locally. We have to [prepare the data and push it to the Jina AI Cloud](https://finetuner.jina.ai/walkthrough/create-training-data/) and Finetuner will be able to get the dataset by its name. For this example,\n", @@ -60,75 +72,73 @@ "We don't require you to push data to the Jina AI Cloud by yourself. Instead of a name, you can provide a `DocumentArray` and Finetuner will do the job for you.\n", "When working with documents where images are stored locally, please call `doc.load_uri_to_blob()` to reduce network transmission and speed up training.\n", "```" - ] + ], + "metadata": { + "id": "EVBez7dHwIye" + } }, { "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "vfPZBQVxxEHm" - }, - "outputs": [], "source": [ "import finetuner\n", "from docarray import DocumentArray, Document\n", "\n", "finetuner.login(force=True)" - ] + ], + "metadata": { + "id": "vfPZBQVxxEHm" + }, + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "cpIj7viExFti" - }, - "outputs": [], "source": [ - "train_data = DocumentArray.pull('fashion-train-data-clip', show_progress=True)\n", - "eval_data = DocumentArray.pull('fashion-eval-data-clip', show_progress=True)\n", - "query_data = DocumentArray.pull('fashion-eval-data-queries', show_progress=True)\n", - "index_data = DocumentArray.pull('fashion-eval-data-index', show_progress=True)\n", + "train_data = DocumentArray.pull('finetuner/fashion-train-data-clip', show_progress=True)\n", + "eval_data = DocumentArray.pull('finetuner/fashion-eval-data-clip', show_progress=True)\n", + "query_data = DocumentArray.pull('finetuner/fashion-eval-data-queries', show_progress=True)\n", + "index_data = DocumentArray.pull('finetuner/fashion-eval-data-index', show_progress=True)\n", "\n", "train_data.summary()" - ] + ], + "metadata": { + "id": "cpIj7viExFti" + }, + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": { - "id": "AE87a5Nvwd7q" - }, "source": [ "## Backbone model\n", "Currently, we support several CLIP variations from [open-clip](https://github.com/mlfoundations/open_clip) for text to image retrieval tasks.\n", "\n", "However, you can see all available models either in [choose backbone](https://finetuner.jina.ai/walkthrough/choose-backbone/) section or by calling `finetuner.describe_models()`." - ] + ], + "metadata": { + "id": "AE87a5Nvwd7q" + } }, { "cell_type": "markdown", - "metadata": { - "id": "81fh900Bxgkn" - }, "source": [ "## Fine-tuning\n", "\n", "Now that we have the training and evaluation datasets loaded as `DocumentArray`s and selected our model, we can start our fine-tuning run." - ] + ], + "metadata": { + "id": "81fh900Bxgkn" + } }, { "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "UDcpfybOv1dh" - }, - "outputs": [], "source": [ "from finetuner.callback import EvaluationCallback\n", "\n", "run = finetuner.fit(\n", " model='openai/clip-vit-base-patch32',\n", - " train_data='fashion-train-data-clip',\n", - " eval_data='fashion-eval-data-clip',\n", + " train_data='finetuner/fashion-train-data-clip',\n", + " eval_data='finetuner/fashion-eval-data-clip',\n", " epochs=5,\n", " learning_rate= 1e-7,\n", " loss='CLIPLoss',\n", @@ -137,18 +147,20 @@ " EvaluationCallback(\n", " model='clip-text',\n", " index_model='clip-vision',\n", - " query_data='fashion-eval-data-queries',\n", - " index_data='fashion-eval-data-index',\n", + " query_data='finetuner/fashion-eval-data-queries',\n", + " index_data='finetuner/fashion-eval-data-index',\n", " )\n", " ],\n", ")" - ] + ], + "metadata": { + "id": "UDcpfybOv1dh" + }, + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": { - "id": "QPDmFdubxzUE" - }, "source": [ "Let's understand what this piece of code does:\n", "\n", @@ -156,37 +168,37 @@ "* We also provide some hyper-parameters such as number of `epochs` and a `learning_rate`.\n", "* We use `CLIPLoss` to optimize the CLIP model.\n", "* We use an evaluation callback, which uses the `'clip-text'` model for encoding the text queries and the `'clip-vision'` model for encoding the images in `'fashion-eval-data-index'`.\n" - ] + ], + "metadata": { + "id": "QPDmFdubxzUE" + } }, { "cell_type": "markdown", - "metadata": { - "id": "qKv3VcMKyG8d" - }, "source": [ "## Monitoring\n", "\n", "Now that we've created a run, let's see its status. You can monitor the run by checking the status - `run.status()` and - the logs - `run.logs()` or - `run.stream_logs()`. " - ] + ], + "metadata": { + "id": "qKv3VcMKyG8d" + } }, { "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "JX45y-2fxs4L" - }, - "outputs": [], "source": [ "# note, the fine-tuning might takes 20~ minutes\n", "for entry in run.stream_logs():\n", " print(entry)" - ] + ], + "metadata": { + "id": "JX45y-2fxs4L" + }, + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": { - "id": "xi49YlQsyXbi" - }, "source": [ "Since some runs might take up to several hours/days, it's important to know how to reconnect to Finetuner and retrieve your run.\n", "\n", @@ -198,13 +210,13 @@ "```\n", "\n", "You can continue monitoring the run by checking the status - `finetuner.run.Run.status()` or the logs - `finetuner.run.Run.logs()`." - ] + ], + "metadata": { + "id": "xi49YlQsyXbi" + } }, { "cell_type": "markdown", - "metadata": { - "id": "Xeq_aVRxyqlW" - }, "source": [ "## Evaluating\n", "\n", @@ -221,54 +233,50 @@ " DEBUG Metric: 'clip-text-to-clip-vision_dcg_at_k' Value: 2.71247 \n", "...\n", "```\n" - ] + ], + "metadata": { + "id": "Xeq_aVRxyqlW" + } }, { "cell_type": "markdown", - "metadata": { - "id": "h3qC3yAcy-Es" - }, "source": [ "## Saving\n", "\n", "After the run has finished successfully, you can download the tuned model on your local machine:" - ] + ], + "metadata": { + "id": "h3qC3yAcy-Es" + } }, { "cell_type": "code", - "execution_count": null, + "source": [ + "artifact = run.save_artifact('clip-model')" + ], "metadata": { "id": "sucF7touyKo0" }, - "outputs": [], - "source": [ - "artifact = run.save_artifact('clip-model')" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": { - "id": "8_VGjKq3zDx9" - }, "source": [ "## Inference\n", "\n", "Now you saved the `artifact` into your host machine,\n", "let's use the fine-tuned model to encode a new `Document`:" - ] + ], + "metadata": { + "id": "8_VGjKq3zDx9" + } }, { "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "v95QsuEyzE-B" - }, - "outputs": [], "source": [ "text_da = DocumentArray([Document(text='some text to encode')])\n", - "image_da = DocumentArray([Document(\n", - " uri='https://upload.wikimedia.org/wikipedia/commons/4/4e/Single_apple.png'\n", - " )])\n", + "image_da = DocumentArray([Document(uri='https://upload.wikimedia.org/wikipedia/commons/4/4e/Single_apple.png')])\n", "\n", "clip_text_encoder = finetuner.get_model(artifact=artifact, select_model='clip-text')\n", "clip_image_encoder = finetuner.get_model(artifact=artifact, select_model='clip-vision')\n", @@ -278,13 +286,15 @@ "\n", "print(text_da.embeddings.shape)\n", "print(image_da.embeddings.shape)" - ] + ], + "metadata": { + "id": "v95QsuEyzE-B" + }, + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": { - "id": "LzMbR7VgzXtA" - }, "source": [ "```bash\n", "(1, 512)\n", @@ -302,13 +312,13 @@ "In case you set `to_onnx=True` when calling `finetuner.fit` function,\n", "please use `model = finetuner.get_model(artifact, is_onnx=True)`\n", "```" - ] + ], + "metadata": { + "id": "LzMbR7VgzXtA" + } }, { "cell_type": "markdown", - "metadata": { - "id": "LHyMm_M1zxdt" - }, "source": [ "## Advanced: WiSE-FT \n", "\n", @@ -343,20 +353,23 @@ "\n", "\n", "That's it! Check out [clip-as-service](https://clip-as-service.jina.ai/user-guides/finetuner/?highlight=finetuner#fine-tune-models) to learn how to plug-in a fine-tuned CLIP model to our CLIP specific service." - ] + ], + "metadata": { + "id": "LHyMm_M1zxdt" + } }, { "cell_type": "markdown", - "metadata": {}, "source": [ "## Before and after\n", "We can directly compare the results of our fine-tuned model with a pre-trained clip model by displaying the matches each model has for the same query. While the differences between the results of the two models are quite subtle for some queries, the examples below clearly show that finetuning increases the quality of the search results:" - ] + ], + "metadata": { + "id": "tpm8eVRFX20B" + } }, { - "attachments": {}, "cell_type": "markdown", - "metadata": {}, "source": [ "```python\n", "import copy\n", @@ -410,37 +423,13 @@ "```plaintext\n", "Results for query: \"nightingale tee jacket\" using a zero-shot model (top) and the fine-tuned model (bottom)\n", "```\n", - "![clip-example-pt](images/clip-example-pt.png)\n", + "![clip-example-pt](https://finetuner.jina.ai/_images/clip-example-pt.png)\n", "\n", - "![clip-example-ft](images/clip-example-ft.png)\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [] - } - ], - "metadata": { - "accelerator": "GPU", - "colab": { - "provenance": [] - }, - "kernelspec": { - "display_name": "Python 3.7.15 ('.venv': venv)", - "language": "python", - "name": "python3" - }, - "language_info": { - "name": "python", - "version": "3.7.15 (default, Oct 12 2022, 19:14:01) \n[GCC 11.2.0]" - }, - "vscode": { - "interpreter": { - "hash": "9ad9c14fbc5ce15e23594239b0b0bb7cf990b71472055d7d43822c20d61e1cff" + "![clip-example-ft](https://finetuner.jina.ai/_images/clip-example-ft.png)\n" + ], + "metadata": { + "id": "C30UVpHDX4HF" } } - }, - "nbformat": 4, - "nbformat_minor": 0 -} + ] +} \ No newline at end of file diff --git a/docs/notebooks/text_to_image.md b/docs/notebooks/text_to_image.md index 37b27ef2e..ff7ca0962 100644 --- a/docs/notebooks/text_to_image.md +++ b/docs/notebooks/text_to_image.md @@ -7,8 +7,7 @@ jupyter: format_version: '1.3' jupytext_version: 1.14.1 kernelspec: - display_name: 'Python 3.7.15 (''.venv'': venv)' - language: python + display_name: Python 3 name: python3 --- @@ -60,10 +59,10 @@ finetuner.login(force=True) ``` ```python id="cpIj7viExFti" -train_data = DocumentArray.pull('fashion-train-data-clip', show_progress=True) -eval_data = DocumentArray.pull('fashion-eval-data-clip', show_progress=True) -query_data = DocumentArray.pull('fashion-eval-data-queries', show_progress=True) -index_data = DocumentArray.pull('fashion-eval-data-index', show_progress=True) +train_data = DocumentArray.pull('finetuner/fashion-train-data-clip', show_progress=True) +eval_data = DocumentArray.pull('finetuner/fashion-eval-data-clip', show_progress=True) +query_data = DocumentArray.pull('finetuner/fashion-eval-data-queries', show_progress=True) +index_data = DocumentArray.pull('finetuner/fashion-eval-data-index', show_progress=True) train_data.summary() ``` @@ -86,8 +85,8 @@ from finetuner.callback import EvaluationCallback run = finetuner.fit( model='openai/clip-vit-base-patch32', - train_data='fashion-train-data-clip', - eval_data='fashion-eval-data-clip', + train_data='finetuner/fashion-train-data-clip', + eval_data='finetuner/fashion-eval-data-clip', epochs=5, learning_rate= 1e-7, loss='CLIPLoss', @@ -96,8 +95,8 @@ run = finetuner.fit( EvaluationCallback( model='clip-text', index_model='clip-vision', - query_data='fashion-eval-data-queries', - index_data='fashion-eval-data-index', + query_data='finetuner/fashion-eval-data-queries', + index_data='finetuner/fashion-eval-data-index', ) ], ) @@ -176,9 +175,7 @@ let's use the fine-tuned model to encode a new `Document`: ```python id="v95QsuEyzE-B" text_da = DocumentArray([Document(text='some text to encode')]) -image_da = DocumentArray([Document( - uri='https://upload.wikimedia.org/wikipedia/commons/4/4e/Single_apple.png' - )]) +image_da = DocumentArray([Document(uri='https://upload.wikimedia.org/wikipedia/commons/4/4e/Single_apple.png')]) clip_text_encoder = finetuner.get_model(artifact=artifact, select_model='clip-text') clip_image_encoder = finetuner.get_model(artifact=artifact, select_model='clip-vision') @@ -245,10 +242,12 @@ The value you set to `alpha` should be greater equal than 0 and less equal than That's it! Check out [clip-as-service](https://clip-as-service.jina.ai/user-guides/finetuner/?highlight=finetuner#fine-tune-models) to learn how to plug-in a fine-tuned CLIP model to our CLIP specific service. + ## Before and after We can directly compare the results of our fine-tuned model with a pre-trained clip model by displaying the matches each model has for the same query. While the differences between the results of the two models are quite subtle for some queries, the examples below clearly show that finetuning increases the quality of the search results: + - + ```python import copy from finetuner import build_model @@ -301,10 +300,8 @@ plot_matches() ```plaintext Results for query: "nightingale tee jacket" using a zero-shot model (top) and the fine-tuned model (bottom) ``` -![clip-example-pt](images/clip-example-pt.png) +![clip-example-pt](https://finetuner.jina.ai/_images/clip-example-pt.png) -![clip-example-ft](images/clip-example-ft.png) +![clip-example-ft](https://finetuner.jina.ai/_images/clip-example-ft.png) - - diff --git a/docs/notebooks/text_to_text.ipynb b/docs/notebooks/text_to_text.ipynb index d51cbba04..4fbfc92f6 100644 --- a/docs/notebooks/text_to_text.ipynb +++ b/docs/notebooks/text_to_text.ipynb @@ -30,6 +30,17 @@ "!pip install 'finetuner[full]'" ] }, + { + "cell_type": "code", + "source": [ + "!pip install git+https://github.com/jina-ai/jina-hubble-sdk.git@fix-post-success-type" + ], + "metadata": { + "id": "cu1HRyCAlH_A" + }, + "execution_count": null, + "outputs": [] + }, { "cell_type": "markdown", "metadata": { @@ -82,7 +93,7 @@ "import finetuner\n", "from docarray import DocumentArray, Document\n", "\n", - "finetuner.login()" + "finetuner.login(force=True)\n" ] }, { @@ -93,9 +104,9 @@ }, "outputs": [], "source": [ - "train_data = DocumentArray.pull('quora_train.da', show_progress=True)\n", - "query_data = DocumentArray.pull('quora_query_dev.da', show_progress=True)\n", - "index_data = DocumentArray.pull('quora_index_dev.da', show_progress=True)\n", + "train_data = DocumentArray.pull('finetuner/quora-train-da', show_progress=True)\n", + "query_data = DocumentArray.pull('finetuner/quora-test-query-da', show_progress=True)\n", + "index_data = DocumentArray.pull('finetuner/quora-test-index-da', show_progress=True)\n", "\n", "train_data.summary()" ] @@ -153,7 +164,7 @@ "\n", "run = finetuner.fit(\n", " model='bert-base-cased',\n", - " train_data='quora_train.da',\n", + " train_data='finetuner/quora-train-da',\n", " loss='TripletMarginLoss',\n", " optimizer='Adam',\n", " learning_rate = 1e-5,\n", @@ -162,8 +173,8 @@ " device='cuda',\n", " callbacks=[\n", " EvaluationCallback(\n", - " query_data='quora_query_dev.da',\n", - " index_data='quora_index_dev.da',\n", + " query_data='finetuner/quora-test-query-da',\n", + " index_data='finetuner/quora-test-index-da',\n", " batch_size=32\n", " )\n", " ]\n", @@ -336,7 +347,6 @@ }, { "cell_type": "markdown", - "metadata": {}, "source": [ "## Before and after\n", "We can directly compare the results of our fine-tuned model with its zero-shot counterpart to get a better idea of how finetuning affects the results of a search. While the zero-shot model is able to produce results that are very similar to the initial query, it is common for the topic of the question to change, with the structure staying the same. After fine-tuning, the returned questions are consistently relevant to the initial query, even in cases where the structure of the sentence is different.\n", @@ -344,8 +354,8 @@ "```python\n", "import copy\n", "\n", - "query_pt = DocumentArray.pull('quora_query_dev.da', show_progress=True)\n", - "index_pt = DocumentArray.pull('quora_index_dev.da', show_progress=True)\n", + "query_pt = DocumentArray.pull('finetuner/quora-test-query-da', show_progress=True)\n", + "index_pt = DocumentArray.pull('finetuner/quora-test-index-da', show_progress=True)\n", "\n", "query_ft = copy.deepcopy(query_pt)\n", "index_ft = copy.deepcopy(index_pt)\n", @@ -406,7 +416,10 @@ " - What are the legitimate ways to earn money online?\n", "\n", "```\n" - ] + ], + "metadata": { + "id": "53Xtm0hidrjs" + } }, { "cell_type": "markdown", @@ -421,25 +434,17 @@ "metadata": { "accelerator": "GPU", "colab": { - "collapsed_sections": [], "provenance": [] }, "gpuClass": "standard", "kernelspec": { - "display_name": "Python 3.7.15 ('.venv': venv)", - "language": "python", + "display_name": "Python 3", "name": "python3" }, "language_info": { - "name": "python", - "version": "3.7.15" - }, - "vscode": { - "interpreter": { - "hash": "9ad9c14fbc5ce15e23594239b0b0bb7cf990b71472055d7d43822c20d61e1cff" - } + "name": "python" } }, "nbformat": 4, "nbformat_minor": 0 -} +} \ No newline at end of file diff --git a/docs/notebooks/text_to_text.md b/docs/notebooks/text_to_text.md index de16e5fce..fffe98052 100644 --- a/docs/notebooks/text_to_text.md +++ b/docs/notebooks/text_to_text.md @@ -7,8 +7,7 @@ jupyter: format_version: '1.3' jupytext_version: 1.14.1 kernelspec: - display_name: 'Python 3.7.15 (''.venv'': venv)' - language: python + display_name: Python 3 name: python3 --- @@ -30,6 +29,10 @@ This guide will lead you through an example use-case to show you how Finetuner c !pip install 'finetuner[full]' ``` +```python id="cu1HRyCAlH_A" +!pip install git+https://github.com/jina-ai/jina-hubble-sdk.git@fix-post-success-type +``` + ## Task @@ -65,13 +68,14 @@ We will use the [Quora Question Pairs](https://www.sbert.net/examples/training/q import finetuner from docarray import DocumentArray, Document -finetuner.login() +finetuner.login(force=True) + ``` ```python id="8PIO5T--p4tR" -train_data = DocumentArray.pull('quora_train.da', show_progress=True) -query_data = DocumentArray.pull('quora_query_dev.da', show_progress=True) -index_data = DocumentArray.pull('quora_index_dev.da', show_progress=True) +train_data = DocumentArray.pull('finetuner/quora-train-da', show_progress=True) +query_data = DocumentArray.pull('finetuner/quora-test-query-da', show_progress=True) +index_data = DocumentArray.pull('finetuner/quora-test-index-da', show_progress=True) train_data.summary() ``` @@ -108,7 +112,7 @@ from finetuner.callback import EvaluationCallback run = finetuner.fit( model='bert-base-cased', - train_data='quora_train.da', + train_data='finetuner/quora-train-da', loss='TripletMarginLoss', optimizer='Adam', learning_rate = 1e-5, @@ -117,8 +121,8 @@ run = finetuner.fit( device='cuda', callbacks=[ EvaluationCallback( - query_data='quora_query_dev.da', - index_data='quora_index_dev.da', + query_data='finetuner/quora-test-query-da', + index_data='finetuner/quora-test-index-da', batch_size=32 ) ] @@ -226,15 +230,15 @@ And finally you can use the embeded `query` to find top-k semantically related t query.match(index_data, limit=10, metric='cosine') ``` - + ## Before and after We can directly compare the results of our fine-tuned model with its zero-shot counterpart to get a better idea of how finetuning affects the results of a search. While the zero-shot model is able to produce results that are very similar to the initial query, it is common for the topic of the question to change, with the structure staying the same. After fine-tuning, the returned questions are consistently relevant to the initial query, even in cases where the structure of the sentence is different. ```python import copy -query_pt = DocumentArray.pull('quora_query_dev.da', show_progress=True) -index_pt = DocumentArray.pull('quora_index_dev.da', show_progress=True) +query_pt = DocumentArray.pull('finetuner/quora-test-query-da', show_progress=True) +index_pt = DocumentArray.pull('finetuner/quora-test-index-da', show_progress=True) query_ft = copy.deepcopy(query_pt) index_ft = copy.deepcopy(index_pt) diff --git a/docs/walkthrough/using-callbacks.md b/docs/walkthrough/using-callbacks.md index 492113e77..7ff2b109f 100644 --- a/docs/walkthrough/using-callbacks.md +++ b/docs/walkthrough/using-callbacks.md @@ -8,13 +8,13 @@ A run can be assigned multiple callbacks using the optional `callbacks` paramete run = finetuner.fit( model = 'resnet50', run_name = 'resnet-tll-early-6', - train_data = 'tll-train-da', + train_data = 'finetuner/tll-train-da', epochs = 5, learning_rate = 1e-6, callbacks=[ EvaluationCallback( - query_data='tll-test-query-da', - index_data='tll-test-index-da' + query_data='finetuner/tll-test-query-da', + index_data='finetuner/tll-test-index-da' ), EarlyStopping() ] @@ -121,7 +121,7 @@ from finetuner.callback import EarlyStopping, EvaluationCallback run = finetuner.fit( model='openai/clip-vit-base-patch32', run_name='clip-fashion-early', - train_data='clip-fashion-train-data', + train_data='finetuner/fashion-train-data-clip', epochs=10, learning_rate= 1e-5, loss='CLIPLoss', @@ -179,13 +179,13 @@ from finetuner.callback import WandBLogger, EvaluationCallback run = finetuner.fit( model='resnet50', run_name = 'resnet-tll-early-6', - train_data = 'tll-train-da', + train_data = 'finetuner/tll-train-da', epochs = 5, learning_rate = 1e-6, callbacks=[ EvaluationCallback( - query_data='tll-test-query-da', - index_data='tll-test-index-da' + query_data='finetuner/tll-test-query-da', + index_data='finetuner/tll-test-index-da' ), WandBLogger(), ]