Skip to content

Commit

Permalink
docs: rephrasing
Browse files Browse the repository at this point in the history
  • Loading branch information
lmmilliken committed Nov 23, 2022
1 parent bb651c9 commit 82c62d7
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 8 deletions.
8 changes: 4 additions & 4 deletions docs/notebooks/using_mclip.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
"\n",
"This guide will show you how to finetune a multilingual CLIP model for a text to image retrieval in non-English languages.\n",
"\n",
"*Note, please consider switching to GPU/TPU Runtime for faster inference.*\n"
"*Note, Check the runtime menu to me sure you are using a GPU./TPU instance, or this code will run very slowly.*\n"
]
},
{
Expand Down Expand Up @@ -51,7 +51,7 @@
"id": "ed1f88d4-f140-48d4-9d20-00e628c73e38",
"metadata": {},
"source": [
"We'll be finetuning multilingual CLIP on the `toloka-fashion` dataset, which contains information about fashion products, with all descriptions being in German. \n",
"We'll be fine-tuning multilingual CLIP on the `toloka-fashion` dataset, which contains images and descriptions of fashion products in German. \n",
"\n",
"Each product in the dataset contains several attributes, we will be making use of the image and category attributes to create a [`Document`](https://docarray.jina.ai/fundamentals/document/#document) containing two [chunks](https://docarray.jina.ai/fundamentals/document/nested/#nested-structure), one containing the image and another containing the category of the product."
]
Expand All @@ -62,7 +62,7 @@
"metadata": {},
"source": [
"## Data\n",
"We will use the `toloka-fashion` dataset, which we have already pre-processed and made available on the Jina AI Cloud. You can access it by like so:"
"We will use the `toloka-fashion` dataset, which we have already pre-processed and made available on the Jina AI Cloud. You can access it using `DocArray.pull`:"
]
},
{
Expand Down Expand Up @@ -97,7 +97,7 @@
"metadata": {},
"source": [
"## Backbone Model\n",
"Currently, we only support one multilingual CLIP model, which has been made available by [open-clip](https://github.com/mlfoundations/open_clip). This model is the `xlm-roberta-base-ViT-B-32`, which has been trained on the `laion5b` dataset."
"Currently, we only support one multilingual CLIP model. This model is the `xlm-roberta-base-ViT-B-32` from [open-clip](https://github.com/mlfoundations/open_clip), which has been trained on the [`laion5b` dataset](https://github.com/LAION-AI/laion5B-paper)."
]
},
{
Expand Down
8 changes: 4 additions & 4 deletions docs/notebooks/using_mclip.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Most text-image models are only able to provide embeddings for text in a single

This guide will show you how to finetune a multilingual CLIP model for a text to image retrieval in non-English languages.

*Note, please consider switching to GPU/TPU Runtime for faster inference.*
*Note, Check the runtime menu to me sure you are using a GPU./TPU instance, or this code will run very slowly.*



Expand All @@ -32,13 +32,13 @@ This guide will show you how to finetune a multilingual CLIP model for a text to
## Task


We'll be finetuning multilingual CLIP on the `toloka-fashion` dataset, which contains information about fashion products, with all descriptions being in German.
We'll be fine-tuning multilingual CLIP on the `toloka-fashion` dataset, which contains images and descriptions of fashion products in German.

Each product in the dataset contains several attributes, we will be making use of the image and category attributes to create a [`Document`](https://docarray.jina.ai/fundamentals/document/#document) containing two [chunks](https://docarray.jina.ai/fundamentals/document/nested/#nested-structure), one containing the image and another containing the category of the product.


## Data
We will use the `toloka-fashion` dataset, which we have already pre-processed and made available on the Jina AI Cloud. You can access it by like so:
We will use the `toloka-fashion` dataset, which we have already pre-processed and made available on the Jina AI Cloud. You can access it using `DocArray.pull`:

```python
import finetuner
Expand All @@ -55,7 +55,7 @@ train_data.summary()
```

## Backbone Model
Currently, we only support one multilingual CLIP model, which has been made available by [open-clip](https://github.com/mlfoundations/open_clip). This model is the `xlm-roberta-base-ViT-B-32`, which has been trained on the `laion5b` dataset.
Currently, we only support one multilingual CLIP model. This model is the `xlm-roberta-base-ViT-B-32` from [open-clip](https://github.com/mlfoundations/open_clip), which has been trained on the [`laion5b` dataset](https://github.com/LAION-AI/laion5B-paper).


## Fine-tuning
Expand Down

0 comments on commit 82c62d7

Please sign in to comment.