jina-ai · guenthermi · Jan 16, 2023 · Dec 29, 2022 · Dec 29, 2022 · Dec 29, 2022
diff --git a/README.md b/README.md
@@ -18,12 +18,17 @@
 
 <!-- start elevator-pitch -->
 
-Fine-tuning is an effective way to improve performance on neural search tasks. However, setting up and performing 
-fine-tuning can be very time-consuming and resource-intensive.
+Fine-tuning is an effective way to improve performance on [neural search](https://jina.ai/news/what-is-neural-search-and-learn-to-build-a-neural-search-engine/) tasks.
+However, setting up and performing fine-tuning can be very time-consuming and resource-intensive.
 
-Jina AI's Finetuner makes fine-tuning easier and faster by streamlining the workflow and handling all complexity and 
-infrastructure in the cloud. With Finetuner, one can easily enhance the performance of pre-trained models, making them 
-production-ready without buying expensive hardware.
+Jina AI's Finetuner makes fine-tuning easier and faster by streamlining the workflow and handling all complexity and infrastructure in the cloud.
+With Finetuner, one can easily enhance the performance of pre-trained models,
+making them production-ready [without extensive labeling](https://jina.ai/news/fine-tuning-with-low-budget-and-high-expectations/) and maintaining hardware.
+
+🎏 **Better embeddings**: create high-quality embeddings for semantic search, visual similarity search, cross-modal text image search, recommendation,
+clustering, duplication detection, anomaly detection etc.
+
+⏰ **Low budget, high expectation**: effectively use a few hundreds of training samples and finish tuning within an hour while bring considerable improvements.
 
 📈 **Performance promise**: enhance the performance of pre-trained models and deliver state-of-the-art performance on 
 domain-specific neural search applications.
@@ -113,11 +118,26 @@ without worrying about resource availability, complex integration, or infrastruc
     <td>0.340</td>
     <td><span style="color:green">37.7%</span></td>
   </tr>
+  <tr>
+    <td rowspan="2">PointNet++</td>
+    <td rowspan="2"><a href="https://modelnet.cs.princeton.edu/">ModelNet40</a> 3D Mesh Search</td>
+    <td>mRR</td>
+    <td>0.791</td>
+    <td>0.891</td>
+    <td><span style="color:green">12.7%</span></td>
+    <td rowspan="2"><p align=center><a href="https://colab.research.google.com/drive/1lIMDFkUVsWMshU-akJ_hwzBfJ37zLFzU?usp=sharing"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg"></a></p></td>
+  </tr>
+  <tr>
+    <td>Recall</td>
+    <td>0.154</td>
+    <td>0.242</td>
+    <td><span style="color:green">57.1%</span></td>
+  </tr>
 
 </tbody>
 </table>
 
-<sub><sup>All metrics were evaluated for k@20 after training for 5 epochs using the Adam optimizer with learning rates of 1e-4 for ResNet, 1e-7 for CLIP and 1e-5 for the BERT models.</sup></sub>
+<sub><sup>All metrics were evaluated for k@20 after training for 5 epochs using the Adam optimizer with learning rates of 1e-4 for ResNet, 1e-7 for CLIP and 1e-5 for the BERT models, 5e-4 for PointNet++</sup></sub>
 
 <!-- start install-instruction -->
 
@@ -142,127 +162,6 @@ pip install "finetuner[full]"
 > ⚠️ Starting with version 0.5.0, Finetuner computing is performed on Jina AI Cloud. The last local version is `0.4.1`. 
 > This version is still available for installation via `pip`. See [Finetuner git tags and releases](https://github.com/jina-ai/finetuner/releases).
 
-
-
-
-
-## Get Started
-
-The following code snippet describes how to fine-tune ResNet50 on the [_Totally Looks Like_ dataset](https://sites.google.com/view/totally-looks-like-dataset). 
-You can run it as-is. The model and training data are already hosted in Jina AI Cloud and Finetuner will 
-download them automatically.
-(NB: If there is already a run called `resnet50-tll-run`, choose a different run-name in the code below.)
-
-```python
-import finetuner
-from finetuner.callback import EvaluationCallback
-
-finetuner.login()
-
-run = finetuner.fit(
-    model='resnet50',
-    run_name='resnet50-tll-run',
-    train_data='tll-train-data',
-    callbacks=[
-        EvaluationCallback(
-            query_data='tll-test-query-data',
-            index_data='tll-test-index-data',
-        )
-    ],
-)
-```
-This code snippet describes the following steps:
-
-1. Log in to Jina AI Cloud.
-2. Select backbone model, training and evaluation data for your evaluation callback.
-3. Start the cloud run.
-
-You can also pass data to Finetuner as a CSV file or a `DocumentArray` object, as described [in the Finetuner documentation](https://finetuner.jina.ai/walkthrough/create-training-data/).  
-
-Depending on the data, task, model, hyperparameters, fine-tuning might take some time to finish. You can leave your jobs 
-to run on the Jina AI Cloud, and later reconnect to them, using code like this below:
-
-```python
-import finetuner
-
-finetuner.login()
-
-run = finetuner.get_run('resnet50-tll-run')
-
-for log_entry in run.stream_logs():
-    print(log_entry)
-
-run.save_artifact('resnet-tll')
-```
-
-This code logs into Jina AI Cloud, then connects to your run by name. After that, it does the following:
-  * Monitors the status of the run and prints out the logs.
-  * Saves the model once fine-tuning is done.
-
-## Using Finetuner to encode
-
-Finetuner has interfaces for using models to do encoding:
-
-```python
-import finetuner
-from docarray import Document, DocumentArray
-
-da = DocumentArray([Document(uri='~/Pictures/your_img.png')])
-
-model = finetuner.get_model('resnet-tll')
-finetuner.encode(model=model, data=da)
-
-da.summary()
-```
-
-When encoding, you can provide data either as a DocumentArray or a list. Since the modality of your input data can be inferred from the model being used, there is no need to provide any additional information besides the content you want to encode. When providing data as a list, the `finetuner.encode` method will return a `np.ndarray` of embeddings, instead of a `docarray.DocumentArray`:
-
-```python
-import finetuner
-from docarray import Document, DocumentArray
-
-images = ['~/Pictures/your_img.png']
-
-model = finetuner.get_model('resnet-tll')
-embeddings = finetuner.encode(model=model, data=images)
-```
-
-## Training on your own data
-
-If you want to train a model using your own dataset instead of one on the Jina AI Cloud, you can provide labeled data in a CSV file.
-
-A CSV file is a tab or comma-delimited plain text file. For example:
-
-```plaintext
-This is an apple    apple_label
-This is a pear      pear_label
-...
-```
-The file should have two columns: The first for the data and the second for the category label.
-
-You can then provide a path to a CSV file as training data for Finetuner:
-
-```python
-run = finetuner.fit(
-    model='bert-base-cased',
-    run_name='bert-my-own-run',
-    train_data='path/to/some/data.csv',
-)
-```
-More information on providing your own training data is found in the [Prepare Training Data](https://finetuner.jina.ai/walkthrough/create-training-data/) section of the [Finetuner documentation](https://finetuner.jina.ai/).
-
-
-
-### Next steps
-
-- Take the [walkthrough](https://finetuner.jina.ai/walkthrough/) and submit your first fine-tuning job.
-- Try out different search tasks:
-  - [Text-to-Text Search via BERT](https://finetuner.jina.ai/notebooks/text_to_text/)
-  - [Image-to-Image Search via ResNet50](https://finetuner.jina.ai/notebooks/image_to_image/)
-  - [Text-to-Image Search via CLIP](https://finetuner.jina.ai/notebooks/text_to_image/)
-
-[Read our documentation](https://finetuner.jina.ai/) to learn more about what Finetuner can do.
-
 <!-- start support-pitch -->
 ## Support
 

diff --git a/docs/advanced-topics/budget.md b/docs/advanced-topics/budget.md
@@ -0,0 +1,34 @@
+(budget)=
+# {octicon}`database` How much data?
+
+## Motivation
+
+Fine-tuning is a transfer learning technique developed as part of the Deep Learning revolution in artificial intelligence.
+Instead of learning a new task from scratch,
+fine-tuning takes a pre-trained model,
+trained on a related task, and then further trains it for the new task.
+Alternately, it can mean taking a model pre-trained for an open domain task, and further training it for a domain-specific one.
+Compared to training from scratch, fine-tuning is a much more cost-efficient solution whenever it is feasible. It requires:
+
++ **less labeled data**: as there is no need to learn everything all over again. All the training is devoted to acquiring domain-specific knowledge.
++ **less time to train**: since the number of variables is much smaller and most layers in the deep neural network freeze during fine-tuning.
+
+But:
+
++ **Exactly how much data do you need to get a good result?** One labeled data point? Ten? One thousand? Ten thousand?
++ **Exactly how much time do you need to get good results?** One minute of fine-tuning? An hour? A day? A week?
+
+## Experiments
+
+We designed two experiments to quantitatively study how labeled data and training time affect fine-tuning performance.
+For each experiment, we construct three search tasks by fine-tuning three deep neural networks.
+We chose seven datasets, two of which are non-domain-specific public datasets, to ensure the generality of our experiment.
+
+We measure the performance of fine-tuned models by evaluating their ability to perform search tasks, as measured by Mean Reciprocal Rank (mRR), Recall, and Mean Average Precision (mAP).
+These metrics are calculated using the top 20 results of each search in the validation subset held out from each dataset.
+
+### How much labeled data is needed?
+
+### How much time is needed?
+
+## Summary
diff --git a/docs/walkthrough/integrate-with-jina.md → docs/advanced-topics/finetuner-executor.md b/docs/walkthrough/integrate-with-jina.md → docs/advanced-topics/finetuner-executor.md
@@ -1,91 +1,5 @@
-# Encode Documents
-
-Once fine-tuning is finished, it's time to actually use the model.
-You can use the fine-tuned models directly to encode [DocumentArray](https://docarray.jina.ai/) objects or setting up an encoding service.
-When encoding, data can also be provided as a regular list.
-
-(integrate-with-docarray)=
-## Embed DocumentArray
-
-To embed a [DocumentArray](https://docarray.jina.ai/) with a fine-tuned model, you can get the model of your Run via the {func}`~finetuner.get_model` function and embed it via the {func}`finetuner.encode` function:
-
-````{tab} Artifact id and token
-```python
-from docarray import DocumentArray, Document
-import finetuner
-
-finetuner.login()
-
-token = finetuner.get_token()
-run = finetuner.get_run(
-    experiment_name='YOUR-EXPERIMENT',
-    run_name='YOUR-RUN'
-)
-
-model = finetuner.get_model(
-    run.artifact_id,
-    token=token,
-    device='cuda', # model will be placed on cpu by default.
-)
-
-da = DocumentArray([Document(text='some text to encode')])
-
-finetuner.encode(model=model, data=da)
-
-for doc in da:
-    print(f'Text of the returned document: {doc.text}')
-    print(f'Shape of the embedding: {doc.embedding.shape}')
-```
-````
-````{tab} Locally saved artifact
-```python
-from docarray import DocumentArray, Document
-import finetuner
-
-model = finetuner.get_model('/path/to/YOUR-MODEL.zip')
-
-da = DocumentArray([Document(text='some text to encode')])
-
-finetuner.encode(model=model, data=da)
-
-for doc in da:
-    print(f'Text of the returned document: {doc.text}')
-    print(f'Shape of the embedding: {doc.embedding.shape}')
-```
-````
-
-```console
-Text of the returned document: some text to encode
-Shape of the embedding: (768,)
-```
-
-## Encoding a List
-Data that is stored in a regular list can be embedded in the same way you would a [DocumentArray](https://docarray.jina.ai/). Since the modality of your input data can be inferred from the model being used, there is no need to provide any additional information besides the content you want to encode. When providing data as a list, the `finetuner.encode` method will return a `np.ndarray` of embeddings, instead of a `docarray.DocumentArray`:
-
-```python
-from docarray import DocumentArray, Document
-import finetuner
-
-model = finetuner.get_model('/path/to/YOUR-MODEL.zip')
-
-texts = ['some text to encode']
-
-embeddings = finetuner.encode(model=model, data=texts)
-
-for text, embedding in zip(texts, embeddings):
-    print(f'Text of the returned document: {text}')
-    print(f'Shape of the embedding: {embedding.shape}')
-```
-
-
-```{admonition} Inference with ONNX
-:class: tip
-In case you set `to_onnx=True` when calling `finetuner.fit` function,
-please use `model = finetuner.get_model('/path/to/YOUR-MODEL.zip', is_onnx=True)`
-```
-
-(integrate-with-jina)=
-## Fine-tuned model as Executor
+(finetuner-executor)=
+# {octicon}`gear` Use FinetunerExecutor inside a Jina Flow
 
 Finetuner, being part of the Jina AI Cloud, provides a convenient way to use tuned models via [Jina Executors](https://docs.jina.ai/fundamentals/executor/).
 
@@ -190,58 +104,6 @@ into the same vector space.
 To use those models, you have to provide the name of the model via an additional
 `select_model` parameter to the {func}`~finetuner.get_model` function.
 
-
-````{tab} CLIP text model
-```python
-from docarray import DocumentArray, Document
-import finetuner
-
-finetuner.login()
-
-token = finetuner.get_token()
-run = finetuner.get_run(
-    experiment_name='YOUR-EXPERIMENT',
-    run_name='YOUR-RUN'
-)
-
-model = finetuner.get_model(
-    run.artifact_id,
-    token=token,
-    device='cuda',
-    select_model='clip-text'
-)
-
-da = DocumentArray([Document(text='some text to encode')])
-
-finetuner.encode(model=model, data=da)
-```
-````
-````{tab} CLIP vision model
-```python
-from docarray import DocumentArray, Document
-import finetuner
-
-finetuner.login()
-
-token = finetuner.get_token()
-run = finetuner.get_run(
-    experiment_name='YOUR-EXPERIMENT',
-    run_name='YOUR-RUN'
-)
-
-model = finetuner.get_model(
-    run.artifact_id,
-    token=token,
-    device='cuda',
-    select_model='clip-vision'
-)
-
-da = DocumentArray([Document(text='~/Pictures/my_img.png')])
-
-finetuner.encode(model=model, data=da)
-```
-````
-
 If you want to host the CLIP models, you also have to provide the name of the model via the
 `select_model` parameter inside the `uses_with` attribute:
 
@@ -264,4 +126,5 @@ f = Flow().add(
     },
 )
 
-```
+```
+