jina-ai · guenthermi · Dec 21, 2022 · Dec 14, 2022 · Dec 14, 2022 · Dec 14, 2022
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -14,6 +14,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 - Add `core-ci` workflow to remotely run the ci of finetuner-core. ([#628](https://github.com/jina-ai/finetuner/pull/628))
 
+- Ass support for 3d meshes to `build_finetuning_dataset`. ([#638](https://github.com/jina-ai/finetuner/pull/638))
+
 ### Removed
 
 - Remove `cpu` parameter from `create_run` function. ([#631](https://github.com/jina-ai/finetuner/pull/631))
@@ -42,6 +44,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 - Add multilingual clip colab to readme. ([#620](https://github.com/jina-ai/finetuner/pull/620))
 
+- Add tutorial for mesh-to-mesh search. ([#638](https://github.com/jina-ai/finetuner/pull/638))
+
+- Add documentation for PointNet++ model and handling 3D mesh dataset. ([#638](https://github.com/jina-ai/finetuner/pull/638))
+
 
 ## [0.6.7] - 2022-11-25
 

diff --git a/docs/index.md b/docs/index.md
@@ -35,6 +35,7 @@ notebooks/text_to_text
 notebooks/image_to_image
 notebooks/text_to_image
 notebooks/multilingual_text_to_image
+notebooks/mesh_to_mesh
 ```
 
 ```{toctree}

diff --git a/docs/notebooks/mesh_to_mesh.ipynb b/docs/notebooks/mesh_to_mesh.ipynb
diff --git a/docs/notebooks/mesh_to_mesh.md b/docs/notebooks/mesh_to_mesh.md
@@ -0,0 +1,230 @@
+---
+jupyter:
+  jupytext:
+    text_representation:
+      extension: .md
+      format_name: markdown
+      format_version: '1.3'
+      jupytext_version: 1.14.1
+  kernelspec:
+    display_name: Python 3
+    name: python3
+---
+
+<!-- #region id="C0RxIJmLkTGk" -->
+# 3D Mesh-to-3D Mesh Search via PointNet++
+
+<a href="https://colab.research.google.com/drive/1lIMDFkUVsWMshU-akJ_hwzBfJ37zLFzU?usp=sharing"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg"></a>
+
+Finding similar 3D Meshes can become very time consuming. To support this task, one can build search systems. To directly search on the 3D meshes without relying on metadata one can use encoder model which extract create a point cloud from the mesh and encode it into vector dense representations which can be compared to each other. To enable those models to detect the right attributes of an 3D Mesh, this tutorial show you how to use Finetuner to train and use a model for 3D mesh search system.
+<!-- #endregion -->
+
+<!-- #region id="mk4gxLZnYJry" -->
+## Install
+<!-- #endregion -->
+
+```python id="vDVkw65kkQcn"
+!pip install 'finetuner[full]'
+!pip install 'docarray[full]'
+```
+
+<!-- #region id="q7Bb9o5ZHSZ3" -->
+## Task
+
+Finetuner supports an embedding model which is based on the Pytorch [implemention](https://github.com/yanx27/Pointnet_Pointnet2_pytorch) of the [PointNet++ model](https://proceedings.neurips.cc/paper/2017/file/d8bf84be3800d12f74d8b05e9b89836f-Paper.pdf). This tutorial will show you how to train and use this model for 3D mesh search.
+
+We demonstrate this on the [Modelnet40](https://modelnet.cs.princeton.edu/) dataset which consist of more than 12k 3D meshes of objects from 40 classes.
+Specifically, we want to build a search system, which can receive a 3D mehs and retrieves meshes of the same class.
+
+We will buid a dataset with some images for 
+
+<!-- #endregion -->
+
+<!-- #region id="H1Yo3NuGP1Oi" -->
+## Data
+
+ModelNet40 consists of 9843 meshes provided for training and 2468 meshes for testing. Usually, you would have to download the [dataset](https://modelnet.cs.princeton.edu/) unzip it, [prepare it, and upload it to the Jina AI Cloud](https://https://finetuner.jina.ai/walkthrough/create-training-data/). After that, you can provide the name of the dataset used for the upload to Finetuner.
+
+For this tutorial, we already prepared the data and uploaded it. Specifically the training data is uploaded as `modelnet40-train`. For evaluating the model, we split the test set of the original dataset in 300 meshes, which serve as queries (`modelnet40-queries`) and 2168 meshes which serve as the mesh collection, which is searched in (`modelnet40-index`).
+
+Each 3D mesh in the dataset is represented by a [DocArray](https://github.com/docarray/docarray) Document object. It contains the uri (local filepath) of the original file and a tensor which contains a point cloud with 2048 3D points sampled from the mesh as explained in (TODO add link to documentation)
+
+```{admonition} Push data to the cloud
+We don't require you to push data to the Jina AI Cloud by yourself. Instead of a name, you can provide a `DocumentArray` or a path to a CSV file.
+In those cases Finetuner will do the job for you.
+When you construct a DocArray dataset with documents of 3D meshes, please call `doc.load_uri_to_point_cloud_tensor(2048)` to create point clouds from your local mesh files before pushing the data to the cloud since Finetuner has no access to your local files.
+```
+
+The code below loads the data and prints a summary of the training datasets:
+<!-- #endregion -->
+
+```python id="uTDreSwfYGOR"
+import finetuner
+from docarray import DocumentArray, Document
+
+finetuner.login(force=True)
+```
+
+```python id="Y-Um5gE8IORv"
+train_data = DocumentArray.pull('modelnet40-train', show_progress=True)
+query_data = DocumentArray.pull('modelnet40-queries', show_progress=True)
+index_data = DocumentArray.pull('modelnet40-index', show_progress=True)
+
+train_data.summary()
+```
+
+<!-- #region id="r4cP95RzLybw" -->
+Now we want to take a look at the point clouds of some of the meshes:
+<!-- #endregion -->
+
+```python id="kCv455NPMD1O"
+index_data[0].display()
+```
+
+<!-- #region id="XlttkaD5Omhk" -->
+![A point cloud example](https://user-images.githubusercontent.com/6599259/208113813-bcf498d9-edf7-4496-a087-03bb783f3b70.png)
+<!-- #endregion -->
+
+<!-- #region id="B3I_QUeFT_V0" -->
+## Backbone model
+
+The model, we provide for 3d mesh encoding is called `pointnet++`. In the following, we show you how to train it on the modelnet training dataset.
+<!-- #endregion -->
+
+<!-- #region id="lqg0eY9oknLL" -->
+## Fine-tuning
+
+Now that we have data for training and evaluation as well as the name of the model, which we want to train, we can configure and submit a fine-tuning run:
+<!-- #endregion -->
+
+```python id="rR22MbgITp8M"
+from finetuner.callback import EvaluationCallback
+
+run = finetuner.fit(
+    model='pointnet++',
+    train_data='modelnet40-train',
+    epochs=10,
+    batch_size=64,
+    learning_rate= 5e-4,
+    loss='TripletMarginLoss',
+    device='cuda',
+    callbacks=[
+        EvaluationCallback(
+            query_data='modelnet40-queries',
+            index_data='modelnet40-index',
+            batch_size=64,
+        )
+    ],
+)
+```
+
+<!-- #region id="ossT9LH1oh6K" -->
+Let's understand what this piece of code does:
+
+* We start with providing `model`, in our case "pointnet++".
+* Via the `train_data` parameter, we inform the Finetuner about the name of the dataset in the Jina AI Cloud
+* We also provide some hyper-parameters such as number of `epochs`, `batch_size`, and a `learning_rate`.
+* We use `TripletMarginLoss` to optimize the PointNet++ model.
+* We use an evaluation callback, which uses the fine-tuned model for encoding the text queries and meshes in the index data collection. It also accepts the `batch_size` attribute. By encoding 64 meshes at once, the evaluation gets faster.
+
+<!-- #endregion -->
+
+<!-- #region id="AsHsMJP6p7Co" -->
+## Monitoring
+
+Now that we've created a run, let's see how it's processing. You can monitor the run by checking the status via `run.status()` and view the logs with `run.logs()`. To stream logs, call `run.stream_logs()`:
+<!-- #endregion -->
+
+```python id="PCCRZ6PalsK3"
+# note, the fine-tuning might takes 20~ minutes
+for entry in run.stream_logs():
+    print(entry)
+```
+
+<!-- #region id="zG7Uci-qqkzM" -->
+Since some runs might take up to several hours/days, it's important to know how to reconnect to Finetuner and retrieve your run.
+
+```python
+import finetuner
+
+finetuner.login()
+run = finetuner.get_run(run.name)
+```
+
+You can continue monitoring the run by checking the status - `finetuner.run.Run.status()` or the logs - `finetuner.run.Run.logs()`.*kursiver Text*
+<!-- #endregion -->
+
+<!-- #region id="WgTrq9D5q0zc" -->
+## Evaluating
+
+Our `EvaluationCallback` during fine-tuning ensures that after each epoch, an evaluation of our model is run. We can access the results of the last evaluation in the logs as follows `print(run.logs())`:
+
+```bash
+  Training [10/10] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 154/154 0:00:00 0:00:26 • loss: 0.001
+           INFO     Done ✨                                                                                                                                            __main__.py:195
+           DEBUG    Finetuning took 0 days, 0 hours 5 minutes and 39 seconds                                                                                           __main__.py:197
+           INFO     Metric: 'pointnet++_precision_at_k' before fine-tuning:  0.56533 after fine-tuning: 0.81100                                                        __main__.py:210
+           INFO     Metric: 'pointnet++_recall_at_k' before fine-tuning:  0.15467 after fine-tuning: 0.24175                                                           __main__.py:210
+           INFO     Metric: 'pointnet++_f1_score_at_k' before fine-tuning:  0.23209 after fine-tuning: 0.34774                                                         __main__.py:210
+           INFO     Metric: 'pointnet++_hit_at_k' before fine-tuning:  0.95667 after fine-tuning: 0.95333                                                              __main__.py:210
+           INFO     Metric: 'pointnet++_average_precision' before fine-tuning:  0.71027 after fine-tuning: 0.85515                                                     __main__.py:210
+           INFO     Metric: 'pointnet++_reciprocal_rank' before fine-tuning:  0.79103 after fine-tuning: 0.89103                                                       __main__.py:210
+           INFO     Metric: 'pointnet++_dcg_at_k' before fine-tuning:  4.71826 after fine-tuning: 6.41999                                                              __main__.py:210
+           INFO     Building the artifact ...                                                                                                                          __main__.py:215
+           INFO     Saving artifact locally ...                                                                                                                        __main__.py:237
+[15:46:55] INFO     Artifact saved in artifacts/                                                                                                                       __main__.py:239
+           DEBUG    Artifact size is 27.379 MB                                                                                                                         __main__.py:245
+           INFO     Finished 🚀                                                                                                                                        __main__.py:246
+
+```
+
+<!-- #endregion -->
+
+<!-- #region id="W4ZCKUOfq9oC" -->
+
+After the run has finished successfully, you can download the tuned model on your local machine:
+<!-- #endregion -->
+
+```python id="K5UdKleiqd8m"
+artifact = run.save_artifact('pointnet_model')
+```
+
+<!-- #region id="JU3uUVyirTE1" -->
+## Inference
+
+Now you saved the `artifact` into your host machine,
+let's use the fine-tuned model to encode a new `Document`:
+
+```{admonition} Inference with ONNX
+In case you set `to_onnx=True` when calling `finetuner.fit` function,
+please use `model = finetuner.get_model(artifact, is_onnx=True)`
+```
+<!-- #endregion -->
+
+```python id="rDGxi7kVq_sH"
+query = DocumentArray([query_data[0]])
+
+model = finetuner.get_model(artifact=artifact, device='cuda')
+
+finetuner.encode(model=model, data=query)
+finetuner.encode(model=model, data=index_data)
+
+assert query.embeddings.shape == (1, 512)
+```
+
+<!-- #region id="pfoc4YG4rrkI" -->
+And finally you can use the embeded `query` to find top-k visually related images within `index_data` as follows:
+<!-- #endregion -->
+
+```python id="_jGsSyedrsJp"
+query.match(index_data, limit=10, metric='cosine')
+```
+
+<!-- #region id="CgZHPInNWWHn" -->
+When investigating the matches, we can see that the model is able to identify similar meshes. However, this does not necessarily mean that all results are correct. For example, our first query (a mesh of a desk) returns results from those some are actual desk. Nevertheless, some results are tables, which looks similar to the desk, but obtain a different label:
+![picture of query mesh and its matches](https://user-images.githubusercontent.com/6599259/208120667-c6633178-154c-40ab-a88c-0955b18d304b.png)
+<!-- #endregion -->
+
+```python id="JsV87_rrW4dT"
+
+```
diff --git a/docs/notebooks/multilingual_text_to_image.ipynb b/docs/notebooks/multilingual_text_to_image.ipynb
@@ -5,7 +5,7 @@
    "id": "72867ba9-6a8c-4b14-acbf-487ea0a61836",
    "metadata": {},
    "source": [
-    "# Multilingual Text-to-Image search with MultilingualCLIP\n",
+    "# Multilingual Text-to-Image Search with MultilingualCLIP\n",
     "\n",
     "<a href=\"https://colab.research.google.com/drive/1N7iWZV0OunFZSLtsQxoazS808MPXhCwq?usp=sharing\"><img alt=\"Open In Colab\" src=\"https://colab.research.google.com/assets/colab-badge.svg\"></a>\n"
    ]

diff --git a/docs/notebooks/multilingual_text_to_image.md b/docs/notebooks/multilingual_text_to_image.md
@@ -12,7 +12,7 @@ jupyter:
     name: python3
 ---
 
-# Multilingual Text-to-Image search with MultilingualCLIP
+# Multilingual Text-to-Image Search with MultilingualCLIP
 
 <a href="https://colab.research.google.com/drive/1N7iWZV0OunFZSLtsQxoazS808MPXhCwq?usp=sharing"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg"></a>
 

diff --git a/docs/walkthrough/choose-backbone.md b/docs/walkthrough/choose-backbone.md
@@ -32,7 +32,13 @@ import finetuner
 
 finetuner.describe_models(task='text-to-image')
 ```
-````
+```
+````{tab} mesh-to-mesh
+```python
+
+finetuner.describe_models(task='mesh-to-mesh')`
+```
+```
 
 To get a list of supported models:
 
@@ -104,12 +110,24 @@ To get a list of supported models:
 └──────────────────────────────────────────────┴───────────────┴────────────┴──────────────┴───────────────────━━━━━━━━━━━━━━─────────────────────────┘
 ```
 ````
+````{tab} mesh-to-mesh
+```bash
+                                       Finetuner backbones: mesh-to-mesh                                       
+┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
+┃       name ┃         task ┃ output_dim ┃ architecture ┃                                         description ┃
+┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
+│ pointnet++ │ mesh-to-mesh │        512 │     pointnet │ PointNet++ embedding model for 3D mesh point clouds │
+└────────────┴──────────────┴────────────┴──────────────┴─────────────────────────────────────────────────────┘
+```
+````
 
 + ResNets are suitable for image-to-image search tasks with high performance requirements, where `resnet152` is bigger and requires higher computational resources than `resnet50`.
 + EfficientNets are suitable for image-to-image search tasks with low training and inference times. The model is more light-weighted than ResNet. Here, `efficientnet_b4` is the bigger and more complex model.
 + CLIP is the one for text-to-image search, where the images do not need to have any text descriptors.
 + BERT is generally suitable for text-to-text search tasks.
 + Msmarco-distilbert-base-v3 is designed for matching web search queries to short text passages and is a suitable backbone for similar text-to-text search tasks.
++ PointNet++ is an embedding model, which we derived from the popular [PointNet++ model](https://proceedings.neurips.cc/paper/2017/file/d8bf84be3800d12f74d8b05e9b89836f-Paper.pdf).
+  The original model is designed for classifying 3D meshes. Our derived model can be used to encode meshes into vectors for search.
 
 It should be noted that:
 

diff --git a/docs/walkthrough/create-training-data.md b/docs/walkthrough/create-training-data.md
@@ -25,7 +25,7 @@ I'm sorry to have…      apologize-english
 Please, forgive me!     apologize-english
 ```
 
-When using image-to-image retrieval models, images can be represented as a URI or a path to a file:
+When using image-to-image or mesh-to-mesh retrieval models, images and meshes can be represented as a URI or a path to a file:
 
 ```markdown
 /Users/images/apples/green_apple.jpg    picture of apple
@@ -49,8 +49,10 @@ run = finetuner.fit(
 
 ```{important} 
 If paths to local images are provided,
-they can be loaded into memory by setting `convert_to_blob = True` in the {class}`~finetuner.data.CSVOptions` object.
+they can be loaded into memory by setting `convert_to_blob = True` (default) in the {class}`~finetuner.data.CSVOptions` object.
 It is important to note that this setting does not cause Internet URLs to be loaded into memory.
+For 3D meshes the option `create_point_clouds` (`True` by default) creates point cloud tensors, which are used as input by the mesh encoding models.
+Please note, that local files can not be processed by the Finetuner if you deactivate `convert_to_blob` or `create_point_clouds`.
 ```
 
 ````
@@ -103,8 +105,11 @@ Please remove/replace comma in your data fields if you are using a comma `,` as
 
 ## Preparing a DocumentArray
 When providing training data in a DocumentArray, each element is represented as a {class}`~docarray.document.Document`. You should assign a label to each {class}`~docarray.document.Document` inside your {class}`~docarray.array.document.DocumentArray`.
-For most of the models, this is done by adding a `finetuner_label` tag to each document. {class}`~docarray.document.Document`s containing uris that point to local images can load these images into memory using the {meth}`docarray.document.Document.load_uri_to_blob` function of that {class}`~docarray.document.Document`.
+For most of the models, this is done by adding a `finetuner_label` tag to each document.
 Only for cross-modality (text-to-image) fine-tuning with CLIP, is this not necessary as explained at the bottom of this section.
+{class}`~docarray.document.Document`s containing uris that point to local images can load these images into memory using the {meth}`docarray.document.Document.load_uri_to_blob` function of that {class}`~docarray.document.Document`.
+Similarly, {class}`~docarray.document.Document`s with uris of local 3D meshes, can be converted into point clouds which are stored in the Document by calling {meth}`docarray.document.Document.load_uri_to_point_cloud_tensor`.
+The function requires a number of points, which we recommend to set to 2048.
 
 
 ````{tab} text-to-text search
@@ -141,6 +146,23 @@ train_da = DocumentArray([
 ])
 ```
 ````
+````{tab} mesh-to-mesh search
+```python
+from docarray import Document, DocumentArray
+
+train_da = DocumentArray([
+    Document(
+        uri='https://...desk-001.off',
+        tags={'finetuner_label': 'desk'},
+    ),
+    Document(
+        uri='https://...table-001.off',
+        tags={'finetuner_label': 'table'},
+    ),
+    ...,
+])
+```
+````
 ````{tab} text-to-image search on CLIP
 ```python
 from docarray import Document, DocumentArray