Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add csv parsing for meshes and tutorial #638

Merged
merged 12 commits into from
Dec 21, 2022
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- Add `core-ci` workflow to remotely run the ci of finetuner-core. ([#628](https://github.com/jina-ai/finetuner/pull/628))

- Ass support for 3d meshes to `build_finetuning_dataset`. ([#638](https://github.com/jina-ai/finetuner/pull/638))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

### Removed

- Remove `cpu` parameter from `create_run` function. ([#631](https://github.com/jina-ai/finetuner/pull/631))
Expand Down Expand Up @@ -42,6 +44,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- Add multilingual clip colab to readme. ([#620](https://github.com/jina-ai/finetuner/pull/620))

- Add tutorial for mesh-to-mesh search. ([#638](https://github.com/jina-ai/finetuner/pull/638))

- Add documentation for PointNet++ model and handling 3D mesh dataset. ([#638](https://github.com/jina-ai/finetuner/pull/638))


## [0.6.7] - 2022-11-25

Expand Down
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ notebooks/text_to_text
notebooks/image_to_image
notebooks/text_to_image
notebooks/multilingual_text_to_image
notebooks/mesh_to_mesh
```

```{toctree}
Expand Down
381 changes: 381 additions & 0 deletions docs/notebooks/mesh_to_mesh.ipynb

Large diffs are not rendered by default.

230 changes: 230 additions & 0 deletions docs/notebooks/mesh_to_mesh.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,230 @@
---
jupyter:
jupytext:
text_representation:
extension: .md
format_name: markdown
format_version: '1.3'
jupytext_version: 1.14.1
kernelspec:
display_name: Python 3
name: python3
---

<!-- #region id="C0RxIJmLkTGk" -->
# 3D Mesh-to-3D Mesh Search via PointNet++

<a href="https://colab.research.google.com/drive/1lIMDFkUVsWMshU-akJ_hwzBfJ37zLFzU?usp=sharing"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg"></a>

Finding similar 3D Meshes can become very time consuming. To support this task, one can build search systems. To directly search on the 3D meshes without relying on metadata one can use encoder model which extract create a point cloud from the mesh and encode it into vector dense representations which can be compared to each other. To enable those models to detect the right attributes of an 3D Mesh, this tutorial show you how to use Finetuner to train and use a model for 3D mesh search system.
<!-- #endregion -->

<!-- #region id="mk4gxLZnYJry" -->
## Install
<!-- #endregion -->

```python id="vDVkw65kkQcn"
!pip install 'finetuner[full]'
!pip install 'docarray[full]'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove this from the tutorial

```

<!-- #region id="q7Bb9o5ZHSZ3" -->
## Task

Finetuner supports an embedding model which is based on the Pytorch [implemention](https://github.com/yanx27/Pointnet_Pointnet2_pytorch) of the [PointNet++ model](https://proceedings.neurips.cc/paper/2017/file/d8bf84be3800d12f74d8b05e9b89836f-Paper.pdf). This tutorial will show you how to train and use this model for 3D mesh search.

We demonstrate this on the [Modelnet40](https://modelnet.cs.princeton.edu/) dataset which consist of more than 12k 3D meshes of objects from 40 classes.
Specifically, we want to build a search system, which can receive a 3D mehs and retrieves meshes of the same class.

We will buid a dataset with some images for

<!-- #endregion -->

<!-- #region id="H1Yo3NuGP1Oi" -->
## Data

ModelNet40 consists of 9843 meshes provided for training and 2468 meshes for testing. Usually, you would have to download the [dataset](https://modelnet.cs.princeton.edu/) unzip it, [prepare it, and upload it to the Jina AI Cloud](https://https://finetuner.jina.ai/walkthrough/create-training-data/). After that, you can provide the name of the dataset used for the upload to Finetuner.

For this tutorial, we already prepared the data and uploaded it. Specifically the training data is uploaded as `modelnet40-train`. For evaluating the model, we split the test set of the original dataset in 300 meshes, which serve as queries (`modelnet40-queries`) and 2168 meshes which serve as the mesh collection, which is searched in (`modelnet40-index`).

Each 3D mesh in the dataset is represented by a [DocArray](https://github.com/docarray/docarray) Document object. It contains the uri (local filepath) of the original file and a tensor which contains a point cloud with 2048 3D points sampled from the mesh as explained in (TODO add link to documentation)

```{admonition} Push data to the cloud
We don't require you to push data to the Jina AI Cloud by yourself. Instead of a name, you can provide a `DocumentArray` or a path to a CSV file.
In those cases Finetuner will do the job for you.
When you construct a DocArray dataset with documents of 3D meshes, please call `doc.load_uri_to_point_cloud_tensor(2048)` to create point clouds from your local mesh files before pushing the data to the cloud since Finetuner has no access to your local files.
```

The code below loads the data and prints a summary of the training datasets:
<!-- #endregion -->

```python id="uTDreSwfYGOR"
import finetuner
from docarray import DocumentArray, Document

finetuner.login(force=True)
```

```python id="Y-Um5gE8IORv"
train_data = DocumentArray.pull('modelnet40-train', show_progress=True)
query_data = DocumentArray.pull('modelnet40-queries', show_progress=True)
index_data = DocumentArray.pull('modelnet40-index', show_progress=True)

train_data.summary()
```

<!-- #region id="r4cP95RzLybw" -->
Now we want to take a look at the point clouds of some of the meshes:
<!-- #endregion -->

```python id="kCv455NPMD1O"
index_data[0].display()
```

<!-- #region id="XlttkaD5Omhk" -->
![A point cloud example](https://user-images.githubusercontent.com/6599259/208113813-bcf498d9-edf7-4496-a087-03bb783f3b70.png)
<!-- #endregion -->

<!-- #region id="B3I_QUeFT_V0" -->
## Backbone model

The model, we provide for 3d mesh encoding is called `pointnet++`. In the following, we show you how to train it on the modelnet training dataset.
<!-- #endregion -->

<!-- #region id="lqg0eY9oknLL" -->
## Fine-tuning

Now that we have data for training and evaluation as well as the name of the model, which we want to train, we can configure and submit a fine-tuning run:
<!-- #endregion -->

```python id="rR22MbgITp8M"
from finetuner.callback import EvaluationCallback

run = finetuner.fit(
model='pointnet++',
train_data='modelnet40-train',
epochs=10,
batch_size=64,
learning_rate= 5e-4,
loss='TripletMarginLoss',
device='cuda',
callbacks=[
EvaluationCallback(
query_data='modelnet40-queries',
index_data='modelnet40-index',
batch_size=64,
)
],
)
```

<!-- #region id="ossT9LH1oh6K" -->
Let's understand what this piece of code does:

* We start with providing `model`, in our case "pointnet++".
* Via the `train_data` parameter, we inform the Finetuner about the name of the dataset in the Jina AI Cloud
* We also provide some hyper-parameters such as number of `epochs`, `batch_size`, and a `learning_rate`.
* We use `TripletMarginLoss` to optimize the PointNet++ model.
* We use an evaluation callback, which uses the fine-tuned model for encoding the text queries and meshes in the index data collection. It also accepts the `batch_size` attribute. By encoding 64 meshes at once, the evaluation gets faster.

<!-- #endregion -->

<!-- #region id="AsHsMJP6p7Co" -->
## Monitoring

Now that we've created a run, let's see how it's processing. You can monitor the run by checking the status via `run.status()` and view the logs with `run.logs()`. To stream logs, call `run.stream_logs()`:
<!-- #endregion -->

```python id="PCCRZ6PalsK3"
# note, the fine-tuning might takes 20~ minutes
for entry in run.stream_logs():
print(entry)
```

<!-- #region id="zG7Uci-qqkzM" -->
Since some runs might take up to several hours/days, it's important to know how to reconnect to Finetuner and retrieve your run.

```python
import finetuner

finetuner.login()
run = finetuner.get_run(run.name)
```

You can continue monitoring the run by checking the status - `finetuner.run.Run.status()` or the logs - `finetuner.run.Run.logs()`.*kursiver Text*
<!-- #endregion -->

<!-- #region id="WgTrq9D5q0zc" -->
## Evaluating

Our `EvaluationCallback` during fine-tuning ensures that after each epoch, an evaluation of our model is run. We can access the results of the last evaluation in the logs as follows `print(run.logs())`:

```bash
Training [10/10] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 154/154 0:00:00 0:00:26 • loss: 0.001
INFO Done ✨ __main__.py:195
DEBUG Finetuning took 0 days, 0 hours 5 minutes and 39 seconds __main__.py:197
INFO Metric: 'pointnet++_precision_at_k' before fine-tuning: 0.56533 after fine-tuning: 0.81100 __main__.py:210
INFO Metric: 'pointnet++_recall_at_k' before fine-tuning: 0.15467 after fine-tuning: 0.24175 __main__.py:210
INFO Metric: 'pointnet++_f1_score_at_k' before fine-tuning: 0.23209 after fine-tuning: 0.34774 __main__.py:210
INFO Metric: 'pointnet++_hit_at_k' before fine-tuning: 0.95667 after fine-tuning: 0.95333 __main__.py:210
INFO Metric: 'pointnet++_average_precision' before fine-tuning: 0.71027 after fine-tuning: 0.85515 __main__.py:210
INFO Metric: 'pointnet++_reciprocal_rank' before fine-tuning: 0.79103 after fine-tuning: 0.89103 __main__.py:210
INFO Metric: 'pointnet++_dcg_at_k' before fine-tuning: 4.71826 after fine-tuning: 6.41999 __main__.py:210
INFO Building the artifact ... __main__.py:215
INFO Saving artifact locally ... __main__.py:237
[15:46:55] INFO Artifact saved in artifacts/ __main__.py:239
DEBUG Artifact size is 27.379 MB __main__.py:245
INFO Finished 🚀 __main__.py:246

```

<!-- #endregion -->

<!-- #region id="W4ZCKUOfq9oC" -->

After the run has finished successfully, you can download the tuned model on your local machine:
<!-- #endregion -->

```python id="K5UdKleiqd8m"
artifact = run.save_artifact('pointnet_model')
```

<!-- #region id="JU3uUVyirTE1" -->
## Inference

Now you saved the `artifact` into your host machine,
let's use the fine-tuned model to encode a new `Document`:

```{admonition} Inference with ONNX
In case you set `to_onnx=True` when calling `finetuner.fit` function,
please use `model = finetuner.get_model(artifact, is_onnx=True)`
guenthermi marked this conversation as resolved.
Show resolved Hide resolved
```
<!-- #endregion -->

```python id="rDGxi7kVq_sH"
query = DocumentArray([query_data[0]])

model = finetuner.get_model(artifact=artifact, device='cuda')

finetuner.encode(model=model, data=query)
finetuner.encode(model=model, data=index_data)

assert query.embeddings.shape == (1, 512)
```

<!-- #region id="pfoc4YG4rrkI" -->
And finally you can use the embeded `query` to find top-k visually related images within `index_data` as follows:
<!-- #endregion -->

```python id="_jGsSyedrsJp"
query.match(index_data, limit=10, metric='cosine')
```

guenthermi marked this conversation as resolved.
Show resolved Hide resolved
<!-- #region id="CgZHPInNWWHn" -->
When investigating the matches, we can see that the model is able to identify similar meshes. However, this does not necessarily mean that all results are correct. For example, our first query (a mesh of a desk) returns results from those some are actual desk. Nevertheless, some results are tables, which looks similar to the desk, but obtain a different label:
![picture of query mesh and its matches](https://user-images.githubusercontent.com/6599259/208120667-c6633178-154c-40ab-a88c-0955b18d304b.png)
<!-- #endregion -->

```python id="JsV87_rrW4dT"

```
2 changes: 1 addition & 1 deletion docs/notebooks/multilingual_text_to_image.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"id": "72867ba9-6a8c-4b14-acbf-487ea0a61836",
"metadata": {},
"source": [
"# Multilingual Text-to-Image search with MultilingualCLIP\n",
"# Multilingual Text-to-Image Search with MultilingualCLIP\n",
"\n",
"<a href=\"https://colab.research.google.com/drive/1N7iWZV0OunFZSLtsQxoazS808MPXhCwq?usp=sharing\"><img alt=\"Open In Colab\" src=\"https://colab.research.google.com/assets/colab-badge.svg\"></a>\n"
]
Expand Down
2 changes: 1 addition & 1 deletion docs/notebooks/multilingual_text_to_image.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ jupyter:
name: python3
---

# Multilingual Text-to-Image search with MultilingualCLIP
# Multilingual Text-to-Image Search with MultilingualCLIP

<a href="https://colab.research.google.com/drive/1N7iWZV0OunFZSLtsQxoazS808MPXhCwq?usp=sharing"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg"></a>

Expand Down
20 changes: 19 additions & 1 deletion docs/walkthrough/choose-backbone.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,13 @@ import finetuner

finetuner.describe_models(task='text-to-image')
```
````
```
````{tab} mesh-to-mesh
```python

finetuner.describe_models(task='mesh-to-mesh')`
```
```

To get a list of supported models:

Expand Down Expand Up @@ -104,12 +110,24 @@ To get a list of supported models:
└──────────────────────────────────────────────┴───────────────┴────────────┴──────────────┴───────────────────━━━━━━━━━━━━━━─────────────────────────┘
```
````
````{tab} mesh-to-mesh
```bash
Finetuner backbones: mesh-to-mesh
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ name ┃ task ┃ output_dim ┃ architecture ┃ description ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ pointnet++ │ mesh-to-mesh │ 512 │ pointnet │ PointNet++ embedding model for 3D mesh point clouds │
└────────────┴──────────────┴────────────┴──────────────┴─────────────────────────────────────────────────────┘
```
````

+ ResNets are suitable for image-to-image search tasks with high performance requirements, where `resnet152` is bigger and requires higher computational resources than `resnet50`.
+ EfficientNets are suitable for image-to-image search tasks with low training and inference times. The model is more light-weighted than ResNet. Here, `efficientnet_b4` is the bigger and more complex model.
+ CLIP is the one for text-to-image search, where the images do not need to have any text descriptors.
+ BERT is generally suitable for text-to-text search tasks.
+ Msmarco-distilbert-base-v3 is designed for matching web search queries to short text passages and is a suitable backbone for similar text-to-text search tasks.
+ PointNet++ is an embedding model, which we derived from the popular [PointNet++ model](https://proceedings.neurips.cc/paper/2017/file/d8bf84be3800d12f74d8b05e9b89836f-Paper.pdf).
The original model is designed for classifying 3D meshes. Our derived model can be used to encode meshes into vectors for search.

It should be noted that:

Expand Down
28 changes: 25 additions & 3 deletions docs/walkthrough/create-training-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ I'm sorry to have… apologize-english
Please, forgive me! apologize-english
```

When using image-to-image retrieval models, images can be represented as a URI or a path to a file:
When using image-to-image or mesh-to-mesh retrieval models, images and meshes can be represented as a URI or a path to a file:

```markdown
/Users/images/apples/green_apple.jpg picture of apple
Expand All @@ -49,8 +49,10 @@ run = finetuner.fit(

```{important}
If paths to local images are provided,
they can be loaded into memory by setting `convert_to_blob = True` in the {class}`~finetuner.data.CSVOptions` object.
they can be loaded into memory by setting `convert_to_blob = True` (default) in the {class}`~finetuner.data.CSVOptions` object.
It is important to note that this setting does not cause Internet URLs to be loaded into memory.
For 3D meshes the option `create_point_clouds` (`True` by default) creates point cloud tensors, which are used as input by the mesh encoding models.
Please note, that local files can not be processed by the Finetuner if you deactivate `convert_to_blob` or `create_point_clouds`.
```

````
Expand Down Expand Up @@ -103,8 +105,11 @@ Please remove/replace comma in your data fields if you are using a comma `,` as

## Preparing a DocumentArray
When providing training data in a DocumentArray, each element is represented as a {class}`~docarray.document.Document`. You should assign a label to each {class}`~docarray.document.Document` inside your {class}`~docarray.array.document.DocumentArray`.
For most of the models, this is done by adding a `finetuner_label` tag to each document. {class}`~docarray.document.Document`s containing uris that point to local images can load these images into memory using the {meth}`docarray.document.Document.load_uri_to_blob` function of that {class}`~docarray.document.Document`.
For most of the models, this is done by adding a `finetuner_label` tag to each document.
Only for cross-modality (text-to-image) fine-tuning with CLIP, is this not necessary as explained at the bottom of this section.
{class}`~docarray.document.Document`s containing uris that point to local images can load these images into memory using the {meth}`docarray.document.Document.load_uri_to_blob` function of that {class}`~docarray.document.Document`.
Similarly, {class}`~docarray.document.Document`s with uris of local 3D meshes, can be converted into point clouds which are stored in the Document by calling {meth}`docarray.document.Document.load_uri_to_point_cloud_tensor`.
The function requires a number of points, which we recommend to set to 2048.


````{tab} text-to-text search
Expand Down Expand Up @@ -141,6 +146,23 @@ train_da = DocumentArray([
])
```
````
````{tab} mesh-to-mesh search
```python
from docarray import Document, DocumentArray

train_da = DocumentArray([
Document(
uri='https://...desk-001.off',
tags={'finetuner_label': 'desk'},
),
Document(
uri='https://...table-001.off',
tags={'finetuner_label': 'table'},
),
...,
])
```
````
````{tab} text-to-image search on CLIP
```python
from docarray import Document, DocumentArray
Expand Down
Loading