docs: add colab column #583

bwanglzu · 2022-10-19T08:29:25Z

Removed all three examples and replaced with three google colabs (links above). Embed three google colabs into the documentation page in order to make sure we only maintain a single notebook per task. How to use?

Update google colab.
Export google colab as ipynb, download to docs/notebooks folder.
Run make notebook in docs folder, will generate user-friendly markdown from notebook using jupytxt
Run make dirhtml locally to see generated notebooks.

This allows us to potentially integration test all the colabs (if we can login) end-to-end periodically using nbsphinx.

review it here

in docs:

in readme:

This PR references an open issue
I have added a line about this change to CHANGELOG

github-actions · 2022-10-19T20:03:55Z

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

github-actions · 2022-10-19T20:07:38Z

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

github-actions · 2022-10-19T20:29:00Z

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

gmastrapas · 2022-10-20T08:45:34Z

setup.py

@@ -27,7 +27,7 @@
        zip_safe=False,
        setup_requires=['setuptools>=18.0', 'wheel'],
        install_requires=[
-            'docarray[common]>=0.13.31',
+            'docarray[common]>=0.17.1.dev40',


why are we using a dev req here?

github-actions · 2022-10-20T09:38:59Z

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

guenthermi · 2022-10-20T13:32:46Z

I like it pretty much to make the examples runnable, however, I was wondering if it wouldn't be a cleaner solution to implement this via binder like presented by @ZiniuYu https://docs.google.com/presentation/d/1Yxwl4z7wBnOo9QFQ0LrpvU7vx-_Z6mSdmqbdncpi5tI/edit#slide=id.ge5ae5cb924_2_23

guenthermi · 2022-10-21T12:49:27Z

docs/notebooks/text_to_image.ipynb

+        "Finetuner will trigger the callback when fine-tuning job finished and merge the weights between the pre-trained model and the fine-tuned model:\n",
+        "\n",
+        "```diff\n",
+        "from finetuner.callbakcs import WiSEFTCallback\n",


This import is wrong and has to be fixed it is
from finetuner.callback import WiSEFTCallback

github-actions · 2022-10-27T09:30:07Z

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

github-actions · 2022-10-27T09:35:24Z

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

guenthermi · 2022-10-27T10:07:05Z

Maybe add small description of the notebook action in https://github.com/jina-ai/finetuner/blob/main/CONTRIBUTING.md

github-actions · 2022-10-27T11:27:45Z

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

gmastrapas

Initial comments before going through the actual content

docs/Makefile

docs/requirements.txt

gmastrapas · 2022-10-27T11:34:49Z

docs/tasks/text-to-image.md

@@ -178,7 +178,7 @@ all you need to do is use the `WiSEFTCallback`.
 Finetuner will trigger the callback when fine-tuning job finished and merge the weights between the pre-trained model and the fine-tuned model:

 ```diff
-from finetuner.callbakcs import WiSEFTCallback
+from finetuner.callback import WiSEFTCallback


why are we keeping the tasks section?

i have removed tasks, wired, seems not reelected in PR

github-actions · 2022-10-27T11:42:07Z

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

github-actions · 2022-10-27T11:51:37Z

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

gmastrapas · 2022-10-27T11:37:47Z

docs/notebooks/image_to_image.md

+
+<a href="https://colab.research.google.com/drive/1QuUTy3iVR-kTPljkwplKYaJ-NTCgPEc_?usp=sharing"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg"></a>
+
+Searching visually similar images with image queries is a very popular use-case. However, using pre-trained models don’t deliver the best results – the models are trained on general data that lacks the particularities of your specific task. Here's where Finetuner comes! It enables you to accomplish this easily.


Suggested change

Searching visually similar images with image queries is a very popular use-case. However, using pre-trained models don’t deliver the best results – the models are trained on general data that lacks the particularities of your specific task. Here's where Finetuner comes! It enables you to accomplish this easily.

Searching visually similar images with image queries is a very popular use-case. However, using pre-trained models does not deliver the best results – the models are trained on general data that lack the particularities of your specific task. Here's where Finetuner comes in! It enables you to accomplish this easily.

gmastrapas · 2022-10-27T11:38:13Z

docs/notebooks/image_to_image.md

+
+This guide will demonstrate how to fine-tune a ResNet model for image to image retrieval.
+
+*Note, please consider switch to GPU/TPU Runtime for faster inference.*


Suggested change

*Note, please consider switch to GPU/TPU Runtime for faster inference.*

*Note, please consider switching to GPU/TPU Runtime for faster inference.*

gmastrapas · 2022-10-27T11:40:15Z

docs/notebooks/image_to_image.md

+
+<!-- #region id="mUoY1jq0klwk" -->
+## Backbone model
+Now let's see which backbone models we can use. You can see available models either in by calling `finetuner.describe_models()`.


Suggested change

Now let's see which backbone models we can use. You can see available models either in by calling `finetuner.describe_models()`.

Now let's see which backbone models we can use. You can see available models by calling `finetuner.describe_models()`.

gmastrapas · 2022-10-27T11:41:06Z

docs/notebooks/image_to_image.md

+* Furthermore, we had to provide names of the `train_data`.
+* We set `TripletMarginLoss`.
+* Additionally, we use {class}`~finetuner.callback.EvaluationCallback` for evaluation.
+* Lastly, we set number of `epochs` and provide a `learning_rate`.


Suggested change

* Lastly, we set number of `epochs` and provide a `learning_rate`.

* Lastly, we set the number of `epochs` and provide a `learning_rate`.

gmastrapas · 2022-10-27T11:41:32Z

docs/notebooks/image_to_image.md

+<!-- #region id="7ftSOH_olcak" -->
+## Monitoring
+
+Now that we've created a run, let's see its status. You can monitor the run by checking the status - `run.status()`, the logs - `run.logs()` or `run.stream_logs()`. 


Suggested change

Now that we've created a run, let's see its status. You can monitor the run by checking the status - `run.status()`, the logs - `run.logs()` or `run.stream_logs()`.

Now that we've created a run, let's see its status. You can monitor the run by checking the status - `run.status()` - and the logs - `run.logs()` or `run.stream_logs()`.

gmastrapas · 2022-10-27T11:49:55Z

docs/notebooks/text_to_text.md

+<!-- #region id="SfR6g0E_8fOz" -->
+## Data
+
+We will use the [Quora Question Pairs](https://www.sbert.net/examples/training/quora_duplicate_questions/README.html?highlight=quora#dataset) dataset to show-case Finetuner for text to text search. We have already pre-processed this dataset and made it available for you to pull from hubble. Do this as follows:


Suggested change

We will use the [Quora Question Pairs](https://www.sbert.net/examples/training/quora_duplicate_questions/README.html?highlight=quora#dataset) dataset to show-case Finetuner for text to text search. We have already pre-processed this dataset and made it available for you to pull from hubble. Do this as follows:

We will use the [Quora Question Pairs](https://www.sbert.net/examples/training/quora_duplicate_questions/README.html?highlight=quora#dataset) dataset to show-case Finetuner for text to text search. We have already pre-processed this dataset and made it available for you to pull from Jina AI Cloud. Do this as follows:

gmastrapas · 2022-10-27T11:50:56Z

docs/notebooks/text_to_text.md

+<!-- #region id="r_IlEIp59g9v" -->
+So we have 104598 training `Document`s. Each `Document` consists of a text field that contains the question, as well as a `finetuner_label` which indicates the label to which the question belongs. If multiple questions have the same label, they are duplicates of one another. If they have different `finetuner_label`s, they are not duplicates of each other.
+
+As for the evaluation dataset, we load `query_data` and `index_data` separately. The `query_data` has the same structure as the `train_data`, consisting of labelled documents. The `index_data` is the data against which the queries will be matched, and contains many documents, some of which may be irrelevant to the queries (ie. they have no duplicated in the `query_data`).


Suggested change

As for the evaluation dataset, we load `query_data` and `index_data` separately. The `query_data` has the same structure as the `train_data`, consisting of labelled documents. The `index_data` is the data against which the queries will be matched, and contains many documents, some of which may be irrelevant to the queries (ie. they have no duplicated in the `query_data`).

As for the evaluation dataset, we load `query_data` and `index_data` separately. The `query_data` have the same structure as the `train_data`, consisting of labeled documents. The `index_data` are the data against which the queries will be matched, and contain many documents, some of which may be irrelevant to the queries (i.e. they have no duplicates in the `query_data`).

gmastrapas · 2022-10-27T11:51:12Z

docs/notebooks/text_to_text.md

+So we have 104598 training `Document`s. Each `Document` consists of a text field that contains the question, as well as a `finetuner_label` which indicates the label to which the question belongs. If multiple questions have the same label, they are duplicates of one another. If they have different `finetuner_label`s, they are not duplicates of each other.
+
+As for the evaluation dataset, we load `query_data` and `index_data` separately. The `query_data` has the same structure as the `train_data`, consisting of labelled documents. The `index_data` is the data against which the queries will be matched, and contains many documents, some of which may be irrelevant to the queries (ie. they have no duplicated in the `query_data`).
+If you look at the summaries for the `query_data` and `index_data`, you will find that they have the following number of instances:


Suggested change

If you look at the summaries for the `query_data` and `index_data`, you will find that they have the following number of instances:

If you look at the summaries for the `query_data` and `index_data`, you will find that they have the following number of samples:

gmastrapas · 2022-10-27T11:53:00Z

docs/notebooks/text_to_text.md

+<!-- #region id="h0DGNRo8-lZD" -->
+## Monitoring
+
+Now that we've created a run, let's see its status. You can monitor the run by checking the status - `run.status()`, the logs - `run.logs()` or `run.stream_logs()`. 


Suggested change

Now that we've created a run, let's see its status. You can monitor the run by checking the status - `run.status()`, the logs - `run.logs()` or `run.stream_logs()`.

Now that we've created a run, let's see its status. You can monitor the run by checking the status - `run.status()` - and the logs - `run.logs()` or `run.stream_logs()`.

gmastrapas · 2022-10-27T11:53:43Z

docs/notebooks/text_to_text.md

+```
+
+<!-- #region id="7AuB0IWC_CSt" -->
+Dependends on the size of the training data, some runs might take up to several hours, you can reconnect to your run very easily to monitor its status.


Suggested change

Dependends on the size of the training data, some runs might take up to several hours, you can reconnect to your run very easily to monitor its status.

Depending on the size of the training data, some runs might take up to several hours. You can later reconnect to your run easily to monitor its status.

CONTRIBUTING.md

README.md

github-actions · 2022-10-27T12:26:06Z

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

github-actions · 2022-10-27T12:29:44Z

📝 Docs are deployed on https://ft-docs-colab--jina-docs.netlify.app 🎉

docs: add colab column

ea40229

github-actions bot added the size/xs label Oct 19, 2022

bwanglzu added 4 commits October 19, 2022 10:33

docs: add colab column

a8406ad

docs: add colab column

5cedced

docs: add bert colab

87d7194

refactor: use latest docarray with progress

f788b97

github-actions bot added area/core area/entrypoint area/setup labels Oct 19, 2022

bwanglzu added 2 commits October 19, 2022 22:02

docs: add three notebooks

24b6e01

docs: add three notebooks

c289d87

github-actions bot added size/xl and removed size/xs labels Oct 19, 2022

github-actions bot added the area/docs label Oct 19, 2022

ci: add notebook to ci

61bcdbf

github-actions bot added area/cicd area/housekeeping labels Oct 19, 2022

bwanglzu self-assigned this Oct 19, 2022

bwanglzu added 2 commits October 19, 2022 22:27

chore: add changelog

6fcface

Merge branch 'main' into docs-colab

f9e4471

bwanglzu marked this pull request as ready for review October 19, 2022 20:30

docs: rename notebook for easier download

f5b39b2

bwanglzu requested review from gmastrapas, guenthermi, scott-martens and LMMilliken October 20, 2022 08:38

gmastrapas reviewed Oct 20, 2022

View reviewed changes

refactor: roll back docarray version upgrade

e869ced

github-actions bot removed area/entrypoint area/core area/setup labels Oct 20, 2022

guenthermi reviewed Oct 21, 2022

View reviewed changes

bwanglzu added 2 commits October 27, 2022 10:26

Merge remote-tracking branch 'origin' into docs-colab

dd8f96e

docs: set finetuner version to 064

8dbeed8

docs: fix wiseft import in docs

91c87bb

LMMilliken approved these changes Oct 27, 2022

View reviewed changes

chore: bump stubs and commons

fe4fb7b

github-actions bot added the area/setup label Oct 27, 2022

gmastrapas requested changes Oct 27, 2022

View reviewed changes

docs: rename notebook and sort requirements

25e15e3

chore: add notebook contribution guide

c26f1e4

gmastrapas requested changes Oct 27, 2022

View reviewed changes

CONTRIBUTING.md Outdated Show resolved Hide resolved

CONTRIBUTING.md Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

docs: fix typos and use make notebook

05965be

bwanglzu merged commit 9dcc4c3 into main Oct 27, 2022

bwanglzu deleted the docs-colab branch October 27, 2022 12:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add colab column #583

docs: add colab column #583

bwanglzu commented Oct 19, 2022 •

edited

github-actions bot commented Oct 19, 2022

github-actions bot commented Oct 19, 2022

github-actions bot commented Oct 19, 2022

gmastrapas Oct 20, 2022

github-actions bot commented Oct 20, 2022

guenthermi commented Oct 20, 2022

guenthermi Oct 21, 2022

github-actions bot commented Oct 27, 2022

github-actions bot commented Oct 27, 2022

guenthermi commented Oct 27, 2022

github-actions bot commented Oct 27, 2022

gmastrapas left a comment

gmastrapas Oct 27, 2022

bwanglzu Oct 27, 2022

github-actions bot commented Oct 27, 2022

github-actions bot commented Oct 27, 2022

gmastrapas Oct 27, 2022

gmastrapas Oct 27, 2022

gmastrapas Oct 27, 2022

gmastrapas Oct 27, 2022

gmastrapas Oct 27, 2022

gmastrapas Oct 27, 2022

gmastrapas Oct 27, 2022

gmastrapas Oct 27, 2022

gmastrapas Oct 27, 2022

gmastrapas Oct 27, 2022

github-actions bot commented Oct 27, 2022

github-actions bot commented Oct 27, 2022


		<a href="https://colab.research.google.com/drive/1QuUTy3iVR-kTPljkwplKYaJ-NTCgPEc_?usp=sharing"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg"></a>

		Searching visually similar images with image queries is a very popular use-case. However, using pre-trained models don’t deliver the best results – the models are trained on general data that lacks the particularities of your specific task. Here's where Finetuner comes! It enables you to accomplish this easily.


		This guide will demonstrate how to fine-tune a ResNet model for image to image retrieval.

		Note, please consider switch to GPU/TPU Runtime for faster inference.

	Note, please consider switch to GPU/TPU Runtime for faster inference.
	Note, please consider switching to GPU/TPU Runtime for faster inference.

	Now let's see which backbone models we can use. You can see available models either in by calling `finetuner.describe_models()`.
	Now let's see which backbone models we can use. You can see available models by calling `finetuner.describe_models()`.

	* Lastly, we set number of `epochs` and provide a `learning_rate`.
	* Lastly, we set the number of `epochs` and provide a `learning_rate`.

	Now that we've created a run, let's see its status. You can monitor the run by checking the status - `run.status()`, the logs - `run.logs()` or `run.stream_logs()`.
	Now that we've created a run, let's see its status. You can monitor the run by checking the status - `run.status()` - and the logs - `run.logs()` or `run.stream_logs()`.

	We will use the [Quora Question Pairs](https://www.sbert.net/examples/training/quora_duplicate_questions/README.html?highlight=quora#dataset) dataset to show-case Finetuner for text to text search. We have already pre-processed this dataset and made it available for you to pull from hubble. Do this as follows:
	We will use the [Quora Question Pairs](https://www.sbert.net/examples/training/quora_duplicate_questions/README.html?highlight=quora#dataset) dataset to show-case Finetuner for text to text search. We have already pre-processed this dataset and made it available for you to pull from Jina AI Cloud. Do this as follows:

	As for the evaluation dataset, we load `query_data` and `index_data` separately. The `query_data` has the same structure as the `train_data`, consisting of labelled documents. The `index_data` is the data against which the queries will be matched, and contains many documents, some of which may be irrelevant to the queries (ie. they have no duplicated in the `query_data`).
	As for the evaluation dataset, we load `query_data` and `index_data` separately. The `query_data` have the same structure as the `train_data`, consisting of labeled documents. The `index_data` are the data against which the queries will be matched, and contain many documents, some of which may be irrelevant to the queries (i.e. they have no duplicates in the `query_data`).

	If you look at the summaries for the `query_data` and `index_data`, you will find that they have the following number of instances:
	If you look at the summaries for the `query_data` and `index_data`, you will find that they have the following number of samples:

	Dependends on the size of the training data, some runs might take up to several hours, you can reconnect to your run very easily to monitor its status.
	Depending on the size of the training data, some runs might take up to several hours. You can later reconnect to your run easily to monitor its status.

docs: add colab column #583

docs: add colab column #583

Conversation

bwanglzu commented Oct 19, 2022 • edited

github-actions bot commented Oct 19, 2022

github-actions bot commented Oct 19, 2022

github-actions bot commented Oct 19, 2022

Choose a reason for hiding this comment

github-actions bot commented Oct 20, 2022

guenthermi commented Oct 20, 2022

Choose a reason for hiding this comment

github-actions bot commented Oct 27, 2022

github-actions bot commented Oct 27, 2022

guenthermi commented Oct 27, 2022

github-actions bot commented Oct 27, 2022

gmastrapas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Oct 27, 2022

github-actions bot commented Oct 27, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Oct 27, 2022

github-actions bot commented Oct 27, 2022

bwanglzu commented Oct 19, 2022 •

edited