Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add colab column #583

Merged
merged 19 commits into from Oct 27, 2022
Merged

docs: add colab column #583

merged 19 commits into from Oct 27, 2022

Conversation

bwanglzu
Copy link
Member

@bwanglzu bwanglzu commented Oct 19, 2022

Removed all three examples and replaced with three google colabs (links above). Embed three google colabs into the documentation page in order to make sure we only maintain a single notebook per task. How to use?

  1. Update google colab.
  2. Export google colab as ipynb, download to docs/notebooks folder.
  3. Run make notebook in docs folder, will generate user-friendly markdown from notebook using jupytxt
  4. Run make dirhtml locally to see generated notebooks.

This allows us to potentially integration test all the colabs (if we can login) end-to-end periodically using nbsphinx.

review it here

in docs:

0D6C5ADF-B51D-4B1A-A1B2-44A0D5FA4935

in readme:

5689660F-D88F-449A-8C0B-6E43E796EBFD


  • This PR references an open issue
  • I have added a line about this change to CHANGELOG

@github-actions github-actions bot added size/xl and removed size/xs labels Oct 19, 2022
@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@bwanglzu bwanglzu marked this pull request as ready for review October 19, 2022 20:30
setup.py Outdated
@@ -27,7 +27,7 @@
zip_safe=False,
setup_requires=['setuptools>=18.0', 'wheel'],
install_requires=[
'docarray[common]>=0.13.31',
'docarray[common]>=0.17.1.dev40',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we using a dev req here?

@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@guenthermi
Copy link
Member

I like it pretty much to make the examples runnable, however, I was wondering if it wouldn't be a cleaner solution to implement this via binder like presented by @ZiniuYu https://docs.google.com/presentation/d/1Yxwl4z7wBnOo9QFQ0LrpvU7vx-_Z6mSdmqbdncpi5tI/edit#slide=id.ge5ae5cb924_2_23

"Finetuner will trigger the callback when fine-tuning job finished and merge the weights between the pre-trained model and the fine-tuned model:\n",
"\n",
"```diff\n",
"from finetuner.callbakcs import WiSEFTCallback\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import is wrong and has to be fixed it is
from finetuner.callback import WiSEFTCallback

@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@guenthermi
Copy link
Member

Maybe add small description of the notebook action in https://github.com/jina-ai/finetuner/blob/main/CONTRIBUTING.md

@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

Copy link
Member

@gmastrapas gmastrapas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial comments before going through the actual content

docs/Makefile Show resolved Hide resolved
docs/requirements.txt Outdated Show resolved Hide resolved
@@ -178,7 +178,7 @@ all you need to do is use the `WiSEFTCallback`.
Finetuner will trigger the callback when fine-tuning job finished and merge the weights between the pre-trained model and the fine-tuned model:

```diff
from finetuner.callbakcs import WiSEFTCallback
from finetuner.callback import WiSEFTCallback
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we keeping the tasks section?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have removed tasks, wired, seems not reelected in PR

@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.


<a href="https://colab.research.google.com/drive/1QuUTy3iVR-kTPljkwplKYaJ-NTCgPEc_?usp=sharing"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg"></a>

Searching visually similar images with image queries is a very popular use-case. However, using pre-trained models don’t deliver the best results – the models are trained on general data that lacks the particularities of your specific task. Here's where Finetuner comes! It enables you to accomplish this easily.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Searching visually similar images with image queries is a very popular use-case. However, using pre-trained models don’t deliver the best results – the models are trained on general data that lacks the particularities of your specific task. Here's where Finetuner comes! It enables you to accomplish this easily.
Searching visually similar images with image queries is a very popular use-case. However, using pre-trained models does not deliver the best results – the models are trained on general data that lack the particularities of your specific task. Here's where Finetuner comes in! It enables you to accomplish this easily.


This guide will demonstrate how to fine-tune a ResNet model for image to image retrieval.

*Note, please consider switch to GPU/TPU Runtime for faster inference.*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*Note, please consider switch to GPU/TPU Runtime for faster inference.*
*Note, please consider switching to GPU/TPU Runtime for faster inference.*


<!-- #region id="mUoY1jq0klwk" -->
## Backbone model
Now let's see which backbone models we can use. You can see available models either in by calling `finetuner.describe_models()`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Now let's see which backbone models we can use. You can see available models either in by calling `finetuner.describe_models()`.
Now let's see which backbone models we can use. You can see available models by calling `finetuner.describe_models()`.

* Furthermore, we had to provide names of the `train_data`.
* We set `TripletMarginLoss`.
* Additionally, we use {class}`~finetuner.callback.EvaluationCallback` for evaluation.
* Lastly, we set number of `epochs` and provide a `learning_rate`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Lastly, we set number of `epochs` and provide a `learning_rate`.
* Lastly, we set the number of `epochs` and provide a `learning_rate`.

<!-- #region id="7ftSOH_olcak" -->
## Monitoring

Now that we've created a run, let's see its status. You can monitor the run by checking the status - `run.status()`, the logs - `run.logs()` or `run.stream_logs()`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Now that we've created a run, let's see its status. You can monitor the run by checking the status - `run.status()`, the logs - `run.logs()` or `run.stream_logs()`.
Now that we've created a run, let's see its status. You can monitor the run by checking the status - `run.status()` - and the logs - `run.logs()` or `run.stream_logs()`.

<!-- #region id="SfR6g0E_8fOz" -->
## Data

We will use the [Quora Question Pairs](https://www.sbert.net/examples/training/quora_duplicate_questions/README.html?highlight=quora#dataset) dataset to show-case Finetuner for text to text search. We have already pre-processed this dataset and made it available for you to pull from hubble. Do this as follows:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We will use the [Quora Question Pairs](https://www.sbert.net/examples/training/quora_duplicate_questions/README.html?highlight=quora#dataset) dataset to show-case Finetuner for text to text search. We have already pre-processed this dataset and made it available for you to pull from hubble. Do this as follows:
We will use the [Quora Question Pairs](https://www.sbert.net/examples/training/quora_duplicate_questions/README.html?highlight=quora#dataset) dataset to show-case Finetuner for text to text search. We have already pre-processed this dataset and made it available for you to pull from Jina AI Cloud. Do this as follows:

<!-- #region id="r_IlEIp59g9v" -->
So we have 104598 training `Document`s. Each `Document` consists of a text field that contains the question, as well as a `finetuner_label` which indicates the label to which the question belongs. If multiple questions have the same label, they are duplicates of one another. If they have different `finetuner_label`s, they are not duplicates of each other.

As for the evaluation dataset, we load `query_data` and `index_data` separately. The `query_data` has the same structure as the `train_data`, consisting of labelled documents. The `index_data` is the data against which the queries will be matched, and contains many documents, some of which may be irrelevant to the queries (ie. they have no duplicated in the `query_data`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
As for the evaluation dataset, we load `query_data` and `index_data` separately. The `query_data` has the same structure as the `train_data`, consisting of labelled documents. The `index_data` is the data against which the queries will be matched, and contains many documents, some of which may be irrelevant to the queries (ie. they have no duplicated in the `query_data`).
As for the evaluation dataset, we load `query_data` and `index_data` separately. The `query_data` have the same structure as the `train_data`, consisting of labeled documents. The `index_data` are the data against which the queries will be matched, and contain many documents, some of which may be irrelevant to the queries (i.e. they have no duplicates in the `query_data`).

So we have 104598 training `Document`s. Each `Document` consists of a text field that contains the question, as well as a `finetuner_label` which indicates the label to which the question belongs. If multiple questions have the same label, they are duplicates of one another. If they have different `finetuner_label`s, they are not duplicates of each other.

As for the evaluation dataset, we load `query_data` and `index_data` separately. The `query_data` has the same structure as the `train_data`, consisting of labelled documents. The `index_data` is the data against which the queries will be matched, and contains many documents, some of which may be irrelevant to the queries (ie. they have no duplicated in the `query_data`).
If you look at the summaries for the `query_data` and `index_data`, you will find that they have the following number of instances:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If you look at the summaries for the `query_data` and `index_data`, you will find that they have the following number of instances:
If you look at the summaries for the `query_data` and `index_data`, you will find that they have the following number of samples:

<!-- #region id="h0DGNRo8-lZD" -->
## Monitoring

Now that we've created a run, let's see its status. You can monitor the run by checking the status - `run.status()`, the logs - `run.logs()` or `run.stream_logs()`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Now that we've created a run, let's see its status. You can monitor the run by checking the status - `run.status()`, the logs - `run.logs()` or `run.stream_logs()`.
Now that we've created a run, let's see its status. You can monitor the run by checking the status - `run.status()` - and the logs - `run.logs()` or `run.stream_logs()`.

```

<!-- #region id="7AuB0IWC_CSt" -->
Dependends on the size of the training data, some runs might take up to several hours, you can reconnect to your run very easily to monitor its status.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Dependends on the size of the training data, some runs might take up to several hours, you can reconnect to your run very easily to monitor its status.
Depending on the size of the training data, some runs might take up to several hours. You can later reconnect to your run easily to monitor its status.

CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link

📝 Docs are deployed on https://ft-docs-colab--jina-docs.netlify.app 🎉

@bwanglzu bwanglzu merged commit 9dcc4c3 into main Oct 27, 2022
@bwanglzu bwanglzu deleted the docs-colab branch October 27, 2022 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants