New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add colab column #583
Conversation
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size. |
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size. |
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size. |
setup.py
Outdated
@@ -27,7 +27,7 @@ | |||
zip_safe=False, | |||
setup_requires=['setuptools>=18.0', 'wheel'], | |||
install_requires=[ | |||
'docarray[common]>=0.13.31', | |||
'docarray[common]>=0.17.1.dev40', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we using a dev req here?
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size. |
I like it pretty much to make the examples runnable, however, I was wondering if it wouldn't be a cleaner solution to implement this via binder like presented by @ZiniuYu https://docs.google.com/presentation/d/1Yxwl4z7wBnOo9QFQ0LrpvU7vx-_Z6mSdmqbdncpi5tI/edit#slide=id.ge5ae5cb924_2_23 |
docs/notebooks/text_to_image.ipynb
Outdated
"Finetuner will trigger the callback when fine-tuning job finished and merge the weights between the pre-trained model and the fine-tuned model:\n", | ||
"\n", | ||
"```diff\n", | ||
"from finetuner.callbakcs import WiSEFTCallback\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This import is wrong and has to be fixed it is
from finetuner.callback import WiSEFTCallback
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size. |
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size. |
Maybe add small description of the notebook action in https://github.com/jina-ai/finetuner/blob/main/CONTRIBUTING.md |
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial comments before going through the actual content
@@ -178,7 +178,7 @@ all you need to do is use the `WiSEFTCallback`. | |||
Finetuner will trigger the callback when fine-tuning job finished and merge the weights between the pre-trained model and the fine-tuned model: | |||
|
|||
```diff | |||
from finetuner.callbakcs import WiSEFTCallback | |||
from finetuner.callback import WiSEFTCallback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we keeping the tasks section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have removed tasks, wired, seems not reelected in PR
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size. |
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size. |
docs/notebooks/image_to_image.md
Outdated
|
||
<a href="https://colab.research.google.com/drive/1QuUTy3iVR-kTPljkwplKYaJ-NTCgPEc_?usp=sharing"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg"></a> | ||
|
||
Searching visually similar images with image queries is a very popular use-case. However, using pre-trained models don’t deliver the best results – the models are trained on general data that lacks the particularities of your specific task. Here's where Finetuner comes! It enables you to accomplish this easily. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Searching visually similar images with image queries is a very popular use-case. However, using pre-trained models don’t deliver the best results – the models are trained on general data that lacks the particularities of your specific task. Here's where Finetuner comes! It enables you to accomplish this easily. | |
Searching visually similar images with image queries is a very popular use-case. However, using pre-trained models does not deliver the best results – the models are trained on general data that lack the particularities of your specific task. Here's where Finetuner comes in! It enables you to accomplish this easily. |
docs/notebooks/image_to_image.md
Outdated
|
||
This guide will demonstrate how to fine-tune a ResNet model for image to image retrieval. | ||
|
||
*Note, please consider switch to GPU/TPU Runtime for faster inference.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*Note, please consider switch to GPU/TPU Runtime for faster inference.* | |
*Note, please consider switching to GPU/TPU Runtime for faster inference.* |
docs/notebooks/image_to_image.md
Outdated
|
||
<!-- #region id="mUoY1jq0klwk" --> | ||
## Backbone model | ||
Now let's see which backbone models we can use. You can see available models either in by calling `finetuner.describe_models()`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now let's see which backbone models we can use. You can see available models either in by calling `finetuner.describe_models()`. | |
Now let's see which backbone models we can use. You can see available models by calling `finetuner.describe_models()`. |
docs/notebooks/image_to_image.md
Outdated
* Furthermore, we had to provide names of the `train_data`. | ||
* We set `TripletMarginLoss`. | ||
* Additionally, we use {class}`~finetuner.callback.EvaluationCallback` for evaluation. | ||
* Lastly, we set number of `epochs` and provide a `learning_rate`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Lastly, we set number of `epochs` and provide a `learning_rate`. | |
* Lastly, we set the number of `epochs` and provide a `learning_rate`. |
docs/notebooks/image_to_image.md
Outdated
<!-- #region id="7ftSOH_olcak" --> | ||
## Monitoring | ||
|
||
Now that we've created a run, let's see its status. You can monitor the run by checking the status - `run.status()`, the logs - `run.logs()` or `run.stream_logs()`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we've created a run, let's see its status. You can monitor the run by checking the status - `run.status()`, the logs - `run.logs()` or `run.stream_logs()`. | |
Now that we've created a run, let's see its status. You can monitor the run by checking the status - `run.status()` - and the logs - `run.logs()` or `run.stream_logs()`. |
docs/notebooks/text_to_text.md
Outdated
<!-- #region id="SfR6g0E_8fOz" --> | ||
## Data | ||
|
||
We will use the [Quora Question Pairs](https://www.sbert.net/examples/training/quora_duplicate_questions/README.html?highlight=quora#dataset) dataset to show-case Finetuner for text to text search. We have already pre-processed this dataset and made it available for you to pull from hubble. Do this as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will use the [Quora Question Pairs](https://www.sbert.net/examples/training/quora_duplicate_questions/README.html?highlight=quora#dataset) dataset to show-case Finetuner for text to text search. We have already pre-processed this dataset and made it available for you to pull from hubble. Do this as follows: | |
We will use the [Quora Question Pairs](https://www.sbert.net/examples/training/quora_duplicate_questions/README.html?highlight=quora#dataset) dataset to show-case Finetuner for text to text search. We have already pre-processed this dataset and made it available for you to pull from Jina AI Cloud. Do this as follows: |
docs/notebooks/text_to_text.md
Outdated
<!-- #region id="r_IlEIp59g9v" --> | ||
So we have 104598 training `Document`s. Each `Document` consists of a text field that contains the question, as well as a `finetuner_label` which indicates the label to which the question belongs. If multiple questions have the same label, they are duplicates of one another. If they have different `finetuner_label`s, they are not duplicates of each other. | ||
|
||
As for the evaluation dataset, we load `query_data` and `index_data` separately. The `query_data` has the same structure as the `train_data`, consisting of labelled documents. The `index_data` is the data against which the queries will be matched, and contains many documents, some of which may be irrelevant to the queries (ie. they have no duplicated in the `query_data`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the evaluation dataset, we load `query_data` and `index_data` separately. The `query_data` has the same structure as the `train_data`, consisting of labelled documents. The `index_data` is the data against which the queries will be matched, and contains many documents, some of which may be irrelevant to the queries (ie. they have no duplicated in the `query_data`). | |
As for the evaluation dataset, we load `query_data` and `index_data` separately. The `query_data` have the same structure as the `train_data`, consisting of labeled documents. The `index_data` are the data against which the queries will be matched, and contain many documents, some of which may be irrelevant to the queries (i.e. they have no duplicates in the `query_data`). |
docs/notebooks/text_to_text.md
Outdated
So we have 104598 training `Document`s. Each `Document` consists of a text field that contains the question, as well as a `finetuner_label` which indicates the label to which the question belongs. If multiple questions have the same label, they are duplicates of one another. If they have different `finetuner_label`s, they are not duplicates of each other. | ||
|
||
As for the evaluation dataset, we load `query_data` and `index_data` separately. The `query_data` has the same structure as the `train_data`, consisting of labelled documents. The `index_data` is the data against which the queries will be matched, and contains many documents, some of which may be irrelevant to the queries (ie. they have no duplicated in the `query_data`). | ||
If you look at the summaries for the `query_data` and `index_data`, you will find that they have the following number of instances: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at the summaries for the `query_data` and `index_data`, you will find that they have the following number of instances: | |
If you look at the summaries for the `query_data` and `index_data`, you will find that they have the following number of samples: |
docs/notebooks/text_to_text.md
Outdated
<!-- #region id="h0DGNRo8-lZD" --> | ||
## Monitoring | ||
|
||
Now that we've created a run, let's see its status. You can monitor the run by checking the status - `run.status()`, the logs - `run.logs()` or `run.stream_logs()`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we've created a run, let's see its status. You can monitor the run by checking the status - `run.status()`, the logs - `run.logs()` or `run.stream_logs()`. | |
Now that we've created a run, let's see its status. You can monitor the run by checking the status - `run.status()` - and the logs - `run.logs()` or `run.stream_logs()`. |
docs/notebooks/text_to_text.md
Outdated
``` | ||
|
||
<!-- #region id="7AuB0IWC_CSt" --> | ||
Dependends on the size of the training data, some runs might take up to several hours, you can reconnect to your run very easily to monitor its status. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dependends on the size of the training data, some runs might take up to several hours, you can reconnect to your run very easily to monitor its status. | |
Depending on the size of the training data, some runs might take up to several hours. You can later reconnect to your run easily to monitor its status. |
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size. |
📝 Docs are deployed on https://ft-docs-colab--jina-docs.netlify.app 🎉 |
Removed all three examples and replaced with three google colabs (links above). Embed three google colabs into the documentation page in order to make sure we only maintain a single notebook per task. How to use?
ipynb
, download todocs/notebooks
folder.make notebook
indocs
folder, will generate user-friendly markdown from notebook using jupytxtmake dirhtml
locally to see generated notebooks.This allows us to potentially integration test all the colabs (if we can login) end-to-end periodically using nbsphinx.
review it here
in docs:
in readme: