Skip to content

Commit

Permalink
Merge branch 'main' into docs-before-after
Browse files Browse the repository at this point in the history
  • Loading branch information
lmmilliken committed Dec 14, 2022
2 parents 89015a6 + c81dcff commit ca6b443
Show file tree
Hide file tree
Showing 10 changed files with 226 additions and 140 deletions.
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Changed

- Adjust Finetuner based on API changes for Jina AI Cloud. ([#637](https://github.com/jina-ai/finetuner/pull/637))

- Change default `experiment_name` from current working dir to `default`. ([#637](https://github.com/jina-ai/finetuner/pull/637))

### Fixed

- Correctly infer the type of models created using `get_model` in the `build_encoding_dataset` function. )[#623](https://github.com/jina-ai/finetuner/pull/623))
Expand All @@ -30,6 +34,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- Change hint in notebooks to use `load_uri_to_blob` instead of `load_uri_to_image_tensor`. ([#625](https://github.com/jina-ai/finetuner/pull/625))

- Copyedit `README.md`, changes to language but not contents. ([#635](https://github.com/jina-ai/finetuner/pull/635))


## [0.6.7] - 2022-11-25

Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ build-sdist:

# ---------------------------------------------------------------- Test related targets

PYTEST_ARGS = --show-capture no --full-trace --verbose --cov finetuner/ --cov-report term-missing --cov-report html
PYTEST_ARGS = --show-capture no --verbose --cov finetuner/ --cov-report term-missing --cov-report html

## Run tests
test:
Expand Down
77 changes: 48 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,22 +18,27 @@

<!-- start elevator-pitch -->

Fine-tuning is an effective way to improve the performance on neural search tasks. However, it is non-trivial for many deep learning engineers.
Fine-tuning is an effective way to improve performance on neural search tasks. However, setting up and performing
fine-tuning can be very time-consuming and resource-intensive.

Finetuner makes fine-tuning easier, faster and performant by streamlining the workflow and handling all complexity and infrastructure on the cloud.
With Finetuner, one can easily uplift pre-trained models to be more performant and production ready.
Jina AI's Finetuner makes fine-tuning easier and faster by streamlining the workflow and handling all complexity and
infrastructure in the cloud. With Finetuner, one can easily enhance the performance of pre-trained models, making them
production-ready without buying expensive hardware.

📈 **Performance promise**: uplift pretrained model and deliver SOTA performance on domain-specific neural search applications.
📈 **Performance promise**: enhance the performance of pre-trained models and deliver state-of-the-art performance on
domain-specific neural search applications.

🔱 **Simple yet powerful**: easy access to 40+ mainstream losses, 10+ optimisers, layer pruning, weights freezing, dimensionality reduction, hard-negative mining, cross-modal model, distributed training.
🔱 **Simple yet powerful**: easy access to 40+ mainstream loss functions, 10+ optimisers, layer pruning, weight
freezing, dimensionality reduction, hard-negative mining, cross-modal models, and distributed training.

**All-in-cloud**: instant training with our free GPU; manage runs, experiments and artifacts on Jina AI Cloud without worrying about provisioning resources, integration complexity and infrastructure.
**All-in-cloud**: train using our free GPU infrastructure, manage runs, experiments and artifacts on Jina AI Cloud
without worrying about resource availability, complex integration, or infrastructure costs.

<!-- end elevator-pitch -->

## [Documentation](https://finetuner.jina.ai/)

## Benchmark
## Benchmarks

<table>
<thead>
Expand Down Expand Up @@ -97,37 +102,41 @@ With Finetuner, one can easily uplift pre-trained models to be more performant a
</tbody>
</table>

<sub><sup>All metrics are evaluated on k@20 after training for 5 epochs using Adam optimizer with learning rates of 1e-4 for ResNet, 1e-7 for CLIP and 1e-5 for the BERT models.</sup></sub>
<sub><sup>All metrics were evaluated for k@20 after training for 5 epochs using the Adam optimizer with learning rates of 1e-4 for ResNet, 1e-7 for CLIP and 1e-5 for the BERT models.</sup></sub>

<!-- start install-instruction -->

## Install

Make sure you have Python 3.7+ installed.
Finetuner can be installed via pip by executing:
Make sure you have Python 3.7+ installed. Finetuner can be installed via `pip` by executing:

```bash
pip install -U finetuner
```

If you want to encode `docarray.DocumentArray` objects with the `finetuner.encode` function, you need to install `"finetuner[full]"`.
In this case, some extra dependencies are installed which are necessary to do the inference, e.g., torch, torchvision, and open clip:
If you want to encode `docarray.DocumentArray` objects with the `finetuner.encode` function, you need to install
`"finetuner[full]"`. This includes a number of additional dependencies, which are necessary for encoding: Torch,
Torchvision and OpenCLIP:

```bash
pip install "finetuner[full]"
```

<!-- end install-instruction -->

> From 0.5.0, Finetuner computing is hosted on Jina AI Cloud. THe last local version is `0.4.1`, one can install it via pip or check out [git tags/releases here](https://github.com/jina-ai/finetuner/releases).
> ⚠️ Starting with version 0.5.0, Finetuner computing is performed on Jina AI Cloud. The last local version is `0.4.1`.
> This version is still available for installation via `pip`. See [Finetuner git tags and releases](https://github.com/jina-ai/finetuner/releases).




## Get Started

The following code snippet describes how to fine-tune ResNet50 on [Totally Looks Like dataset](https://sites.google.com/view/totally-looks-like-dataset), it can be run as-is (If there is already a run called `resnet50-tll-run`, choose a different name):
The following code snippet describes how to fine-tune ResNet50 on the [_Totally Looks Like_ dataset](https://sites.google.com/view/totally-looks-like-dataset).
You can run it as-is. The model and training data are already hosted in Jina AI Cloud and Finetuner will
download them automatically.
(NB: If there is already a run called `resnet50-tll-run`, choose a different run-name in the code below.)

```python
import finetuner
Expand All @@ -147,9 +156,16 @@ run = finetuner.fit(
],
)
```
This code snippet describes the following steps:

Here, the training data used is gathered from the Jina AI Cloud, however data can also be passed as a CSV file or DocumentArray, as described [here](https://finetuner.jina.ai/walkthrough/create-training-data/).
Fine-tuning might take 5 minutes to finish. You can later re-connect your run with:
1. Log in to Jina AI Cloud.
2. Select backbone model, training and evaluation data for your evaluation callback.
3. Start the cloud run.

You can also pass data to Finetuner as a CSV file or a `DocumentArray` object, as described [in the Finetuner documentation](https://finetuner.jina.ai/walkthrough/create-training-data/).

Depending on the data, task, model, hyperparameters, fine-tuning might take some time to finish. You can leave your jobs
to run on the Jina AI Cloud, and later reconnect to them, using code like this below:

```python
import finetuner
Expand All @@ -164,16 +180,13 @@ for log_entry in run.stream_logs():
run.save_artifact('resnet-tll')
```

Specifically, the code snippet describes the following steps:

* Login to Jina AI Cloud.
* Select backbone model, training and evaluation data for your evaluation callback.
* Start the cloud run.
* Monitor the status: check the status and logs of the run.
* Save model for further use and integration.
This code logs into Jina AI Cloud, then connects to your run by name. After that, it does the following:
* Monitors the status of the run and prints out the logs.
* Saves the model once fine-tuning is done.

## Using Finetuner to encode

Finally, you can use the model to encode images:
Finetuner has interfaces for using models to do encoding:

```python
import finetuner
Expand Down Expand Up @@ -201,15 +214,18 @@ embeddings = finetuner.encode(model=model, data=images)

## Training on your own data

If you want to train a model using your own dataset instead of one on the Jina AI Cloud, you can provide labeled data in a CSV file in the following way:
If you want to train a model using your own dataset instead of one on the Jina AI Cloud, you can provide labeled data in a CSV file.

A CSV file is a tab or comma-delimited plain text file. For example:

```plaintext
This is an apple apple_label
This is a pear pear_label
...
```
The file should have two columns: The first for the data and the second for the category label.

You can then provide the path to your CSV file as your training data:
You can then provide a path to a CSV file as training data for Finetuner:

```python
run = finetuner.fit(
Expand All @@ -218,7 +234,7 @@ run = finetuner.fit(
train_data='path/to/some/data.csv',
)
```
More information on providing your own training data is found in the [Prepare Training Data](https://finetuner.jina.ai/walkthrough/create-training-data/) section of the [walkthrough](https://finetuner.jina.ai/walkthrough/).
More information on providing your own training data is found in the [Prepare Training Data](https://finetuner.jina.ai/walkthrough/create-training-data/) section of the [Finetuner documentation](https://finetuner.jina.ai/).



Expand All @@ -230,7 +246,7 @@ More information on providing your own training data is found in the [Prepare Tr
- [Image-to-Image Search via ResNet50](https://finetuner.jina.ai/notebooks/image_to_image/)
- [Text-to-Image Search via CLIP](https://finetuner.jina.ai/notebooks/text_to_image/)

Intrigued? That's only scratching the surface of what Finetuner is capable of. [Read our docs to learn more](https://finetuner.jina.ai/).
[Read our documentation](https://finetuner.jina.ai/) to learn more about what Finetuner can do.

<!-- start support-pitch -->
## Support
Expand All @@ -247,6 +263,9 @@ Intrigued? That's only scratching the surface of what Finetuner is capable of. [

## Join Us

Finetuner is backed by [Jina AI](https://jina.ai) and licensed under [Apache-2.0](./LICENSE). [We are actively hiring](https://jobs.jina.ai) AI engineers, solution engineers to build the next neural search ecosystem in opensource.
Finetuner is backed by [Jina AI](https://jina.ai) and licensed under [Apache-2.0](./LICENSE).

[We are actively hiring](https://jobs.jina.ai) AI engineers and solution engineers to build the next generation of
open-source AI ecosystems.

<!-- end support-pitch -->
45 changes: 30 additions & 15 deletions finetuner/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -267,16 +267,22 @@ def get_run(run_name: str, experiment_name: Optional[str] = None) -> Run:
return ft.get_run(run_name=run_name, experiment_name=experiment_name)


def list_runs(experiment_name: Optional[str] = None) -> List[Run]:
"""List every run.
If an experiment name is not specified, we'll list every run across all
experiments.
:param experiment_name: Optional name of the experiment.
:return: A list of `Run` objects.
def list_runs(
experiment_name: Optional[str] = None, page: int = 1, size: int = 50
) -> List[Run]:
"""List all created runs inside a given experiment.
If no experiment is specified, list runs for all available experiments.
:param experiment_name: The name of the experiment.
:param page: The page index.
:param size: Number of runs to retrieve.
:return: List of all runs.
..note:: `page` and `size` works together. For example, page 1 size 50 gives
the 50 runs in the first page. To get 50-100, set `page` as 2.
..note:: The maximum number for `size` per page is 100.
"""
return ft.list_runs(experiment_name=experiment_name)
return ft.list_runs(experiment_name=experiment_name, page=page, size=size)


def delete_run(run_name: str, experiment_name: Optional[str] = None) -> None:
Expand All @@ -303,11 +309,11 @@ def delete_runs(experiment_name: Optional[str] = None) -> None:
ft.delete_runs(experiment_name=experiment_name)


def create_experiment(name: Optional[str] = None) -> Experiment:
def create_experiment(name: str = 'default') -> Experiment:
"""Create an experiment.
:param name: Optional name of the experiment. If `None`,
the experiment is named after the current directory.
:param name: The name of the experiment. If not provided,
the experiment is named as `default`.
:return: An `Experiment` object.
"""
return ft.create_experiment(name=name)
Expand All @@ -322,9 +328,18 @@ def get_experiment(name: str) -> Experiment:
return ft.get_experiment(name=name)


def list_experiments() -> List[Experiment]:
"""List every experiment."""
return ft.list_experiments()
def list_experiments(page: int = 1, size: int = 50) -> List[Experiment]:
"""List every experiment.
:param page: The page index.
:param size: The number of experiments to retrieve.
:return: A list of :class:`Experiment` instance.
..note:: `page` and `size` works together. For example, page 1 size 50 gives
the 50 experiments in the first page. To get 50-100, set `page` as 2.
..note:: The maximum number for `size` per page is 100.
"""
return ft.list_experiments(page=page, size=size)


def delete_experiment(name: str) -> Experiment:
Expand Down
50 changes: 32 additions & 18 deletions finetuner/client/client.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from typing import Iterator, List, Optional
from typing import Any, Dict, Iterator, List, Optional

import pkg_resources

Expand Down Expand Up @@ -32,7 +32,9 @@ class FinetunerV1Client(_BaseClient):

""" Experiment API """

def create_experiment(self, name: str, description: Optional[str] = '') -> dict:
def create_experiment(
self, name: str = 'default', description: Optional[str] = ''
) -> Dict[str, Any]:
"""Create a new experiment.
:param name: The name of the experiment.
Expand All @@ -53,15 +55,23 @@ def get_experiment(self, name: str) -> dict:
url = self._construct_url(self._base_url, API_VERSION, EXPERIMENTS, name)
return self._handle_request(url=url, method=GET)

def list_experiments(self) -> List[dict]:
"""List all available experiments.
def list_experiments(self, page: int = 1, size: int = 50) -> Dict[str, Any]:
"""List every experiment.
:return: List of all experiments.
:param page: The page index.
:param size: The number of experiments to retrieve.
:return: Paginated results as a dict, where `items` are the `Experiment`s being
retrieved.
..note:: `page` and `size` works together. For example, page 1 size 50 gives
the 50 experiments in the first page. To get 50-100, set `page` as 2.
..note:: The maximum number for `size` per page is 100.
"""
params = {'page': page, 'size': size}
url = self._construct_url(self._base_url, API_VERSION, EXPERIMENTS)
return self._handle_request(url=url, method=GET)
return self._handle_request(url=url, method=GET, params=params)

def delete_experiment(self, name: str) -> dict:
def delete_experiment(self, name: str) -> Dict[str, Any]:
"""Delete an experiment given its name.
:param name: The name of the experiment.
Expand Down Expand Up @@ -92,28 +102,32 @@ def get_run(self, experiment_name: str, run_name: str) -> dict:
)
return self._handle_request(url=url, method=GET)

def list_runs(self, experiment_name: Optional[str] = None) -> List[dict]:
def list_runs(
self, experiment_name: Optional[str] = None, page: int = 50, size: int = 50
) -> Dict[str, Any]:
"""List all created runs inside a given experiment.
If no experiment is specified, list runs for all available experiments.
:param experiment_name: The name of the experiment.
:return: List of all runs.
:param page: The page index.
:param size: Number of runs to retrieve.
:return: Paginated results as a dict, where `items` are the `Runs` being
retrieved.
..note:: `page` and `size` works together. For example, page 1 size 50 gives
the 50 runs in the first page. To get 50-100, set `page` as 2.
..note:: The maximum number for `size` per page is 100.
"""
if not experiment_name:
target_experiments = [
experiment[NAME] for experiment in self.list_experiments()
]
url = self._construct_url(self._base_url, API_VERSION, RUNS, RUNS)
else:
target_experiments = [experiment_name]
response = []
for experiment_name in target_experiments:
url = self._construct_url(
self._base_url, API_VERSION, EXPERIMENTS, experiment_name, RUNS
)
response.extend(self._handle_request(url=url, method=GET))
return response
params = {'page': page, 'size': size}
return self._handle_request(url=url, method=GET, params=params)

def delete_run(self, experiment_name: str, run_name: str) -> dict:
def delete_run(self, experiment_name: str, run_name: str) -> Dict[str, Any]:
"""Delete a run by its name and experiment.
:param experiment_name: The name of the experiment.
Expand Down
Loading

0 comments on commit ca6b443

Please sign in to comment.