Merge branch 'main' into docs-before-after

jina-ai · Dec 14, 2022 · ca6b443 · ca6b443
2 parents 89015a6 + c81dcff
commit ca6b443
Show file tree

Hide file tree

Showing 10 changed files with 226 additions and 140 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -18,6 +18,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Changed
 
+- Adjust Finetuner based on API changes for Jina AI Cloud. ([#637](https://github.com/jina-ai/finetuner/pull/637))
+
+- Change default `experiment_name` from current working dir to `default`. ([#637](https://github.com/jina-ai/finetuner/pull/637))
+
 ### Fixed
 
 - Correctly infer the type of models created using `get_model` in the `build_encoding_dataset` function. )[#623](https://github.com/jina-ai/finetuner/pull/623))
@@ -30,6 +34,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 - Change hint in notebooks to use `load_uri_to_blob` instead of `load_uri_to_image_tensor`. ([#625](https://github.com/jina-ai/finetuner/pull/625))
 
+- Copyedit `README.md`, changes to language but not contents. ([#635](https://github.com/jina-ai/finetuner/pull/635))
+
 
 ## [0.6.7] - 2022-11-25
 

diff --git a/Makefile b/Makefile
@@ -85,7 +85,7 @@ build-sdist:
 
 # ---------------------------------------------------------------- Test related targets
 
-PYTEST_ARGS = --show-capture no --full-trace --verbose --cov finetuner/ --cov-report term-missing --cov-report html
+PYTEST_ARGS = --show-capture no --verbose --cov finetuner/ --cov-report term-missing --cov-report html
 
 ## Run tests
 test:

diff --git a/README.md b/README.md
@@ -18,22 +18,27 @@
 
 <!-- start elevator-pitch -->
 
-Fine-tuning is an effective way to improve the performance on neural search tasks. However, it is non-trivial for many deep learning engineers.
+Fine-tuning is an effective way to improve performance on neural search tasks. However, setting up and performing 
+fine-tuning can be very time-consuming and resource-intensive.
 
-Finetuner makes fine-tuning easier, faster and performant by streamlining the workflow and handling all complexity and infrastructure on the cloud.
-With Finetuner, one can easily uplift pre-trained models to be more performant and production ready.
+Jina AI's Finetuner makes fine-tuning easier and faster by streamlining the workflow and handling all complexity and 
+infrastructure in the cloud. With Finetuner, one can easily enhance the performance of pre-trained models, making them 
+production-ready without buying expensive hardware.
 
-📈 **Performance promise**: uplift pretrained model and deliver SOTA performance on domain-specific neural search applications.
+📈 **Performance promise**: enhance the performance of pre-trained models and deliver state-of-the-art performance on 
+domain-specific neural search applications.
 
-🔱 **Simple yet powerful**: easy access to 40+ mainstream losses, 10+ optimisers, layer pruning, weights freezing, dimensionality reduction, hard-negative mining, cross-modal model, distributed training. 
+🔱 **Simple yet powerful**: easy access to 40+ mainstream loss functions, 10+ optimisers, layer pruning, weight 
+freezing, dimensionality reduction, hard-negative mining, cross-modal models, and distributed training. 
 
-☁ **All-in-cloud**: instant training with our free GPU; manage runs, experiments and artifacts on Jina AI Cloud without worrying about provisioning resources, integration complexity and infrastructure.
+☁ **All-in-cloud**: train using our free GPU infrastructure, manage runs, experiments and artifacts on Jina AI Cloud
+without worrying about resource availability, complex integration, or infrastructure costs.
 
 <!-- end elevator-pitch -->
 
 ## [Documentation](https://finetuner.jina.ai/)
 
-## Benchmark
+## Benchmarks
 
 <table>
 <thead>
@@ -97,37 +102,41 @@ With Finetuner, one can easily uplift pre-trained models to be more performant a
 </tbody>
 </table>
 
-<sub><sup>All metrics are evaluated on k@20 after training for 5 epochs using Adam optimizer with learning rates of 1e-4 for ResNet, 1e-7 for CLIP and 1e-5 for the BERT models.</sup></sub>
+<sub><sup>All metrics were evaluated for k@20 after training for 5 epochs using the Adam optimizer with learning rates of 1e-4 for ResNet, 1e-7 for CLIP and 1e-5 for the BERT models.</sup></sub>
 
 <!-- start install-instruction -->
 
 ## Install
 
-Make sure you have Python 3.7+ installed.
-Finetuner can be installed via pip by executing:
+Make sure you have Python 3.7+ installed. Finetuner can be installed via `pip` by executing:
 
 ```bash
 pip install -U finetuner
 ```
 
-If you want to encode `docarray.DocumentArray` objects with the `finetuner.encode` function, you need to install `"finetuner[full]"`.
-In this case, some extra dependencies are installed which are necessary to do the inference, e.g., torch, torchvision, and open clip:
+If you want to encode `docarray.DocumentArray` objects with the `finetuner.encode` function, you need to install 
+`"finetuner[full]"`. This includes a number of additional dependencies, which are necessary for encoding: Torch, 
+Torchvision and OpenCLIP:
 
 ```bash
 pip install "finetuner[full]"
 ```
 
 <!-- end install-instruction -->
 
-> From 0.5.0, Finetuner computing is hosted on Jina AI Cloud. THe last local version is `0.4.1`, one can install it via pip or check out [git tags/releases here](https://github.com/jina-ai/finetuner/releases).
+> ⚠️ Starting with version 0.5.0, Finetuner computing is performed on Jina AI Cloud. The last local version is `0.4.1`. 
+> This version is still available for installation via `pip`. See [Finetuner git tags and releases](https://github.com/jina-ai/finetuner/releases).
 
 
 
 
 
 ## Get Started
 
-The following code snippet describes how to fine-tune ResNet50 on [Totally Looks Like dataset](https://sites.google.com/view/totally-looks-like-dataset), it can be run as-is (If there is already a run called `resnet50-tll-run`, choose a different name):
+The following code snippet describes how to fine-tune ResNet50 on the [_Totally Looks Like_ dataset](https://sites.google.com/view/totally-looks-like-dataset). 
+You can run it as-is. The model and training data are already hosted in Jina AI Cloud and Finetuner will 
+download them automatically.
+(NB: If there is already a run called `resnet50-tll-run`, choose a different run-name in the code below.)
 
 ```python
 import finetuner
@@ -147,9 +156,16 @@ run = finetuner.fit(
     ],
 )
 ```
+This code snippet describes the following steps:
 
-Here, the training data used is gathered from the Jina AI Cloud, however data can also be passed as a CSV file or DocumentArray, as described [here](https://finetuner.jina.ai/walkthrough/create-training-data/).  
-Fine-tuning might take 5 minutes to finish. You can later re-connect your run with:
+1. Log in to Jina AI Cloud.
+2. Select backbone model, training and evaluation data for your evaluation callback.
+3. Start the cloud run.
+
+You can also pass data to Finetuner as a CSV file or a `DocumentArray` object, as described [in the Finetuner documentation](https://finetuner.jina.ai/walkthrough/create-training-data/).  
+
+Depending on the data, task, model, hyperparameters, fine-tuning might take some time to finish. You can leave your jobs 
+to run on the Jina AI Cloud, and later reconnect to them, using code like this below:
 
 ```python
 import finetuner
@@ -164,16 +180,13 @@ for log_entry in run.stream_logs():
 run.save_artifact('resnet-tll')
 ```
 
-Specifically, the code snippet describes the following steps:
-
-  * Login to Jina AI Cloud.
-  * Select backbone model, training and evaluation data for your evaluation callback.
-  * Start the cloud run.
-  * Monitor the status: check the status and logs of the run.
-  * Save model for further use and integration.
+This code logs into Jina AI Cloud, then connects to your run by name. After that, it does the following:
+  * Monitors the status of the run and prints out the logs.
+  * Saves the model once fine-tuning is done.
 
+## Using Finetuner to encode
 
-Finally, you can use the model to encode images:
+Finetuner has interfaces for using models to do encoding:
 
 ```python
 import finetuner
@@ -201,15 +214,18 @@ embeddings = finetuner.encode(model=model, data=images)
 
 ## Training on your own data
 
-If you want to train a model using your own dataset instead of one on the Jina AI Cloud, you can provide labeled data in a CSV file in the following way:
+If you want to train a model using your own dataset instead of one on the Jina AI Cloud, you can provide labeled data in a CSV file.
+
+A CSV file is a tab or comma-delimited plain text file. For example:
 
 ```plaintext
 This is an apple    apple_label
 This is a pear      pear_label
 ...
 ```
+The file should have two columns: The first for the data and the second for the category label.
 
-You can then provide the path to your CSV file as your training data:
+You can then provide a path to a CSV file as training data for Finetuner:
 
 ```python
 run = finetuner.fit(
@@ -218,7 +234,7 @@ run = finetuner.fit(
     train_data='path/to/some/data.csv',
 )
 ```
-More information on providing your own training data is found in the [Prepare Training Data](https://finetuner.jina.ai/walkthrough/create-training-data/) section of the [walkthrough](https://finetuner.jina.ai/walkthrough/).
+More information on providing your own training data is found in the [Prepare Training Data](https://finetuner.jina.ai/walkthrough/create-training-data/) section of the [Finetuner documentation](https://finetuner.jina.ai/).
 
 
 
@@ -230,7 +246,7 @@ More information on providing your own training data is found in the [Prepare Tr
   - [Image-to-Image Search via ResNet50](https://finetuner.jina.ai/notebooks/image_to_image/)
   - [Text-to-Image Search via CLIP](https://finetuner.jina.ai/notebooks/text_to_image/)
 
-Intrigued? That's only scratching the surface of what Finetuner is capable of. [Read our docs to learn more](https://finetuner.jina.ai/).
+[Read our documentation](https://finetuner.jina.ai/) to learn more about what Finetuner can do.
 
 <!-- start support-pitch -->
 ## Support
@@ -247,6 +263,9 @@ Intrigued? That's only scratching the surface of what Finetuner is capable of. [
 
 ## Join Us
 
-Finetuner is backed by [Jina AI](https://jina.ai) and licensed under [Apache-2.0](./LICENSE). [We are actively hiring](https://jobs.jina.ai) AI engineers, solution engineers to build the next neural search ecosystem in opensource.
+Finetuner is backed by [Jina AI](https://jina.ai) and licensed under [Apache-2.0](./LICENSE). 
+
+[We are actively hiring](https://jobs.jina.ai) AI engineers and solution engineers to build the next generation of
+open-source AI ecosystems.
 
 <!-- end support-pitch -->
diff --git a/finetuner/__init__.py b/finetuner/__init__.py
@@ -267,16 +267,22 @@ def get_run(run_name: str, experiment_name: Optional[str] = None) -> Run:
     return ft.get_run(run_name=run_name, experiment_name=experiment_name)
 
 
-def list_runs(experiment_name: Optional[str] = None) -> List[Run]:
-    """List every run.
-
-    If an experiment name is not specified, we'll list every run across all
-    experiments.
-
-    :param experiment_name: Optional name of the experiment.
-    :return: A list of `Run` objects.
+def list_runs(
+    experiment_name: Optional[str] = None, page: int = 1, size: int = 50
+) -> List[Run]:
+    """List all created runs inside a given experiment.
+
+    If no experiment is specified, list runs for all available experiments.
+    :param experiment_name: The name of the experiment.
+    :param page: The page index.
+    :param size: Number of runs to retrieve.
+    :return: List of all runs.
+
+    ..note:: `page` and `size` works together. For example, page 1 size 50 gives
+        the 50 runs in the first page. To get 50-100, set `page` as 2.
+    ..note:: The maximum number for `size` per page is 100.
     """
-    return ft.list_runs(experiment_name=experiment_name)
+    return ft.list_runs(experiment_name=experiment_name, page=page, size=size)
 
 
 def delete_run(run_name: str, experiment_name: Optional[str] = None) -> None:
@@ -303,11 +309,11 @@ def delete_runs(experiment_name: Optional[str] = None) -> None:
     ft.delete_runs(experiment_name=experiment_name)
 
 
-def create_experiment(name: Optional[str] = None) -> Experiment:
+def create_experiment(name: str = 'default') -> Experiment:
     """Create an experiment.
 
-    :param name: Optional name of the experiment. If `None`,
-        the experiment is named after the current directory.
+    :param name: The name of the experiment. If not provided,
+        the experiment is named as `default`.
     :return: An `Experiment` object.
     """
     return ft.create_experiment(name=name)
@@ -322,9 +328,18 @@ def get_experiment(name: str) -> Experiment:
     return ft.get_experiment(name=name)
 
 
-def list_experiments() -> List[Experiment]:
-    """List every experiment."""
-    return ft.list_experiments()
+def list_experiments(page: int = 1, size: int = 50) -> List[Experiment]:
+    """List every experiment.
+
+    :param page: The page index.
+    :param size: The number of experiments to retrieve.
+    :return: A list of :class:`Experiment` instance.
+
+    ..note:: `page` and `size` works together. For example, page 1 size 50 gives
+        the 50 experiments in the first page. To get 50-100, set `page` as 2.
+    ..note:: The maximum number for `size` per page is 100.
+    """
+    return ft.list_experiments(page=page, size=size)
 
 
 def delete_experiment(name: str) -> Experiment:

diff --git a/finetuner/client/client.py b/finetuner/client/client.py
@@ -1,4 +1,4 @@
-from typing import Iterator, List, Optional
+from typing import Any, Dict, Iterator, List, Optional
 
 import pkg_resources
 
@@ -32,7 +32,9 @@ class FinetunerV1Client(_BaseClient):
 
     """ Experiment API """
 
-    def create_experiment(self, name: str, description: Optional[str] = '') -> dict:
+    def create_experiment(
+        self, name: str = 'default', description: Optional[str] = ''
+    ) -> Dict[str, Any]:
         """Create a new experiment.
 
         :param name: The name of the experiment.
@@ -53,15 +55,23 @@ def get_experiment(self, name: str) -> dict:
         url = self._construct_url(self._base_url, API_VERSION, EXPERIMENTS, name)
         return self._handle_request(url=url, method=GET)
 
-    def list_experiments(self) -> List[dict]:
-        """List all available experiments.
+    def list_experiments(self, page: int = 1, size: int = 50) -> Dict[str, Any]:
+        """List every experiment.
 
-        :return: List of all experiments.
+        :param page: The page index.
+        :param size: The number of experiments to retrieve.
+        :return: Paginated results as a dict, where `items` are the `Experiment`s being
+            retrieved.
+
+        ..note:: `page` and `size` works together. For example, page 1 size 50 gives
+            the 50 experiments in the first page. To get 50-100, set `page` as 2.
+        ..note:: The maximum number for `size` per page is 100.
         """
+        params = {'page': page, 'size': size}
         url = self._construct_url(self._base_url, API_VERSION, EXPERIMENTS)
-        return self._handle_request(url=url, method=GET)
+        return self._handle_request(url=url, method=GET, params=params)
 
-    def delete_experiment(self, name: str) -> dict:
+    def delete_experiment(self, name: str) -> Dict[str, Any]:
         """Delete an experiment given its name.
 
         :param name: The name of the experiment.
@@ -92,28 +102,32 @@ def get_run(self, experiment_name: str, run_name: str) -> dict:
         )
         return self._handle_request(url=url, method=GET)
 
-    def list_runs(self, experiment_name: Optional[str] = None) -> List[dict]:
+    def list_runs(
+        self, experiment_name: Optional[str] = None, page: int = 50, size: int = 50
+    ) -> Dict[str, Any]:
         """List all created runs inside a given experiment.
 
         If no experiment is specified, list runs for all available experiments.
         :param experiment_name: The name of the experiment.
-        :return: List of all runs.
+        :param page: The page index.
+        :param size: Number of runs to retrieve.
+        :return: Paginated results as a dict, where `items` are the `Runs` being
+            retrieved.
+
+        ..note:: `page` and `size` works together. For example, page 1 size 50 gives
+            the 50 runs in the first page. To get 50-100, set `page` as 2.
+        ..note:: The maximum number for `size` per page is 100.
         """
         if not experiment_name:
-            target_experiments = [
-                experiment[NAME] for experiment in self.list_experiments()
-            ]
+            url = self._construct_url(self._base_url, API_VERSION, RUNS, RUNS)
         else:
-            target_experiments = [experiment_name]
-        response = []
-        for experiment_name in target_experiments:
             url = self._construct_url(
                 self._base_url, API_VERSION, EXPERIMENTS, experiment_name, RUNS
             )
-            response.extend(self._handle_request(url=url, method=GET))
-        return response
+        params = {'page': page, 'size': size}
+        return self._handle_request(url=url, method=GET, params=params)
 
-    def delete_run(self, experiment_name: str, run_name: str) -> dict:
+    def delete_run(self, experiment_name: str, run_name: str) -> Dict[str, Any]:
         """Delete a run by its name and experiment.
 
         :param experiment_name: The name of the experiment.