MXNet example as a plugin to OpenFL (#349)

* example how to add framework support added mxnet tutorial and adapter * add readme * Remove 'os' library from the shared descriptor * Minor change * changed the way set and get optimizer state in mxnet_adapter edited README add cuda monitor plugin edited list libraries in sd_requirements and requirements edited shard_descriptor * edited README edited list libraries requirements Minor fixes shard descriptor and mxnet adapter * shard descriptor minor changes * Minor changes * Update openfl-tutorials/interactive_api/MXNet_landmarks/README.md Co-authored-by: Igor Davidyuk <igor.davidyuk@intel.com> Co-authored-by: Igor Davidyuk <igor.davidyuk@intel.com>
securefederatedai · Mar 1, 2022 · df22c83 · df22c83
1 parent 156d0ca
commit df22c83
Show file tree

Hide file tree

Showing 11 changed files with 1,025 additions and 0 deletions.
diff --git a/openfl-tutorials/interactive_api/MXNet_landmarks/README.md b/openfl-tutorials/interactive_api/MXNet_landmarks/README.md
@@ -0,0 +1,122 @@
+# MXNet Facial Keypoints Detection tutorial
+---
+**Note:**
+
+Please pay attention that this task uses the dataset from Kaggle. To get the dataset you
+will need a Kaggle account and accept "Facial Keypoints Detection" [competition rules](https://www.kaggle.com/c/facial-keypoints-detection/rules).
+
+---
+
+This tutorial shows how to use any other framework, different from already supported PyTorch and TensorFlow, together with OpenFl.
+
+## Installation of Kaggle API credentials
+
+**Before the start please make sure that you installed sd_requirements.txt on your virtual
+environment on an envoy machine.**
+
+To use the [Kaggle API](https://github.com/Kaggle/kaggle-api), sign up for
+a [Kaggle account](https://www.kaggle.com). Then go to the `'Account'` tab of your user
+profile `(https://www.kaggle.com/<username>/account)` and select `'Create API Token'`. This will
+trigger the download of `kaggle.json`, a file containing your API credentials. Place this file in
+the location `~/.kaggle/kaggle.json`
+
+For your security, ensure that other users of your computer do not have read access to your
+credentials. On Unix-based systems you can do this with the following command:
+
+`chmod 600 ~/.kaggle/kaggle.json`
+
+If you need proxy add "proxy": `"http://<ip_addr:port>" in kaggle.json`. It should looks like
+that: `{"username":"your_username","key":"token", "proxy": "ip_addr:port"}`
+
+*Information about Kaggle API settings has been taken from kagge-api [readme](https://github.com/Kaggle/kaggle-api).*
+
+*Useful [link](https://github.com/Kaggle/kaggle-api/issues/6) for a problem with proxy settings.*
+
+### 1. About dataset
+
+All information about the dataset you may find
+on [link](https://www.kaggle.com/c/facial-keypoints-detection/data)
+
+### 2. Adding support for a third-party framework
+
+You need to write your own adapter class which is based on `FrameworkAdapterPluginInterface` [class](https://github.com/intel/openfl/blob/develop/openfl/plugins/frameworks_adapters/framework_adapter_interface.py). This class should contain at least two methods:
+
+ - `get_tensor_dict(model, optimizer=None)` - extracts tensor dict from a model and optionally[^1] an optimizer. The resulting tensors must be converted to **dict{str: numpy.array}** for forwarding and aggregation.
+
+  - `set_tensor_dict(model, tensor_dict, optimizer=None, device=None)` - sets aggregated numpy arrays into the model or model and optimizer. To do so it gets `tensor_dict` variable as **dict{str: numpy.array}** and should convert it into suitable for your model or model and optimizer tensors. After that, it must load the prepared parameters into the model/model and optimizer. 
+
+ Your adapter should be placed in workspace directory. When you create `ModelInterface` class object at the `'***.ipunb'`, place the name of your adapter to the input parameter `framework_plugin`. Example: 
+ ```py
+ framework_adapter = 'mxnet_adapter.FrameworkAdapterPlugin'
+
+ MI = ModelInterface(model=model, optimizer=optimizer,
+                    framework_plugin=framework_adapter)
+```
+
+[^1]: Whether or not to forward the optimizer parameters is set in the `start` method (FLExperiment [class](https://github.com/intel/openfl/blob/develop/openfl/interface/interactive_api/experiment.py) object, parameter `opt_treatment`).
+
+### Run experiment
+
+1. Create a folder for each `envoy`.
+2. Put a relevant envoy_config in each of the n folders (n - number of envoys which you would like
+   to use, in this tutorial there is two of them, but you may use any number of envoys) and copy
+   other files from `envoy` folder there as well.
+3. Modify each `envoy` accordingly:
+
+    - At `start_envoy.sh` change env_one to env_two (or any unique `envoy` names you like)
+
+    - Put a relevant envoy_config `envoy_config_one.yaml` or `envoy_config_two.yaml` (or any other
+      config file name consistent to the configuration file that is called in `start_envoy.sh`).
+4. Make sure that you installed requirements for each `envoy` in your virtual
+   environment: `pip install -r sd_requirements.txt`
+5. Run the `director`: 
+    ```sh
+    cd director_folder
+    ./start_director.sh
+    ```
+
+6. Run the `envoys`: 
+    ```sh
+    cd envoy_folder
+    ./start_envoy.sh env_one shard_config_one.yaml
+    ```
+    If kaggle-API setting are
+    correct the download of the dataset will be started. If this is not the first `envoy` launch
+    then the dataset will be redownloaded only if some part of the data are missing.
+
+7. Run the [MXNet_landmarks.ipynb](workspace/MXNet_landmarks.ipynb) notebook using
+   Jupyter lab in a prepared virtual environment. For more information about preparation virtual
+   environment look **[
+   Preparation virtual environment](#preparation-virtual-environment)**
+   .
+
+    * Install [MXNet 1.9.0](https://pypi.org/project/mxnet/1.9.0/) framework with CPU or GPU (preferred) support and [verify](https://mxnet.apache.org/versions/1.4.1/install/validate_mxnet.html) it:
+    ```bash
+    pip install mxnet-cuXXX==1.9.0
+    ```
+
+    * Run jupyter-lab:
+    ```bash
+    cd workspare
+    jupyter-lab
+    ```
+
+### Preparation virtual environment
+
+* Create virtual environment
+
+```sh
+    python3 -m venv venv
+```
+
+* To activate virtual environment
+
+```sh
+    source venv/bin/activate
+```
+
+* To deactivate virtual environment
+
+```sh
+    deactivate
+```
diff --git a/openfl-tutorials/interactive_api/MXNet_landmarks/director/director_config.yaml b/openfl-tutorials/interactive_api/MXNet_landmarks/director/director_config.yaml
@@ -0,0 +1,5 @@
+settings:
+  listen_host: localhost
+  listen_port: 50051
+  sample_shape: ['96', '96']
+  target_shape: ['1']
diff --git a/openfl-tutorials/interactive_api/MXNet_landmarks/director/start_director.sh b/openfl-tutorials/interactive_api/MXNet_landmarks/director/start_director.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+set -e
+
+fx director start --disable-tls -c director_config.yaml
diff --git a/openfl-tutorials/interactive_api/MXNet_landmarks/envoy/envoy_config_one.yaml b/openfl-tutorials/interactive_api/MXNet_landmarks/envoy/envoy_config_one.yaml
@@ -0,0 +1,12 @@
+params:
+  cuda_devices: [0]
+
+optional_plugin_components:
+ cuda_device_monitor:
+   template: openfl.plugins.processing_units_monitor.pynvml_monitor.PynvmlCUDADeviceMonitor
+   settings: []
+
+shard_descriptor:
+  template: landmark_shard_descriptor.LandmarkShardDescriptor
+  params:
+    rank_worldsize: 1, 2
diff --git a/openfl-tutorials/interactive_api/MXNet_landmarks/envoy/envoy_config_two.yaml b/openfl-tutorials/interactive_api/MXNet_landmarks/envoy/envoy_config_two.yaml
@@ -0,0 +1,12 @@
+params:
+  cuda_devices: [1]
+
+optional_plugin_components:
+ cuda_device_monitor:
+   template: openfl.plugins.processing_units_monitor.pynvml_monitor.PynvmlCUDADeviceMonitor
+   settings: []
+
+shard_descriptor:
+  template: landmark_shard_descriptor.LandmarkShardDescriptor
+  params:
+    rank_worldsize: 2, 2
diff --git a/openfl-tutorials/interactive_api/MXNet_landmarks/envoy/landmark_shard_descriptor.py b/openfl-tutorials/interactive_api/MXNet_landmarks/envoy/landmark_shard_descriptor.py
@@ -0,0 +1,170 @@
+# Copyright (C) 2021-2022 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+"""Landmarks Shard Descriptor."""
+
+import json
+import shutil
+from hashlib import md5
+from logging import getLogger
+from pathlib import Path
+from random import shuffle
+from typing import Dict
+from typing import List
+from zipfile import ZipFile
+
+import numpy as np
+import pandas as pd
+from kaggle.api.kaggle_api_extended import KaggleApi
+
+from openfl.interface.interactive_api.shard_descriptor import ShardDataset
+from openfl.interface.interactive_api.shard_descriptor import ShardDescriptor
+
+logger = getLogger(__name__)
+
+
+class LandmarkShardDataset(ShardDataset):
+    """Landmark Shard dataset class."""
+
+    def __init__(self, dataset_dir: Path,
+                 rank: int = 1, worldsize: int = 1) -> None:
+        """Initialize LandmarkShardDataset."""
+        self.rank = rank
+        self.worldsize = worldsize
+        self.dataset_dir = dataset_dir
+        self.img_names = list(self.dataset_dir.glob('img_*.npy'))
+
+        # Sharding
+        self.img_names = self.img_names[self.rank - 1::self.worldsize]
+        # Shuffling the results dataset after choose half pictures of each class
+        shuffle(self.img_names)
+
+    def __getitem__(self, index) -> np.ndarray:
+        """Return a item by the index."""
+        # Get name key points file
+        # f.e. image name:  'img_123.npy, corresponding name of the key points: 'keypoints_123.npy'
+        kp_name = str(self.img_names[index]).replace('img', 'keypoints')
+        return np.load(self.img_names[index]), np.load(self.dataset_dir / kp_name)
+
+    def __len__(self) -> int:
+        """Return the len of the dataset."""
+        return len(self.img_names)
+
+
+class LandmarkShardDescriptor(ShardDescriptor):
+    """Landmark Shard descriptor class."""
+
+    def __init__(self, data_folder: str = 'data',
+                 rank_worldsize: str = '1, 1',
+                 **kwargs) -> None:
+        """Initialize LandmarkShardDescriptor."""
+        super().__init__()
+        # Settings for sharding the dataset
+        self.rank, self.worldsize = map(int, rank_worldsize.split(','))
+
+        self.data_folder = Path.cwd() / data_folder
+        self.download_data()
+
+        # Calculating data and target shapes
+        ds = self.get_dataset()
+        sample, target = ds[0]
+        self._sample_shape = [str(dim) for dim in sample.shape]
+        self._target_shape = [str(len(target.shape))]
+
+        if self._target_shape[0] != '1':
+            raise ValueError('Target has a wrong shape')
+
+    def process_data(self, name_csv_file) -> None:
+        """Process data from csv to numpy format and save it in the same folder."""
+        data_df = pd.read_csv(self.data_folder / name_csv_file)
+        data_df.fillna(method='ffill', inplace=True)
+        keypoints = data_df.drop('Image', axis=1)
+        cur_folder = self.data_folder.relative_to(Path.cwd())
+
+        for i in range(data_df.shape[0]):
+            img = data_df['Image'][i].split(' ')
+            img = np.array(['0' if x == '' else x for x in img], dtype='float32').reshape(96, 96)
+            np.save(str(cur_folder / f'img_{i}.npy'), img)
+            y = np.array(keypoints.iloc[i, :], dtype='float32')
+            np.save(str(cur_folder / f'keypoints_{i}.npy'), y)
+
+    def download_data(self) -> None:
+        """Download dataset from Kaggle."""
+        if self.is_dataset_complete():
+            return
+
+        self.data_folder.mkdir(parents=True, exist_ok=True)
+
+        logger.info('Your dataset is absent or damaged. Downloading ... ')
+        api = KaggleApi()
+        api.authenticate()
+
+        if Path('data').exists():
+            shutil.rmtree('data')
+
+        api.competition_download_file(
+            'facial-keypoints-detection',
+            'training.zip', path=self.data_folder
+        )
+
+        with ZipFile(self.data_folder / 'training.zip', 'r') as zipobj:
+            zipobj.extractall(self.data_folder)
+
+        (self.data_folder / 'training.zip').unlink()
+
+        self.process_data('training.csv')
+        (self.data_folder / 'training.csv').unlink()
+        self.save_all_md5()
+
+    def get_dataset(self, dataset_type='train') -> LandmarkShardDataset:
+        """Return a shard dataset by type."""
+        return LandmarkShardDataset(
+            dataset_dir=self.data_folder,
+            rank=self.rank,
+            worldsize=self.worldsize
+        )
+
+    def calc_all_md5(self) -> Dict[str, str]:
+        """Calculate hash of all dataset."""
+        md5_dict = {}
+        for root in self.data_folder.glob('*.npy'):
+            md5_calc = md5()
+            rel_file = root.relative_to(self.data_folder)
+
+            with open(self.data_folder / rel_file, 'rb') as f:
+                for chunk in iter(lambda: f.read(4096), b''):
+                    md5_calc.update(chunk)
+                md5_dict[str(rel_file)] = md5_calc.hexdigest()
+        return md5_dict
+
+    def save_all_md5(self) -> None:
+        """Save dataset hash."""
+        all_md5 = self.calc_all_md5()
+        with open(self.data_folder / 'dataset.json', 'w') as f:
+            json.dump(all_md5, f)
+
+    def is_dataset_complete(self) -> bool:
+        """Check dataset integrity."""
+        dataset_md5_path = self.data_folder / 'dataset.json'
+        if dataset_md5_path.exists():
+            with open(dataset_md5_path, 'r') as f:
+                old_md5 = json.load(f)
+            new_md5 = self.calc_all_md5()
+            return new_md5 == old_md5
+        return False
+
+    @property
+    def sample_shape(self) -> List[str]:
+        """Return the sample shape info."""
+        return self._sample_shape
+
+    @property
+    def target_shape(self) -> List[str]:
+        """Return the target shape info."""
+        return self._target_shape
+
+    @property
+    def dataset_description(self) -> str:
+        """Return the dataset description."""
+        return (f'Dogs and Cats dataset, shard number {self.rank} '
+                f'out of {self.worldsize}')
diff --git a/openfl-tutorials/interactive_api/MXNet_landmarks/envoy/sd_requirements.txt b/openfl-tutorials/interactive_api/MXNet_landmarks/envoy/sd_requirements.txt
@@ -0,0 +1,2 @@
+pynvml
+kaggle
diff --git a/openfl-tutorials/interactive_api/MXNet_landmarks/envoy/start_envoy.sh b/openfl-tutorials/interactive_api/MXNet_landmarks/envoy/start_envoy.sh
@@ -0,0 +1,6 @@
+#!/bin/bash
+set -e
+ENVOY_NAME=$1
+SHARD_CONF=$2
+
+fx envoy start -n "$ENVOY_NAME" --disable-tls --envoy-config-path "$SHARD_CONF" -dh localhost -dp 50051