Skip to content

Commit

Permalink
MXNet example as a plugin to OpenFL (#349)
Browse files Browse the repository at this point in the history
* example how to add framework support
 added mxnet tutorial and adapter

* add readme

* Remove 'os' library from the shared descriptor

* Minor change

* changed the way set and get optimizer state in mxnet_adapter
edited README
add cuda monitor plugin
edited list libraries in sd_requirements and requirements
edited shard_descriptor

* edited README
edited list libraries requirements
Minor fixes shard descriptor and mxnet adapter

* shard descriptor minor changes

* Minor changes

* Update openfl-tutorials/interactive_api/MXNet_landmarks/README.md

Co-authored-by: Igor Davidyuk <igor.davidyuk@intel.com>

Co-authored-by: Igor Davidyuk <igor.davidyuk@intel.com>
  • Loading branch information
ViktoriiaRomanova and igor-davidyuk committed Mar 1, 2022
1 parent 156d0ca commit df22c83
Show file tree
Hide file tree
Showing 11 changed files with 1,025 additions and 0 deletions.
122 changes: 122 additions & 0 deletions openfl-tutorials/interactive_api/MXNet_landmarks/README.md
@@ -0,0 +1,122 @@
# MXNet Facial Keypoints Detection tutorial
---
**Note:**

Please pay attention that this task uses the dataset from Kaggle. To get the dataset you
will need a Kaggle account and accept "Facial Keypoints Detection" [competition rules](https://www.kaggle.com/c/facial-keypoints-detection/rules).

---

This tutorial shows how to use any other framework, different from already supported PyTorch and TensorFlow, together with OpenFl.

## Installation of Kaggle API credentials

**Before the start please make sure that you installed sd_requirements.txt on your virtual
environment on an envoy machine.**

To use the [Kaggle API](https://github.com/Kaggle/kaggle-api), sign up for
a [Kaggle account](https://www.kaggle.com). Then go to the `'Account'` tab of your user
profile `(https://www.kaggle.com/<username>/account)` and select `'Create API Token'`. This will
trigger the download of `kaggle.json`, a file containing your API credentials. Place this file in
the location `~/.kaggle/kaggle.json`

For your security, ensure that other users of your computer do not have read access to your
credentials. On Unix-based systems you can do this with the following command:

`chmod 600 ~/.kaggle/kaggle.json`

If you need proxy add "proxy": `"http://<ip_addr:port>" in kaggle.json`. It should looks like
that: `{"username":"your_username","key":"token", "proxy": "ip_addr:port"}`

*Information about Kaggle API settings has been taken from kagge-api [readme](https://github.com/Kaggle/kaggle-api).*

*Useful [link](https://github.com/Kaggle/kaggle-api/issues/6) for a problem with proxy settings.*

### 1. About dataset

All information about the dataset you may find
on [link](https://www.kaggle.com/c/facial-keypoints-detection/data)

### 2. Adding support for a third-party framework

You need to write your own adapter class which is based on `FrameworkAdapterPluginInterface` [class](https://github.com/intel/openfl/blob/develop/openfl/plugins/frameworks_adapters/framework_adapter_interface.py). This class should contain at least two methods:

- `get_tensor_dict(model, optimizer=None)` - extracts tensor dict from a model and optionally[^1] an optimizer. The resulting tensors must be converted to **dict{str: numpy.array}** for forwarding and aggregation.

- `set_tensor_dict(model, tensor_dict, optimizer=None, device=None)` - sets aggregated numpy arrays into the model or model and optimizer. To do so it gets `tensor_dict` variable as **dict{str: numpy.array}** and should convert it into suitable for your model or model and optimizer tensors. After that, it must load the prepared parameters into the model/model and optimizer.

Your adapter should be placed in workspace directory. When you create `ModelInterface` class object at the `'***.ipunb'`, place the name of your adapter to the input parameter `framework_plugin`. Example:
```py
framework_adapter = 'mxnet_adapter.FrameworkAdapterPlugin'

MI = ModelInterface(model=model, optimizer=optimizer,
framework_plugin=framework_adapter)
```

[^1]: Whether or not to forward the optimizer parameters is set in the `start` method (FLExperiment [class](https://github.com/intel/openfl/blob/develop/openfl/interface/interactive_api/experiment.py) object, parameter `opt_treatment`).

### Run experiment

1. Create a folder for each `envoy`.
2. Put a relevant envoy_config in each of the n folders (n - number of envoys which you would like
to use, in this tutorial there is two of them, but you may use any number of envoys) and copy
other files from `envoy` folder there as well.
3. Modify each `envoy` accordingly:

- At `start_envoy.sh` change env_one to env_two (or any unique `envoy` names you like)

- Put a relevant envoy_config `envoy_config_one.yaml` or `envoy_config_two.yaml` (or any other
config file name consistent to the configuration file that is called in `start_envoy.sh`).
4. Make sure that you installed requirements for each `envoy` in your virtual
environment: `pip install -r sd_requirements.txt`
5. Run the `director`:
```sh
cd director_folder
./start_director.sh
```

6. Run the `envoys`:
```sh
cd envoy_folder
./start_envoy.sh env_one shard_config_one.yaml
```
If kaggle-API setting are
correct the download of the dataset will be started. If this is not the first `envoy` launch
then the dataset will be redownloaded only if some part of the data are missing.

7. Run the [MXNet_landmarks.ipynb](workspace/MXNet_landmarks.ipynb) notebook using
Jupyter lab in a prepared virtual environment. For more information about preparation virtual
environment look **[
Preparation virtual environment](#preparation-virtual-environment)**
.

* Install [MXNet 1.9.0](https://pypi.org/project/mxnet/1.9.0/) framework with CPU or GPU (preferred) support and [verify](https://mxnet.apache.org/versions/1.4.1/install/validate_mxnet.html) it:
```bash
pip install mxnet-cuXXX==1.9.0
```

* Run jupyter-lab:
```bash
cd workspare
jupyter-lab
```

### Preparation virtual environment

* Create virtual environment

```sh
python3 -m venv venv
```

* To activate virtual environment

```sh
source venv/bin/activate
```

* To deactivate virtual environment

```sh
deactivate
```
@@ -0,0 +1,5 @@
settings:
listen_host: localhost
listen_port: 50051
sample_shape: ['96', '96']
target_shape: ['1']
@@ -0,0 +1,4 @@
#!/bin/bash
set -e

fx director start --disable-tls -c director_config.yaml
@@ -0,0 +1,12 @@
params:
cuda_devices: [0]

optional_plugin_components:
cuda_device_monitor:
template: openfl.plugins.processing_units_monitor.pynvml_monitor.PynvmlCUDADeviceMonitor
settings: []

shard_descriptor:
template: landmark_shard_descriptor.LandmarkShardDescriptor
params:
rank_worldsize: 1, 2
@@ -0,0 +1,12 @@
params:
cuda_devices: [1]

optional_plugin_components:
cuda_device_monitor:
template: openfl.plugins.processing_units_monitor.pynvml_monitor.PynvmlCUDADeviceMonitor
settings: []

shard_descriptor:
template: landmark_shard_descriptor.LandmarkShardDescriptor
params:
rank_worldsize: 2, 2
@@ -0,0 +1,170 @@
# Copyright (C) 2021-2022 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

"""Landmarks Shard Descriptor."""

import json
import shutil
from hashlib import md5
from logging import getLogger
from pathlib import Path
from random import shuffle
from typing import Dict
from typing import List
from zipfile import ZipFile

import numpy as np
import pandas as pd
from kaggle.api.kaggle_api_extended import KaggleApi

from openfl.interface.interactive_api.shard_descriptor import ShardDataset
from openfl.interface.interactive_api.shard_descriptor import ShardDescriptor

logger = getLogger(__name__)


class LandmarkShardDataset(ShardDataset):
"""Landmark Shard dataset class."""

def __init__(self, dataset_dir: Path,
rank: int = 1, worldsize: int = 1) -> None:
"""Initialize LandmarkShardDataset."""
self.rank = rank
self.worldsize = worldsize
self.dataset_dir = dataset_dir
self.img_names = list(self.dataset_dir.glob('img_*.npy'))

# Sharding
self.img_names = self.img_names[self.rank - 1::self.worldsize]
# Shuffling the results dataset after choose half pictures of each class
shuffle(self.img_names)

def __getitem__(self, index) -> np.ndarray:
"""Return a item by the index."""
# Get name key points file
# f.e. image name: 'img_123.npy, corresponding name of the key points: 'keypoints_123.npy'
kp_name = str(self.img_names[index]).replace('img', 'keypoints')
return np.load(self.img_names[index]), np.load(self.dataset_dir / kp_name)

def __len__(self) -> int:
"""Return the len of the dataset."""
return len(self.img_names)


class LandmarkShardDescriptor(ShardDescriptor):
"""Landmark Shard descriptor class."""

def __init__(self, data_folder: str = 'data',
rank_worldsize: str = '1, 1',
**kwargs) -> None:
"""Initialize LandmarkShardDescriptor."""
super().__init__()
# Settings for sharding the dataset
self.rank, self.worldsize = map(int, rank_worldsize.split(','))

self.data_folder = Path.cwd() / data_folder
self.download_data()

# Calculating data and target shapes
ds = self.get_dataset()
sample, target = ds[0]
self._sample_shape = [str(dim) for dim in sample.shape]
self._target_shape = [str(len(target.shape))]

if self._target_shape[0] != '1':
raise ValueError('Target has a wrong shape')

def process_data(self, name_csv_file) -> None:
"""Process data from csv to numpy format and save it in the same folder."""
data_df = pd.read_csv(self.data_folder / name_csv_file)
data_df.fillna(method='ffill', inplace=True)
keypoints = data_df.drop('Image', axis=1)
cur_folder = self.data_folder.relative_to(Path.cwd())

for i in range(data_df.shape[0]):
img = data_df['Image'][i].split(' ')
img = np.array(['0' if x == '' else x for x in img], dtype='float32').reshape(96, 96)
np.save(str(cur_folder / f'img_{i}.npy'), img)
y = np.array(keypoints.iloc[i, :], dtype='float32')
np.save(str(cur_folder / f'keypoints_{i}.npy'), y)

def download_data(self) -> None:
"""Download dataset from Kaggle."""
if self.is_dataset_complete():
return

self.data_folder.mkdir(parents=True, exist_ok=True)

logger.info('Your dataset is absent or damaged. Downloading ... ')
api = KaggleApi()
api.authenticate()

if Path('data').exists():
shutil.rmtree('data')

api.competition_download_file(
'facial-keypoints-detection',
'training.zip', path=self.data_folder
)

with ZipFile(self.data_folder / 'training.zip', 'r') as zipobj:
zipobj.extractall(self.data_folder)

(self.data_folder / 'training.zip').unlink()

self.process_data('training.csv')
(self.data_folder / 'training.csv').unlink()
self.save_all_md5()

def get_dataset(self, dataset_type='train') -> LandmarkShardDataset:
"""Return a shard dataset by type."""
return LandmarkShardDataset(
dataset_dir=self.data_folder,
rank=self.rank,
worldsize=self.worldsize
)

def calc_all_md5(self) -> Dict[str, str]:
"""Calculate hash of all dataset."""
md5_dict = {}
for root in self.data_folder.glob('*.npy'):
md5_calc = md5()
rel_file = root.relative_to(self.data_folder)

with open(self.data_folder / rel_file, 'rb') as f:
for chunk in iter(lambda: f.read(4096), b''):
md5_calc.update(chunk)
md5_dict[str(rel_file)] = md5_calc.hexdigest()
return md5_dict

def save_all_md5(self) -> None:
"""Save dataset hash."""
all_md5 = self.calc_all_md5()
with open(self.data_folder / 'dataset.json', 'w') as f:
json.dump(all_md5, f)

def is_dataset_complete(self) -> bool:
"""Check dataset integrity."""
dataset_md5_path = self.data_folder / 'dataset.json'
if dataset_md5_path.exists():
with open(dataset_md5_path, 'r') as f:
old_md5 = json.load(f)
new_md5 = self.calc_all_md5()
return new_md5 == old_md5
return False

@property
def sample_shape(self) -> List[str]:
"""Return the sample shape info."""
return self._sample_shape

@property
def target_shape(self) -> List[str]:
"""Return the target shape info."""
return self._target_shape

@property
def dataset_description(self) -> str:
"""Return the dataset description."""
return (f'Dogs and Cats dataset, shard number {self.rank} '
f'out of {self.worldsize}')
@@ -0,0 +1,2 @@
pynvml
kaggle
@@ -0,0 +1,6 @@
#!/bin/bash
set -e
ENVOY_NAME=$1
SHARD_CONF=$2

fx envoy start -n "$ENVOY_NAME" --disable-tls --envoy-config-path "$SHARD_CONF" -dh localhost -dp 50051

0 comments on commit df22c83

Please sign in to comment.