Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MXNet example as a plugin to OpenFL #349

Merged
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
118 changes: 118 additions & 0 deletions openfl-tutorials/interactive_api/MXNet_landmarks/README.md
@@ -0,0 +1,118 @@
# MXNet Facial Keypoints Detection tutorial
---
**Note:**

Please pay attention that this task uses the dataset from Kaggle. To get the dataset you
will need a Kaggle account and accept "Facial Keypoints Detection" competition rules.

---
ViktoriiaRomanova marked this conversation as resolved.
Show resolved Hide resolved

This tutorial shows how to use any other framework, different from already supported PyTorch and TensorFlow, together with OpenFl.

## Installation of Kaggle API credentials

**Before the start please make sure that you installed sd_requirements.txt on your virtual
environment on an envoy machine.**

To use the [Kaggle API](https://github.com/Kaggle/kaggle-api), sign up for
a [Kaggle account](https://www.kaggle.com). Then go to the `'Account'` tab of your user
profile `(https://www.kaggle.com/<username>/account)` and select `'Create API Token'`. This will
trigger the download of `kaggle.json`, a file containing your API credentials. Place this file in
the location `cd ~/.kaggle/kaggle.json`
ViktoriiaRomanova marked this conversation as resolved.
Show resolved Hide resolved

---
**Note:**

You will need to accept [competition rules](https://www.kaggle.com/c/facial-keypoints-detection/rules).

---
ViktoriiaRomanova marked this conversation as resolved.
Show resolved Hide resolved

For your security, ensure that other users of your computer do not have read access to your
credentials. On Unix-based systems you can do this with the following command:

`chmod 600 ~/.kaggle/kaggle.json`

If you need proxy add "proxy": `"http://<ip_addr:port>" in kaggle.json`. It should looks like
that: `{"username":"your_username","key":"token", "proxy": "ip_addr:port"}`

*Information about Kaggle API settings has been taken from kagge-api [readme](https://github.com/Kaggle/kaggle-api).*

*Useful [link](https://github.com/Kaggle/kaggle-api/issues/6) for a problem with proxy settings.*

### 1. About dataset

All information about the dataset you may find
on [link](https://www.kaggle.com/c/facial-keypoints-detection/data)

### 2. Adding support for a third-party framework

You need to write your own adapter class which is based on `FrameworkAdapterPluginInterface` [class](https://github.com/intel/openfl/blob/develop/openfl/plugins/frameworks_adapters/framework_adapter_interface.py). This class should contain at least two methods:

- `get_tensor_dict(model, optimizer=None)` - extracts tensor dict from a model and optionally[^1] an optimizer. The resulting tensors must be converted to **dict{str: numpy.array}** for forwarding and aggregation.

- `set_tensor_dict(model, tensor_dict, optimizer=None, device=None)` - sets aggregated numpy arrays into the model or model and optimizer. To do so it gets `tensor_dict` variable as **dict{str: numpy.array}** and should convert it into suitable for your model or model and optimizer tensors. After that, it must load the prepared parameters into the model/model and optimizer.

Your adapter should be placed in workspace directory. When you create `ModelInterface` class object at the `'***.ipunb'`, place the name of your adapter to the input parameter `framework_plugin`. Example:
```py
framework_adapter = 'mxnet_adapter.FrameworkAdapterPlugin'

MI = ModelInterface(model=model, optimizer=optimizer,
framework_plugin=framework_adapter)
```

[^1]: Whether or not to forward the optimizer parameters is set in the `start` method (FLExperiment [class](https://github.com/intel/openfl/blob/develop/openfl/interface/interactive_api/experiment.py) object, parameter `opt_treatment`).

### Run experiment

1. Create a folder for each `envoy`.
alexey-khorkin marked this conversation as resolved.
Show resolved Hide resolved
2. Put a relevant envoy_config in each of the n folders (n - number of envoys which you would like
to use, in this tutorial there is two of them, but you may use any number of envoys) and copy
other files from `envoy` folder there as well.
3. Modify each `envoy` accordingly:

- At `start_envoy.sh` change env_one to env_two (or any unique `envoy` names you like)

- Put a relevant envoy_config `envoy_config_one.yaml` or `envoy_config_two.yaml` (or any other
config file name consistent to the configuration file that is called in `start_envoy.sh`).
4. Make sure that you installed requirements for each `envoy` in your virtual
environment: `pip install -r sd_requirements.txt`
5. Run the `director`:
```sh
cd director_folder
./start_director.sh
```

6. Run the `envoys`:
```sh
cd envoy_folder
./start_envoy.sh env_one shard_config_one.yaml
```
If kaggle-API setting are
correct the download of the dataset will be started. If this is not the first `envoy` launch
then the dataset will be redownloaded only if some part of the data are missing.

7. Run the [MXNet_landmarks.ipynb](workspace/MXNet_landmarks.ipynb) notebook using
Jupyter lab in a prepared virtual environment. For more information about preparation virtual
environment look **[
Preparation virtual environment](#preparation-virtual-environment)**
.

### Preparation virtual environment

* Create virtual environment

```sh
python3 -m venv venv
```

* To activate virtual environment

```sh
source venv/bin/activate
```

* To deactivate virtual environment

```sh
deactivate
```
@@ -0,0 +1,5 @@
settings:
listen_host: localhost
listen_port: 50051
sample_shape: ['96', '96']
target_shape: ['1']
@@ -0,0 +1,4 @@
#!/bin/bash
set -e

fx director start --disable-tls -c director_config.yaml
@@ -0,0 +1,12 @@
params:
cuda_devices: [0]

optional_plugin_components:
cuda_device_monitor:
template: openfl.plugins.processing_units_monitor.pynvml_monitor.PynvmlCUDADeviceMonitor
settings: []

shard_descriptor:
template: landmark_shard_descriptor.LandmarkShardDescriptor
params:
rank_worldsize: 1, 2
@@ -0,0 +1,12 @@
params:
cuda_devices: [1]

optional_plugin_components:
cuda_device_monitor:
template: openfl.plugins.processing_units_monitor.pynvml_monitor.PynvmlCUDADeviceMonitor
settings: []

shard_descriptor:
template: landmark_shard_descriptor.LandmarkShardDescriptor
params:
rank_worldsize: 2, 2
@@ -0,0 +1,169 @@
# Copyright (C) 2020-2021 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
ViktoriiaRomanova marked this conversation as resolved.
Show resolved Hide resolved

"""Landmarks Shard Descriptor."""

import json
import shutil
from hashlib import md5
from logging import getLogger
from pathlib import Path
from random import shuffle
from typing import Dict
from typing import List
from zipfile import ZipFile

import numpy as np
import pandas as pd
from kaggle.api.kaggle_api_extended import KaggleApi

from openfl.interface.interactive_api.shard_descriptor import ShardDataset
from openfl.interface.interactive_api.shard_descriptor import ShardDescriptor

logger = getLogger(__name__)


class LandmarkShardDataset(ShardDataset):
"""Landmark Shard dataset class."""

def __init__(self, dataset_dir: Path,
rank: int = 1, worldsize: int = 1) -> None:
"""Initialize LandmarkShardDataset."""
self.rank = rank
self.worldsize = worldsize
self.dataset_dir = dataset_dir
self.img_names = list(self.dataset_dir.glob('img_*.npy'))

# Sharding
self.img_names = self.img_names[self.rank - 1::self.worldsize]
# Shuffling the results dataset after choose half pictures of each class
shuffle(self.img_names)

def __getitem__(self, index) -> np.ndarray:
"""Return a item by the index."""
# Get name key points file
# f.e. image name: 'img_123.npy, corresponding name of the key points: 'keypoints_123.npy'
kp_name = str(self.img_names[index]).replace('img', 'keypoints')
return np.load(self.img_names[index]), np.load(self.dataset_dir / kp_name)

def __len__(self) -> int:
"""Return the len of the dataset."""
return len(self.img_names)


class LandmarkShardDescriptor(ShardDescriptor):
"""Landmark Shard descriptor class."""

def __init__(self, data_folder: str = 'data',
rank_worldsize: str = '1, 1',
**kwargs) -> None:
"""Initialize LandmarkShardDescriptor."""
super().__init__()
# Settings for sharding the dataset
self.rank, self.worldsize = map(int, rank_worldsize.split(','))

self.data_folder = Path.cwd() / data_folder
self.download_data()

# Calculating data and target shapes
ds = self.get_dataset()
sample, target = ds[0]
self._sample_shape = [str(dim) for dim in sample.shape]
self._target_shape = [str(len(target.shape))]

if self._target_shape[0] != '1':
raise ValueError('Target has a wrong shape')

def process_data(self, name_csv_file) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do processing in runtime, without saving additional files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can, but then we have to process csv file every time we start the experiment and keep all images in RAM. I guess processing them just once is better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the situation correctly, there is a single csv file with all the labels, and you read and split it in at Envoy start time.
In this situation, you do not need to save separate labels to disk, just keep them in memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right, we can do it. I just thought that will be easier to process them the same way as a picture.

"""Process data from csv to numpy format and save it in the same folder."""
data_df = pd.read_csv(self.data_folder / name_csv_file)
data_df.fillna(method='ffill', inplace=True)
keypoints = data_df.drop('Image', axis=1)
cur_folder = self.data_folder.relative_to(Path.cwd())

for i in range(data_df.shape[0]):
img = data_df['Image'][i].split(' ')
img = np.array(['0' if x == '' else x for x in img], dtype='float32').reshape(96, 96)
np.save(str(cur_folder / f'img_{i}.npy'), img)
y = np.array(keypoints.iloc[i, :], dtype='float32')
np.save(str(cur_folder / f'keypoints_{i}.npy'), y)

def download_data(self) -> None:
"""Download dataset from Kaggle."""
self.data_folder.mkdir(parents=True, exist_ok=True)

if not self.is_dataset_complete():
logger.info('Your dataset is absent or damaged. Downloading ... ')
api = KaggleApi()
api.authenticate()

ViktoriiaRomanova marked this conversation as resolved.
Show resolved Hide resolved
if Path('data').exists():
shutil.rmtree('data')

api.competition_download_file(
'facial-keypoints-detection',
'training.zip', path=self.data_folder
)

with ZipFile(self.data_folder / 'training.zip', 'r') as zipobj:
zipobj.extractall(self.data_folder)

(self.data_folder / 'training.zip').unlink()

self.process_data('training.csv')
(self.data_folder / 'training.csv').unlink()
self.save_all_md5()

def get_dataset(self, dataset_type='train') -> LandmarkShardDataset:
"""Return a shard dataset by type."""
return LandmarkShardDataset(
dataset_dir=self.data_folder,
rank=self.rank,
worldsize=self.worldsize
)

def calc_all_md5(self) -> Dict[str, str]:
"""Calculate hash of all dataset."""
md5_dict = {}
for root in self.data_folder.glob('*.npy'):
md5_calc = md5()
rel_file = root.relative_to(self.data_folder)

with open(self.data_folder / rel_file, 'rb') as f:
for chunk in iter(lambda: f.read(4096), b''):
md5_calc.update(chunk)
md5_dict[str(rel_file)] = md5_calc.hexdigest()
return md5_dict
ViktoriiaRomanova marked this conversation as resolved.
Show resolved Hide resolved

def save_all_md5(self) -> None:
"""Save dataset hash."""
all_md5 = self.calc_all_md5()
with open(self.data_folder / 'dataset.json', 'w') as f:
json.dump(all_md5, f)

def is_dataset_complete(self) -> bool:
"""Check dataset integrity."""
new_md5 = self.calc_all_md5()
if (self.data_folder / 'dataset.json').exists():
with open(self.data_folder / 'dataset.json', 'r') as f:
old_md5 = json.load(f)
else:
return False

return new_md5 == old_md5
ViktoriiaRomanova marked this conversation as resolved.
Show resolved Hide resolved

@property
def sample_shape(self) -> List[str]:
"""Return the sample shape info."""
return self._sample_shape

@property
def target_shape(self) -> List[str]:
"""Return the target shape info."""
return self._target_shape

@property
def dataset_description(self) -> str:
"""Return the dataset description."""
return (f'Dogs and Cats dataset, shard number {self.rank} '
f'out of {self.worldsize}')
@@ -0,0 +1,2 @@
pynvml
ViktoriiaRomanova marked this conversation as resolved.
Show resolved Hide resolved
kaggle
@@ -0,0 +1,6 @@
#!/bin/bash
set -e
ENVOY_NAME=$1
SHARD_CONF=$2

fx envoy start -n "$ENVOY_NAME" --disable-tls --envoy-config-path "$SHARD_CONF" -dh localhost -dp 50051