Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
179 lines (126 sloc) 5.07 KB

Tutorial

In this tutorial a data owner (also referred to as data provider) will publish private data to the fitchain network and provide a description of the data science problem that needs to be solved. A data scientist (also referred to as model provider) can apply to the project made available by data provider, inspect the data source(s) that are connected to the project and provide a solution in the form of a machine learning model (using libraries such as sklearn or tensorflow).

Data provider

The following preliminary steps are required by data provider to submit a project

  1. Run the IPFS daemon $ ipfs daemon

  2. Run Geth and synchronize to the Ethereum blockchain (for a test without involving real ETH, one can connect to the Rinkeby testnet)

$ geth --rinkeby --ws --wsapi admin,eth,web3,personal,miner,rpc --wsaddr localhost --wsorigins="*"
  1. Run the fitchain pod
$ cd fitchain-pod
$ node index
  1. Run the fitchain dashboard (and pointing the web browser to http://localhost:9090)
$ cd fitchain-pod-ui
$ npm run dev

Use the dashboard to add data provider, add datasource with description, create project and submit

Model provider

A model provider needs to execute steps 1, 2, and 3 in order to connect to the fitchain network with an Ethereum account. He/she can use the fitchain command line interface to inspect the schema of the private data published by data provider and provide a solution to the data science problem of choice.

How to use the fitchain command line interface

The command line interface is an executable that allows one to apply to projects, deploy machine learning models and inspect logs and metrics of submitted jobs.

To list all commands type

fitchain help
The fitchain commandline client can be used to communicate with a pod from the commandline. 
This might be preferable while designing models.

Usage:
  fitchain [command]

Available Commands:
  accounts    List the accounts in the pod
  help        Help about any command
  identity    Get the pod identity details
  jobs        Get a list of all the jobs for the given workspace
  lock        Lock the given account
  logs        Get the logs for the given job
  metrics     Get the metrics for the given job
  project     show information about the given project
  projects    search for projects matching the given search string
  register    Create a new account with the provided password
  status      Get the status of the provided account
  unlock      Unlock the given account
  workspace   Workspace commands
  workspaces  Get a list of all the workspaces on the pod

Flags:
  -h, --help            help for fitchain
  -n, --name string     The name of the workspace
  -u, --podUri string   The uri of the pod (default "http://localhost:9400/v1")
  -w, --wsUri string    The websocket uri of the pod (default "ws://localhost:9400/v1")

Use "fitchain [command] --help" for more information about a command.

To list all projects

$ fitchain projects

To show the info of the specific project

$ fitchain project project_id

Create a folder for the workspace

$ mkdir ~/my_folder

From the newly created folder, initialize the workspace with the relative project_id

$ cd ~/my_folder 
$ fitchain workspace init --name my_workspace project_id`

At this point a new Python model can be written and saved to disk (eg. to my_model.py) When done with the code, save it to the workspace

The machine learning model should be written between some boiler plate code that is generated by the command line interface. Such code (specific to the keras library for neural networks) looks like the one below:

from fitchain import Runtime

# initialize the runtime
runtime = Runtime()

# Load the dataset from the project
data = runtime.resolve("datasource_id")

# ###########################################
# WRITE KERAS/TENSORFLOW/SKLEARN MODEL HERE
#
#
#
#
#
# 
# ############################################

# For keras you need to perform the following steps:
from fitchain import keras as ker
model_id = 'my_model_id'
ker.store_train_params(model_id, x=train_data, y=train_labels)
ker.store_validate_params(model_id, x=valid_data, y=valid_labels)

# Fit model with entire data batch (can be prohibitive with many images)
ker.fit(model_id, model, x_train=train_data, y_train=train_labels, epochs=epochs)

After inspecting the synthetic data and writing the machine learning model, it is time to save it to the current workspace. This can be done with

$ fitchain workspace save my_model.py

Model provider can ask data provider to start training the model with

$ fitchain workspace run

Once the model has been deployed, model provider can view the jobs attached to the current workspace with

$ fitchain jobs my_workspace 

This will return the job_id of the submitted job

Model provider can inspect the logs of the current job_id with

$ fitchain logs job_id

and the collected metrics (if available) with

$ fitchain metrics job_id

Happy modeling!