# FloydHub SDK Demo

This notebook shows how to use the Floyd SDK to automate your FloydHub workflow. You can do all the operations you perform on the cli programatically using the Python SDK. In fact the cli itself uses the sdk to communicate with the FloydHub server. Use pip to install the sdk.

The best way to execute this notebook is to create a new directory and copy this notebook in to that directory. Then populate the current directory with some files.

In [1]:
# Install sdk
!pip install -q floyd-cli

# Create some files for testing purposes
!echo "hello" > ./hello.txt
!echo "print (\"Hello world\")" > ./hello_world.py

# Authentication

First step is to authenticate yourselves against the FloydHub server. You can use your username / password combo to get an access token from the server. In future this will be replaced by an api key that one can create from the FloydHub website.

The token is saved by the AuthConfigManager and automatically accessed in subsequent sdk calls. The path where this token is stored is `~/.floydconfig`

In [2]:
from floyd.client.auth import AuthClient
from floyd.log import configure_logger
from floyd.model.access_token import AccessToken
from floyd.model.credentials import Credentials
from floyd.manager.auth_config import AuthConfigManager

# Initialize logger
configure_logger(verbose=False)

# Login using credentials (replace with your credentials)
login_credentials = Credentials(username="username", password="password123")
access_code = AuthClient().login(login_credentials)
user = AuthClient().get_user(access_code)
access_token = AccessToken(username=user.username,
                           token=access_code)

# Auth token is stored and automatically used in subsequent sdk calls
AuthConfigManager.set_access_token(access_token)

# Data

FloydHub manages data separately from code. You need to create a dataset directly from the [website](https://www.floydhub.com/datasets/create). Then use the dataset name in the section below to upload the contents of the current directory to FloydHub as a dataset. You will later mount this data into a job.

In [3]:
from floyd.client.data import DataClient
from floyd.client.dataset import DatasetClient
from floyd.manager.auth_config import AuthConfigManager
from floyd.manager.data_config import DataConfig
from floyd.cli.data_upload_utils import initialize_new_upload, complete_upload
from floyd.cli.utils import get_namespace_from_name

# Get access token from the stored config file
# Or re-authenticate from the previous step
access_token = AuthConfigManager.get_access_token()

# Replace with your dataset name
dataset_name = "floydlabs/test11"
dataset = DatasetClient().get_by_name(dataset_name)

namespace, name = get_namespace_from_name(dataset_name)
data_config = DataConfig(name=name,
                         namespace=namespace,
                         family_id=dataset.id)

# This is the actual upload step
initialize_new_upload(data_config, access_token, "new upload")
complete_upload(data_config)

Compressing data...
Making create request to server...
Initializing upload...
Uploading compressed data. Total upload size: 4.5KiB
Removing compressed data...
Upload finished.
Waiting for server to unpack data.
You can exit at any time and come back to check the status with:
	floyd data upload -r


Waiting for unpack....



NAME
---------------------------
floydlabs/datasets/test11/6


In [4]:
from floyd.manager.data_config import DataConfigManager
from floyd.cli.utils import normalize_data_name

# Get the uploaded data name
data_config = DataConfigManager.get_config()
data_name = normalize_data_name(data_config.data_name)

# Job

You can kick off a training job, monitor it and download the output all using the sdk. The next section shows how to run a job under a specific project. Create the project from the FloydHub [website](https://www.floydhub.com/projects/create) and use the project name in the next section.

In [6]:
from floyd.client.project import ProjectClient
from floyd.manager.experiment_config import ExperimentConfigManager
from floyd.manager.floyd_ignore import FloydIgnoreManager
from floyd.model.experiment_config import ExperimentConfig
from floyd.cli.utils import get_namespace_from_name

# Replace with your project name
project_name = "floydlabs/private-proj"
project = ProjectClient().get_by_name(project_name)

namespace, name = get_namespace_from_name(project_name)
experiment_config = ExperimentConfig(name=name,
                                     namespace=namespace,
                                     family_id=project.id)
ExperimentConfigManager.set_config(experiment_config)
FloydIgnoreManager.init()

# Mounting Data

You can mount any data on FloydHub (that you have access to) in to your job at the path you specify. In this case we are mounting the dataset we created above and mounting it at `/training` path. You also need to specify the floydhub instance type and the [environment](https://docs.floydhub.com/guides/environments/) you want to use.

Running a job is currently two step process - you first need to upload the code and then run the experiment (or job).

In [7]:
from floyd.client.experiment import ExperimentClient
from floyd.client.module import ModuleClient
from floyd.constants import INSTANCE_ARCH_MAP
from floyd.model.experiment import ExperimentRequest
from floyd.model.module import Module

# Run a job
# Get the data mount id (data_name comes from the previous step)
data_obj = DataClient().get(normalize_data_name(data_name))
data_ids = ["{}:{}".format(data_obj.id, "/training")]

# Define the data mount point for data
module_inputs = {
    "name": "/training",
    "type": "dir" # Always use dir here
}
    
# First create a module and then use it in the experiment create step

experiment_name = project_name
instance_type = "c1" # You can use c1 for cpu, c2 for cpu2, g1 for gpu and g2 for gpu2
project_id = project.id

# Get env value
arch = INSTANCE_ARCH_MAP[instance_type]
env = "tensorflow-1.5"  # Choose env that you need

module = Module(name=experiment_name,
                description='foo',
                command="ls /training",
                mode='command',
                family_id=project_id,
                inputs=module_inputs,
                env=env,
                arch=arch)

module_id = ModuleClient().create(module)
    
experiment_request = ExperimentRequest(name=experiment_name,
                                       description='foo',
                                       full_command='ls /training',
                                       module_id=module_id,
                                       env=env,
                                       data_ids=data_ids,
                                       family_id=project_id,
                                       instance_type=instance_type)
expt_info = ExperimentClient().create(experiment_request)

Creating project run. Total upload size: 26.1KiB
Syncing code ...


# Tracking an experiment

You can track an experiment periodically and wait for it to finish. You can also setup a [notification webhook](https://docs.floydhub.com/guides/notifications/) and get notified when jobs finish. You can also programatically download the output of your training job.

In [9]:
from floyd.client.experiment import ExperimentClient
from floyd.client.resource import ResourceClient

# Track experiment
job_id = expt_info['id']
experiment = ExperimentClient().get(job_id)
print(experiment.state)

# Stop running job (works only if the job is queued or running)
# ExperimentClient().stop(job_id)

# Get logs
log_resource_id = experiment.instance_log_id
logs = ResourceClient().get_content(log_resource_id)
print(logs)

# Download an output model file
output_id = experiment.output_id
data_url = "https://www.floydhub.com/api/v1/resources/{}?content=true&download=true".format(output_id)
DataClient().download_tar(url=data_url,
                          untar=True,
                          delete_after_untar=True)

success
2018-04-18 23:17:03,004 INFO - Preparing to run TaskInstance <TaskInstance: floydlabs/projects/private-proj/62 (id: NcZ99H8BGmuaocPQgmSD7f)
2018-04-18 23:17:03,015 INFO - Starting attempt 1 at 2018-04-19 06:17:03.008571
2018-04-18 23:17:03,028 INFO - Downloading and setting up data sources
2018-04-18 23:17:03,036 INFO - Downloading and mounting dataset. ETA: 2 seconds
2018-04-18 23:17:03,270 INFO - Pulling Docker image: floydhub/tensorflow:1.5.0-py3_aws.22
2018-04-18 23:17:04,423 INFO - Starting container...
2018-04-18 23:17:04,642 INFO - 
################################################################################

2018-04-18 23:17:04,642 INFO - Run Output:
2018-04-18 23:17:05,827 INFO - Starting supervisor: supervisord.
2018-04-18 23:17:05,829 INFO - floydhub_sdk_demo.ipynb
2018-04-18 23:17:05,830 INFO - hello.txt
2018-04-18 23:17:05,830 INFO - hello_world.py
2018-04-18 23:17:05,877 INFO - 
################################################################################



'output.tar'

# Support

This sdk is in beta. If you have any questions or are interested in adopting this for your workflow, please contact us at support@floydhub.com. We are happy to support you and work with you in automating your training.