# Resources

http://cloudacademy.com/blog/google-prediction-api/
http://cloudacademy.com/blog/google-vision-api-image-analysis/
https://www.youtube.com/watch?v=O3mfuc-syTI

Video from IO:
https://cloud.google.com/prediction/docs

# Steps to first test

Referencing the first link from cloudacademy - very good reference. 

* Downloaded data from https://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones
to git_repos folder. 

* installed google api client for python: 
pip install --upgrade google-api-python-client

It looks like the free version of prediction api allows up to 5MB of trained data per day so I need to split the files: 
* Used split to split the file (60MB) into files of size 1MB. 
split -b 1m X_train.txt

This creates lots of 1MB files labelled xaa, xab, etc. 
Next count the number of lines in the file: wc -l x_train1.txt   Result is 116 lines. 

Ditch the new file and just split off the first 116 lines off both training files using head. Just need a smaller sample file for now. Should ideally shuffle the file first. 

head -116 X_train.txt > x_train1.txt
head -116 y_train.txt > y_train1.txt

So training files are done. Now get the test files - maybe use 1/4 of this record count. 
head -30 X_test.txt > x_test1.txt
head -30 y_test.txt > y_test1.txt

Copy this into our repo. 

Ran script from cloudacademy which basically concats the data into one file and adds the y data as the first column.  

In [1]:
from itertools import izip
import re

output_file = 'dataset.csv'
input_files = {
    'human_activities/x_train1.txt': 'human_activities/y_train1.txt', 
    'human_activities/x_test1.txt': 'human_activities/y_test1.txt'
}

def getOutputLines(filenames):
    for X,y in filenames.iteritems():
        with open(X) as Xf, open(y) as yf:
            for Xline, yline in izip(Xf, yf):
                Xline = re.sub(' +', ' ', Xline).strip() #remove multiple white spaces and strip
                yield ','.join([yline.strip()] + Xline.split(' ')) + "\n" #concat in csv format
                


with open(output_file, 'w+') as f:
    for newline in getOutputLines(input_files):
        f.writelines(newline)

Next followed the instructions to create a project and a service account on the cloudacademy blog. 
All worked fine. json file downloaded. 
Check git repo for the details of the files. 
* new project created
* prediction api activated for project
* new service account created for project
* json credential file for the service account downloaded locally 
* had to amend part of the code 
    get_api referenced "credentials = client.SignedJwtAssertionCredentials(email, key, scope=scope)" 
    had to replace this with: 
        from oauth2client.service_account import ServiceAccountCredentials
        ....
        credentials = ServiceAccountCredentials.from_json_keyfile_name('Human Interact-672e0199b3c7.json', scope)
* thereafter it worked but I did run it from the command line. 
* note the prediction is based on a single record in a record.csv file. 

In [22]:
import httplib2, argparse, os, sys, json
from oauth2client import tools, file, client
from oauth2client.service_account import ServiceAccountCredentials
from googleapiclient import discovery
from googleapiclient.errors import HttpError

#Project and model configuration
project_id = 'human-interact'
#model_id is something we decide. 
model_id = 'HAR-model'

#activity labels
labels = {
	'1': 'walking', '2': 'walking upstairs', 
	'3': 'walking downstairs', '4': 'sitting', 
	'5': 'standing', '6': 'laying'
}


def main():
	""" Simple logic: train and make prediction
        This caters for the async nature of the call/process. 
        The very first time the train_model will be called async
        The model will be created and when this code is called again
        the make_prediction will have access to and use the existing model
    """
	try:
		make_prediction()
	except HttpError as e: 
		if e.resp.status == 404: #model does not exist
			print("Model does not exist yet.")
			train_model()
			make_prediction()
		else: #real error
			print(e)

def make_prediction():
	""" Use trained model to generate a new prediction """

	api = get_prediction_api()
	
	print("Fetching model.")

	model = api.trainedmodels().get(project=project_id, id=model_id).execute()

	if model.get('trainingStatus') != 'DONE':
		print("Model is (still) training. \nPlease wait and run me again!") #no polling
		exit()

	print("Model is ready.")
	
	"""
	#Optionally analyze model stats (big json!)
	analysis = api.trainedmodels().analyze(project=project_id, id=model_id).execute()
	print(analysis)
	exit()
	"""

	#read new record from local file
	with open('record.csv') as f:
		record = f.readline().split(',') #csv

	#obtain new prediction
	prediction = api.trainedmodels().predict(project=project_id, id=model_id, body={
		'input': {
			'csvInstance': record
		},
	}).execute()

	#retrieve classified label and reliability measures for each class
	label = prediction.get('outputLabel')
	stats = prediction.get('outputMulti')

	#show results
	print("You are currently %s (class %s)." % (labels[label], label) ) 
	print(stats)
            
            
def train_model():
	""" Create new classification model """

	api = get_prediction_api()

	print("Creating new Model.")

	api.trainedmodels().insert(project=project_id, body={
		'id': model_id,
		'storageDataLocation': 'human-interact-dataset/dataset.csv',
		'modelType': 'CLASSIFICATION'
	}).execute()

def get_prediction_api(service_account=True):
    scope = [
        'https://www.googleapis.com/auth/prediction',
        'https://www.googleapis.com/auth/devstorage.read_only'
    ]
    return get_api('prediction', scope, service_account)

def get_api(api, scope, service_account=True):
	""" Build API client based on oAuth2 authentication """
	STORAGE = file.Storage('oAuth2.json') #local storage of oAuth tokens
	credentials = STORAGE.get()
	if credentials is None or credentials.invalid: #check if new oAuth flow is needed
		if service_account: #server 2 server flow
			with open('Human Interact-672e0199b3c7.json') as f:
				account = json.loads(f.read())
				email = account['client_email']
				key = account['private_key']
			credentials = ServiceAccountCredentials.from_json_keyfile_name('Human Interact-672e0199b3c7.json', scope)
			STORAGE.put(credentials)
		else: #normal oAuth2 flow
			CLIENT_SECRETS = os.path.join(os.path.dirname(__file__), 'client_secrets.json')
			FLOW = client.flow_from_clientsecrets(CLIENT_SECRETS, scope=scope)
			PARSER = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter, parents=[tools.argparser])
			FLAGS = PARSER.parse_args(sys.argv[1:])
			credentials = tools.run_flow(FLOW, STORAGE, FLAGS)
		
	#wrap http with credentials
	http = credentials.authorize(httplib2.Http())
	return discovery.build(api, "v1.6", http=http)

if __name__ == '__main__':
	main()


## Observations and comments

* Simple example using data uploaded to a cloud storage bucket which is easily accessed by the prediction api. 
Api runs async and local code needs to poll the api to get the response. 

* So the model HAR-model would have been created after the training and any prediction requests would be based off the model thereafter and should be faster. In order to update the model you need to run the update api. 
The list of apis in python are here: 
https://developers.google.com/resources/api-libraries/documentation/prediction/v1.6/python/latest/prediction_v1.6.trainedmodels.html

* analyze - special mention of this api is made in the article as it returns a confusion matrix which is quite useful. 

* When submitting the training data the y values are prepended to the X matrix as column 1 and the test and train data are consolidated. 

* google provides some performance enhancement suggestions including the minimum amount of features and data that should be provided. 