# Loading and using a prediction function

This notebook should be used in a different conda environment from the first, `environment2`. This environment should not have sklearn modules installed before running the contents of this notebook.

In [None]:
import cPickle as pickle
import pip

## Try loading prediction function without necessary packages

Without installing the necessary packages, we won't be able to use the prediction function. Confirm that this is true by attempting to load in the prediction function now. If you *don't* run into any problems here, go back and ensure that you're in an environment that doesn't have scikit-learn (or some other required package) installed.

In [None]:
def load_prediction_func(location):
    with open(location, 'r') as serialized_func_file:
        serialized_func = serialized_func_file.read()

    return pickle.loads(serialized_func)

In [None]:
# This should fail
load_prediction_func('serialized_function.txt')

## Install required packages

Identify the packages specified in the serialized dependencies file

In [None]:
with open('dependencies.txt', 'r') as dependencies_file:
    dependencies_list = dependencies_file.readlines()

required_packages = map(lambda s: s.strip(), dependencies_list)

Identify packages already installed and confirm that scikit-learn is not among them.

In [None]:
installed_packages = pip.get_installed_distributions()
installed_packages_list = ["{name}=={version}".format(name=m.key, version=m.version) for m in installed_packages]
print("Scikit-learn exists in installed_packages: {}".format('scikit-learn==0.18' in installed_packages_list))

Identify packages that need to be installed and confirm that scikit-learn is among them

In [None]:
needed_pakcages = set(required_packages) - set(installed_packages_list)
print("Scikit-learn exists in needed_packages: {}".format('scikit-learn==0.18' in needed_pakcages))

Install all needed packages

In [None]:
for package in needed_pakcages:
    pip.main(['install', package])

## Load serialized predict function

Now, try to load in the serialized function. It should work.

In [None]:
prediction_func = load_prediction_func('serialized_function.txt')

## Run the prediction function on some data

Import test data

In [None]:
import pandas as pd

# Taken from https://gist.github.com/dcrankshaw/f851ea2fee582f544288d36ae97ef86d
def load_digits(digits_location, digits_filename):
    digits_path = digits_location + "/" + digits_filename
    print "Source file:", digits_path
    df = pd.read_csv(digits_path, sep=",", header=None)
    data = df.values
    print "Number of image files:", len(data)
    y = data[:,0]
    X = data[:,1:]
    return (X, y)

digits_location = "" # Set this to path of the folder enclosing the .data files
test_data_fname = "test-mnist-dense-with-labels.data"
test_x, test_y = load_digits(digits_location, test_data_fname)

In [None]:
import numpy as np
def get_prediction_func_score(prediction_func, test_x, test_y):
    n, _ = test_x.shape
    correct = 0
    difference = np.subtract(prediction_func(test_x), test_y)
    for i in difference:
        if i == 0:
            correct += 1
    return float(correct)/n

Get predictions and see the results. Make sure that the data is being normalized by the prediction_function

In [None]:
accuracy = get_prediction_func_score(prediction_func, test_x, test_y)
print("Reconstructed prediction function has a {}% accuracy".format(accuracy * 100))