# Overview

## Functional Overview

### Flow 1 :: Build a ML Model
1. Retrieve data for a provided category (drawing)
2. Preprocess the data
3. Train a ML model to identify (or classify) the drawing
4. Evaluate the model (testing)
5. This model will be used to predict (or classify) the picture submited by users

### Flow 2 :: Retrain (transfer) a model
1. User requests a new type of category
2. Retrieve data for that category
3. Uplift of retrain the **existing** model to identify the new category
4. Evaluate the model
5. Use the new model to predict drawings submited by the users

### Stretch goal :: Deploy a model on mobile

# Showback
1. Number of container spun up in the background?
2. 

# Basic Setup

## Install `kubeflow` components

In [None]:
# Install Jupyter autocompletion
!pip install jupyter_contrib_nbextensions
!jupyter contrib nbextension install - user
from jedi import settings
settings.case_insensitive_completion = True

USER_FLAG = "--user"
# Install ai platform and kfp
!pip3 install {USER_FLAG} google-cloud-aiplatform==1.3.0 --upgrade
!pip3 install {USER_FLAG} kfp --upgrade
!pip install google_cloud_pipeline_components

## Import kubeflow and Google AI libraries

In [None]:
from typing import NamedTuple
from kfp.v2 import dsl
from kfp.v2.dsl import (Artifact,
                        Dataset,
                        Input,
                        Model,
                        Output,
                        Metrics,
                        ClassificationMetrics,
                        component, 
                        OutputPath, 
                        InputPath)

from kfp.v2 import compiler
from google.cloud import bigquery
from google.cloud import aiplatform
from google.cloud.aiplatform import pipeline_jobs
from google_cloud_pipeline_components import aiplatform as gcc_aip

## Enable APIs

In [None]:
!gcloud services enable compute.googleapis.com         \
                       containerregistry.googleapis.com  \
                       aiplatform.googleapis.com  \
                       cloudbuild.googleapis.com \
                       cloudfunctions.googleapis.com

## Setup global variables

In [None]:
PATH=%env PATH
%env PATH={PATH}:/home/jupyter/.local/bin
REGION="us-central1"

from datetime import datetime
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")

# Get projet name
shell_output=!gcloud config get-value project 2> /dev/null
PROJECT_ID=shell_output[0]

# Set bucket name
BUCKET_NAME="gs://" + PROJECT_ID + "-quickdraw-" + TIMESTAMP

BASE_URL_QUICK_DRAW_NUMPY_DS = "gs://quickdraw_dataset/full/numpy_bitmap/"
# Create bucket
PIPELINE_ROOT = f"{BUCKET_NAME}/pipeline_root_wine/"
PIPELINE_ROOT

USER_FLAG = "--user"
#!gcloud auth login if needed

# Functional Components

## Retrieve the data
The Quick Draw Dataset is a collection of 50 million drawings across [345 categories](https://github.com/googlecreativelab/quickdraw-dataset/blob/master/categories.txt).  The characteristics of the data are explained [here](https://github.com/googlecreativelab/quickdraw-dataset).

In [None]:
# TODO: Copy over the dataset to local bucket only once

In [None]:
# Category param is the name of the category.  

def get_category_data(
    category: str,
    dataset_train: Output[Dataset],
    dataset_test: Output[Dataset]
):
    from numpy as np
    #initialize variables 
    x = np.empty([0, 784])
    y = np.empty([0])
    class_names = []

    all_categories = np.load("https://raw.githubusercontent.com/googlecreativelab/quickdraw-dataset/master/categories.txt")
    selected_category = all_categories.indexOf(category)

    data = np.load(BASE_URL_QUICK_DRAW_NUMPY_DS + category + ".npy")
    # max_items_per_class = 4000
    # data = data[0: max_items_per_class, :]
    labels = np.full(data.shape[0], selected_category)

    x = np.concatenate((x, data), axis=0)
    y = np.append(y, labels)

    # class_name, ext = os.path.splitext(os.path.basename(file))
    # class_names.append(class_name)

    # data = None
    # labels = None
    
    #randomize the dataset 
    permutation = np.random.permutation(y.shape[0])
    x = x[permutation, :]
    y = y[permutation]

    #separate into training and testing 
    vfold_size = int(x.shape[0]/100*(vfold_ratio*100))

    x_test = x[0:vfold_size, :]
    y_test = y[0:vfold_size]

    x_train = x[vfold_size:x.shape[0], :]
    y_train = y[vfold_size:y.shape[0]]
    return x_train, y_train, x_test, y_test, class_names