This notebook contains functions relevent for the machine learning case study

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Imports" data-toc-modified-id="Imports-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Imports</a></span></li><li><span><a href="#Define-all-functions-within-notebook" data-toc-modified-id="Define-all-functions-within-notebook-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Define all functions within notebook</a></span></li><li><span><a href="#Define-all-functions" data-toc-modified-id="Define-all-functions-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Define all functions</a></span><ul class="toc-item"><li><span><a href="#FUNCTION---linear_reg_model_creation" data-toc-modified-id="FUNCTION---linear_reg_model_creation-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>FUNCTION - linear_reg_model_creation</a></span></li><li><span><a href="#FUNCTION---prediction_using_model" data-toc-modified-id="FUNCTION---prediction_using_model-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>FUNCTION - prediction_using_model</a></span></li><li><span><a href="#FUNCTION---single_step_create_predict" data-toc-modified-id="FUNCTION---single_step_create_predict-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>FUNCTION - single_step_create_predict</a></span></li></ul></li><li><span><a href="#Help-with-Markdown-cells" data-toc-modified-id="Help-with-Markdown-cells-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Help with Markdown cells</a></span></li><li><span><a href="#Remember-you-can-use-latex-equations" data-toc-modified-id="Remember-you-can-use-latex-equations-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Remember you can use latex equations</a></span></li></ul></div>

# Imports

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Define all functions within notebook
This function will be read by the website to inform import functional decisions &mdash; so be careful!


Some rules:
1. **Must** be a function called **in_out_def()**
2. **Must return a dictionary in the form:**  
        {  
        "func_1_name":  
        {   
        'inputs: {"param_1": "param_type", "param_2": "param_type"}  
        'outputs: {"output_param_1": "output_param_type"}  
        },  
        "func_2_name":  
        {  
        'inputs: {"param_1": "param_type", "param_2": "param_type"}  
        'outputs: {"output_param_1": "output_param_type"}  
        },  
        }  
3. Input parameter types include:
    - "variable" (for passing a variable from a previous step)
    - "text_input" (for text input by user)
    - "file_browse" (for user providing a file)
    - "files_browse" (for user providing multiple files)
    - "float_input" (for user providing a float value)
    - "int_input" (for user providing an integer value)
    - "boolean" (for providing a true or false value)
    - "choice[options]" (for providing specific options to choose from)

In [2]:
def in_out_def():
    return {
        'linear_reg_model_creation': {'inputs': {"df": "variable", "output_column": "text_input",
                                     "columns_not_required": "text_input"},
                      'outputs': {"reg": "variable",'scaler_used':'variable',
                                  "mean_squared_error":"variable","regression_cols":"variable"}},
        'prediction_using_model':{'inputs':{"model":"variable",'scaler_used':'variable',
                                            'df':"variable","regression_cols":'variable'},
                      'outputs':{'df':'variable'}},
        'single_step_create_predict':{'inputs':{'training_csv':'file_browse','predict_csv':'file_browse',
                                                'output_column': "text_input", 'columns_not_required': "text_input"},
                                     'outputs':{'df':'variable',"fig":"graph"}}
        
        
    }

# Define all functions
Some rules:
* Add a markdown cell heading, beginning with **FUNCTION -**, to introduce the function 
* Ensure a Doc String (contained within '''Some Text''') is the first line of the function
* Ensure all required parameters are set with default values.
* Ensure something is always returned 
- Return variables within a dictionary where the key provides a name
- Return the same dictionary but with **None** values if there is an error
* Ensure the function is correctly described within **in_out_def**
* Ensure some parameter validation occurs (**isinstance** is a good check)

## FUNCTION - linear_reg_model_creation

In [3]:
def linear_reg_model_creation(df=None, output_column=None, columns_not_required=None):
    '''Take df, a target column, and columns to discard and create a linear regression model relating the inputs and outputs'''
    #try:
    if isinstance(df, pd.DataFrame) and isinstance(output_column,
    str) and isinstance(columns_not_required, str):
        cols=[a.replace(" ","_") for a in df.columns]
        df.columns=cols
        #print(cols)
        columns_not_required="".join(columns_not_required.split()).split(',')
        if output_column in [*df.columns] and False not in [True if cc in [*df.columns] else False for cc in columns_not_required]:
            df = df.drop(columns_not_required, axis=1)
            Y = df[output_column]
            X = df.drop([output_column], axis=1)
            min_max_scaler = preprocessing.MinMaxScaler()
            X_scaled = pd.DataFrame(min_max_scaler.fit_transform(X), columns=X.columns)
            X_train, X_test, Y_train, Y_test = train_test_split(X_scaled, Y, test_size=0.3, random_state=42)
            reg = LinearRegression()
            _ = reg.fit(X_train, Y_train)
            predictions = reg.predict(X_test)
            error = mean_squared_error(Y_test, predictions)
            return {"reg": reg,"scaler_used":min_max_scaler,"mean_squared_error":error,"regression_cols":[*X_test.columns]}
    #except:
    #    pass
    return {"reg": None,"scaler_used":None,"mean_squared_error":None,"regression_cols":None}

## FUNCTION - prediction_using_model

In [4]:
def prediction_using_model(model=None,scaler_used=None,df=None,regression_cols=None):
    '''Use a sklearn ML model to predict outputs for a provided data set'''
    if isinstance(regression_cols,str):
        try:
            regression_cols="".join(regression_cols.split()).split(',')
            #print(regression_cols)
        except:
            pass
    if isinstance(model,type(LinearRegression())) and isinstance(scaler_used,type(preprocessing.MinMaxScaler())) and isinstance(df,pd.DataFrame) and isinstance(regression_cols,list):
        try:
            #regression_cols="".join(regression_cols.split()).split(',')
            X=df[regression_cols]
            X_scaled = pd.DataFrame(scaler_used.transform(X), columns=X.columns)
            predictions = model.predict(X_scaled)
            X["Predicted_%_Silica_Concentrate"]=pd.Series(predictions)
            return{"df":X}
        except:
            pass
    return{"df":None}

## FUNCTION - single_step_create_predict

In [5]:
from importnb import Notebook
with __import__('importnb').Notebook():
    from df_to_fig import df_to_hist
def single_step_create_predict(training_csv=None,predict_csv=None, output_column=None, columns_not_required=None):
    '''Single step model creation and prediction'''
    df=pd.read_csv(training_csv)
    predict_df=pd.read_csv(predict_csv)
    conversions=[(['Starch_Flow','Amina_Flow','Ore_Pulp_Flow'],101.941),(['Ore_Pulp_Density'],0.0000160185),(['Flotation_Column_01_Level','Flotation_Column_02_Level'
],25.4)]
    for conv_tuple in conversions:
        for col in conv_tuple[0]:
            predict_df[col]=pd.to_numeric(predict_df[col],errors='coerce')
            predict_df[col]=predict_df[col]*conv_tuple[1]
    if isinstance(df, pd.DataFrame) and isinstance(predict_df, pd.DataFrame) and isinstance(output_column,
    str) and isinstance(columns_not_required, str):
        dict_a=linear_reg_model_creation(df=df, output_column=output_column, columns_not_required=columns_not_required)
        #print(predict_df.columns,dict_a['regression_cols'])
        dict_b=prediction_using_model(model=dict_a["reg"],scaler_used=dict_a["scaler_used"],df=predict_df,regression_cols=dict_a["regression_cols"])
        dict_c=df_to_hist(df=dict_b['df'],data='Predicted_%_Silica_Concentrate',group_by='None',bins=5)
        dict_b['fig']=dict_c['fig']
        return dict_b

In [6]:
#single_step_create_predict(training_csv="mining_training.csv",predict_csv="Day_Input.csv", output_column='%_Silica_Concentrate', columns_not_required='date,Ore_Pulp_pH,%_Iron_Concentrate')

# Help with Markdown cells

For help with markdown vist [markdownguide](https://markdownguide.org/basic-syntax/)

# Remember you can use latex equations

* To write equations first start a math environment using double dollar signs    
* Then write your equations and close the environment with double dollar signs  
* For in-line equations use single dollar signs $e=mc^2$  
* Remember Latex uses \\ to indicate many equation objects and use {} to surround information when its for a specific object.  
* \_ indicates subscript and \^ indicates superscript (you will need to use {} if the sub or superscript is not a single number  * ie. {-i, j}). 
$$
X=\frac{A_{-i, j}}{B^2}
$$


See [Mathjax](https://math.meta.stackexchange.com/questions/5020/mathjax-basic-tutorial-and-quick-reference) for more help.