## Building and Deploying a Streamlit App for Maching Learning
In this tutorial, we walk through how you can build and deploy a visual front-end to your machine learning applications. 
We will use [streamlit](https://docs.streamlit.io/library/get-started/main-concepts), which an easy-to-use framework for creating interactive visualizations and applications in Python (this is in contrast to [plotly dash](https://dash.plotly.com) which, although technically python, relies a lot on HTML and CSS knowledge).

### Installing Streamlit and Making a Virtual Environment 
When we make a new front-end application, we want to use a virtual environment. [Pipenv](https://pipenv-fork.readthedocs.io/en/latest/) will be our tool of choice. Pipenv makes it easy to combine both requirements management and virtual environments at the same time.

First, move into the project directory:

`cd my_project`

Then, run:

`pipenv shell` 
to activate the pipenv.

That's all you have to do! 

After that, when you install packages via `pipenv install [package_name]`, they will be automatically added to your `Pipfile.lock`. No need to update a requirements file all the time!

NOTE: This step is *VERY* important for being able to successfully deploy your app later on! Make sure you don't forget this step!!! 

Note: if using pycharm, it can be helpful to use [these steps](https://stackoverflow.com/questions/46251411/how-do-i-properly-setup-pipenv-in-pycharm) to make sure you're using the virtual environment when running your code in pycharm. 

### Starting up the App
Open the file entitled `streamlit_app.py`.

Here you have a basic skeleton of an application. This code will simply display a table with 2 columns.


<img src="./images/code_snippet_1.png" width="300"/>
Code of the basic app.


We run the app using the following command:

`streamlit run streamlit_app.py`

This command opens a new browser window with our streamlit app running. The great thing about streamlit is that the app will be automatically updated whenever we save the file. 

Running the command, we see the following output in the browser:
<img src="./images/app_screenshot_1.png" width="300"/>


### Working with real data 
Here, we will import the actual Telco churn data and use it as an example for making a streamlit application to deploy. 

In [None]:
import streamlit as st
import pandas as pd

# importing customer churn data
data = pd.read_csv("./data/training_data.csv", index_col=0)
st.write("Churn Data")
st.write(data)

Note: look at what happens when you replace the line
`st.write(data)` in the code above with `st.table(data)`.

Now, our streamlit app looks like this:

<img src="./images/app_screenshot_2.png" width="300"/>

Note that it shows the entire dataframe, and we can even move around to explore the columns and rows.

### Adding Elements
Now, we will explore adding more elements to our app and making some visualizations. 

We'll start how we should always start in a new DS project: by visualizing the target.

Add the code in the next cell to your streamlit app, underneath the dataset visualization. You'll now see a nice bar chart. 

<img src="./images/app_screenshot_3.png" width="300"/>

In [None]:
st.write("How many customers in the dataset churned?")
target_bins = data.loc[:, 'Churn'].value_counts()
st.bar_chart(target_bins)

#### Exercise 
Add some new elements to your application. Look in the [streamlit documentation](https://docs.streamlit.io/library/api-reference/widgets/st.selectbox).

You can use the cells below to save your code. 

### Making the Prediction App
Here, we will look at integrating machine learning models into the application, and making predicitons. We will update the code in our file `streamlit_app.py` so that it no longer just shows analysis, but full code. 

First, we will train a model. The code below does this.

In [None]:
import pickle
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_recall_fscore_support, accuracy_score

In [None]:
# get training data
train = pd.read_csv("./data/training_data.csv")
# drop customer ID: not a feature for training 
train.drop("customerID", axis=1, inplace=True)

# getting validation data
val = pd.read_csv("./data/validation_data.csv")

"""
Data Pre-Processing

The following columns need to be fixed in order to train the model:
1. All yes/no categories category encoded
2. All category columns category encoded
3. "Churn" column category encoded
"""

categorical_columns = ['gender', 'SeniorCitizen', 'Partner', 'Dependents',
       'PhoneService', 'MultipleLines', 'InternetService',
       'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',
       'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling',
       'PaymentMethod', 'Churn']
# converting all the categorical columns to numeric
col_mapper = {}
for col in categorical_columns:
    le = LabelEncoder()
    le.fit(train.loc[:, col])
    class_names = le.classes_
    train.loc[:, col] = le.transform(train.loc[:, col])
    # saving encoder for each column to be able to inverse-transform later
    col_mapper.update({col: le})

train.replace(" ", "0", inplace=True)

# converting "Total Charges" to numeric
train.loc[:, "TotalCharges"] = pd.to_numeric(train.loc[:, "TotalCharges"])

# converting data pre-processing steps to a function to apply to new data
def pre_process_data(df, label_encoder_dict):
    df.drop("customerID", axis=1, inplace=True)
    for col in df.columns:
        if col in list(label_encoder_dict.keys()):
            column_le = label_encoder_dict[col]
            df.loc[:, col] = column_le.transform(df.loc[:, col])
        else:
            continue

    return df

# splitting into X and Y
x_train = train.drop("Churn", axis=1)
y_train = train.loc[:, "Churn"]

# fitting model
model = LogisticRegression(max_iter=1000)
model.fit(x_train, y_train)

# pre-processing validation data
val = pre_process_data(val, col_mapper)

# split validation set 
x_val = val.drop("Churn", axis=1)
y_val = val.loc[:, "Churn"]

# predicting on validation
predictions = model.predict(x_val)
precision, recall, fscore, support = precision_recall_fscore_support(y_val, predictions)
accuracy = accuracy_score(y_val, predictions)
print(f"Validation accuracy is: {round(accuracy, 3)}")

#### Preparing the code for use in the app
We need to update the code used for making a prediction. 

For this, we also need anything that must be applied to the model to make a prediction: like label encoders, and of course the trained model itself. 

Steps:
1. Save the trained model and fitted label encoder after training. They can be saved as a pickle. These will be used later to apply the necessary transformations to new, incoming data. 
    - How to: add code at the bottom of your training code to save the model and encoder as a pickle

2. Update the prediction code to take in new data, apply the saved label encoders and trained model to the new data 
    - How to: make a file with prediction code that loads the pickled model and label encoders, and applies them to new data. Should return a prediction (or multiple, if it's batch)

Step 1: Code for pickling model and label encoder. To be added to the end of the model training code from above.

In [None]:
# pickling mdl
pickler = open("churn_prediction_model.pkl", "wb")
pickle.dump(model, pickler)
pickler.close()

In [None]:
# pickling le dict
pickler = open("churn_prediction_label_encoders.pkl", "wb")
pickle.dump(col_mapper, pickler)
pickler.close()

Step 2: Creating a new prediction file to get predictions for the new data 

Code for this step is shown in the next step:

In [None]:
import streamlit as st
import pandas as pd
import pickle
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression


def load_pickles(model_pickle_path, label_encoder_pickle_path):
    model_pickle_opener = open(model_pickle_path, "rb")
    model = pickle.load(model_pickle_opener)

    label_encoder_pickle_opener = open(label_encoder_pickle_path, "rb")
    label_encoder_dict = pickle.load(label_encoder_pickle_opener)

    return model, label_encoder_dict


def pre_process_data(df, label_encoder_dict):
    # df.drop("customerID", axis=1, inplace=True)
    for col in df.columns:
        if col in list(label_encoder_dict.keys()):
            column_le = label_encoder_dict[col]
            df.loc[:, col] = column_le.transform(df.loc[:, col])
        else:
            continue
    return df


def make_predictions(processed_df, model):
    prediction = model.predict(processed_df)
    return prediction


def generate_predictions(test_df):
    model_pickle_path = "./churn_model/churn_prediction_model.pkl"
    label_encoder_pickle_path = "./churn_model/churn_prediction_label_encoders.pkl"

    model, label_encoder_dict = load_pickles(model_pickle_path,
                                             label_encoder_pickle_path)

    processed_df = pre_process_data(test_df, label_encoder_dict)
    prediction = make_predictions(processed_df, model)
    return prediction

### Updating the Prediction App to Make Predictions
Up to now, our application only showed some analysis of the training data. It didn't interact with the prediction model in any way. 

Here, we update the application to apply the model, and show the outcome. 

First, we will do this with a single row of holdout test data. 

Later, we will update the application further to allow for making predictions in real time and interacting with the data

#### Making predictions on the one-row test dataset 
The code below shows how the updated model prediction code from above can be applied, together with steamlit, to make an interactive front-end with a prediction button. The app will already load and display the data from the one hold-out customer to predict when it's started up. Then, a button push will activate making the prediction.

By updating your `streamlit_app.py` file with the following code, you should see the following picture in your front-end app: 

<img src="./images/app_screenshot_4.png" width="400"/>

NOTE: This is what is shown in the file `simple_prediction_streamlit_app.py`, with a small extension to test on any customer in the holdout dataset. 

In [None]:
import streamlit as st
import pandas as pd
import pickle
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression


def load_pickles(model_pickle_path, label_encoder_pickle_path):
    model_pickle_opener = open(model_pickle_path, "rb")
    model = pickle.load(model_pickle_opener)

    label_encoder_pickle_opener = open(label_encoder_pickle_path, "rb")
    label_encoder_dict = pickle.load(label_encoder_pickle_opener)

    return model, label_encoder_dict


def pre_process_data(df, label_encoder_dict):
    df.drop("customerID", axis=1, inplace=True)
    for col in df.columns:
        if col in list(label_encoder_dict.keys()):
            column_le = label_encoder_dict[col]
            df.loc[:, col] = column_le.transform(df.loc[:, col])
        else:
            continue
    return df


def make_predictions(processed_df, model):
    prediction = model.predict(processed_df)
    return prediction


def generate_predictions(test_df):
    model_pickle_path = "./churn_model/churn_prediction_model.pkl"
    label_encoder_pickle_path = "./churn_model/churn_prediction_label_encoders.pkl"

    model, label_encoder_dict = load_pickles(model_pickle_path,
                                             label_encoder_pickle_path)

    processed_df = pre_process_data(test_df, label_encoder_dict)
    prediction = make_predictions(processed_df, model)
    return prediction


if __name__ == '__main__':
    # make the application
    st.title("Customer Churn Prediction")
    customer_data = pd.read_csv("./data/single_row_to_check.csv")
    
    # visualizing cutomer's data
    st.table(customer_data)

    # generate the prediction for the customer
    if st.button("Predict Churn"):
        pred = generate_predictions(customer_data)
        if bool(pred):
            st.text("Customer will churn!")
        else:
            st.text("Customer not predicted to churn")


The following code shows an extension for predicting on every customer in the holodout dataset.

In [None]:
import streamlit as st
import pandas as pd
import pickle
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression


def load_pickles(model_pickle_path, label_encoder_pickle_path):
    model_pickle_opener = open(model_pickle_path, "rb")
    model = pickle.load(model_pickle_opener)

    label_encoder_pickle_opener = open(label_encoder_pickle_path, "rb")
    label_encoder_dict = pickle.load(label_encoder_pickle_opener)

    return model, label_encoder_dict


def pre_process_data(df, label_encoder_dict):
    df.drop("customerID", axis=1, inplace=True)
    for col in df.columns:
        if col in list(label_encoder_dict.keys()):
            column_le = label_encoder_dict[col]
            df.loc[:, col] = column_le.transform(df.loc[:, col])
        else:
            continue
    return df


def make_predictions(processed_df, model):
    prediction = model.predict(processed_df)
    return prediction


def generate_predictions(test_df):
    model_pickle_path = "./churn_model/churn_prediction_model.pkl"
    label_encoder_pickle_path = "./churn_model/churn_prediction_label_encoders.pkl"

    model, label_encoder_dict = load_pickles(model_pickle_path,
                                             label_encoder_pickle_path)

    processed_df = pre_process_data(test_df, label_encoder_dict)
    prediction = make_predictions(processed_df, model)
    return prediction


if __name__ == '__main__':
    # make the application
    st.title("Customer Churn Prediction")
    st.text("Enter customer data.")
    all_customers_training_data = pd.read_csv("./data/holdout_data.csv")
    all_customers_data = all_customers_training_data.drop(columns="Churn")
    chosen_customer = st.selectbox("Select the customer you are speaking to:", all_customers_training_data.loc[:, "customerID"])
    chosen_customer_data = all_customers_data.loc[all_customers_data.loc[:, 'customerID']==chosen_customer]
    # visualizing cutomer's data
    st.table(chosen_customer_data)

    # generate the prediction for the customer
    if st.button("Predict Churn"):
        pred = generate_predictions(chosen_customer_data)
        if bool(pred):
            st.text("Customer will churn!")
        else:
            st.text("Customer not predicted to churn")


### Updaing App for Online Predictions 
Now that we've checked and seen that the application is working and able to apply the model and make predictions, we can update it using the user input features from streamlit, to make it so that users can play with the various inputs and see how they change the model output. 

The updated code for this is shown below:

In [None]:
import streamlit as st
import pandas as pd
import pickle
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression


def load_pickles(model_pickle_path, label_encoder_pickle_path):
    model_pickle_opener = open(model_pickle_path, "rb")
    model = pickle.load(model_pickle_opener)

    label_encoder_pickle_opener = open(label_encoder_pickle_path, "rb")
    label_encoder_dict = pickle.load(label_encoder_pickle_opener)

    return model, label_encoder_dict


def pre_process_data(df, label_encoder_dict):
    # df.drop("customerID", axis=1, inplace=True)
    for col in df.columns:
        if col in list(label_encoder_dict.keys()):
            column_le = label_encoder_dict[col]
            df.loc[:, col] = column_le.transform(df.loc[:, col])
        else:
            continue
    return df


def make_predictions(processed_df, model):
    prediction = model.predict(processed_df)
    return prediction


def generate_predictions(test_df):
    model_pickle_path = "./churn_model/churn_prediction_model.pkl"
    label_encoder_pickle_path = "./churn_model/churn_prediction_label_encoders.pkl"

    model, label_encoder_dict = load_pickles(model_pickle_path,
                                             label_encoder_pickle_path)

    processed_df = pre_process_data(test_df, label_encoder_dict)
    prediction = make_predictions(processed_df, model)
    return prediction


if __name__ == '__main__':
    # make the application
    st.title("Customer Churn Prediction")
    st.text("Enter customer data.")

    # making customer data input
    gender = st.selectbox("Select customer's gender :",
                         ['Female', 'Male'])
    senior_citizen_input = st.selectbox('Is customer a senior citizen? :',
                                     ["No", "Yes"])
    if senior_citizen_input == "Yes":
        senior_citizen = 1
    else:
        senior_citizen = 0
    partner = st.selectbox('Does the customer have a partner? :',
                             ["No", "Yes"])
    dependents = st.selectbox('Does the customer have dependents? :',
                              ["Yes", "No"])
    tenure = st.slider('How many months has the customer been with the company? :',
                       min_value=0, max_value=72, value=24)
    phone_service = st.selectbox('Does the customer have phone service? :',
                             ["No", "Yes"])
    multiple_lines = st.selectbox('Does the customer have multiple lines? :',
                                 ["No", "Yes", "No phone service"])
    internet_service = st.selectbox('What type of internet service does the customer have? :',
                                  ["No", "DSL", "Fiber optic"])
    online_security = st.selectbox('Does the customer have online security? :',
                                  ["No", "Yes", "No internet service"])
    online_backup = st.selectbox('Does the customer have online backup? :',
                                  ["No", "Yes", "No internet service"])
    device_protection = st.selectbox('Does the customer have device protection? :',
                                  ["No", "Yes", "No internet service"])
    tech_support = st.selectbox('Does the customer have tech support? :',
                                  ["No", "Yes", "No internet service"])
    streaming_tv = st.selectbox('Does the customer have streaming TV? :',
                                  ["No", "Yes", "No internet service"])
    streaming_movies = st.selectbox('Does the customer have streaming movies? :',
                                  ["No", "Yes", "No internet service"])
    contract = st.selectbox('What kind of contract does the customer have? :',
                                  ["Month-to-month", "Two year", "One year"])
    paperless_billing = st.selectbox('Does the customer have paperless billing? :',
                                  ["No", "Yes"])
    payment_method = st.selectbox("What is the customer's payment method? :",
                                     ["Mailed check", "Credit card (automatic)", "Bank transfer (automatic)",
                                      "Electronic check"])
    monthly_charges = st.slider("What is the customer's monthly charge? :", min_value=0, max_value=118, value=50)
    total_charges = st.slider('What is the total charge of the customer? :', min_value=0, max_value=8600, value=2000)
    input_dict = {'gender': gender,
                  'SeniorCitizen': senior_citizen,
                  'Partner': partner,
                  'Dependents': dependents,
                  'tenure': tenure,
                  'PhoneService': phone_service,
                  'MultipleLines': multiple_lines,
                  'InternetService': internet_service,
                  'OnlineSecurity': online_security,
                  'OnlineBackup': online_backup,
                  'DeviceProtection': device_protection,
                  'TechSupport': tech_support,
                  'StreamingTV': streaming_tv,
                  'StreamingMovies': streaming_movies,
                  'Contract': contract,
                  'PaperlessBilling': paperless_billing,
                  'PaymentMethod': payment_method,
                  'MonthlyCharges': monthly_charges,
                  'TotalCharges': total_charges,
                  }
    input_data = pd.DataFrame([input_dict])

    # generate the prediction for the customer
    if st.button("Predict Churn"):
        pred = generate_predictions(input_data)
        if bool(pred):
            # making text red if customer will churn
            churn_text = '<p style="font-family:Courier; color:Red; font-size: 16px;">Customer will churn!</p>'
            st.markdown(churn_text, unsafe_allow_html=True)
        else:
            # making text green if customer will not churn
            not_churn_text = '<p style="font-family:Courier; color:Green; font-size: 16px;">Customer not predicted to churn</p>'
            st.markdown(not_churn_text, unsafe_allow_html=True)

Note that there has to be an input for every feature that the model needs to be able to predict. 

Updating the code in your file `streamlit_app.py` should give you the folowing output:

<img src="./images/app_screenshot_5.png" width="200"/>

Note the small change in code at the bottom, to add a small bit of HTML to change the color of the output.

This code is shown in the file `prediction_streamlit_app.py`. 


### Final Step: Deploying your Streamlit app
Deploying your app with streamlit cloud is incredibly easy. Simply follow the steps in the [documentation](https://docs.streamlit.io/streamlit-cloud/get-started#sign-up-for-streamlit-cloud). 

First, you have to make a free streamlit cloud account, and connect it with your github.

Next, make a github repository for deploying your application. It must contain the `Pipfile` and the `Pipfile.lock` that you created by using `pipenv`. With these 2 files in place, and if you always installed any packages using them and made sure to run your application locally with the `pipenv` virtual environment activated, then deployment should be a breeze. 

Once you have set up your streamlit cloud account, click on "New Project" in the top right corner. There, with your github linked, you will simply need to specify:

- The name of your repository where the the app is 
- The branch in your github repository that the deployed app should be linked to (the great thing about streamlit cloud is that your deployed app is automatically updated whenever you push to this branch!)
- The name of the file with your streamlit app you want to deploy (mine, in this case, would be `prediction_streamlit_app.py`).

The image below shows these steps.

<img src="./images/deploy-an-app.png" width="500"/>

Then you simply wait, and your app is deployed!! 

Check out [this link](https://rachelkberryman-churn-predictor-prediction-streamlit-app-coqysw.streamlitapp.com/) to see my deployed application from this project, and the [linked github repository](https://github.com/rachelkberryman/churn_predictor_application).