## Week 1: Assignment 

Jupyter Notebook: Jupyter Notebook (formerly IPython Notebooks) is a web-based interactive computational environment for creating notebook documents. Project Jupyter's operating philosophy is to support interactive data science and scientific computing across all programming languages via the development of open-source software. It supports different languages such as Python, R, and Julia. You can also install kernels to run other things like JavaScript.

A notebook kernel is a “computational engine” that executes the code contained in a Notebook document. The ipython kernel, referenced in this guide, executes python code. Kernels for many other languages exist.

https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html
https://github.com/jupyter/jupyter/wiki/Jupyter-kernels
https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Notebook%20Basics.html
https://realpython.com/jupyter-notebook-introduction/

### Data Science Model Example

In [None]:
# import the libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split

import warnings
warnings.simplefilter(action='ignore')

%matplotlib inline

In [None]:
# get the data
df = pd.read_csv('Advertising.csv', usecols=['TV', 'radio', 'newspaper', 'sales'])
df.head()

In [None]:
# shape of data
df.shape

In [None]:
# look at info
df.info()

In [None]:
# describe data
df.describe()

In [None]:
# plot TV histogram using matplotlib
plt.hist(df['TV'])

In [None]:
# plot radio histogram using matplotlib
plt.hist(df['radio'], bins=5)
plt.xlabel('Amount')
plt.ylabel('Quantity')
plt.title('Radio Revenue')
plt.show();

In [None]:
# using subplots using matplotlib https://matplotlib.org/stable/gallery/statistics/hist.html
fig, axs = plt.subplots(1, 3, sharey=False, tight_layout=True)
axs[0].hist(df['TV'])
axs[0].set_xlabel('TV')

axs[1].hist(df['radio'])
axs[1].set_xlabel('radio')

axs[2].hist(df['newspaper'])
axs[2].set_xlabel('newspaper')

plt.show();

In [None]:
# using pandas and matplotlib
df.hist()
plt.tight_layout()

In [None]:
# scatterplot example with seaborn (sns)
sns.pairplot(df, x_vars=['TV','radio','newspaper'], y_vars='sales', 
             kind='reg', 
             size=5,
             aspect=0.8, 
             plot_kws={'line_kws':{'color':'red'}, 'scatter_kws': {'alpha': 0.2}})

In [None]:
# separate X, y
X = df.drop(['sales'], axis=1)
y = df.sales
print(X.head())
print()
print(y.head())

In [None]:
# train test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)
print(X_train.shape)
print(y_train.shape)

In [None]:
# another split example
X_train, X_test, y_train, y_test = train_test_split(df.drop(['sales'], axis=1), 
                                                    df.sales, 
                                                    test_size=0.20, 
                                                    random_state=42)
print(X_train.shape)
print(y_train.shape)

In [None]:
# build, train (fit), predict, evaluate the model
model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
print(f'MSE: {mean_squared_error(y_true=y_test, y_pred=predictions)}')
print(f'R-Squared: {r2_score(y_test, predictions)}')

In [None]:
# make a prediction
tv = 232.1
radio = 8.6
newspaper = 8.7
model.predict(np.array([tv, radio, newspaper]).reshape(1, -1))

In [None]:
# what are our coefficients?
# what are our coefficients?
list(zip(X, model.coef_))

### More on Using the Tools

In [None]:
# python examples
# string
# number
# list (mutable)
# tuple (immutable)
# set
# dictionary
# comprehensions

In [None]:
# numpy examples
# https://www.learndatasci.com/tutorials/applied-introduction-to-numpy-python-tutorial/