# Predictive Modeling Example

## Initial setup

Let's import a number of libraries that we expect to use

In [None]:
import matplotlib.pyplot as plt
import numpy as np

np.random.seed(1) # set this to ensure the results are repeatable. 

# this is a notebook 'MAGIC' that will allow for creation of interactive plot
%matplotlib widget 

sample_size = 1000

### Let's define a hidden linear relationship/model

First let's define our relationship (normally, this is hidden, but since we are creating the data, we will need to identify this)

In [None]:
# we will define a linear model with the following parameter values (arguements)
b2 = 1.5 # slope for x^2
b1 = 3.5 # slope for x
b0 = 1 # intercept

In [None]:
X = np.round(np.random.normal(10, 10.0, sample_size),2)
y = b0 + b1 * X + b2 * X**2 
#y = b0 * X**0 + b1 * X**1 + b2 * X**2 # NOTE: It's more useful to think of a polynomial like this... it's the same as the one above, but says more
e = np.round(np.random.normal(0, 100.0, sample_size), 2)
y = y + e

Let's plot this sample data using a scatter plot

In [None]:
plt.scatter(X, y, color='red')
plt.show()

In [None]:
import pandas as pd

df = pd.DataFrame({'X': X, 'y': y})
df

In [None]:
df.to_csv('./data/c02_dataset_4.csv', index=False)