# Classification & Regression Challenge

In this exercise you will be filling in blanks (____) in any any section between 

**### YOUR CODE HERE**

and

**### END CODE**

The purpose of this excercise is to reaffirm your understanding of the difference in classification and regressions tasks, as well as familiarizing with some of the tools available to carry them out.

You should refrain from altering the code outside of the designated areas, or create duplicate for exploration purposes.

In [None]:
# Packages required
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LinearRegression, LogisticRegression
import warnings

warnings.filterwarnings(action="ignore", module="scipy", message="^internal gelsd")

%matplotlib inline

np.random.seed(42)

In [None]:
# The two models to be used in this excercise (keep note of the variable names)
linear_regression = LinearRegression()
logistic_regression = LogisticRegression()

## Problem 1

Model profit as a function of population.

In [None]:
# Profit data: 
# population,profit
filename = 'data/profit.csv'

### YOUR CODE HERE
data = np.loadtxt(fname=____, delimiter=____) # fill in the fname and delimiter arguments
### END CODE

X, y = np.hsplit(data, 2) # Separate features (x) from target (y)
print(X[1],y[1])

**Expected output:** 

\[ 5.5277\] \[ 9.1302\]

In [None]:
# Plot the data
plt.plot(X, y, 'bx', label='Data')
plt.xlabel('Population')
plt.ylabel('Profit')
plt.legend()
plt.show()

In [None]:
# Determine the number of training examples

## YOUR CODE HERE
m = ____
print(m)
## END


 **Expected output:** 
 
 97

In [None]:
# Select the correct model and fit
## YOUR CODE HERE
model = ____           # need to select the appropriate model (model_a or model_b)
model.fit(____, ____)  # Need to pass the correct arguments to the fit() method
## END CODE

# Obtain coefficients theta0 and theta1 from model
theta0, theta1 = model.intercept_, model.coef_[0]
print(theta0,theta1)

**Expected output**:

\[-3.89578088\] \[1.19303364\]

In [None]:
# Plot data with trained regression line
plt.plot(X, y, 'bx', label='Data')
plt.plot(X, model.predict(X), 'r-', label='Regression')
plt.xlabel('Population')
plt.ylabel('Profit')
plt.legend()
plt.show()

## Problem B

Model college admission as a function of two exam scores.

In [None]:
# Admission data: 
# exam_score_1,exam_score_1,admission
filename = 'data/acceptance.csv'
data = np.loadtxt(filename, delimiter=',')
# Separate features (x1, x2) from target (y)
X, y = np.hsplit(data, np.array([2]))
y = y.ravel()

In [None]:
fig, ax = plt.subplots()

# Plot data
y_pos = y == 1
y_neg = y == 0
ax.plot(X[y_pos,0], X[y_pos,1], 'g+', label='Admitted')
ax.plot(X[y_neg,0], X[y_neg,1], 'ro', label='Not admitted')
ax.set_xlabel('Exam 1 score')
ax.set_ylabel('Exam 2 score')
ax.legend(loc='upper right')
plt.show()

In [None]:
# Select the correct model
### YOUR CODE HERE 
model_b = ____
model_b.fit(____,____)
### END

#Obtain coefficients theta0, theta1, theta2
theta0 = model_b.intercept_[0]
theta1 = model_b.coef_[0,0]
theta2 = model_b.coef_[0,1]
print(theta0, theta1, theta2)

**Expected output:**

\[-3.8997779447047662 0.038444815554882487 0.031018545562908596 \]

In [None]:
# Computes x2 at y=0.5 from x1 and model parameters
def x2(x1):
    return (0.5 - theta0 - theta1*x1) / theta2

In [None]:
x1_min = X[:,0].min()
x1_max = X[:,0].max()

In [None]:
# x1 and x2 data of linear decision boundary
x1_plot = np.array([x1_min, x1_max])
x2_plot = x2(x1_plot)

In [None]:
fig, ax = plt.subplots()

# Plot examples and decision boundary
y_pos = y == 1
y_neg = y == 0
ax.plot(X[y_pos,0], X[y_pos,1], 'g+', label='Admitted')
ax.plot(X[y_neg,0], X[y_neg,1], 'ro', label='Not admitted')
ax.set_xlabel('Exam 1 score')
ax.set_ylabel('Exam 2 score')
ax.legend(loc='upper right')

# Plot decision boundary
ax.plot(x1_plot, x2_plot)
plt.show()

## BONUS

Consider any issues with how we've trained the model:
- Can we trust that it will extrapolate to new data well?
- Are there ways we can check this?