# Chapter 1

- Machine Learning : Ability of machines to learn to make decisions from data
- Supervised Learning : Machine Learning on labeled data
- Unsupervised Learning : Machine Learning on unlabeled data


### Supervised Learning

- Machine Learning of labeled data
- Predict target values of unseen data, given the features
- Feature = predictor variable = independent variable
- Target variable = dependent variable = response variable
- 2 types:
    - Classification : Target is a category
    - Regression : Target is a continuous value
- Requirements for supervised learning
    - Must have no missing values
    - Data is numeric
    - X should be in 2D array, y should be in 1D array
    - Data is in array (for python, it is the numpy array)
    - Perform EDA to see if the data is formatted correctly
- Generic Workflow:
    1. Create a generic model
    2. Fit training data in the model
    3. Predict with the model for test data
- Some popular algorithms:
    - KNN 
    - Linear Regression
    - Ridge Regression (With regularization for large co-efficients)
    - Lasso Regression (With regularization for large co-efficients)
- Metric for  measuring model performance
    - accuracy, F1 score, Precision, Recall for classification
    - r-squared (percentage of explanability), RMSE (Average error) for regression

### Python Supervised Learning Workflow

```
from sklearn.module import Model
model = Model()
model.fit(X, y)
predictions = model.predict(X_new)
print(predictions)
```

### KNN 

- Classify label using majority vote of nearest neighbors within a given number of closest neighbors
- Large k = simpler model = underfitting = less able to detect relationship
- Small k = complex model = overfitting = more prone/sensitive to detect noise
- Complexity graph : k on X-axis, model accuracy of train and test set on Y-axis [for different values of k]


```
from sklearn.neighbors import KNeighborsClassifier
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# First convert dataset to numpy since sklearn uses numpy
y = df['target'].values
X = df.drop('target', axis=1).values
# Normalize the whole dataset before modeling
X = preprocessing\
	.StandardScaler()\
	.fit(X)\
	.transform(X.astype(float))
# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 42)
# Initialize and train model
knn = KNeighborsClassifier(n_neighbors=5, weights='uniform', metric='minkowski')
knn.fit(X_train, y_train)
# Predict the test set class with the trained model
predicted_y = knn.predict(X_test)
# Measure probability score of prediction for the test set with the trained model
predicted_y_prob = knn.predict_proba(X_test)
# Measure accuracy on testing set
print(accuracy_score(y_test, predicted_y)*100)
# Visualize normal distribution of accuracy for different Ks
# Compute the above steps for different K and find mean, std etc
plt.plot(range(1,Ks),mean_acc,'g')
plt.fill_between(range(1,Ks),mean_acc - 1 * std_acc,mean_acc + 1 * std_acc, alpha=0.10)
plt.fill_between(range(1,Ks),mean_acc - 3 * std_acc,mean_acc + 3 * std_acc, alpha=0.10,color="green")
plt.legend(('Accuracy ', '+/- 1xstd','+/- 3xstd'))
plt.ylabel('Accuracy ')
plt.xlabel('Number of Neighbors (K)')
plt.tight_layout()
plt.show()
# Plot complexity graph with list of train and test accuracies
plt.plot(neighbors, train_accuracies.values(), label="Training Accuracy")
plt.plot(neighbors, test_accuracies.values(), label="Testing Accuracy")
```

# Chapter 2

### Linear Regression

- Linear regression is a method that let us understand the relationship between dependent and independent variables. 
- It predicts continuous value. example: y = mx + c
- y is target, x is independent feature, m is slope, c is intercept
- By training on many datapoints, the model understands value of c and m. Then taking any value x, the model can estimate the value y.
- Simple linear regression = Use 1 independent variable for predicting 1 dependent variable
- Multiple linear regression = Use multiple independent variables for predicting 1 dependent variable
- Noise : 
    - Error in prediction. 
    - The error is gaussian noise and the residuals show normal distribution properties. 
    - The more the error, the more spread out the normal distribution (sigma or standard deviation in normal distribution)
    - The best fit line has the least noise.
- We define an error function for m and c and choose the line that reduces the error and gain the optimized value for m and c
- Error function = loss function = cost function
- Residual = distance between datapoint and the fitted line
- Our loss function can be RSS (Residual Sum of Squares) the sum of the residuals and our goal is to minimize this value.
- metrics 
    - r-squared : percentage of the variance in target values explained by the features. range is 0 to 1
    - RMSE : Root Mean squared error (Average error in prediction)
- cross validation : do train-test process in multiple folds and take average to consolidate r-squared.
- Regularization : penalizes large co-efficients to reduce overfitting. Some regressions that uses regularizations:
    - Lasso Regression
    - Ridge Regression
- Hyperparameter : Variables used to optimize model parameters


```
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression 
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import cross_val_score, KFold
# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Construct model
lm=LinearRegression()
# Simple linear regression uses 1 column with eqn: y = mx + c
# Multiple linear regression uses multiple columns with eqn: z = mx + ny +c
# Fit the model
lm.fit(X_train, y_train)
# Predicted estimation
y_pred = lm.predict(X_test)
# This is intercept of the line (Also known as bias co-efficient)
intercept = lm.intercept_
# This is slope (m) of the line y=mx+c (Also known as relevant variable's co-efficient)
slope = lm.coef_
# Percentage of target values explained by the features
rsquared = lm.score(X_test, y_test)
# RMSE : Average error in prediction
mean_squared_error(y_test, y_pred, squared=False)
# Prediction of specific range
new_x = np.arange(1,101,1).reshape(-1,1) # Or you can make it dataframe
new_pred_y = lm.predict(new_x)

# Do k-fold cross validation
kf = KFold(n_splits=6, shuffle=True, random_state=42)
cv_results = cross_val_score(lm, X_train, y_train, cv=kf)
# Mean, std and confidence interval of the cross-validation
print(np.mean(cv_results), np.std(cv_results), np.quantile(cv_results, [0.025, 0.975]))

# Visualize Feature importance
names = df.drop("target", axis=1).columns
importance = lm.fit(X, y).coef_
plt.bar(names, importance)
plt.xticks(rotation=45)
plt.show()
```

### Ridge Regression and Lasso Regression

- Ridge and Lasso regression has regularization parameter alpha which is same as k for knn.
- alpha controls model complexity.
- large alpha = underfitting = simpler model
- small alpha = overfitting = complex model
- Complexity graph : alpha on X-axis and r-squared on y-axis
- Lasso regression selects important features of dataset (shrinks co-efficients of less important features to 0 )

```
from sklearn.linear_model import Ridge
ridge_scores = []
for alpha in [0.1, 1.0, 10.0, 100.0, 1000.0]:
    ridge = Ridge(alpha=alpha)
    ridge.fit(X_train, y_train)
    y_pred = ridge.predict(X_test)
    ridge_scores.append(ridge.score(X_test, y_test))


from sklearn.linear_model import Lasso
lasso_scores = []
for alpha in [0.01, 1.0, 10.0, 20.0, 50.0]:
    lasso = Lasso(alpha=alpha)
    lasso.fit(X_train, y_train)
    y_pred = lasso.predict(X_test)
    lasso_scores.append(lasso.score(X_test, y_test))

# Visualize Feature importance
names = df.drop("target", axis=1).columns
lasso_coef = lasso.fit(X, y).coef_
ridge_coef = ridge.fit(X, y).coef_
plt.bar(names, lasso_coef, label="Lasso co-efficients")
plt.bar(names, ridge_coef, label="Ridge co-efficients")
plt.xticks(rotation=45)
plt.show()
```