# Intro to ML with SciKit Learn
*Meghana Bhimasani |  May 1 2020*

Machine learning is broken down into 3 main categories:

## Supervised Learning:

Supervised machine learning models utilize algorithms for which the potential outcomes are knowable in advance (i.e. category or numeric range) and can be used to correct the model’s predictions. With supervised ML, the input features and output labels are defined.

Example 1: Using data such as credit score, credit history, income, etc., we are trying to predict whether an individual is a credit-risk or not. (Known Category: “Credit-Risk” vs. “Not Credit-Risk”)

Example 2: Using features such as Number of Bedrooms, Square Feet, etc., we are trying to predict the market value of a house. (Numeric Range: 50k – 500K).

## Unsupervised Learning: 

Unsupervised machine learning models utilize algorithms for which the potential outcomes are unlabeled. Inferences are made directly from the data without feedback from known outcomes or labels. Since the dataset in in unsupervised ML is unlabeled, the goal is to discover hidden relationships. 

Example: An advertising platform segments the U.S. population into smaller groups with similar demographics and purchasing habits so that advertisers can reach their target market with relevant ads.

## Reinforcement Learning:
Reinforcement learning is a technique to allow an agent to take actions and interact with its environment so as to maximize total rewards. It essentially models decision making processes. An example of this would be a computer AI for a chess game. 

Here is a breakdown of the main machine learning categories and some real world applications of each

<img src="img/MLcategories.png" width=75% height=75% />

Here are some applications of supervised and unsupervised machine learning used specifically with resting state fMRI data. (image taken from Arefeh Sherafati's Theory presentation on "Machine learning in resting-state fMRI analysis" [https://www.sciencedirect.com/science/article/abs/pii/S0730725X18306854])

<img src="img/ML_applications_in_rsfMRI.png" width=75% height=75% />

### Okay that's really cool that we now know where machine learning is utilized in the real world, but what are the models that make up/are used within each category?

<img src="img/machinelearningtypes.jpg" width=60% height=60% />

### But regardless of the problem type, in Machine Learning we follow a familiar paradigm:
**1. Model**

**2. Fit(Train)**

**3. Predict**

<img src="img/MLparadigm.png" width=60% height=60% />

### There are so many models though, how do we choose the right model for our problem type?

Below is a flowchart from scikit-learn's site that gives a rough guide on how to find potential algorithms/models/estimators to try based on the dataset and the problem type to be solved.

<img src="img/scikitlearncheatsheet.png" width=100% height=100% />

### Refresher on Common Scoring Metrics
After selecting, training and fitting the model, we need to determine how well the model performs. There are many ways to quantify models but two common scoring metrics are:
1. R2 (R-Squared): This is the baseline metric that many ML tools report on score. Higher R2 values signify that the model is “highly predictive”.  An R2 value of >0.90 means that our model roughly accounts for 90% of the variability of the data. 

2. MSE (Mean Squared Error): This measures the average of the squares of the errors or deviations.


**A "good" MSE score will be close to zero while a "good" R2 Score will be close to 1.**
Note: R2 Score is the default scoring for many of the Sklearn models

### Use of Training and Testing Data
In order to quantify our model against new input values, we often split the data into training and testing data. The model is then fit to the training data and scored and validated by the test data. Sklean pre-processing provides a library for automatically splitting up the data into training and testing

#### References

https://medium.com/technology-nineleaps/popular-machine-learning-algorithms-a574e3835ebb

https://towardsdatascience.com/applications-of-reinforcement-learning-in-real-world-1a94955bcd12

https://medium.com/machine-learning-for-humans/unsupervised-learning-f45587588294


https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html


**Below are some links explaining the differences between various ML libaries (namely scikit learn, tensorflow, keras, and pytorch)**

https://stackoverflow.com/questions/54527439/differences-in-scikit-learn-keras-or-pytorch

https://www.edureka.co/blog/keras-vs-tensorflow-vs-pytorch/

https://towardsdatascience.com/from-scikit-learn-to-tensorflow-part-1-9ee0b96d4c85


**Below are some more links on different machine learning methods**
Ensemble Methods: https://blog.statsbot.co/ensemble-learning-d1dcd548e936
