# Notebook for the exercices from the Machine Learning A-Z™ course
### Hands-On Python & R In Data Science
https://www.udemy.com/machinelearning/

**Part 1** - Data Preprocessing  
**Part 2** - Regression: Simple Linear Regression, Multiple Linear Regression, Polynomial Regression, SVR, Decision Tree Regression, Random Forest Regression  
**Part 3** - Classification: Logistic Regression, K-NN, SVM, Kernel SVM, Naive Bayes, Decision Tree Classification, Random Forest Classification  
**Part 4** - Clustering: K-Means, Hierarchical Clustering  
**Part 5** - Association Rule Learning: Apriori, Eclat  
**Part 6** - Reinforcement Learning: Upper Confidence Bound, Thompson Sampling  
**Part 7** - Natural Language Processing: Bag-of-words model and algorithms for NLP  
**Part 8** - Deep Learning: Artificial Neural Networks, Convolutional Neural Networks  
**Part 9** - Dimensionality Reduction: PCA, LDA, Kernel PCA  
**Part 10** - Model Selection & Boosting: k-fold Cross Validation, Parameter Tuning, Grid Search, XGBoost  

## Section 1 - Welcome to the course!
### Class 1 - Applications for Machine Learning

1. <strong>Facebook Facial recognition </strong> : algorithms tags someone automatically
2. <strong>Kinect</strong>: You can play games. (Uses Random Forest)
3. <strong>Virtual reality headsets</strong>: ML monitor your actions, you turn your head and the picture moves 
4. <strong>Voice recognition or Speech-to-text</strong>
5. <strong>Robodog</strong>: The dogs learn how to walk, reinforcement learning so the dogs learn how to walk on their own
6. <strong>Ads</strong>: Facebook
7. <strong>Recommender systems</strong>: Amazon, Netflix
8. <strong>Medicine</strong>: to save lives
9. <strong>Space</strong>: to recognize certain areas of the world
10. <strong>Mars</strong>: explore new territories 

## Section 2 - Part 1 - Data Pre-processing

Is the **preparation of the dataset** for any machine learning model.  

**Crutial step in the journey of making a ML model** - without data processing, the model won't work properly 

In [1]:
# Data Preprocessing Template
# Importing the essential libraries

# Numpy contains mathematic tools
import numpy as np 
# Library to plot nice charts
import matplotlib.pyplot as plt 
# Library to import and manege datasets
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Data.csv')
# X is matrix of the independent variables (country, age, and salary) -- all lines, all columns except the last one
X = dataset.iloc[:, :-1].values
# X is the dependent variable vector (purchased) -- take only the last column
y = dataset.iloc[:, 3].values

# # Splitting the dataset into the Training set and Test set
# from sklearn.model_selection import train_test_split
# X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)


In [2]:
dataset

Unnamed: 0,Country,Age,Salary,Purchased
0,France,44.0,72000.0,No
1,Spain,27.0,48000.0,Yes
2,Germany,30.0,54000.0,No
3,Spain,38.0,61000.0,No
4,Germany,40.0,,Yes
5,France,35.0,58000.0,Yes
6,Spain,,52000.0,No
7,France,48.0,79000.0,Yes
8,Germany,50.0,83000.0,No
9,France,37.0,67000.0,Yes


In [3]:
X

array([['France', 44.0, 72000.0],
       ['Spain', 27.0, 48000.0],
       ['Germany', 30.0, 54000.0],
       ['Spain', 38.0, 61000.0],
       ['Germany', 40.0, nan],
       ['France', 35.0, 58000.0],
       ['Spain', nan, 52000.0],
       ['France', 48.0, 79000.0],
       ['Germany', 50.0, 83000.0],
       ['France', 37.0, 67000.0]], dtype=object)

In [4]:
y

array(['No', 'Yes', 'No', 'No', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes'],
      dtype=object)

## Missing data

First option: Removing the observations where there is some missing data? Dangerous  
**Better option: Take the mean of the columns!!!** we can use other strategies as the median or most frequent

In [19]:
# Taking care of missing data
# SKLearn is a ML library to preprocess data
from sklearn.preprocessing.imputation import Imputer
imputer = Imputer('NaN', strategy = 'mean', axis=0) # the missing values

# Fitting the inputer to our matrix of features X
imputer = imputer.fit(X[:,1:3])
# Applying the transform method to replace the missing data
X[:,1:3] = imputer.transform(X[:,1:3])

X



array([['France', 44.0, 72000.0],
       ['Spain', 27.0, 48000.0],
       ['Germany', 30.0, 54000.0],
       ['Spain', 38.0, 61000.0],
       ['Germany', 40.0, 63777.77777777778],
       ['France', 35.0, 58000.0],
       ['Spain', 38.77777777777778, 52000.0],
       ['France', 48.0, 79000.0],
       ['Germany', 50.0, 83000.0],
       ['France', 37.0, 67000.0]], dtype=object)