### What Is Machine Learning? 
**Machine Learning is the science (and art) of programming computers so they can learn from data.**

Here is a slightly more general definition:

> _Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed._
***—Arthur Samuel, 1959***

And a more engineering-oriented one:

> _A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E._
***—Tom Mitchell, 1997***



### Types of Machine Learning Systems
There are so many different types of Machine Learning systems that it is useful to classify them in broad categories based on:
- Whether or not they are trained with human supervision (***supervised, unsupervised, semi-supervised, and Reinforcement Learning***)
- Whether or not they can learn incrementally on the fly (***online versus batch learning***)
- Whether they work by simply comparing new data points to known data points, or instead detect patterns in the training data and build a predictive model, much like scientists do (***instance-based versus model-based learning***)

#### Some of the most important Supervised learning algorithms :
- k-Nearest Neighbours
- Linear Regression
- Logistic Regression
- Support Vector Machines (SVMs)
- Decision Trees and Random Forests
- Neural networks

#### Some of the most important unsupervised learning algorithms :
- Clustering
    - k-Means
    - Hierarchical Cluster Analysis (HCA)
    - Expectation Maximization
- Visualization and dimensionality reduction
    - Principal Component Analysis (PCA)
    - Kernel PCA
    - Locally-Linear Embedding (LLE)
    - t-distributed Stochastic Neighbour Embedding (t-SNE)
- Association rule learning
    - Apriori
    - Eclat


### Main Challenges of Machine Learning:

- Insufficient Quantity of Training Data
- Non-representative Training Data (It is crucial to use a training set that is representative of the cases you want to generalize to.)
- Poor-Quality Data
- Irrelevant Features

A critical part of the success of a Machine Learning project is coming up with a good set of features to train on. This process, called ***`feature engineering`***, involves:

- **Feature selection**: selecting the most useful features to train on among existing features.
- **Feature extraction**: combining existing features to produce a more useful one (_dimensionality reduction algorithms can help_).
- Creating new features by gathering new data.

>***Dimensionality reduction***, in which the goal is to simplify the data without losing too much information. One way to do this is to merge several correlated features into one. 
For example, a car’s mileage may be very correlated with its age,
so the dimensionality reduction algorithm will merge them into one feature that rep‐
resents the car’s wear and tear. This is called feature extraction.

**Overfitting means that the model performs well on the training data, but it does not generalize
well.**

*Overfitting happens when the model is too complex relative to the amount and noisiness of the training data.* The possible solutions are:
- To simplify the model by selecting one with fewer parameters (e.g., a linear model rather than a high-degree polynomial model), by reducing the number of attributes in the training data or by constraining the model
- To gather more training data
- To reduce the noise in the training data (e.g., fix data errors and remove outliers)


Constraining a model to make it simpler and reduce the risk of overfitting is called ***`Regularization.`***

**Underfitting is the opposite of overfitting**: it occurs when your
model is too simple to learn the underlying structure of the data.

The main options to fix this problem are:
- Selecting a more powerful model, with more parameters
- Feeding better features to the learning algorithm (feature engineering)
- Reducing the constraints on the model (e.g., reducing the regularization hyperparameter)

### Main steps involved in a ML project:
1. Look at the big picture.
2. Get the data.
3. Discover and visualize the data to gain insights.
4. Prepare the data for Machine Learning algorithms.
5. Select a model and train it.
6. Fine-tune your model.
7. Present your solution.
8. Launch, monitor, and maintain your system.

In [19]:
from sklearn.datasets import fetch_california_housing
import numpy as np

In [20]:
housing = fetch_california_housing()

In [22]:
housing

{'data': array([[   8.3252    ,   41.        ,    6.98412698, ...,    2.55555556,
           37.88      , -122.23      ],
        [   8.3014    ,   21.        ,    6.23813708, ...,    2.10984183,
           37.86      , -122.22      ],
        [   7.2574    ,   52.        ,    8.28813559, ...,    2.80225989,
           37.85      , -122.24      ],
        ...,
        [   1.7       ,   17.        ,    5.20554273, ...,    2.3256351 ,
           39.43      , -121.22      ],
        [   1.8672    ,   18.        ,    5.32951289, ...,    2.12320917,
           39.43      , -121.32      ],
        [   2.3886    ,   16.        ,    5.25471698, ...,    2.61698113,
           39.37      , -121.24      ]]),
 'target': array([4.526, 3.585, 3.521, ..., 0.923, 0.847, 0.894]),
 'feature_names': ['MedInc',
  'HouseAge',
  'AveRooms',
  'AveBedrms',
  'Population',
  'AveOccup',
  'Latitude',
  'Longitude'],
 'DESCR': '.. _california_housing_dataset:\n\nCalifornia Housing dataset\n--------------------

In [23]:
df = pd.DataFrame(data= np.c_[housing['data'], housing['target']],
                     columns= housing['feature_names'] + ['target'])

In [26]:
df.shape

(20640, 9)

In [27]:
df.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,target
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,4.526
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,3.585
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,3.521
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,3.413
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,3.422
