## What is Machine Learning

Machine learning (ML) is a process of building models that learn from data. Machine learning models are algorithms that learn patterns from data. 

Feed-forward neural networks, or neural networks (NN), are a type of ML algorithm whereby multiple layers, each with many neurons, analyze and process information and then send that information to the next layer, resulting in a final layer that produces a prediction as output 

#### Data and Feature Engineering
-Dataset is the data that used for training, validating and testing ML model.                         
-Training data: data fed to the model during the training process                                     
-Validation data: is held out from the training set and used to evaluate how the model is performing after each training. The performance of model on the validation data is used to decide when to stop the training run, and to choose hyperparametes                                                         
-Test data: not used in the training process, used to evaluate how the trained model performs. Performance report must be computed on the test data. The error rate on new cases is called the 'generalization error' or out-of-sample error. This value tells how well model will perform on instances it has never seen before. If training error is low (model make few mistakes on training set), but generalization error is high ==> model is overfitting the training data                     
-We define structured data as numerical and categorical data                                           
 +Numerical data includes integer and float values                                                     
 +Categorical data includes data that can be divided into a finite set of groups, like type of car or education level                                                                                       
-Unstructured data, includes data that cannot be represented as neatly. This typically includes free-form text, images, video, audio                                                                       
-Data preprocessing or feature engineering (another term), typically includes scaling numerical values, or converting nonnumerical values into a numerical format that can be understood by model

-Input describes a single column in dataset before it has been processed                                
-Feature describes a single column after it has been processed                                          
-Instance is an item to send to the model for prediction. It could be a row in dataset (without the label column), an image to classify, a text document                                                    
-Label is the target column in dataset - called 'ground truth label' or output given by model - called 'prediction'                                                                                            
-Prediction: the process of sending new data to model and making output

#### Machine Learning process/workflow

-The first step: training - the process of passing training data to model so that it can learn to identify patterns                                                                                      
-Next step: testing how the model performs on data outside of training set - 'Model evaluation'           
Might run training and testing multiple times, additional feature engineering and adjust model architecture                                                                                            
-Final step: 'Serving' - refer to accepting incoming requests and sending back predictions by deploying the model

#### Data Drift

Data can change significantly over time. Data drift refers to the challenge of ensuring ML models stay relevant, and that model predictions are an accurate reflection of the environment in which they are being used 

#### Poor-Quality Data
If some instances are clearly errors, outliers, noise, may to simply discard them or try to fix the errors manually.

If some instances are missing a few features/values, there are 3 option:

    + ignore these features/values: use drop(axis=1) or dropna() 

    + fill in the missing values (e.g., with the median): use median() and fillna() 
    or ask expert's opinion or use Scikit-Learn's class
    
    + train one model with these features and one model without it

#### No Free Lunch (NFL) Theorem

'If make no assumption about the data, then there is no reason to prefer one model over any other'. No model that is a guaranteed to work better. For example, some datasets the best model is linear model, while for other datasets it is a neural network. In practice, make some reasonable assumptions about the data and evaluate only a few reasonable models. 
The only way to know for sure which model is best is to evaluate them all. 

## Scikit-Learn

### Design
#### Estimators
Any object that can estimate some parameters based on a dataset is called an "estimator". The estimation is performed by fit() method, and it take only a dataset as a parameter. Any other parameter needed to guide the estimation process is considered a hyperparameter (such as an imputer's strategy), and it must be set as an instance variable (generally via a contructor parameter)
#### Transformers
Some estimators can also transform a dataset, they are called "transformers". The transformation is performed by transform() method with the dataset to transform as a parameter. It returns the transformed dataset. This transformation generally relies on the learned parameters. All transformers also have method called fit_transform() that is equivalent to calling fit() and then transform()
#### Predictors
Finally, some estimators, given a dataset, are capable of making predictions, they are called "predictor". A predictor has a predict() method that takes a dataset of new instances and return a dataset of corresponding predictions. It also has a score() method that measures the quality of the predictions, given a test set (and the corresponding labels, in the case of supervised learning algorithms)

#### Pipeline

The Pipeline constructor takes a list of name/estimator pairs defining a sequence of steps. When we call fit() method of pipeline, it calls fit_transform() sequentially on all transformers, passing the output of each call as the parameter to the next call until it reaches the final estimator, for which it calls the fit() method