# What is Machine Learning?

**Definition of learning :**
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P , if its performance at tasks T, as measured by P , improves with experience E.

**Examples:**
1. Handwriting recognition learning problem:
   
   1.Task T :  Recognizing and classifying handwritten words within images
   
   2.Performance P : Percent of words correctly classified
   
   3.Training experience E : A dataset of handwritten words with given classifications 

2. A robot driving learning problem:

   1.Task T : Driving on highways using vision sensors
   
   2.Performance P : Average distance traveled before an error
   
   3.Training experience E : A sequence of images and steering commands recorded while observing a human driver

**Definition of Machine learning:** Machine Learning is the field of study that gives computers the capability to learn without being explicitly programmed.

Machine Learning is a branch of artificial intelligence that develops algorithms by learning the hidden patterns of the datasets used it to make predictions on new similar type data, without being explicitly programmed for each task.

![image.png](attachment:image.png)

**Example of Machine Learning:**

if a real estate firm wants to provide its customers a utility to automatically predict the price of a given property. To build this solution, first feed the machine with property-sales data for a given time period (last 3 years). Then let the machine learn various patterns in the data, with the help of which it predicts the price.

# Types of Machine Learning

![image.png](attachment:image.png)

**Supervised Machine Learning**:

Supervised learning works on labelled data. Each input data has a corresponding labelled output. The goal of supervised machine learning is to learn a mapping from the input to the output. The input data is called attributes, features or predictors. This output variable is also called response variable or target variable.

**Examples**:

1. The problem of building a utility for predicting the selling price of the car.

2. Given an email defined by its collection of phrases(X), predict if the mail is a spam(Y).

3. Given a medical brain scan image (X), predict if the patient has tumour(Y).

Supervised learning can be further categorized as **Regression** or **Classification**.

1. **Regression** -- When the output variable can take continuous numerical values, e.g. price of a car, delivery time, credit limit.

2. **Classification** -- When the output variable takes categorical or discrete (non-continuous) values, e.g. whether an email is a spam, whether a transaction is fraudulent etc.

![image.png](attachment:image.png)

Both the above figures have labelled data set as follows:  

1. **Figure A**: It is a dataset of a shopping store that is useful in predicting whether a customer will purchase a particular product under consideration or not based on his/ her gender, age, and salary.

    **Input**: Gender, Age, Salary

    **Output**: Purchased i.e. 0 or 1; 1 means yes the customer will purchase and 0 means that the customer won’t purchase it.

2. **Figure B**: It is a Meteorological dataset that serves the purpose of predicting wind speed based on different parameters.

    **Input**: Dew Point, Temperature, Pressure, Relative Humidity, Wind Direction

    **Output**: Wind Speed 

**Unsupervised Learning:**

Unsupervised machine learning has no explicitly defined output. The idea is to discover knowledge or structure in the data. 
![image.png](attachment:image.png)

In example, animals are grouped into cats & dogs as algorithm based on their features. This is known as **Clustering**

# Machine Learning Lifecycle

![image.png](attachment:image.png)

The lifecycle of a machine learning project involves a series of steps that include: 

1. **Study the Problems**: The first step is to study the problem. This step involves understanding the business problem and defining the objectives of the model. 


2. **Data Collection**: When the problem is well-defined, we can collect the relevant data required for the model. The data could come from various sources such as databases, APIs, or web scraping.


3. **Data Preparation**: When our problem-related data is collected. then it is a good idea to check the data properly and make it in the desired format so that it can be used by the model to find the hidden patterns. This can be done in the following steps:

    1.Data cleaning
    
    2.Data Transformation
    
    3.Explanatory Data Analysis and Feature Engineering
    
    4.Split the dataset for training and testing.
    
    
4. **Model Selection**: The next step is to select the appropriate machine learning algorithm that is suitable for our problem. This step requires knowledge of the strengths and weaknesses of different algorithms. Sometimes we use multiple models and compare their results and select the best model as per our requirements.


5. **Model building and Training**: After selecting the algorithm, we have to build the model.

    1.In the case of traditional machine learning building mode is easy it is just a few hyperparameter tunings.
    
    2.In the case of deep learning, we have to define layer-wise architecture along with input and output size, number of nodes       in each layer, loss function, gradient descent optimizer, etc.
    
    3.After that model is trained using the preprocessed dataset.
    
    
6. **Model Evaluation**: Once the model is trained, it can be evaluated on the test dataset to determine its accuracy and performance using different techniques like classification report, F1 score, precision, recall, ROC Curve, Mean Square error, absolute error, etc.


7. **Model Tuning**: Based on the evaluation results, the model may need to be tuned or optimized to improve its performance. This involves tweaking the hyperparameters of the model.


8. **Deployment**: Once the model is trained and tuned, it can be deployed in a production environment to make predictions on new data. This step requires integrating the model into an existing software system or creating a new system for the model.


9. **Monitoring and Maintenance**: Finally, it is essential to monitor the model’s performance in the production environment and perform maintenance tasks as required. This involves monitoring for data drift, retraining the model as needed, and updating the model as new data becomes available.

# Reinforcement Learning

![image.png](attachment:image.png)

In this technique, the model keeps on increasing its performance using Reward Feedback to learn the behavior or pattern. These algorithms are specific to a particular problem e.g. Google Self Driving car, AlphaGo where a bot competes with humans and even itself to get better and better performers in Go Game. Each time we feed in data, they learn and add the data to their knowledge which is training data. So, the more it learns the better it gets trained and hence experienced. 

1. Agents observe input.
2. An agent performs an action by making some decisions.
3. After its performance, an agent receives a reward and accordingly reinforces and the model stores in state-action pair of information.
4. Temporal Difference (TD)
5. Q-Learning
6. Deep Adversarial Networks

# Association Analysis

Association analysis or Association rule learning is a method for discovering interesting associations between variables in datasets. Association analysis deals with finding the degree and direction of the relationship between two variables. The degree/strength indicates the magnitude with which two variables may be associated. The direction indicates if the association between them is directly or inversely proportional.

Consider an outcome variable y and predictor variable x, the relationship between y and x can be represented as y=f(x). If y and x have a linear relationship, then f(x) may be represented as ax+b and thus y=ax+b.

For a variable y which can be predicted based on x, it can be said that a change in x influences y, whereas it cannot be guaranteed that a change in y indicates a change in x. For example, if rainfall is high there may be a greater sale of umbrellas, but having a greater sale of umbrellas need not indicate higher rainfall.

Such study is termed Association analysis. There are two types of analysis multivariate and bivariate.

Multivariate analysis is the statistical process of simultaneously analyzing multiple variables. When only two variables are being analyzed then such analysis is termed as bivariate analysis.

The variables being analyzed may either be dependent or independent in nature. A dependent variable can be called outcome or criterion, while the variable(s) on which it depends can be called predictor(s). In bivariate analysis, there is one predictor and one outcome variable.

**Application of Association Analysis:**

Some scenarios where association analysis is being used are:

1. Credit card purchases can provide insight into the type of products a customer is likely to purchase. Using the credit card statements, one can determine if the customer spends more on household items/ jewellery etc.

2. Supermarkets can rearrange their shelves by understanding the combinations of frequently bought items. For example, if the supermarket observes that bread and butter are frequently purchased together, then the supermarket may choose to place these items close to each other.

3. Telecommunication agencies can structure product bundles based on commonly associated options (internet packs, SMS services and other value added services) to maximize revenue.

4. Click stream analysis on websites is used to observe patterns in user’s browsing behavior in order to deliver content accordingly. For example, the click stream analysis may suggest that visitors who land on a webpage X, clicked on links A, B and C more often than on links D,E and F. Such observations provide an insight on how to personalize and recommend the content to website visitors

# Scatter Plot

A scatter plot can be used to get an insight into the nature of the relationship between two numeric variables. Using a scatter plot, one can visually determine whether there exists a linear association between the two variables

# Covariance

Covariance is a measure of how much two random variables change together. It helps measure the direction and strength (degree) of the linear association between a pair of numeric variables.

![image.png](attachment:image.png)

Covariance helps to understand the association between the variables X and Y. It helps to notice the effect of the change in the value of X on Y, i.e. if the value of X increases, does the value of Y increase or decrease.

If the covariance value is positive, it indicates that while observing an increase in the value of X, the value of Y is likely to increase. Similarly, if the covariance value is negative, it indicates that while observing a decrease in the value of X, the value of Y is likely to decrease.

While covariance helps determine the direction of the association, its magnitude/strength is not easy to interpret.

In Python, the covariance of X and Y can be calculated using X.cov(Y)