### **Machine Learning Basics**

**Machine Learning (ML)** is a subfield of artificial intelligence (AI) that allows computers to learn from data and make predictions or decisions without being explicitly programmed for specific tasks. In traditional programming, we define rules and logic that computers follow to perform a task. However, in machine learning, we provide data, and the model automatically learns **patterns**, **relationships**, or **trends** within the data to perform tasks such as classification, regression, clustering, etc.

- **Data**: The primary input for a machine learning model is data, which can be in various forms such as numbers, text, images, etc.
- **Model**: A machine learning model is a mathematical representation that learns from the input data. Examples include linear regression models, decision trees, and neural networks.
- **Training**: The process where the model learns from the data is called **training**. The model adjusts its parameters based on the patterns it finds in the data.
- **Prediction**: Once trained, the model can make predictions or decisions based on new, unseen data.
- **Evaluation**: We assess how well the model is performing by evaluating it on test data using metrics like accuracy, precision, recall, etc.

#### **Machine Learning Workflow**
1. **Data Collection**: Gather and preprocess data.
2. **Model Selection**: Choose an appropriate machine learning model.
3. **Training**: Train the model using the training dataset.
4. **Evaluation**: Evaluate the model using test data.
5. **Deployment**: Deploy the model to make predictions in real-world applications.


### **Types of Machine Learning**

Machine learning can be broadly categorized into three main types based on how the algorithm learns and makes predictions:

1. **Supervised Learning**
2. **Unsupervised Learning**
3. **Reinforcement Learning**

Let's explore each type in detail:

---

#### **1. Supervised Learning**

**Supervised learning** is the most common type of machine learning where the model is trained on labeled data. In supervised learning, each training example is paired with an output label (known as the target). The goal is for the model to learn the mapping from input to output.

- **Input**: Features (X), such as numerical or categorical values.
- **Output**: Labels or target values (y), such as class labels or numerical values.
- **Objective**: Learn a function that maps input features to output labels.

##### **Types of Supervised Learning**
1. **Regression**: When the output variable is continuous. For example, predicting house prices based on features like size, location, etc.
   - **Example**: Linear Regression, Polynomial Regression.
2. **Classification**: When the output variable is categorical. For example, classifying emails as spam or not spam.
   - **Example**: Logistic Regression, Decision Trees, Random Forest, Support Vector Machines (SVM), K-Nearest Neighbors (KNN).

##### **Supervised Learning Example:**
Here’s an example of **Logistic Regression** for a classification task:

#### **2. Unsupervised Learning**

**Unsupervised learning** involves training a model on data that does not have labeled outcomes. The goal of unsupervised learning is to find hidden patterns, structures, or relationships in the data.

- **Input**: Features (X) without labels.
- **Objective**: Discover patterns or structures in the data, such as clusters or associations.

##### **Types of Unsupervised Learning**
1. **Clustering**: Grouping data points into clusters based on similarity. For example, customer segmentation.
   - **Example**: K-Means Clustering, Hierarchical Clustering.
2. **Association**: Finding relationships between variables in a dataset, such as market basket analysis.
   - **Example**: Apriori, Eclat.

#### **3. Reinforcement Learning**

**Reinforcement learning (RL)** is a type of machine learning where an agent learns how to make decisions by performing actions in an environment to maximize a cumulative reward. The agent receives feedback in the form of rewards or punishments based on its actions.

- **Input**: The current state of the environment.
- **Output**: Actions to take to maximize rewards.
- **Objective**: Learn the optimal policy (a strategy of actions) that maximizes long-term rewards.

##### **Reinforcement Learning Example:**
Reinforcement learning often involves more complex environments and requires specialized libraries such as OpenAI’s Gym for simulation. Here is a simple framework using the `gym` library:


### **Linear Regression**
- Linear regression is a fundamental algorithm in machine learning used for making predictions based on continuous variables. It works by finding a linear relationship between an input variable (X) and an output variable (Y).
- Linear regression aims to find the best fitting line to represent that data.

### The equation for linear regression is as follows:

**Y = β₀ + β₁X + ε**

- Y: The dependent variable you're trying to predict.
- β₀ (beta-nought): The y-intercept of the regression line. This is the predicted value of Y when X is zero.
- β₁ (beta-one): The slope of the regression line. It tells you how much Y changes on average for every one-unit increase in X.
- X: The independent variable you're using for prediction.
- ε (epsilon): The random error term. This represents the difference between the actual Y value and the value predicted by the equation (β₀ + β₁X).

### Example: sample data on hours of study and scores

In [2]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt


# Sample data: Hours of study vs marks scored
X = np.array([[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]])  # Hours of study
y = np.array([2, 4, 5, 4, 5, 6, 7, 8, 8, 10])  # Marks scored

# Split data into training(70%) and test(30%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
print(X_train)
print(X_test)
print(y_train)
print(y_test)

# Initialize the model
model = LinearRegression()

# Train the model
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Plotting results
plt.scatter(X, y, color='blue')
plt.plot(X_test, y_pred, color='red')
plt.xlabel('Hours of Study')
plt.ylabel('Marks Scored')
plt.show()

print(f"Predictions: {y_pred}")

ImportError: DLL load failed while importing _path: The specified module could not be found.

### Arrange Test Data, Predictions, and Actual Values on a table

In [4]:
import pandas as pd

## Flatten the X_test and y_pred to have same dimension as y_test
data = {'Hours Studied': X_test.flatten(), 'Predicted Marks': y_pred.flatten(), 'Actual Marks': y_test}
df = pd.DataFrame(data)

### Write the output to a excel file
df.to_excel("actual_predicted.xlsx", index=False)
print("\nTest Data, Predictions, and Actual Values:")
print(df.to_string(index=False))


Test Data, Predictions, and Actual Values:
 Hours Studied  Predicted Marks  Actual Marks
             9         8.853960             8
             2         2.980198             4
             6         6.336634             6


### Example: Built-in Diabetes data 

In [14]:
from sklearn.datasets import load_diabetes
import pandas as pd

# Load dataset
diabetes = load_diabetes()

# Convert to a DataFrame
df = pd.DataFrame(data=diabetes.data, columns=diabetes.feature_names)
df['target'] = diabetes.target

# Display first few rows
#print(df.head())
df


Unnamed: 0,age,sex,bmi,bp,s1,s2,s3,s4,s5,s6,target
0,0.038076,0.050680,0.061696,0.021872,-0.044223,-0.034821,-0.043401,-0.002592,0.019907,-0.017646,151.0
1,-0.001882,-0.044642,-0.051474,-0.026328,-0.008449,-0.019163,0.074412,-0.039493,-0.068332,-0.092204,75.0
2,0.085299,0.050680,0.044451,-0.005670,-0.045599,-0.034194,-0.032356,-0.002592,0.002861,-0.025930,141.0
3,-0.089063,-0.044642,-0.011595,-0.036656,0.012191,0.024991,-0.036038,0.034309,0.022688,-0.009362,206.0
4,0.005383,-0.044642,-0.036385,0.021872,0.003935,0.015596,0.008142,-0.002592,-0.031988,-0.046641,135.0
...,...,...,...,...,...,...,...,...,...,...,...
437,0.041708,0.050680,0.019662,0.059744,-0.005697,-0.002566,-0.028674,-0.002592,0.031193,0.007207,178.0
438,-0.005515,0.050680,-0.015906,-0.067642,0.049341,0.079165,-0.028674,0.034309,-0.018114,0.044485,104.0
439,0.041708,0.050680,-0.015906,0.017293,-0.037344,-0.013840,-0.024993,-0.011080,-0.046883,0.015491,132.0
440,-0.045472,-0.044642,0.039062,0.001215,0.016318,0.015283,-0.028674,0.026560,0.044529,-0.025930,220.0


### Example: Built-in longley data

In [7]:
import statsmodels.api as sm

# Load dataset
data = sm.datasets.longley.load_pandas().data

# Display first few rows
print(data.head())
#data

    TOTEMP  GNPDEFL       GNP   UNEMP   ARMED       POP    YEAR
0  60323.0     83.0  234289.0  2356.0  1590.0  107608.0  1947.0
1  61122.0     88.5  259426.0  2325.0  1456.0  108632.0  1948.0
2  60171.0     88.2  258054.0  3682.0  1616.0  109773.0  1949.0
3  61187.0     89.5  284599.0  3351.0  1650.0  110929.0  1950.0
4  63221.0     96.2  328975.0  2099.0  3099.0  112075.0  1951.0


### **Summary of Machine Learning Types**
1. **Supervised Learning**: Learn from labeled data (e.g., regression, classification).
2. **Unsupervised Learning**: Learn from unlabeled data to identify patterns or groupings (e.g., clustering, association).
3. **Reinforcement Learning**: Learn by interacting with an environment to maximize cumulative reward.

---

### **Conclusion**
Machine learning is a powerful tool that enables systems to learn and make decisions. The choice of learning method—supervised, unsupervised, or reinforcement learning—depends on the type of data and the problem you're trying to solve. With Python libraries such as `scikit-learn`, `pandas`, and `matplotlib`, it's easier than ever to implement these algorithms and apply machine learning to real-world problems.

These basics provide a foundation for diving deeper into specific algorithms and use cases in machine learning. You can experiment with more complex models and fine-tune them for real-world applications as you gain more experience.