## **freeCodeCamp-Machine-Learning-For-Everybody-Kylie-Ying**

https://www.youtube.com/watch?v=i_LwzRVP7bg

## Machine Learning Basics:

- Definition: Subset of AI that enables systems to learn and improve from experience.
- Types: Supervised, Unsupervised, Reinforcement.
- Components: Model, Features, Labels.
- Process: Training, Validation, Testing.
  
**Supervised Learning:**

- Goal: Predict output based on input-label pairs.
- Examples: Regression, Classification.
- Algorithms: Linear Regression, Decision Trees, Neural Networks.

**Unsupervised Learning:**

- Goal: Extract patterns from data without labeled outputs.
- Examples: Clustering, Association.
- Algorithms: K-Means, Hierarchical Clustering, Apriori.

**Reinforcement Learning:**

- Goal: Learn optimal actions in an environment.
- Components: Agent, Environment, Actions, Rewards.
- Algorithms: Q-Learning, Deep Q Network (DQN).

**Neural Networks:**

- Mimic human brain structure.
- Layers: Input, Hidden, Output.
- Types: Feedforward, Recurrent, Convolutional.

**Training a Model:**

- Loss Function: Measures error.
- Optimization: Minimize loss using algorithms (Gradient Descent).
- Epochs: Number of iterations over dataset.
- Batch Size: Number of samples processed per iteration.

**Evaluation Metrics:**

- Accuracy: Proportion of correctly classified instances.
- Precision: Proportion of true positives among predicted positives.
- Recall: Proportion of true positives among actual positives.
- F1 Score: Balance between precision and recall.

**Overfitting and Underfitting:**

- Overfitting: Model learns training data too well, performs poorly on new data.
- Underfitting: Model is too simple, fails to capture patterns.

**Feature Engineering:**

- Selecting and transforming features for better model performance.
- Examples: Scaling, One-Hot Encoding, Feature Creation.

**Cross-Validation:**

- Technique to assess model generalization.
- Splits data into training and validation sets multiple times.

**Bias and Variance:**

- Bias: Error due to overly simplistic assumptions.
- Variance: Error due to too much complexity.
- Balancing: Aim for optimal trade-off.

**Ensemble Learning:**

- Combining multiple models for better performance.
- Examples: Random Forest, Gradient Boosting.

## AI vs ML vs data science

- **AI:**
  - Enable machines for human-like tasks.

- **ML:**
  - Subset of AI.
  - Solves problems.
  - Predicts from data.

- **Data Science:**
  - Finds patterns.
  - Draws data insights.

## Types of learning

- **Supervised Learning:**
  - Trains on labeled data.
  - Predicts output.

- **Unsupervised Learning:**
  - Uses unlabeled data.
  - Learns data patterns.

- **Reinforcement Learning:**
  - Interactive environment.
  - Rewards and penalties.

## Machine Learning Flow Chart

1. **Input: Feature Vector**
   - Feature 1
   - Feature 2
   - Feature 3

2. **Model**
   - Processes the feature vector.

3. **Output: Output Label**
   - Prediction or classification result.

## Features
  - Qualitative:
    - Nominal: Categories with no order (e.g., colors: red, blue).
    - Ordinal: Categories with a meaningful order (e.g., education levels: high school, college).

  - Quantitative:
    - Discrete: Countable and separate values (e.g., number of students in a class).
    - Continuous: Infinite possible values within a range (e.g., height, weight).

## One-Hot Encoding
  - Encoding categorical data.
  - Example: Countries - U.S., India, Canada, France.

| U.S. | India | Canada | France |
|------|-------|--------|--------|
| 1    | 0     | 0      | 0      |
| 0    | 1     | 0      | 0      |
| 0    | 0     | 1      | 0      |
| 0    | 0     | 0      | 1      |

## Supervised Learning Tasks
  - **Classification:**
    - Predicts discrete classes.
      - Binary: e.g., Pizza or not, Spam or not, Cat vs. Dog.
      - Multiclass: e.g., Pictures into Cat, Dog, Ice Cream.

  - **Regression:**
    - Predicts continuous values.
      - e.g., Real estate prices.

## Dataset Representation
- Row: Represents a data sample.
- Column: Represents a unique feature.
- Feature Vector: Unique data sample without output label.
- Target: Output for the feature vector.

- **Notation:**
  - **Capital X:**
    - List of all feature vectors (Feature Matrix).
  - **Small y:**
    - Label vector or target vector.

## **Dataset:** Heart Attack Prediction

| Height (cm) | Weight (kg) | Glucose Level | Blood Pressure | Exercise (hours/week) | Heart Attack Before 60 |
|-------------|-------------|---------------|-----------------|------------------------|-------------------------|
| 175         | 70          | 90            | 120/80          | 3                      | No                      |
| 160         | 65          | 95            | 130/85          | 2                      | No                      |
| 180         | 80          | 105           | 140/90          | 1                      | Yes                     |
| 165         | 55          | 85            | 110/75          | 4                      | No                      |

- **Features:**
  - Height (cm)
  - Weight (kg)
  - Glucose Level
  - Blood Pressure
  - Exercise (hours/week)

- **Target:** Heart Attack Before 60 (Yes/No)

- **Feature Matrix (Capital X):**
  
| Height (cm) | Weight (kg) | Glucose Level | Blood Pressure | Exercise (hours/week) |
|-------------|-------------|---------------|-----------------|------------------------|
| 175         | 70          | 90            | 120/80          | 3                      |
| 160         | 65          | 95            | 130/85          | 2                      |
| 180         | 80          | 105           | 140/90          | 1                      |
| 165         | 55          | 85            | 110/75          | 4                      |

- **Output vector (Small y):**

| Heart Attack Before 60 |
|------------------------|
| No                     |
| No                     |
| Yes                    |
| No                     |

👉 Each row in the Feature Matrix corresponds to the respective row in the Output Matrix.

## Data Splitting
  - Training: 80%
  - Validation: 10%
  - Testing: 10%

- **Validation Set:**
  - Used during/after training.
  - Reality check for model's handling of unseen data.

- **Test Set:**
  - Assesses final reported model performance.
  - Measures generalizability.

## Measuring model performance

- **Loss:**
  - Difference between predictions and actual values.
  - Goal: Minimize during training.

- **L1 Loss:**
  - $L1 = \sum \lvert \text{Predicted} - \text{True} \rvert$
  - Linear loss, linear penalty.

- **L2 Loss:**
  - $L2 = \sum (\text{Predicted} - \text{True})^2$
  - Quadratic loss, low penalty at lower values, high penalty at higher values.

- **Accuracy:**
  - Metric for model performance.
  - Example:

| Actual |  Predicted  |
|-----------|----------|
| Apple     | ✅ Apple    |
| Orange    | ✅ Orange   |
| Apple     | ❌ Orange   |
| Apple     | ✅ Apple    |
| Orange    | ✅ Orange   |

Model accuracy: 80% (4 out of 5 correct predictions).

## Linear Regression

- **Assumptions:**
  - Linearity: Relationship between variables is linear.
  - Independence: Residuals are independent.
  - Homoscedasticity: Residuals have constant variance.
  - Normality: Residuals follow a normal distribution.

- **Sum of Squared Errors (SSE):**
  - Measures the sum of squared differences between predicted and actual values.
  - $ SSE = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $
  - Objective: Minimize SSE to obtain the best-fitting line.

- **Simple Linear Regression:**
  - One independent variable.
  - Equation: $ y = b_0 + b_1 \cdot x + \varepsilon $
  - Graph: Represents a straight line.

- **Multiple Linear Regression:**
  - Multiple independent variables.
  - Equation: $ y = b_0 + b_1 \cdot x_1 + b_2 \cdot x_2 + \ldots + b_n \cdot x_n + \varepsilon $
  - Graph: Represents a hyperplane in higher dimensions.

- **ε (Epsilon):**
  - Error term in regression equations.
  - Represents the unobserved factors affecting the dependent variable.

## Evaluation Metrics for Linear Regression

👉 $ \hat{y}_i $ represents the model's predictions.

👉 $ \bar{y} $ is the average of the actual $y$ values.
  
  - **Mean Absolute Error (MAE):**
    - $ MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i| $
  - **Mean Squared Error (MSE):**
    - $ MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $
  - **Root Mean Squared Error (RMSE):**
    - $ RMSE = \sqrt{MSE} $
  - **R-squared ($ R^2 $): Coefficient of determination**
    - Measures the proportion of variance in the dependent variable explained by the independent variables.
    - $ R^2 = 1 - \frac{RSS}{TSS} $
    - $ RSS $: Residual Sum of Squares ($ \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $)
    - $ TSS $: Total Sum of Squares ($ \sum_{i=1}^{n} (y_i - \bar{y})^2 $)
    - Ranges from 0 to 1.
    - Higher $R^2$ indicates better fit.

#  K-Means Clustering

![image.png](attachment:image.png)

👉 Unsupervised machine learning algorithm for partitioning data into 'K' clusters.

- **Process:**
  1. **Initialization:** Randomly select 'K' centroids.
  2. **Assignment:** Assign each point to the nearest centroid.
  3. **Update:** Recalculate centroids based on assigned points.
  4. **Repeat:** Iteratively reassign and update centroids.
- **Objective:** Minimize intra-cluster variance.
- **Applications:** Image segmentation, customer segmentation, anomaly detection.
- **Limitations:** Sensitive to initial centroids, assumes clusters are spherical.

![image-2.png](attachment:image-2.png)

**🚨 Expectation maximization**
![image-3.png](attachment:image-3.png)

# Principal Component:
  - Direction of maximum variance in data.
  - **Key Points:**
    - Each principal component is orthogonal to others.
    - Used to simplify complex data structures.
  - **Significance:**
    - First component captures the most variance.
    - Subsequent components capture less, in order.
  - **Application:**
    - Basis for dimensionality reduction in PCA.
    - Represents the dominant patterns in the data.

# Principal Component Analysis (PCA):

![image.png](attachment:image.png)

  - PCA is sensitive to scale, so standardizing is crucial.
  - **Purpose:**
    - Simplify data while retaining key info.
    - Focuses on key features.
  - **Process:**
    - Find axes (principal components) of max variance.
    - Project data onto these components.
  - **Result:**
    - New uncorrelated features (principal components).
      - Reduces dimensionality.
    - Principal components capture data variability.
      - Maintains data variance.
  - **Applications:** Dimensionality reduction, noise reduction.
  - **Useful for:**
    - Visualizing high-dimensional data.
    - Speeding up machine learning algorithms.