# **Introduction to Machine Learning and Artificial Intelligence (August - September 2024)**
<br>

<br>

![alt text](image.png)

**Lecturer:** Dr. Darshan Ingle

**Modules Covered:**
Matplotlib (matplotlib), WordCloud (wordcloud), HuggingFace Transformers (transformers), FastText (fasttext), Numpy (numpy), SMOTE (imblearn.over_sampling.SMOTE), GloVe (glove-python), Keras API (tensorflow.keras), NLTK (nltk), Seaborn (seaborn), Keras (tensorflow.keras), TQDM (tqdm), TensorFlow (tensorflow), Pandas (pandas), Scikit-learn (sklearn)

<br>
<br>

# Day 4: Artificial Neural Networks (ANN) and Deep Learning. Convolutional Neural Networks (CNN)
## Key Concepts:
**1. Artificial Neural Networks:**
* **Mimic Human Brain:** ANN models aim to simulate how the human brain processes information.
* **Basic Flow:** Input → Processing → Output

**2. Training Process:**
* **Epoch:** One forward pass (propagation) + one backward pass (propagation) through the network. An epoch refers to a complete iteration over the entire training dataset.

**3. Frameworks and Libraries:**
* **TensorFlow:** A popular open-source library for building and training neural networks.
* **Keras API:** A high-level API for building and training neural networks that runs on top of TensorFlow (included in TF 2.0).

**4. Artificial Neuron (Perceptron):**
* Developed in 1949, it is a basic unit of neural networks, inspired by biological neurons.

**5. Example:**
* Predicting whether to attend a Punjabi food festival based on weather, spouse’s preference, and availability of train/metro.

**6. Mathematics of Neurons:**
* **Thresholding:** An artificial neuron fires if the weighted sum of inputs exceeds a certain threshold 
$$ \sum_{i} (X_i \cdot w_i) \geq \text{threshold} $$
$$ \sum_{i} (X_i \cdot w_i) - \text{threshold} \geq 0 $$
* **Bias:** 
$$b = - \text{threshold}$$
* **Activation Decision:**
$$\begin{align*}
\text{If }(w \cdot x) + b &\leq 0, \quad \text{the output is \textbf{false.}} \\
\text{If }(w \cdot x) + b &> 0, \quad \text{the output is \textbf{true.}}
\end{align*}$$

**7. Activation Functions:**
* **Early Activation:** Step activation function (discrete, obsolete).
* **Current Activation Functions:**
    * **Binary Classification:** sigmoid
    * **Multiclass Classification:** softmax
    * **Regression:** linear
    * **Hidden Layers:**
        * **ReLU** (Rectified Linear Unit)
        * **Leaky ReLU**
        * **ELU** (Exponential Linear Unit): Good for reducing bias.
        * **GELU** (Gaussian Error Linear Unit)
        * **SELU** (Scaled Exponential Linear Unit)
        * **SiLU** (Sigmoid Linear Unit)
    * **Tanh:** Hyperbolic tangent function, useful for NLP tasks with values in the range [-1, 1].
    * **Mish:** Effective for computer vision tasks.
* [TensorFlow Activation Functions](https://www.tensorflow.org/api_docs/python/tf/keras/activations)
* [Keras Activation Functions](https://keras.io/api/layers/activations/#tanh-function)

**8. Model Building:**
* **Single Perceptron Neural Network Example:** Basic ANN with one neuron.
* **Activation Function:** Sigmoid for binary classification. SoftMax for multiclass.
* **Dense Layers:** Fully connected layers where each neuron is connected to every neuron in the previous layer.

**9. Optimization:**
* **Optimizers:**
    * **Adam:** Adaptive Moment Estimation.
    * **SGD (Stochastic Gradient Descent):** A method for optimizing the loss function.
* **Loss Functions:**
    * **Binary Crossentropy:** Used for binary classification.
    * **Sparse Categorical Crossentropy:** Used for multi-class classification where labels are integers.

**10. Training Process:**
    * **Forward Pass:** Calculating output from the input data.
    * **Backpropagation:** Adjusting weights based on the error from the forward pass.
    * **1 Epoch:** Includes one forward pass and one backward pass.

**11. Evaluation Metrics:**
* **Accuracy:** Measure of how often the model's predictions are correct.
* **Graphs:**
    * **Loss/Validation Loss Graph:** Shows how the loss changes over epochs.
    * **Accuracy/Validation Accuracy Graph:** Shows how the accuracy changes over epochs.
* **r.history:** Object storing training metrics for plotting.

**12. Output Handling:**
* **Probability Output:** Neural networks often output probabilities (e.g., [0, 1] for binary classification).
* **Rounding and Flattening:** Convert probabilities to class labels if necessary.

**13. Saving and Defining Models:**
* Saving: Save the model using a '.keras' extension.
* Define SGD: Custom implementation if needed.
* Pass model for Deployment.

**14. MNIST Dataset:**
* A classic dataset for digit recognition tasks.

**15. Batch Size:**
* Default is 32. 
* Refers to the number of samples processed before the model's internal parameters are updated.

**16. Cluster Analysis:**
* **Number of Neurons:** A common heuristic (approach to problem solving that employs a pragmatic method that is not fully optimized) is $2^n$ for hidden layers where n is the number of input features (peceptron count).

**17. Softmax Activation:**
* Used for multiclass classification to convert logits to probabilities and handle overfitting.

**18. Dropout:** 
* A Simple Way to Prevent Neural Networks from Overfitting
* [Paper](https://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf)

**19. Transfer Learning:**
* **Concept of Tranfer Learning:** a pretrained model reused or adapted to a different, but related task.
* Image handling using Transfer Learning with pretrained models
* **Example:** Image recognition using VGG16
* **Preprocessing for VGG16:**
    * Typical preprocessing steps to adapt data for the VGG16 model involves resizing and normalization.
* [Pretrained Model Suggestions ChatGPT](https://chatgpt.com/share/1f3d3db9-5181-40c1-a29a-c981d77cf8ad)