# Machine Learning Terminologies

## Basic Concepts

### Algorithm

A set of rules or steps used to solve a problem. In machine learning, algorithms process data to make predictions or decisions.

### Model

A mathematical representation created by a machine learning algorithm. It makes predictions or decisions based on input data.

### Training

The process of teaching a model to make predictions by feeding it data and adjusting its parameters.

### Dataset

A collection of data used to train, validate, and test machine learning models.

### Feature

An individual measurable property or characteristic of a phenomenon being observed. Features are the input variables used in predictions.

### Label

The output variable or the target that the model is trying to predict.

## Types of Learning

### Supervised Learning

A type of machine learning where the model is trained on labeled data, meaning the input comes with the correct output.

### Unsupervised Learning

A type of machine learning where the model is trained on data without labels and must find patterns or structure in the data.

### Reinforcement Learning

A type of machine learning where an agent learns to make decisions by receiving rewards or penalties based on its actions.

## Model Evaluation

### Classification

A type of supervised learning where the model predicts a discrete label, such as "spam" or "not spam".

### Regression

A type of supervised learning where the model predicts a continuous value, such as house prices or stock prices.

### Clustering

A type of unsupervised learning where the model groups similar data points together, such as customer segmentation.

### Overfitting

When a model learns the training data too well, (basically memorizes), including noise and outliers, and performs poorly on new data.

### Underfitting

When a model is too simple to capture the underlying patterns in the data, leading to poor performance.

### Validation Set

A subset of the dataset used to tune model parameters and assess its performance during training.

### Test Set

A subset of the dataset used to evaluate the final model performance and generalization to new data.

### Cross-Validation

A technique for assessing how well a model generalizes by splitting the dataset into multiple training and validation sets.

## Errors and Regularization

### Bias

Error introduced by approximating a real-world problem, which may be complex, by a simplified model.

### Variance

The model's sensitivity to changes in the training data. High variance can lead to overfitting.

### Regularization

Techniques used to prevent overfitting by adding a penalty to the loss function for large model parameters, such as L1 or L2 regularization.

## Optimization

### Loss Function

A function that measures the difference between the model's predictions and the actual labels. It guides the optimization process.

### Optimization

The process of adjusting the model's parameters to minimize the loss function.

`optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)`

### Gradient Descent

An optimization algorithm used to minimize the loss function by iteratively updating the model's parameters in the direction of the steepest descent.

### Epoch

One complete pass through the entire training dataset.

### Batch Size

The number of training examples used in one iteration of the optimization process.

### Learning Rate

A hyperparameter that controls the step size during gradient descent. It determines how quickly or slowly the model learns.

### Hyperparameter

Parameters that are set before training and control the training process, such as learning rate and batch size.

## Neural Networks and Deep Learning

### Neural Network

A computational model inspired by the human brain, consisting of layers of interconnected nodes (neurons) that learn to recognize patterns in data.

### Deep Learning

A subset of machine learning involving neural networks with many layers (deep neural networks) that can learn complex patterns in large datasets.

### Activation Function

A function applied to a neuron's output in a neural network, introducing non-linearity and allowing the network to learn complex patterns.

### Convolutional Neural Network (CNN)

A type of neural network designed for processing structured grid data, such as images. A convolutional neural network (CNN) is a type of artificial neural network used primarily for image recognition and processing, due to its ability to recognize patterns in images.

### Recurrent Neural Network (RNN)

A type of neural network designed for sequential data, such as time series or natural language.

### Transfer Learning

A technique where a pre-trained model is used as the starting point for a new task, leveraging learned knowledge from the previous task.

### Feature Engineering

The process of selecting, modifying, and creating new features from raw data to improve model performance.

### Dimensionality Reduction

Techniques used to reduce the number of features in a dataset while preserving important information, such as PCA (Principal Component Analysis).

## Advanced Concepts

### Computation Graph

A visual and mathematical way to represent the sequence of operations that are performed to compute a function. It is used in deep learning to represent the operations that transform input data through various layers to produce the output.

### Automatic Differentiation

A technique used to automatically compute the gradients of functions. In PyTorch, this is handled by `torch.autograd`, which tracks operations on tensors to enable gradient computation during backpropagation.

### Gradient

A measure of how a function changes as its input changes. In machine learning, the gradient represents the rate of change of the loss function with respect to the model's parameters.

### Backpropagation

A method used to compute the gradient of the loss function with respect to each parameter by applying the chain rule of calculus. It allows the optimizer to update the parameters in a way that reduces the loss.

## Common Loss functions

Common loss functions include nn.MSELoss (Mean Square Error) for regression tasks, and nn.NLLLoss (Negative Log Likelihood) for classification. nn.CrossEntropyLoss combines nn.LogSoftmax and nn.NLLLoss.

## Additional Machine Learning Terminologies

### Ensemble Learning

A machine learning technique that combines multiple models to improve performance. Examples include Random Forests and Gradient Boosting Machines (GBMs).

### Bagging

A technique in ensemble learning where multiple models are trained independently on different subsets of the data, and their predictions are aggregated to make a final prediction.

### Boosting

A technique in ensemble learning where models are trained sequentially, with each model trying to correct the errors of its predecessors.

### Dropout

A regularization technique used in neural networks to prevent overfitting by randomly deactivating a fraction of neurons during training.

### Batch Normalization

A technique used in neural networks to normalize the activations of each layer, which can speed up training and improve performance.

### Word Embedding

A representation of words as dense vectors in a continuous vector space. Word embeddings capture semantic relationships between words and are commonly used in natural language processing tasks.

### Attention Mechanism

A mechanism used in neural networks, particularly in sequence-to-sequence models, to focus on relevant parts of the input sequence when making predictions.

### Generative Adversarial Networks (GANs)

A type of neural network architecture consisting of two networks, a generator and a discriminator, that are trained together in a competitive setting. GANs are used to generate new data samples that are similar to a given dataset.

### Autoencoder

A neural network architecture used for unsupervised learning, where the model learns to encode input data into a lower-dimensional representation and then decode it back to the original input.

### Hyperparameter Tuning

The process of finding the optimal values for hyperparameters to improve the performance of a machine learning model. Techniques include grid search, random search, and Bayesian optimization.

### Precision and Recall

Metrics used to evaluate the performance of classification models. Precision measures the proportion of true positives among all predicted positives, while recall measures the proportion of true positives among all actual positives.

### F1 Score

The harmonic mean of precision and recall, used as a single metric to evaluate the overall performance of a classification model.

### ROC Curve and AUC

Tools used to evaluate the performance of binary classification models. The ROC curve plots the true positive rate against the false positive rate at various thresholds, and the AUC (Area Under the Curve) measures the area under the ROC curve.

### K-Means Clustering

A popular clustering algorithm that partitions data into K clusters based on similarity. It aims to minimize the within-cluster variance.

### Support Vector Machine (SVM)

A supervised learning algorithm used for classification and regression tasks. SVM finds the hyperplane that best separates the data points of different classes in a high-dimensional space.

### One-Hot Encoding

A technique used to convert categorical variables into a numerical representation. Each category is represented by a binary vector, where only one element is 1 (hot) and the rest are 0 (cold).

### Cross-Entropy Loss

A loss function commonly used in classification tasks, particularly when the output of the model is a probability distribution over multiple classes.

### Word2Vec

A popular word embedding technique that learns vector representations of words by predicting the surrounding words in a text corpus.

### Long Short-Term Memory (LSTM)

A type of recurrent neural network architecture designed to capture long-term dependencies in sequential data, such as time series or natural language.

### Transformer

A deep learning architecture based solely on self-attention mechanisms, used primarily in natural language processing tasks such as machine translation and language modeling.

### Reinforcement Learning Terms

-   **Policy**: A strategy or rule that an agent uses to make decisions in a reinforcement learning environment.
-   **Value Function**: A function that estimates the expected return (total reward) of following a particular policy.
-   **Q-Learning**: A model-free reinforcement learning algorithm that learns the value of taking an action in a particular state.
-   **Exploration vs. Exploitation**: The trade-off between trying out new actions (exploration) and selecting actions with the highest known rewards (exploitation) in reinforcement learning.
-   **Markov Decision Process (MDP)**: A mathematical framework used to model decision-making processes in reinforcement learning, consisting of states, actions, transition probabilities, and rewards.

### Natural Language Processing (NLP) Terms

-   **Tokenization**: The process of breaking down a text into smaller units, such as words or subwords, for further analysis.
-   **N-gram**: A contiguous sequence of N items (typically words or characters) in a text.
-   **TF-IDF (Term Frequency-Inverse Document Frequency)**: A numerical statistic that reflects the importance of a word in a document relative to a collection of documents.
-   **Named Entity Recognition (NER)**: A task in NLP that involves identifying and classifying named entities (such as names of people, organizations, and locations) in a text.
-   **Word Sense Disambiguation**: The task of determining the correct meaning (sense) of a word in context, especially when the word has multiple meanings.

### Layer

A layer consists of small individual units called neurons. A neuron in a neural network can be better understood with the help of biological neurons. An artificial neuron is similar to a biological neuron. It receives input from the other neurons, performs some processing, and produces an output

### ReLU - Rectified linear unit - ReLU(x)=max(0,x)

    Also known as the rectifier activation function

Unlike the sigmoid and tanh functions, ReLU is a non-saturating function, which means that it does not become flat at the extremes of the input range
