## Stratification

- **Definition**: Stratification is the process of dividing members of the population into homogeneous subgroups before sampling. The strata should define a partition of the population. That is, it should be collectively exhaustive and mutually exclusive: every element in the population must be assigned to one and only one stratum.

- **Use case and Intuition**: Stratification is used when an entity wants to ensure that the sample represents certain characteristics in the population. The strata are formed based on members' shared attributes or characteristics such as income level, education level, etc.

- **5 Common Usages**:
    1. Stratified Random Sampling: In statistical surveys, when populations are divided into strata, a random sample is taken from each stratum in a number that is proportional to the stratum's size when compared to the population. These subsets of the strata are then pooled to form a random sample.
    2. Stratified Shuffle Split: It is a merge of Stratified K-Fold and Shuffle Split, which returns stratified randomized folds. The folds are made by preserving the percentage of samples for each class.
    3. Stratified Cross-Validation: In stratified cross-validation, the folds are selected so that the mean response value is approximately equal in all the folds. In the case of a dichotomous classification, this means that each fold contains roughly the same proportions of the two types of class labels.
    4. Stratified Train/Test Split: It is used in the splitting of data in a way that preserves the same proportions of examples in each class as observed in the original dataset.
    5. Stratified Sampling for Handling Imbalanced Datasets: In imbalanced datasets, stratified sampling can help in ensuring that the train, validation, and test sets have the same proportion of samples for each class as found in the original dataset.

- **Assumptions and Cautions**: Stratification assumes that the population is easily divisible into discrete subgroups. If stratification is done incorrectly, and the strata or layers do not accurately represent the population, then it can lead to selection bias, significantly reducing the statistical power of the output.

- **Interpretation**: Stratification ensures that each subset of the dataset has the same proportions of the different target classes as the original dataset. This is particularly useful in classification problems where the target class is imbalanced.

## Cross-Validation

- **Definition**: Cross-validation is a resampling procedure used to evaluate machine learning models on a limited data sample. The procedure has a single parameter called k that refers to the number of groups that a given data sample is to be split into.

- **Use case and Intuition**: Cross-validation is primarily used in applied machine learning to estimate the skill of a machine learning model on unseen data. That is, to use a limited sample in order to estimate how the model is expected to perform in general when used to make predictions on data not used during the training of the model.

- **5 Common Usages**:
    1. K-Fold Cross-Validation: The procedure involves taking a dataset and dividing it into k groups or folds. First, we train on k-1 groups and then test on the remaining group. We repeat this process k times so that we have a performance measure for each group. The result is usually summarized with the mean of the model skill scores. It is generally the gold standard for estimating the performance of a machine learning algorithm on unseen data with k set to 5 or 10.
    2. Stratified K-Fold Cross-Validation: This is a variation of k-fold that returns stratified folds: each set contains approximately the same percentage of samples of each target class as the complete set. It is used when

## Feature Engineering

- **Definition**: Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms.

- **Use case and Intuition**: Feature engineering can be used in almost all data science projects since it helps to extract the most relevant information from the data, which in turn improves the performance of machine learning models.

- **5 Common Usages**:
    1. One-Hot Encoding: It is a process of converting categorical data variables so they can be provided to machine learning algorithms to improve predictions. One hot encoding is a crucial part of feature engineering for machine learning.
    2. Binning: The process of transforming continuous numerical variables into discrete categories for grouped analysis.
    3. Polynomial Features: It is used to create interactions among features.
    4. Custom Transformations: Logarithmic, square roots, or reciprocals to reduce the skewness of data.
    5. Date/Time Features: Extracting information like 'month of the year', 'day of the week', 'hour of the day', etc.

- **Assumptions and Cautions**: Feature engineering is more of an art than a science, and it heavily depends on the dataset and the problem at hand. It's always important to understand the underlying data and the business problem before deciding on the most appropriate feature engineering techniques.

- **Interpretation**: Feature engineering can significantly improve the performance of machine learning models by creating meaningful features from the data.

## Transformation

- **Definition**: Transformation in the context of data processing is the replacement of a variable by a function of that variable: for example, replacing a variable x by the square root of x or the logarithm of x. In a stronger sense, a transformation is a replacement that changes the shape of a distribution or relationship.

- **Use case and Intuition**: Data transformation is a process that converts data from one format or structure into another format or structure. It is a fundamental aspect of most data integration and data management tasks such as data wrangling, data warehousing, data integration and application integration.

- **5 Common Usages**:
    1. Log Transformation: Used when data is highly skewed, it can help to reduce the skewness.
    2. Square Root Transformation: This is a moderately strong transformation with a substantial effect on distribution shape.
    3. Box-Cox Transformation: This is a family of power transformations indexed by a parameter lambda. When lambda is zero, the Box-Cox transformation equals the log transformation.
    4. Yeo-Johnson Transformation: This is similar to the Box-Cox transformation but can be used on datasets containing zero and negative values.
    5. Quantile Transformation: This transforms the features to follow a uniform or a normal distribution. Therefore, for a given feature, this transformation tends to spread out the most frequent values.

- **Assumptions and Cautions**: The choice of transformation can depend on many things, but a few common reasons for transformation include improving the interpretability or appearance of graphs and helping meet the assumptions of inferential procedures.

- **Interpretation**: Transformations can make patterns in the data more interpretable and meet the assumptions of statistical tests.
