## 75 terms every data scientist should know

---

# A
## Accuracy
The proportion of correctly predicted instances out of the total instances.  
*Example*: If a model predicts 90 out of 100 test cases correctly, its accuracy is 90%.

## Area Under Curve (AUC)
A metric used to evaluate the performance of a binary classifier by calculating the area under the ROC curve.  
*Example*: An AUC of 0.9 indicates a strong model performance.

## ARIMA
A forecasting algorithm based on the idea of modeling the differences between values in a time series.  
*Example*: Used to predict future stock prices based on past trends.

---

# B
## Bias
The error introduced by approximating a real-world problem with a simplified model.  
*Example*: A model that predicts all outcomes as positive might have high bias.

## Bayes Theorem
A formula used to calculate conditional probabilities.  
*Example*: Helps update the probability of having a disease based on new test results.

## Binomial Distribution
A probability distribution that summarizes the likelihood of a value occurring a fixed number of times in repeated experiments.  
*Example*: The probability of flipping a coin 5 times and getting heads exactly 3 times.

---

# C
## Clustering
Grouping similar data points together.  
*Example*: Grouping customers into segments based on purchasing behavior.

## Confusion Matrix
A table that shows the performance of a classification model by comparing actual and predicted classes.  
*Example*: It shows true positives, false negatives, etc., for a binary classifier.

## Cross-validation
A technique for assessing how well a model generalizes by training and testing it on different subsets of the data.  
*Example*: Using 5-fold cross-validation to evaluate a model's performance.

---

# D
## Decision Trees
A flowchart-like tree structure used for decision making and classification.  
*Example*: A tree that helps decide whether a customer will buy a product or not based on age and income.

## Dimensionality Reduction
Reducing the number of input variables in a dataset while retaining important information.  
*Example*: Using PCA to reduce a 100-dimensional dataset to 10 dimensions.

## Discriminative Model
A model that directly maps inputs to outputs without modeling the underlying distribution.  
*Example*: Logistic regression is a discriminative model.

---

# E
## Ensemble
Combining multiple models to improve overall prediction accuracy.  
*Example*: Using Random Forest (an ensemble of decision trees) for better predictions.

## EDA (Exploratory Data Analysis)
Analyzing data sets to summarize their main characteristics.  
*Example*: Plotting histograms and scatter plots to understand the distribution of data.

## Entropy
A measure of the uncertainty or randomness in data.  
*Example*: Higher entropy means more disorder; flipping a fair coin has higher entropy than flipping a biased coin.

---

# F
## Feature Engineering
The process of selecting, modifying, and creating variables to improve model performance.  
*Example*: Converting the date of birth into age for a predictive model.

## F-score
A measure of a model's accuracy that combines precision and recall.  
*Example*: An F1 score of 1 indicates perfect precision and recall.

## Feature Extraction
The process of transforming raw data into features that better represent the model.  
*Example*: Extracting edges from images for a machine learning model.

---

# G
## Gradient Descent
An optimization algorithm used to minimize the loss function in machine learning models.  
*Example*: Adjusting the slope \( m \) and intercept \( c \) in linear regression to find the best fit.

## Gaussian Distribution
A bell-shaped probability distribution where most values cluster around the mean.  
*Example*: Heights of people in a population often follow a Gaussian distribution.

## Gradient Boosting
An ensemble technique that builds models sequentially, with each new model correcting the errors of the previous ones.  
*Example*: XGBoost is a popular gradient boosting algorithm.

---

# H
## Hypothesis
An assumption made about a data sample that can be tested for its validity.  
*Example*: "Increasing study hours leads to better test scores" is a hypothesis.

## Hierarchical Clustering
A method of clustering where nested clusters are created by successively merging or splitting them.  
*Example*: Organizing animals into hierarchical clusters based on their species.

## Heteroscedasticity
A situation in which the variance of errors or residuals is not constant across observations.  
*Example*: In stock price prediction, errors increase as stock price increases.

---

# I
## Information Gain
A measure used to decide which feature to split on in a decision tree by determining how much information is gained from a feature.  
*Example*: Choosing "Income Level" over "Age" for the first split in a decision tree.

## Independent Variable
The input features or predictors used to predict the target variable in a model.  
*Example*: Age, income, and education level are independent variables used to predict spending behavior.

## Imbalance
A situation where the number of observations for each class in a classification task is unequal.  
*Example*: A dataset with 95% non-spam emails and 5% spam emails is imbalanced.

---

# J
## Jupyter
An open-source tool for creating and sharing documents that contain live code, equations, visualizations, and text.  
*Example*: Jupyter notebooks are widely used for data science tasks.

## Joint Probability
The probability of two events occurring together.  
*Example*: The probability of it raining and being a weekday.

## Jaccard Index
A statistic used to compare the similarity and diversity of sample sets.  
*Example*: Used to measure the overlap between two clusters in clustering analysis.

---

# K
## Kernel Density Estimation
A non-parametric way to estimate the probability density function of a random variable.  
*Example*: Estimating the distribution of data points using a smooth curve instead of a histogram.

## KS Test (Kolmogorov-Smirnov Test)
A statistical test used to compare a sample with a reference probability distribution.  
*Example*: Checking if a dataset follows a normal distribution.

## KMeans Clustering
A popular unsupervised learning algorithm that divides data into \( K \) clusters.  
*Example*: Grouping customers into 3 segments based on purchase history.

---

# L
## Likelihood
The probability of observing the data given the parameters of a statistical model.  
*Example*: Finding the likelihood of getting a heads on a coin given that it is fair.

## Linear Regression
A model that predicts a continuous outcome based on a linear relationship between the input and the output.  
*Example*: Predicting house prices based on square footage.

## L1/L2 Regularization
Techniques used to prevent overfitting by adding a penalty to the model's loss function.  
*Example*: L2 regularization is commonly used in ridge regression.

---

# M
## Maximum Likelihood Estimation (MLE)
A method for estimating the parameters of a model by maximizing the likelihood function.  
*Example*: MLE is used to estimate the parameters of logistic regression models.

## Multicollinearity
A situation where independent variables in a regression model are highly correlated.  
*Example*: Including both height in inches and height in centimeters as features would lead to multicollinearity.

## Mutual Information
A measure of the mutual dependence between two variables.  
*Example*: Used to identify the strength of the relationship between two variables in feature selection.

---

# N
## Naive Bayes
A classification algorithm based on Bayes' Theorem with an assumption of independence between features.  
*Example*: Used for spam filtering in emails.

## Normalisation
The process of scaling data so that it fits within a specific range, often between 0 and 1.  
*Example*: Scaling student scores between 0 and 1 for comparison.

## Null Hypothesis
A hypothesis that assumes no relationship exists between two variables.  
*Example*: "Studying hours have no effect on test scores" is a null hypothesis.

---

# O
## Overfitting
A model that performs well on training data but poorly on unseen data due to being too complex.  
*Example*: A decision tree that perfectly fits the training data but fails on new data.

## Outliers
Data points that are significantly different from the rest of the data.  
*Example*: A test score of 0 in a class where most students scored between 70 and 90 is an outlier.

## One-hot encoding
A method of converting categorical variables into a numerical format by creating binary columns for each category.  
*Example*: Converting "red", "blue", "green" categories into [1,0,0], [0,1,0], and [0,0,1].

---

# P
## PCA (Principal Component Analysis)
A technique for reducing the dimensionality of data by transforming it into new variables (principal components).  
*Example*: Reducing 100 features to 2 principal components in a dataset.

## Precision
The proportion of true positives among all predicted positives.  
*Example*: In spam detection, precision measures how many of the emails predicted as spam are actually spam.

## P-value
A statistical measure that helps determine whether an observed effect is statistically significant.  
*Example*: A p-value less than 0.05 typically indicates strong evidence against the null hypothesis.

---

# Q
## QQ-Plot
A plot to compare the quantiles of a dataset to a theoretical distribution to assess if the data follows that distribution.  
*Example*: Checking if data is normally distributed using a QQ-plot.

## QR Decomposition
A matrix factorization technique that decomposes a matrix into an orthogonal matrix \( Q \) and an upper triangular matrix \( R \).  
*Example*: Used in solving linear systems of equations more efficiently.

---

# R
## Random Forest
An ensemble learning method that combines multiple decision trees to improve the accuracy of predictions.  
*Example*: Used in classification tasks like determining whether a patient has a disease based on multiple health indicators.

## Recall
The proportion of true positives identified out of all actual positives.  
*Example*: In a cancer detection model, recall measures how many actual cancer cases are correctly identified.

## ROC Curve
A plot that shows the performance of a classification model at all classification thresholds by plotting the True Positive Rate against the False Positive Rate.  
*Example*: A model with a ROC curve closer to the top-left corner has a better performance.

---

# S
## SVM (Support Vector Machine)
A classification algorithm that finds the hyperplane that best separates different classes.  
*Example*: Used to classify emails into spam or non-spam categories.

## Standardisation
The process of scaling data so that it has a mean of 0 and a standard deviation of 1.  
*Example*: Standardizing student scores to compare performance across different subjects.

## Sampling
The technique of selecting a subset of data points from a larger dataset to analyze and model.  
*Example*: Taking a sample of 100 students from a school to estimate average test scores.

---

# T
## t-SNE (t-distributed Stochastic Neighbor Embedding)
A technique for reducing the dimensionality of data, often used for visualizing high-dimensional data.  
*Example*: Visualizing clusters of customers based on purchasing behavior.

## T-distribution
A probability distribution used when estimating population parameters from small sample sizes.  
*Example*: Used to calculate confidence intervals when the sample size is small.

## Type I/Type II Error
Type I error occurs when a true null hypothesis is rejected, while Type II error occurs when a false null hypothesis is not rejected.  
*Example*: Type I error is like convicting an innocent person; Type II error is like letting a guilty person go free.

---

# U
## Underfitting
A model that is too simple to capture the underlying patterns in the data, resulting in poor performance.  
*Example*: A linear model trying to fit non-linear data might underfit and produce inaccurate predictions.

## UMAP (Uniform Manifold Approximation and Projection)
A technique for reducing the dimensionality of data, similar to t-SNE but often faster and more scalable.  
*Example*: Used to visualize high-dimensional datasets like gene expression data.

## Uniform Distribution
A probability distribution where all outcomes are equally likely.  
*Example*: Rolling a fair six-sided die produces a uniform distribution because each number has an equal chance of being rolled.

---

# V
## Variance
A measure of how spread out the values in a dataset are from the mean.  
*Example*: In predicting house prices, variance indicates how much house prices vary in a given neighborhood.

## Validation Curve
A plot used to show how a model's performance changes with respect to changes in a hyperparameter.  
*Example*: A validation curve shows the impact of the regularization parameter on model accuracy.

## Vanishing Gradient
A problem in deep learning where gradients become too small during backpropagation, making it difficult for the model to learn.  
*Example*: Occurs in very deep neural networks, making it hard for earlier layers to learn.

---

# W
## Word Embedding
A technique used to represent words as vectors in a continuous vector space, allowing models to understand relationships between words.  
*Example*: The word "king" might be closer to "queen" than "apple" in vector space.

## Word Cloud
A visual representation of text data where the size of each word indicates its frequency or importance.  
*Example*: A word cloud can show the most frequently used words in a collection of emails.

## Weights
The parameters in a machine learning model that are adjusted during training to minimize the loss function.  
*Example*: In linear regression, the weights determine the slope and intercept of the line.

---

# X
## XGBoost
An optimized implementation of gradient boosting used for supervised learning tasks like classification and regression.  
*Example*: XGBoost is commonly used in data science competitions for its speed and accuracy.

## XLNet
A transformer-based deep learning model that builds upon BERT and achieves state-of-the-art results in many NLP tasks.  
*Example*: XLNet is used for tasks like text classification and language generation.

---

# Y
## YOLO (You Only Look Once)
A real-time object detection system that predicts bounding boxes and class probabilities for images in a single evaluation.  
*Example*: YOLO can detect and classify objects in an image, like cars and pedestrians, in real time.

## Yellowbrick
A Python library that provides visualizations for evaluating machine learning models.  
*Example*: Yellowbrick can be used to plot the ROC curve or visualize the learning curve of a model.

---

# Z
## Z-score
A measure of how many standard deviations a data point is from the mean.  
*Example*: A Z-score of 2 means the data point is two standard deviations above the mean.

## Z-test
A statistical test used to determine whether there is a significant difference between sample and population means when the population variance is known.  
*Example*: A Z-test could be used to compare the average test scores of students to a national average.

## Zero-shot learning
A machine learning paradigm where the model learns to recognize objects it has never seen before based on descriptions or relationships with known objects.  
*Example*: A zero-shot learning model might recognize a zebra based on its description, even though it has only seen horses during training.
