## What are different types of Sampling Techniques:

Sampling techniques are used in statistics to select a subset (sample) from a larger population for the purpose of making inferences about that population. 

Different types of sampling techniques with examples and explanations of when to use each:

#### 1. **Simple Random Sampling (SRS):**
   
   
   - **Description:** In SRS, each member of the population has an equal chance of being selected, and selection is done without any bias.
   
   
   - **Example:** To study the average income of citizens in a city, you assign a unique number to each citizen and use a random number generator to select a sample.
   
   
   - **When to Use:** When you want a representative sample from a homogeneous population, especially when population data is readily available.



#### 2. **Stratified Sampling:**
   
   
   - **Description:** The population is divided into distinct subgroups or strata, and then samples are randomly selected from each stratum.
   
   
   - **Example:** When studying educational achievement in a country, you might divide the population into strata based on grade levels (e.g., elementary, middle, high school) and sample from each stratum.
   
   
   - **When to Use:** When the population has clear subgroups, and you want to ensure representation from each subgroup.



#### 3. **Systematic Sampling:**
   
   
   - **Description:** Every nth item from a list or sequence is selected as part of the sample.
   
   
   - **Example:** In a factory, you might select every 20th product off the assembly line for quality control checks.
   
   
   - **When to Use:** When there's a natural order or sequence to the population, and you want a systematic and efficient method of sampling.



#### 4. **Cluster Sampling:**
   
   
   - **Description:** The population is divided into clusters or groups, and a random sample of clusters is selected. Then, all members within the selected clusters are included in the sample.
   
   
   - **Example:** In a nationwide health survey, you randomly select a few counties, and then survey all households within those counties.
   
   
   - **When to Use:** When it's impractical or costly to survey the entire population, but clusters can be sampled more easily.







#### 5. **Random Sampling with Replacement vs. Without Replacement:**

   
   - **Description:** In random sampling with replacement, each selected item is returned to the population before the next selection; without replacement means items are not returned.
    
   
   - **Example:** Drawing cards from a deck with or without replacement.
    
   
   - **When to Use:** With replacement is used when items are homogeneous and could be selected multiple times; without replacement is used when each item can be selected only once.


Certainly! Here are 50 commonly asked questions related to statistics for a data scientist or machine learning engineer interview, along with brief answers:

**1. What is statistics?**
   - **Answer:** Statistics is the study of collecting, organizing, analyzing, interpreting, and presenting data to make informed decisions.

**2. Explain the difference between population and sample.**
   - **Answer:** The population is the entire group of interest, while a sample is a subset of the population used for analysis.

**3. What are descriptive and inferential statistics?**
   - **Answer:** Descriptive statistics summarize and describe data, while inferential statistics make predictions or inferences about populations based on sample data.

**4. Define mean, median, and mode.**
   - **Answer:** Mean is the average of a dataset, median is the middle value when data is sorted, and mode is the most frequently occurring value.

**5. What is standard deviation?**
   - **Answer:** Standard deviation measures the spread or variability of data points from the mean.

**6. Explain the concept of variance.**
   - **Answer:** Variance quantifies how data points deviate from the mean, calculated as the average of squared deviations.

**7. What is a normal distribution?**
   - **Answer:** A normal distribution is a symmetric, bell-shaped probability distribution with a well-defined mean and standard deviation.

**8. What is the Central Limit Theorem?**
   - **Answer:** The Central Limit Theorem states that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases, regardless of the population's distribution.

**9. Describe Type I and Type II errors in hypothesis testing.**
   - **Answer:** Type I error occurs when you reject a true null hypothesis, and Type II error occurs when you fail to reject a false null hypothesis.

**10. What is p-value in hypothesis testing?**
    - **Answer:** The p-value is the probability of observing a test statistic as extreme as or more extreme than what was observed, assuming the null hypothesis is true. A smaller p-value suggests stronger evidence against the null hypothesis.

**11. What is the difference between correlation and causation?**
    - **Answer:** Correlation indicates a statistical relationship between two variables, while causation implies that one variable directly affects the other.

**12. Define statistical power.**
    - **Answer:** Statistical power is the probability of correctly rejecting a false null hypothesis (Type II error). It measures the test's ability to detect a true effect.

**13. What is the purpose of a confidence interval?**
    - **Answer:** A confidence interval provides a range of values within which a population parameter is likely to fall with a certain level of confidence.

**14. Explain overfitting in machine learning.**
    - **Answer:** Overfitting occurs when a model learns the noise in the training data rather than the underlying patterns, resulting in poor performance on new data.

**15. What is cross-validation, and why is it important?**
    - **Answer:** Cross-validation is a technique for assessing a model's generalization performance by partitioning the data into training and testing sets multiple times. It helps detect overfitting and provides a more robust evaluation of the model.

**16. What is the bias-variance trade-off in machine learning?**
    - **Answer:** The bias-variance trade-off refers to the balance between a model's ability to fit the training data (low bias) and its ability to generalize to new data (low variance). Increasing model complexity reduces bias but increases variance.

**17. Explain the ROC curve and AUC in binary classification.**
    - **Answer:** The ROC (Receiver Operating Characteristic) curve plots the true positive rate against the false positive rate at various thresholds. AUC (Area Under the Curve) measures the area under the ROC curve and quantifies a model's discrimination ability.

**18. What is the purpose of regularization in machine learning?**
    - **Answer:** Regularization techniques (e.g., L1 and L2 regularization) are used to prevent overfitting by adding a penalty term to the model's loss function, discouraging large coefficients.

**19. What are precision and recall in classification?**
    - **Answer:** Precision is the ratio of true positives to the total predicted positives, while recall is the ratio of true positives to the total actual positives. They are used to evaluate a model's performance on imbalanced datasets.

**20. Explain the bias-variance decomposition of mean squared error (MSE).**
    - **Answer:** The MSE can be decomposed into three components: bias^2, variance, and irreducible error. This decomposition illustrates how errors arise from bias (model simplification), variance (model complexity), and inherent noise.

**21. What is the curse of dimensionality?**
    - **Answer:** The curse of dimensionality refers to the challenges that arise when working with high-dimensional data, such as increased computational complexity and the need for more data to maintain model generalization.

**22. What is feature engineering in machine learning?**
    - **Answer:** Feature engineering involves creating, selecting, or transforming input variables (features) to improve a model's performance.

**23. Describe k-fold cross-validation.**
    - **Answer:** K-fold cross-validation partitions the dataset into k subsets (folds). The model is trained on k-1 folds and tested on the remaining fold, repeated k times. It provides a more reliable estimate of model performance than a single train-test split.

**24. What is a confusion matrix, and how is it used?**
    - **Answer:** A confusion matrix is a table that summarizes the performance of a classification model, showing true positives, true negatives, false positives, and false negatives. It's used to calculate various evaluation metrics like accuracy, precision, recall, and F1-score.

**25. What is the bias of an estimator?**
    - **Answer:** The bias of an estimator measures the systematic error between the expected value of the estimator and the true population parameter it is estimating. An estimator is unbiased if its expected value equals the true parameter value.

**26. Explain the law of large numbers.**
    - **Answer:** The law of large numbers states that as the sample size increases, the sample mean approaches the population mean. In other words, with a sufficiently large sample, sample statistics become more reliable estimates of population parameters.

**27. What is the difference between a hypothesis test and a confidence interval?**
    - **Answer:** A hypothesis test assesses the significance of a specific hypothesis about a population parameter, while a confidence interval provides a range of values for the parameter without making a specific hypothesis.

**28. What is multicollinearity in regression analysis?**
    - **Answer:** Multicollinearity occurs when two or more independent variables in a regression model are highly correlated, making it challenging to distinguish their individual effects on the dependent variable.

**29. Define A/B testing.**
    - **Answer:** A/B testing, also known as split testing, is an experimental method where two or more versions (A and B) of a web page, app, or product are compared to determine which one performs better in terms of user engagement or other metrics.

**30. What is the purpose of a t-test?**
    - **Answer:** A t-test is used to compare the means of two groups and determine if there is a statistically significant difference between them.

**

31. Explain the concept of outliers.**
    - **Answer:** Outliers are data points that significantly differ from the rest of the data. They can skew statistical analysis and should be carefully examined for validity.

**32. What is the difference between correlation and covariance?**
    - **Answer:** Correlation is a standardized measure of the strength and direction of a linear relationship between two variables, while covariance measures the degree to which two variables change together.

**33. What is a box plot, and what information does it provide?**
    - **Answer:** A box plot displays the distribution of a dataset by showing the median, quartiles, and potential outliers. It provides a visual summary of the data's central tendency and spread.

**34. Explain the purpose of feature scaling in machine learning.**
    - **Answer:** Feature scaling standardizes or normalizes input features to ensure that they have similar scales. It helps algorithms converge faster and perform better, especially for methods sensitive to feature scales (e.g., gradient descent).

**35. What is the Bayes' theorem, and how is it used in statistics?**
    - **Answer:** Bayes' theorem is a formula used to update the probability for a hypothesis based on new evidence. In statistics, it is commonly used in Bayesian inference and Bayesian machine learning.

**36. What is bootstrapping in statistics?**
    - **Answer:** Bootstrapping is a resampling technique that repeatedly samples from a dataset with replacement to estimate population parameters or assess the sampling distribution of a statistic.

**37. Define cross-correlation and autocorrelation.**
    - **Answer:** Cross-correlation measures the similarity between two different time series, while autocorrelation measures the similarity between a time series and a lagged version of itself.

**38. Explain the concept of skewness in a probability distribution.**
    - **Answer:** Skewness quantifies the asymmetry of a probability distribution. A positive skew indicates a longer tail on the right, while a negative skew indicates a longer tail on the left.

**39. What is the Kolmogorov-Smirnov test used for?**
    - **Answer:** The Kolmogorov-Smirnov test is a non-parametric test used to compare the distribution of a sample to a known distribution or to compare two sample distributions for similarity.

**40. What is the purpose of outlier detection techniques like the Z-score and IQR method?**
    - **Answer:** Outlier detection techniques help identify and remove or handle extreme values in data that can adversely affect statistical analysis or machine learning models.

**41. What is the difference between time-series data and cross-sectional data?**
    - **Answer:** Time-series data is collected over time at specific intervals, while cross-sectional data is collected from multiple subjects or entities at a single point in time.

**42. Explain the difference between parametric and non-parametric statistics.**
    - **Answer:** Parametric statistics assume specific properties of the underlying data distribution, while non-parametric statistics make fewer assumptions about the distribution.

**43. What is the purpose of hypothesis testing in statistics?**
    - **Answer:** Hypothesis testing is used to make decisions about population parameters based on sample data, helping to assess the significance of relationships or differences.

**44. What are the key assumptions of linear regression?**
    - **Answer:** Linear regression assumes that the relationship between the dependent variable and independent variables is linear, errors are normally distributed, and the residuals have constant variance (homoscedasticity).

**45. What is a correlation matrix, and how is it useful?**
    - **Answer:** A correlation matrix displays pairwise correlations between variables. It is useful for identifying relationships between variables and identifying multicollinearity in regression analysis.

**46. Explain the concept of statistical power and its importance.**
    - **Answer:** Statistical power is the probability of correctly rejecting a false null hypothesis. It's important because it helps ensure that a study can detect true effects when they exist.

**47. What is the purpose of a chi-squared test, and when is it used?**
    - **Answer:** A chi-squared test is used to determine if there is a significant association between two categorical variables in a contingency table.

**48. Describe the concept of entropy in information theory.**
    - **Answer:** Entropy measures the uncertainty or disorder in a random variable. In information theory, it quantifies the average amount of information contained in a message or dataset.

**49. What is the difference between supervised and unsupervised learning in machine learning?**
    - **Answer:** In supervised learning, the model is trained on labeled data, while in unsupervised learning, the model finds patterns or structures in unlabeled data.

**50. Explain the concept of bias in machine learning models.**
    - **Answer:** Bias in machine learning models occurs when the model consistently makes errors in predictions due to systematic inaccuracies or assumptions in the model. It can lead to unfair or discriminatory outcomes.

These questions cover a range of statistical concepts commonly encountered in data science and machine learning roles. Be prepared to discuss these topics in-depth and provide examples and practical applications during your interview.

## Difference between z-statistics, z-score and z-test.

### NOTE : Z-statistics, Z-test statistic and Z-value is the same thing

Certainly! Let's break down the differences between Z-statistics, Z-scores, and Z-tests in simple terms with examples:

**1. Z-Statistic:**
- **What it is:** A Z-statistic is a single number that describes how far a particular data point is from the mean of a data set in terms of standard deviations.


- **Example:** Suppose you have a test score of 85, and the average (mean) test score is 70, with a standard deviation of 10. The Z-statistic for your score would be:
$$Z=\frac{x-\mu}{\sigma}$$
   Z = (85 - 70) / 10 = 1.5

   This means your score is 1.5 standard deviations above the mean.


- __Z-statistics are often used in hypothesis testing to make decisions about population parameters or sample statistics. They help assess whether observed differences are statistically significant.__



**2. Z-Score:**


- **What it is:** A Z-score is also a number that tells you how far a data point is from the mean, but it's typically used to standardize data from different distributions to a common scale (mean of 0 and standard deviation of 1).


- **Example:** Imagine you have two classes with different grading systems, and you want to compare the performance of students in both classes. In Class A, the mean score is 75, and the standard deviation is 5. In Class B, the mean score is 85, and the standard deviation is 10. To compare the performance, you can calculate Z-scores for a score of 80 in both classes:

   Z_A = (80 - 75) / 5 = 1
   Z_B = (80 - 85) / 10 = -0.5

   Now, you have standardized the scores, and you can easily see that the student in Class A scored 1 standard deviation above the class average, while the student in Class B scored 0.5 standard deviations below the class average.
   
   
- __Z-scores are used for standardization and comparison of data points from different distributions. They tell you how many standard deviations a data point is away from the mean of its own distribution.__



**3. Z-Test:**
- **What it is:** A Z-test is a statistical hypothesis test that uses Z-statistics to determine if there is a significant difference between a sample statistic and a population parameter or between two sample statistics. It's used to make decisions based on data.

$$Z = \frac{\bar X - \mu}{\frac{\sigma}{\sqrt n}}$$
- **Example:** Let's say you work at a chocolate factory, and the company claims that the average weight of a chocolate bar is 50 grams. You randomly sample 30 chocolate bars, weigh them, and find that the average weight of your sample is 48 grams with a standard deviation of 3 grams. To test if this difference is significant, you can perform a Z-test:

   Z = (48 - 50) / (3/√30) ≈ -2.74

   You compare this Z-statistic to a critical value or a significance level to determine if the difference in weight is statistically significant. If it is, you may conclude that the company's claim of a 50-gram average weight is not supported by the sample data.

In summary, a Z-statistic is a measure of how far a data point is from a mean in terms of standard deviations, a Z-score standardizes data for comparison, and a Z-test is a statistical test that uses Z-statistics to make decisions about population parameters or sample statistics.

# CHATGPT



## **Probability and Distributions**
### **1. What is the difference between a Probability Mass Function (PMF) and a Probability Density Function (PDF)?**  
- **PMF**: Used for **discrete** random variables. It gives the probability of a specific outcome.  
- **PDF**: Used for **continuous** random variables. The probability of any specific value is **zero**; instead, probability is calculated over an interval using integration.

---

### **2. What is the Central Limit Theorem (CLT) and why is it important?**  
- The **CLT states** that the sampling distribution of the mean of a large number of independent, identically distributed (i.i.d) random variables approaches a **normal distribution**, regardless of the original distribution.  
- **Importance**:  
  - Enables **hypothesis testing** and **confidence intervals**.  
  - Justifies the use of **Gaussian-based models**.  

---

### **3. What is the difference between Parametric and Non-Parametric tests?**  
| Feature  | Parametric  | Non-Parametric  |
|----------|------------|----------------|
| Assumptions | Data follows a specific distribution (e.g., Normal) | No strict distribution assumptions |
| Examples | t-test, ANOVA | Wilcoxon test, Kruskal-Wallis |
| Data Type | Interval, Ratio | Ordinal, Nominal |

---

### **4. What is the Kolmogorov-Smirnov test?**  
- A **non-parametric** test used to compare:  
  1. A sample with a **known distribution**.  
  2. Two independent samples to check if they come from the same distribution.  
- **Uses:**  
  - Checking **normality** of data.  
  - Comparing observed vs. expected distributions.

---

### **5. What is the relationship between Mean, Median, and Mode in skewed distributions?**  
- **Right (positive) skew**: Mean > Median > Mode  
- **Left (negative) skew**: Mode > Median > Mean  
- **Symmetric**: Mean = Median = Mode  

---

### **6. What is the difference between Bayesian and Frequentist statistics?**  
| Approach  | Frequentist  | Bayesian  |
|-----------|-------------|-----------|
| Definition | Probability is **frequency of events** | Probability is **degree of belief** |
| Prior Knowledge | No prior distribution used | Uses prior and updates belief using data |
| Example | A/B testing | Spam detection (prior spam likelihood) |

---

## **Inferential Statistics**
### **7. What is the difference between Type I and Type II errors?**  
- **Type I error (False Positive)**: Rejecting a true null hypothesis (α).  
- **Type II error (False Negative)**: Failing to reject a false null hypothesis (β).  
- **Trade-off**: Reducing one increases the other.

---

### **8. What is the difference between T-test and Z-test?**  
| Feature | T-test | Z-test |
|---------|-------|--------|
| Sample Size | Small (n < 30) | Large (n ≥ 30) |
| Population Variance | Unknown | Known |
| Use Case | Comparing means | Testing population mean |

---

### **9. What is the difference between One-tailed and Two-tailed tests?**  
- **One-tailed test**: Tests for effect in one direction only (e.g., **"μ > μ₀" or "μ < μ₀"**).  
- **Two-tailed test**: Tests for effect in both directions (e.g., **"μ ≠ μ₀"**).  

---

### **10. What is an F-test? When do we use it?**  
- Used to compare **variances** of two or more groups.  
- **Examples**:  
  - **ANOVA (Analysis of Variance)**: Tests if means of multiple groups are different.  
  - **Regression**: Checks significance of a model.

---

### **11. What is the Bonferroni correction?**  
- A method to **control false positives** when multiple hypothesis tests are conducted.  
- Adjusts **p-value threshold**:  
  $$
  \alpha_{corrected} = \frac{\alpha}{m}
  $$
  where \( m \) = number of tests.

---

### **12. Explain Bootstrapping and its advantages.**  
- **Bootstrapping**: Resampling with replacement to estimate **sampling distributions**.  
- **Advantages**:  
  - Works without normality assumptions.  
  - Useful for **small datasets**.  
  - Estimates **confidence intervals**.  

---

## **Regression Analysis**
### **13. What is Heteroscedasticity and how do you detect it?**  
- **Unequal variance** in residuals of regression models.  
- **Detection**:  
  - **Breusch-Pagan test**  
  - **White’s test**  
  - Plot residuals vs. fitted values.  
- **Fixes**:  
  - Log transformation  
  - Weighted least squares  

---

### **14. What is Multicollinearity? How do you fix it?**  
- **Highly correlated independent variables** cause instability in regression coefficients.  
- **Detection**:  
  - **Variance Inflation Factor (VIF)** (> 10 is problematic).  
- **Fixes**:  
  - Remove highly correlated features.  
  - Use **Principal Component Analysis (PCA)**.  

---

### **15. What is the difference between R² and Adjusted R²?**  
- **R²**: Measures explained variance but increases with more features.  
- **Adjusted R²**: Penalizes for extra features.  
  \[
  R^2_{adj} = 1 - \frac{(1 - R^2) (n - 1)}{n - k - 1}
  \]
  where \( n \) is sample size, \( k \) is number of predictors.

---

### **16. What is an Odds Ratio in Logistic Regression?**  
- **Odds Ratio (OR)**: Measures effect size of a predictor.  
  \[
  OR = e^{\beta}
  \]
  - If **OR > 1**, predictor increases probability of outcome.  
  - If **OR < 1**, predictor decreases probability.

---

## **Time Series Analysis**
### **17. What is Stationarity in Time Series? How do you test for it?**  
- A time series is **stationary** if its statistical properties (**mean, variance, autocorrelation**) remain constant over time.  
- **Tests**:  
  - **Augmented Dickey-Fuller (ADF) test**  
  - **KPSS test**  

---

### **18. What is Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF)?**  
- **ACF**: Measures correlation of time series with lagged versions.  
- **PACF**: Measures direct correlation after removing indirect correlations.  
- **Use Case**: Identifying AR (PACF) and MA (ACF) terms in ARIMA.

---

### **19. What is an ARIMA model?**  
- **Autoregressive (AR)**: Uses past values.  
- **Integrated (I)**: Differencing to remove trends.  
- **Moving Average (MA)**: Uses past errors.  

---

## **Bayesian Statistics**
### **20. What is Markov Chain Monte Carlo (MCMC)?**  
- A method to **approximate probability distributions** by generating samples.  
- Used in **Bayesian inference** when exact solutions are intractable.

---

## **Miscellaneous**
### **21. What is Jensen’s Inequality?**  
- If \( f \) is a **convex function**, then:
  \[
  f(E[X]) \leq E[f(X)]
  \]
  - **Application**: Log-likelihood in Maximum Likelihood Estimation (MLE).  

---

### **22. Explain the Dirichlet Distribution.**  
- A **prior distribution** for **categorical** probabilities in Bayesian models.  

---

### **23. What is a Wald Test?**  
- Tests if **a regression coefficient** is significantly different from zero.  

---

### **24. What is the likelihood ratio test?**  
- Compares nested models:  
  $$
  \Lambda = \frac{L_0}{L_1}
  $$
  where $L_0$ is likelihood of null model, $L_1$ is full model.  



# GEMINI


Absolutely! Here are 30 challenging statistics questions tailored for a data scientist/ML engineer with 5 years of experience, along with detailed answers:

**1. Hypothesis Testing & Power:**

* **Question:** Explain the concept of statistical power and how it relates to Type II errors. How would you design an A/B test to maximize power while controlling for Type I errors?
    * **Answer:**
        * Statistical power is the probability of correctly rejecting a false null hypothesis. Type II error (beta) is failing to reject a false null hypothesis. Power = 1 - beta.
        * To maximize power:
            * Increase sample size.
            * Increase the effect size.
            * Reduce variability in the data.
            * Set a higher alpha level (but increases Type I error).
        * A/B test design: Use appropriate sample size calculations, minimize variance through stratified sampling, and carefully select the significance level.

**2. Bayesian Inference:**

* **Question:** Describe a scenario where Bayesian inference would be preferred over frequentist methods. Explain how you would construct a Bayesian model for that scenario.
    * **Answer:**
        * When prior knowledge is available or when dealing with small sample sizes.
        * Example: Predicting conversion rates for a new product with limited initial data.
        * Model: Define a prior distribution for the conversion rate, collect data, and update the prior to a posterior distribution using Bayes' theorem.

**3. Time Series Analysis:**

* **Question:** How would you handle non-stationary time series data? Explain the steps involved in building an ARIMA model and how you would evaluate its performance.
    * **Answer:**
        * Non-stationary data: Use differencing, detrending, or seasonal decomposition.
        * ARIMA: Identify p, d, q parameters using ACF and PACF plots.
        * Evaluation: Use metrics like AIC, BIC, RMSE, and perform residual analysis.

**4. Model Evaluation:**

* **Question:** Beyond accuracy, precision, recall, and F1-score, what other metrics are crucial for evaluating models in imbalanced datasets? Explain their significance.
    * **Answer:**
        * AUC-ROC, AUC-PR, sensitivity, specificity, and the Matthews correlation coefficient (MCC).
        * AUC-ROC: Area under the Receiver Operating Characteristic curve.
        * AUC-PR: Area under the precision recall curve.
        * MCC: Considers true and false positives and negatives, robust to class imbalance.

**5. Dimensionality Reduction:**

* **Question:** Explain the differences between PCA and t-SNE. When would you use each technique, and what are their limitations?
    * **Answer:**
        * PCA: Linear dimensionality reduction, maximizes variance.
        * t-SNE: Non-linear, preserves local structure, good for visualization.
        * PCA: Use for linear data, feature compression. Limitations: Only linear transformations.
        * t-SNE: Use for visualization of high-dimensional data. Limitations: Computationally expensive, difficult to interpret global structure.

**6. Causal Inference:**

* **Question:** Explain the concept of confounding variables and how they can affect causal inference. What are some methods for mitigating their impact?
    * **Answer:**
        * Confounding variables: Influence both independent and dependent variables, creating spurious associations.
        * Mitigation:
            * Randomized controlled trials (RCTs).
            * Statistical methods like propensity score matching, instrumental variables, and regression adjustment.

**7. Statistical Distributions:**

* **Question:** Describe the properties of the exponential distribution and its relationship to the Poisson distribution. Provide a real-world example where the exponential distribution is applicable.
    * **Answer:**
        * Exponential distribution: Models the time between events in a Poisson process.
        * Poisson distribution: Models the number of events in a fixed interval.
        * Example: Time between customer arrivals at a service center.

**8. Resampling Techniques:**

* **Question:** Explain the differences between bootstrapping and cross-validation. When would you use each technique, and what are their advantages and disadvantages?
    * **Answer:**
        * Bootstrapping: Sampling with replacement to estimate statistics.
        * Cross-validation: Partitioning data to evaluate model performance.
        * Bootstrapping: Use for estimating confidence intervals, standard errors. Advantages: Robust to non-normality. Disadvantages: Computationally intensive.
        * Cross-validation: Use for model selection, hyperparameter tuning. Advantages: Reduces overfitting. Disadvantages: Can be computationally expensive.

**9. Handling Missing Data:**

* **Question:** Describe different methods for handling missing data and explain the potential biases associated with each method.
    * **Answer:**
        * Methods: deletion, imputation(mean, median, KNN, MICE).
        * Biases: Deletion can introduce bias if data is not MCAR, simple imputation can distort distributions, complex imputation can introduce noise.

**10. Outlier Detection:**

* **Question:** What are some robust methods for outlier detection, and when would you use them?
    * **Answer:**
        * Methods: IQR, Isolation Forest, Local Outlier Factor (LOF).
        * Use robust methods when data is not normally distributed or contains high levels of noise.

**11. Feature Selection:**

* **Question:** Explain the differences between filter, wrapper, and embedded feature selection methods.
    * **Answer:**
        * Filter: Uses statistical measures.
        * Wrapper: Uses model performance.
        * Embedded: Integrated into the model training process.

**12. Central Limit Theorem:**

* **Question:** Explain the Central Limit Theorem and its importance in statistical inference.
    * **Answer:**
        * The distribution of sample means approaches a normal distribution as the sample size increases, regardless of the population distribution.
        * Importance: Allows us to use normal distribution for hypothesis testing and confidence intervals.

**13. Statistical significance.**

* **Question:** Explain what a p-value is, and what are the limitations to relying on p-values alone.
    * **Answer:**
        * A p-value is the probability of obtaining results as extreme as the observed results of a statistical hypothesis test, assuming that the null hypothesis is correct.
        * Limitations: P-values do not indicate the size of an effect, and can be easily misinterpreted.

**14. Multicollinearity.**

* **Question:** What is multicollinearity, and how does it impact regression models? How would you detect and address it?
    * **Answer:**
        * Multicollinearity occurs when independent variables in a regression model are highly correlated.
        * Impacts: Unstable coefficient estimates, reduced statistical significance.
        * Detection: Variance inflation factor (VIF).
        * Address: Remove redundant variables, or use regularization.

**15. Non-Parametric tests.**

* **Question:** When would you use non-parametric statistical tests, and what are some examples?
    * **Answer:**
        * When data does not meet assumptions of parametric tests (non-normal, small sample size).
        * Examples: Mann-Whitney U test, Wilcoxon signed-rank test, Kruskal-Wallis test.

**16. Logistical Regression.**

* **Question:** Describe the assumptions of logistic regression, and how to check to see if those assumptions are being met.
    * **Answer:**
        * Linearity of log odds, independence of errors, and lack of strong multicollinearity.
        * Checking: residual plots, VIF scores, and examining relationships between independent variables and log odds.

**17. Confidence Intervals.**

* **Question:** Explain the meaning of a confidence interval, and how it differs from a prediction interval.
    * **Answer:**
        * Confidence interval: Range of plausible values for a population parameter.
        * Prediction interval: Range of plausible values for a single observation.
        * Prediction intervals are wider than confidence intervals.

**18. Bias Variance Tradeoff.**

* **Question:** Explain the bias-variance tradeoff in machine learning. How does it relate to model complexity?
    * **Answer:**
        * Bias: Error from incorrect assumptions.
        * Variance: Error from sensitivity to small fluctuations in training data.
        * High model complexity reduces bias but increases variance.

**19. Law of Large Numbers.**

* **Question:** Describe the law of large numbers, and how it is used in statistical inference.
    * **Answer:**
        * As the sample size increases, the sample mean converges to the population mean.
        * Used to justify using sample statistics to estimate population parameters.

**20. Sampling Methods.**

* **Question:** What are the various types of sampling methods and when is each appropriate?
    * **Answer:**
        * Random, stratified, cluster, systematic. Each is appropiate for different data sets and desired levels of statistical accuracy.

**21. Feature Engineering.**

* **Question:** How do you determine what feature engineering techniques to use on a given dataset?
    * **Answer:**
        

# MAIN

## What is the difference between qualitative and quantitative data?

Answer:

Quantitative data: Numeric, measurable, and analyzed statistically (e.g., age, income). Types include discrete and continuous data.


Qualitative data: Descriptive, non-numeric, and analyzed by grouping (e.g., gender, marital status). Types include nominal and ordinal data.

## How do you calculate the coefficient of variation (CV)?

![Screenshot%202025-04-08%20192358.png](attachment:Screenshot%202025-04-08%20192358.png)

It’s useful for comparing variability between datasets with different units or means.

## Explain the relationship between confidence level and significance level.

Answer:



Confidence Level (1 − α): Probability that the confidence interval contains the true population parameter.



Significance Level (α): Probability of rejecting a true null hypothesis (Type I error).


Example: At a 95% confidence level, the significance level is 0.05.

## What is multicollinearity, and how do you detect it?

Answer:

Multicollinearity: Occurs when independent variables in a regression model are highly correlated, making it difficult to isolate their individual effects.

__Detection :__


Variance Inflation Factor (VIF): VIF > 5 or 10 indicates high multicollinearity.


Correlation Matrix: High pairwise correlations (e.g., > 0.8) among independent variables.


Condition Index: Values > 30 indicate severe multicollinearity.

## What are some common methods to address multicollinearity?

Answer:

Remove highly correlated predictors.


Use dimensionality reduction techniques like PCA.


Apply regularization techniques (e.g., Lasso, Ridge regression).

## Question 1: What is a p-value?

 
Answer: Given that the null hypothesis is true, a p-value is the probability that you would see a result at least as extreme as the one observed.

P-values are typically calculated to determine whether the result of a statistical test is significant. In simple words, the p-value tells us whether there is enough evidence to reject the null hypothesis.

## Question 2: Explain the concept of statistical power

 
Answer: If you were to run a statistical test to detect whether an effect is present, statistical power is the probability that the test will accurately detect the effect.

Here is a simple example to explain this:

Let’s say we run an ad for a test group of 100 people and get 80 conversions.

The null hypothesis is that the ad had no effect on the number of conversions. In reality, however, the ad did have a significant impact on the amount of sales.

Statistical power is the probability that you would accurately reject the null hypothesis and actually detect the effect. A higher statistical power indicates that the test is better able to detect an effect if there is one.

## Question 3: How would you describe confidence intervals to a non-technical stakeholder?

 
Let’s use the same example as before, in which an ad is run for a sample size of 100 people and 80 conversions are obtained.

Instead of saying that the conversion rate is 80%, we would provide a range, since we don’t know how the true population would behave. In other words, if we were to take an infinite number of samples, how many conversions would we see?

Here is an example of what we might say solely based on the data obtained from our sample:

“If we were to run this ad for a larger group of people, we are 95% confident that the conversion rate will fall anywhere between 75% to 88%.”

We use this range because we don’t know how the total population will react, and can only generate an estimate based on our test group, which is just a sample.

## Question 4: What is the difference between a parametric and non-parametric test?

Parametric and non-parametric tests are two broad categories of statistical tests used to analyze data, depending on the characteristics and assumptions about the data.

---

### 🔹 **Parametric Tests**
**Definition**: Parametric tests assume that the data follows a certain distribution (typically normal distribution) and relies on parameters such as the mean and standard deviation.

**Key Assumptions**:
- Data is normally distributed.
- Homogeneity of variance (equal variances in groups).
- Data is measured on an interval or ratio scale.
- Sample size is reasonably large.

**Examples of Parametric Tests**:

| Test | Purpose |
|------|---------|
| **t-test (one-sample, two-sample, paired)** | Compare means between groups |
| **ANOVA (Analysis of Variance)** | Compare means of 3 or more groups |
| **Pearson correlation** | Measure linear relationship between two variables |
| **Linear regression** | Model relationship between dependent and independent variables |

---

### 🔹 **Non-Parametric Tests**
**Definition**: Non-parametric tests **do not assume** a specific distribution for the data. They are more flexible and can be used when parametric assumptions are violated.

**When to Use**:


- Data is ordinal, nominal, or not normally distributed.


- Small sample size.


- Outliers are present.

**Examples of Non-Parametric Tests**:


| Test | Purpose |
|------|---------|
| **Mann-Whitney U test** | Compare medians between two independent groups (non-parametric alternative to two-sample t-test) |
| **Wilcoxon signed-rank test** | Compare medians of two related samples (non-parametric alternative to paired t-test) |
| **Kruskal-Wallis test** | Compare medians of 3 or more groups (non-parametric alternative to ANOVA) |
| **Spearman’s rank correlation** | Assess monotonic relationship between two variables |
| **Chi-square test** | Test relationships between categorical variables |

---

### 🔄 Summary Table:

| Criteria | Parametric | Non-Parametric |
|----------|------------|----------------|
| Assumes normal distribution | ✅ Yes | ❌ No |
| Based on parameters (mean, SD) | ✅ Yes | ❌ No |
| Uses raw data | ✅ Yes | 🔁 Often uses ranks |
| Robust to outliers | ❌ No | ✅ Yes |
| More powerful (if assumptions met) | ✅ Yes | ❌ Less powerful |

---

Let me know if you’d like a cheat sheet or want help selecting a test for a dataset!

## Question 5: What is the difference between covariance and correlation?

 
Covariance measures the direction of the linear relationship between variables. Correlation measures the strength and direction of this relationship.

While both correlation and covariance give you similar information about feature relationship, the main difference between them is scale.

Correlation ranges between -1 and +1. It is standardized, and easily allows you to understand whether there is a positive or negative relationship between features and how strong this effect is. On the other hand, covariance is displayed in the same units as the dependent and independent variables, which can make it slightly harder to interpret.

## Question 6: How would you analyze and handle outliers in a dataset?

 
There are a few ways to detect outliers in the dataset.

**Visual methods :** Outliers can be visually identified using charts like boxplots and scatterplots Points that are outside the whiskers of a boxplot are typically outliers. When using scatterplots, outliers can be detected as points that are far away from other data points in the visualization.


**Non-visual methods (Z-score):** One non-visual technique to detect outliers is the Z-Score. Z-Scores are computed by subtracting a value from the mean and dividing it by the standard deviation. This tells us how many standard deviations away from the mean a value is. Values that are above or below 3 standard deviations from the mean are considered outliers.


**Interquartile range (IQR) :** Calculate the IQR, which represents the range of values within the middle 50% of the dataset, and identify data points outside this range.

## Question 7: Differentiate between a one-tailed and two-tailed test.

 
A one-tailed test checks whether there is a relationship or effect in a single direction. For example, after running an ad, you can use a one-tailed test to check for a positive impact, i.e. an increase in sales. This is a right-tailed test.

A two-tailed test examines the possibility of a relationship in both directions. For instance, if a new teaching style has been implemented in all public schools, a two-tailed test would assess whether there is a significant increase or decrease in scores.

## Question 8: Given the following scenario, which statistical test would you choose to implement?

 
An online retailer want to evaluate the effectiveness of a new ad campaign. They collect daily sales data for 30 days before and after the ad was launched. The company wants to determine if the ad contributed to a significant difference in daily sales.

Options:

A) Chi-squared test


B) Paired t-test


C) One-way ANOVA


d) Independent samples t-test

Answer: To evaluate the effectiveness of a new ad campaign, we should use an paired t-test.
A paired t-test is used to compare the means of two samples and check if a difference is statistically significant.
In this case, we are comparing sales before and after the ad was run, comparing a change in the same group of data, which is why we use a paired t-test instead of an independent samples t-test.
 

## Question 9: What is a Chi-Square test of independence?

 
A Chi-Square test of independence is used to examine the relationship between observed and expected results. The null hypothesis (H0) of this test is that any observed difference between the features is purely due to chance.

In simple terms, this test can help us identify if the relationship between two categorical variables is due to chance, or whether there is a statistically significant association between them.

For example, if you wanted to test whether there was a relationship between gender (Male vs Female) and ice cream flavor preference (Vanilla vs Chocolate), you can use a Chi-Square test of independence.

## Question 10: Explain the concept of regularization in regression models.

 
Regularization is a technique that is used to reduce overfitting by adding extra information to it, allowing models to adapt and generalize better to datasets that they haven't been trained on.

In regression, there are two commonly-used regularization techniques: ridge and lasso regression.

These are models that slightly change the error equation of the regression model by adding a penalty term to it.

In the case of ridge regression, a penalty term is multiplied by the sum of squared coefficients. This means that models with larger coefficients are penalized more. In lasso regression, a penalty term is multiplied by the sum of absolute coefficients.

While the primary objective of both methods is to shrink the size of coefficients while minimizing model error, ridge regression penalizes large coefficients more.

On the other hand, lasso regression applies a constant penalty to each coefficient, which means that coefficients can shrink to zero in some cases.

## What is the Pareto principle?

The Pareto principle, also known as the 80/20 rule, suggests that 80 percent of the effects or results in a given situation are typically generated by 20 percent of the causes. For example, 80 percent of sales come from 20 percent of customers in business.

## What is the Law of Large Numbers in statistics?

The Law of Large Numbers in statistics states that as the number of trials or observations in an experiment increases, the average or expected value of the results will approach the true or expected value. This principle demonstrates the convergence of sample statistics to population parameters with a larger sample size.


As an example, let us check the probability of rolling a six-sided dice three times. The expected value obtained is far from the average value. And if we roll a dice a large number of times, we will obtain the average result closer to the expected value (which is 3.5 in this case). 

## What is the assumption of normality?

It refers to the assumption that the distribution of sample means, particularly across independent samples, follows a normal (bell-shaped) distribution. This assumption is essential for many statistical tests and models.

## What is the meaning of Six Sigma in statistics?

In statistics, Six Sigma refers to a quality control methodology aimed at producing a data set or process that is nearly error-free. It is typically measured in terms of standard deviations (sigma), and a process is considered at the six sigma level when it is 99.99966% error-free, indicating high reliability.

## 11. What is the meaning of KPI in statistics?

KPI stands for key performance indicators in statistics. It is a quantifiable metric to assess whether specific goals or objectives are being achieved. KPIs are crucial for measuring performance in various contexts, such as organizations, projects, or individuals.

## 13. What are some of the properties of a normal distribution?

The normal distribution is also known as the Gaussian distribution. It has key properties, including symmetry, unimodality (a single peak), and the mean, median, and mode, all equal and located at the center. It forms a bell-shaped curve when graphed.

## 14. How would you describe a ‘p-value’?

A p-value is a statistical measure calculated during hypothesis testing. It represents the probability of observing data as extreme as what was obtained in the experiment if the null hypothesis were true. A smaller p-value indicates stronger evidence against the null hypothesis, suggesting that the results are statistically significant.

## Give an example of a dataset with a non-Gaussian distribution.

Bacterial growth is an example of a dataset with a non-Gaussian or exponential distribution. In such datasets, the values are typically skewed to one side of the graph, unlike the symmetrical bell curve of a Gaussian (normal) distribution. Non-Gaussian distributions are common in various real-world processes and phenomena.

## What are the key assumptions necessary for linear regression?

Linear regression relies on several key assumptions:

1. Linearity: The relationship between predictor variables and the outcome variable is linear.


2. Normality: The errors (residuals) are normally distributed.


3. Independence: Residuals are independent of each other, meaning one observation’s error does not affect another’s.


4. Homoscedasticity: The variance of residuals is constant across all levels of predictor variables.


Violations of these assumptions can affect the model’s accuracy and reliability.

## When should you opt for a t-test instead of a z-test in statistical hypothesis testing?

You should choose a t-test for a small sample size (n<30). It can also be used when the population standard deviation is unknown. 


A z-test is appropriate for more extensive samples (n>30). It is used when the population standard deviation is known. The t-test uses the t-distribution, which accounts for the more significant uncertainty in smaller samples.

## Describe the difference between low and high-bias Machine Learning algorithms.

Low-bias machine learning algorithms, such as decision trees and k-nearest Neighbors, have the flexibility to capture complex patterns in data. Preconceived notions less constrain them and can fit the data closely.



In contrast, high-bias algorithms like Linear Regression and Logistic Regression have simpler models and make stronger assumptions. They may not fit the data as closely but are less prone to overfitting small variations in the data.

## What is cherry-picking, P-hacking, and the practice of significance chasing in statistics?

Cherry-picking is the selective presentation of data that supports a specific claim while ignoring contradictory data.


P-hacking involves manipulating data analysis to find statistically significant patterns even when no real effect exists.


Significance chasing, also known as Data Dredging or Data Snooping, involves presenting insignificant results as if they are almost significant, potentially leading to misleading conclusions.

## Explain the distinction between type I and type II errors in hypothesis testing.

Type I error occurs when the null hypothesis is incorrectly rejected, suggesting an effect exists when it doesn’t (false positive). Type II error occurs when the null hypothesis is incorrectly accepted, failing to detect a real effect (false negative). These errors affect the accuracy of statistical tests and decision-making in hypothesis testing.

## What are the differences between Type II error and statistical power?

Answer:

Type II Error (β) Probability of failing to reject a false null hypothesis.


Statistical Power (1−β): Probability of correctly rejecting a false null hypothesis. Higher power reduces the likelihood of Type II errors.

## Explain the concept of degrees of freedom (DF) in statistics.

Degrees of freedom (DF) in statistics represent the number of options or variables available to analyze a problem. It’s a critical concept used primarily with the t-distribution and less commonly with the z-distribution.

An increase in degrees of freedom allows the t-distribution to approximate the normal distribution more closely. When DF exceeds 30, the t-distribution closely resembles a normal distribution. In essence, degrees of freedom determine the flexibility of statistical analysis and the shape of the distribution.

## What are some of the characteristics of a normal distribution?

A normal distribution, often called a bell-shaped curve, possesses several key properties:

Unimodal: It has only one mode or peak.
Symmetrical: The left and right halves mirror each other.
Central tendency: The mean, median, and mode are all centered at the midpoint of the distribution.

## Define sensitivity in the context of statistics.

Sensitivity, often used in the context of classification models such as logistic regression or random forests, measures the accuracy of a model in identifying true positive events. It is calculated as the ratio of correctly predicted true events to the total number of actual true events. Sensitivity helps assess a model’s ability to identify positive cases correctly, which is crucial in various fields like healthcare for disease diagnosis.

## What’s the advantage of using box plots?

Box plots concisely represent the 5-number summary (minimum, 1st quartile, median, 3rd quartile, maximum). It also facilitates easy comparison between data groups or distributions, enhancing data analysis and visualization.

## List some examples of low and high-bias machine learning algorithms.

Low-bias machine learning algorithms have greater flexibility to capture complex patterns and include decision trees, k-nearest Neighbors, and support vector machines. High-bias algorithms, like Linear Regression and Logistic Regression, make stronger assumptions and have simpler models, making them less prone to overfitting but potentially missing nuanced relationships in data.

## When would the middle value be better than the average value?

When some values are too high or too low and can change the data a lot, the middle value is better because it can show the data more accurately.

## How can you use root cause analysis in real life?

Root cause analysis is a way of finding the main cause of a problem by asking why it happened. Examples: You might see that more crimes happen in a city when more red shirts are sold. But this does not mean that one causes the other. You can always use different ways to check if something causes something else.

## How are confidence tests and hypothesis tests similar? How are they different?

Confidence tests and hypothesis tests both form the foundation of statistics. 

The confidence interval holds importance in research to offer a strong base for research estimations, especially in medical research. The confidence interval provides a range of values that helps in capturing the unknown parameter. 

Hypothesis testing is used to test an experiment or observation and determine if the results did not occur purely by chance or luck using the below formula where ‘p’ is some parameter. 

Confidence and hypothesis testing are inferential techniques used to either estimate a parameter or test the validity of a hypothesis using a sample of data from that data set. While confidence interval provides a range of values for an accurate estimation of the precision of that parameter, hypothesis testing tells us how confident we are inaccurately drawing conclusions about a parameter from a sample. Both can be used to infer population parameters in tandem. 

In case we include 0 in the confidence interval, it indicates that the sample and population have no difference. If we get a p-value that is higher than alpha from hypothesis testing, it means that we will fail to reject the bull hypothesis.

## What is the relationship between mean and median in normal distribution?

In a normal distribution, the mean and the median are equal.

## What is the relationship between standard error and margin of error?

Margin of error = Critical value X Standard deviation for the population 


and


Margin of error = Critical value X Standard error of the sample.



The margin of error will increase with the standard error. 

## What is the Kolmogorov-Smirnov test?

Answer:

A non-parametric test that compares a sample distribution with a reference distribution or two sample distributions to check for differences.


Used to test for normality or distribution equality.

## What does it mean if a model is heteroscedastic?

A model is said to be heteroscedastic when the variation in errors comes out to be inconsistent. It often occurs in two forms – conditional and unconditional.

## What is selection bias and why is it important?

Selection bias is a term in statistics used to denote the situation when selected individuals or a group within a study differ in a manner from the population of interest that they give systematic error in the outcome.

Typically selection bias can be identified using bivariate tests apart from using other methods of multiple regression such as logistic regression.

It is crucial to understand and identify selection bias to avoid skewing results in a study. Selection bias can lead to false insights about a particular population group in a study.

Different types of selection bias include –

1. Sampling bias – It is often caused by non-random sampling. The best way to overcome this is by drawing from a sample that is not self-selecting.


2. Participant attrition – The dropout rate of participants from a study constitutes participant attrition. It can be avoided by following up with the participants who dropped off to determine if the attrition is due to the presence of a common factor between participants or something else.


3. Exposure – It occurs due to the incorrect assessment or the lack of internal validity between exposure and effect in a population.


4. Data – It includes dredging of data and cherry-picking and occurs when a large number of variables are present in the data causing even bogus results to appear significant. 


5. Time-interval – It is a sampling error that occurs when observations are selected from a certain time period only. For example, analyzing sales during the Christmas season.


6. Observer selection- It is a kind of discrepancy or detection bias that occurs during the observation of a process and dictates that for the data to be observable, it must be compatible with the life that observes it.

## What does autocorrelation mean?

Autocorrelation is a representation of the degree of correlation between the two variables in a given time series. It means that the data is correlated in a way that future outcomes are linked to past outcomes. Autocorrelation makes a model less accurate because even errors follow a sequential pattern. 

## What is Bessel’s correction?

Bessel’s correction advocates the use of n-1 instead of n in the formula of standard deviation. It helps to increase the accuracy of results while analyzing a sample of data to derive more general conclusions.

## Does symmetric distribution need to be unimodal?

Symmetrical distribution does not necessarily need to be unimodal, they can be skewed or asymmetric. They can be bimodal with two peaks or multimodal with multiple peaks. 

## What is the benefit of using box plots?

Boxplot is a visually effective representation of two or more data sets and facilitates quick comparison between a group of histograms.

## What is the meaning of sensitivity in statistics?

Sensitivity refers to the accuracy of a classifier in a test. It can be calculated using the formula –

Sensitivity = Predicted True Events/Total number of Events

## What is the F-statistic in ANOVA?

Answer:

Measures the ratio of variance explained by the model to the variance within groups.


Higher F-statistic indicates a significant difference among group means.

## What is kurtosis?

Kurtosis is a measure of the degree of the extreme values present in one tail of distribution or the peaks of frequency distribution as compared to the others. 


The standard normal distribution has a kurtosis of 3 whereas the values of symmetry and kurtosis between -2 and +2 are considered normal and acceptable. The data sets with a high level of kurtosis imply that there is a presence of outliers. 


One needs to add data or remove outliers to overcome this problem. 



Data sets with low kurtosis levels have light tails and lack outliers.

Types:


1. Mesokurtic: Normal distribution (moderate tails).


2. Leptokurtic: Heavy tails and sharp peak.


3. Platykurtic: Light tails and flat peak.

## What is the Durbin-Watson test?

Answer:

Used in regression to detect autocorrelation in residuals.


Values close to 2 indicate no autocorrelation; values closer to 0 or 4 indicate positive or negative autocorrelation, respectively.

## What is the difference between A/B testing and hypothesis testing?

Answer:

A/B Testing: A specific type of hypothesis testing used to compare two versions (A and B) of a product, webpage, or process to determine which performs better.


Hypothesis Testing: A general statistical framework to test assumptions about a population parameter.

## How do you interpret a QQ-plot?

Answer: A QQ-plot compares the distribution of a dataset with a theoretical distribution (e.g., normal). Points along the diagonal indicate a good fit, while deviations suggest discrepancies.

## Good questions here also (advance level) : https://github.com/youssefHosni/Data-Science-Interview-Questions-Answers/blob/main/Statistics%20Interview%20Questions%20%26%20Answers%20for%20Data%20Scientists.md