## **1. What is Statistics?**

**Statistics** is the science of **collecting, analyzing, interpreting, presenting, and organizing data**. It helps us make decisions under uncertainty by summarizing large volumes of data in meaningful ways.

* **Example:** A company wants to understand customer satisfaction. It collects survey responses and uses statistical methods to summarize, analyze, and interpret the feedback.

**Two main branches of statistics:**

1. **Descriptive Statistics:** Summarizes and describes the features of a dataset.
2. **Inferential Statistics:** Makes predictions or inferences about a population based on a sample.

---

## **2. Types of Data**

Understanding the type of data is crucial for choosing the right statistical method.

1. **Qualitative (Categorical) Data:** Describes attributes or categories.

   * **Nominal:** Categories with no order.
     *Example:* Eye color (blue, brown, green)
   * **Ordinal:** Categories with a meaningful order but not numeric differences.
     *Example:* Ratings (poor, average, good, excellent)

2. **Quantitative (Numerical) Data:** Numeric values.

   * **Discrete:** Countable values.
     *Example:* Number of students in a class
   * **Continuous:** Any value within a range.
     *Example:* Height, weight, temperature

---

## **3. Descriptive Statistics**

Descriptive statistics help us **summarize and describe data**.

### **A. Measures of Central Tendency**

These tell us where the “center” of data lies.

1. **Mean (Average):** Sum of all observations divided by the number of observations.

   $[
   \text{Mean} = \frac{\sum x_i}{n}
   ]$

   *Example:* Scores = [80, 90, 70], Mean = (80+90+70)/3 = 80

2. **Median:** Middle value when data is ordered. Useful for skewed data.
   *Example:* Scores = [70, 80, 90], Median = 80

3. **Mode:** Most frequent value. Useful for categorical data.
   *Example:* Colors = [Red, Blue, Blue, Green], Mode = Blue

### **B. Measures of Dispersion**

These tell us how spread out the data is.

1. **Range:** Difference between the largest and smallest values.
2. **Variance ((σ^2))**: Average squared deviation from the mean.
3. **Standard Deviation ((σ))**: Square root of variance. Gives dispersion in original units.
   
   $[
   \sigma = \sqrt{\frac{\sum (x_i - \bar{x})^2}{n}}
   ]$

*Example:* If exam scores vary a lot among students, standard deviation will be high.

### **C. Other Descriptive Measures**

* **Percentiles & Quartiles:** Divide data into parts (e.g., median is 50th percentile).
* **Skewness:** Measures asymmetry of data.
* **Kurtosis:** Measures how “peaked” the data distribution is.

---

## **4. Probability and Probability Distributions**

Statistics is rooted in **probability**, which quantifies uncertainty.

* **Probability:** Likelihood of an event occurring, $(0 \leq P \leq 1)$.
  *Example:* Probability of rolling a 3 on a die = 1/6.

### **A. Common Probability Distributions**

1. **Discrete Distributions:**

   * **Binomial:** Success/failure experiments.
   * **Poisson:** Rare events in fixed intervals.

2. **Continuous Distributions:**

   * **Normal Distribution (Bell curve):** Symmetrical; mean = median = mode.
   * **Uniform, Exponential, etc.**

**Use in statistics:** Understanding distributions allows us to calculate probabilities and apply inferential methods.

---

## **5. Inferential Statistics**

While descriptive statistics summarizes data, **inferential statistics** lets us **make decisions or predictions**.

### **A. Sampling**

We often study a **sample** instead of the entire population.

* **Population:** Entire group we want to study.
* **Sample:** Subset of the population.
* **Sampling Methods:** Simple random, stratified, cluster, systematic.

### **B. Estimation**

* **Point Estimate:** Single value (e.g., sample mean).
* **Interval Estimate (Confidence Interval):** Range of values likely to contain population parameter.

### **C. Hypothesis Testing**

* **Null Hypothesis ((H_0))**: No effect or difference.
* **Alternative Hypothesis ((H_1))**: There is an effect or difference.
* **Steps:**

  1. Formulate hypotheses
  2. Choose significance level ((α), often 0.05)
  3. Compute test statistic (e.g., t-test, z-test)
  4. Make decision: reject or fail to reject (H_0)

**Example:** Testing if a new teaching method improves scores.

### **D. Correlation & Regression**

* **Correlation:** Measures strength of linear relationship between two variables ((-1) to (1)).
* **Regression:** Predicts one variable from another using a linear model.
  *Example:* Predict sales based on advertising spend.

---

## **6. Common Mistakes in Statistics**

* Confusing **correlation with causation**.
* Ignoring **outliers** or skewed data.
* Using **wrong statistical test** for the data type.
* Overgeneralizing **sample results to population** without checking assumptions.

---

## **7. Why Statistics is Important in Data Analytics**

* Summarizes and interprets large datasets.
* Helps make evidence-based decisions.
* Provides a foundation for **machine learning** and predictive modeling.
* Quantifies uncertainty, risk, and variability.

---

### **8. Quick Visual Summary**

| Topic                    | Purpose                           | Example                       |
| ------------------------ | --------------------------------- | ----------------------------- |
| Descriptive Stats        | Summarize data                    | Mean, Median, Std Dev         |
| Probability              | Quantify uncertainty              | Dice rolls                    |
| Inferential Stats        | Make predictions about population | Confidence intervals          |
| Hypothesis Testing       | Test assumptions                  | Does teaching improve scores? |
| Correlation & Regression | Find relationships/predictions    | Sales vs Ads                  |
| Sampling                 | Study subset of population        | Survey 100 customers          |

---


