## Day 1: Introduction to Statistics
##### What is Statistics?


##### Statistics is the science of collecting, analyzing, interpreting, and presenting data. It helps us make sense of large amounts of information by summarizing and drawing conclusions from it.

##### - Statistics is used in virtually every field—science, business, health, and daily life—to understand patterns and make informed decisions.

##### Example
##### If you survey your classmates about favorite subjects, statistics helps you summarize the results and look for trends.
##### Descriptive vs. Inferential Statistics
| Type | Description | Example |
|-----------------|-----------------|-----------------|
|Descriptive    | 	Summarizes or describes features of a dataset    | Mean, median, mode, range, graphs     |
| Inferential     | Makes predictions or generalizations about a population based on a sample     | Estimating average height of all students from a small group     |



##### - Descriptive statistics: Answers, "What does my data show?"

##### - Inferential statistics: Answers, "What can I predict or infer about a larger group from my data?

## Day 2: Data & Measurement Scales

##### Data Types: Qualitative vs. Quantitative


| Type | Definition | Examples |
|-----------------|-----------------|-----------------|
| Qualitative/Categorical    | 	Describes qualities, categories, or labels    | Colors, gender, letter grades     |
| Quantitative/Numerical     | Based on numbers or measurable quantities     | Height, weight, age     |



##### - Qualitative data is grouped by name or label.

##### - Quantitative data is measured numerically.


##### Scales of Measurement
1. Nominal:

    - Labels or names only

    - Example: Types of fruits (apple, orange, banana)

2. Ordinal:

    - Ordered categories, but differences between values aren’t meaningful

    - Example: Survey rankings (happy, neutral, sad)

3. Interval:

    - Order matters, differences are meaningful, but no true zero

    - Example: Temperature in Celsius (20°C, 30°C)

4. Ratio:

    - Like interval, plus a true zero exists

    - Example: Height (0 cm means no height)



| **Scale**   | **Can Be Sorted?** | **Differences Meaningful?** | **True Zero?** | **Example**             |
|-------------|--------------------|-----------------------------|----------------|-------------------------|
| Nominal     | No                 | No                          | No             | Eye color               |
| Ordinal     | Yes                | No                          | No             | Movie ratings (1–5 stars) |
| Interval    | Yes                | Yes                         | No             | Calendar years          |
| Ratio       | Yes                | Yes                         | Yes            | Salary                  |


##### Quick Review:

- Nominal: Just labels

- Ordinal: Labels with order

- Interval: Numbers with meaningful differences, no true zero

- Ratio: Numbers with order, meaningful differences, and absolute zero

## Day 3: Population & Sampling

#### 1. Population vs. Sample

##### Population:
        The entire group of individuals or items you want to study.

##### Sample:
        A subset of the population, selected for actual analysis. We use samples to make inferences about populations.

##### Example:

        If you're interested in the average height of all students at a university (the population), but you only measure 100 students (the sample), you’ll use the sample to estimate the characteristics of the whole group.

#### 2. Sampling Methods
##### a. Simple Random Sampling
- Definition: Every member of the population has an equal chance of being selected.

##### Example:



In [1]:
import numpy as np
students = np.arange(1, 1001)  # Student IDs 1 to 1000
np.random.seed(42)
sample = np.random.choice(students, size=100, replace=False)
print("Sampled student IDs:", sample[:10])

Sampled student IDs: [522 738 741 661 412 679 627 514 860 137]


##### b. Stratified Sampling
- Definition: The population is divided into subgroups (strata) and random samples are taken from each stratum. This ensures representation of each stratum.

##### Example:



In [2]:
import pandas as pd
import numpy as np
# Suppose we have 1000 students with gender labels (M/F)
students = pd.DataFrame({
    'id': np.arange(1, 1001),
    'gender': np.random.choice(['M','F'], size=1000)
})
# Take 10 males and 10 females
male_sample = students[students['gender'] == 'M'].sample(10, random_state=42)
female_sample = students[students['gender'] == 'F'].sample(10, random_state=42)
stratified_sample = pd.concat([male_sample, female_sample])
print(stratified_sample.head())

      id gender
588  589      M
829  830      M
11    12      M
347  348      M
142  143      M


##### c. Cluster Sampling
- Definition: The population is divided into clusters (often based on geography or another natural grouping), a few clusters are randomly selected, and all members of those clusters are included.

##### Example:

In [3]:
# Suppose students are grouped in 10 dorms
students['dorm'] = np.random.choice(['Dorm_A','Dorm_B','Dorm_C','Dorm_D','Dorm_E','Dorm_F','Dorm_G','Dorm_H','Dorm_I','Dorm_J'], size=1000)
# Randomly pick 2 dorms
selected_dorms = np.random.choice(students['dorm'].unique(), size=2, replace=False)
cluster_sample = students[students['dorm'].isin(selected_dorms)]
print('Selected dorms:', selected_dorms)
print(cluster_sample.head())

Selected dorms: ['Dorm_C' 'Dorm_G']
    id gender    dorm
0    1      F  Dorm_C
2    3      M  Dorm_G
8    9      M  Dorm_C
10  11      M  Dorm_G
11  12      M  Dorm_G


## Day 4: Central Tendency & Dispersion

#### 1. Measures of Central Tendency

&nbsp;&nbsp; These tell us where the "center" or typical value of a dataset lies.




##### Mean
        The arithmetic average: sum all values, divide by the count.



##### Example:



In [4]:
import numpy as np
scores = np.array([55, 72, 89, 65, 83]).astype(float)
mean_score = scores.mean()
print("Mean:", mean_score)

Mean: 72.8


##### Median
        The middle value when sorted (or average of two middle values if even number).

##### Example:



In [5]:
median_score = np.median(scores)
print("Median:", median_score)

Median: 72.0


##### Mode
        The value that occurs most often.

#### 2. Measures of Spread (Dispersion)
&nbsp;&nbsp;These describe how much the values vary.


#### 2. Measures of Spread (Dispersion)
&nbsp;&nbsp;These describe how much the values vary.


##### Range
        The difference between the largest and smallest value.

In [6]:
range_score = scores.max() - scores.min()
print("Range:", range_score)

Range: 34.0


##### Variance
        Average of the squared differences from the mean.

##### Example:

In [7]:
variance_score = scores.var()
print("Variance:", variance_score)

Variance: 148.96


variance_score = scores.var()
print("Variance:", variance_score)

##### Standard Deviation
        The square root of variance; shows average distance from the mean.

##### Standard Deviation
        The square root of variance; shows average distance from the mean.

In [None]:
std_score = scores.std()
print("Standard Deviation:", std_score)

## Day 5: Variables & Probability Basics

### 1. Variables in Statistics
&nbsp;Variables represent characteristics or properties that can be measured or categorized.

#### Independent Variable
        The variable you manipulate or categorize to see its effect on another variable.

##### Example:
        In a study of how study hours affect exam scores, “study hours” is the independent variable.


#### Dependent Variable
        The outcome variable that you measure, which is affected by independent variables.

##### Example:
        In the same study, “exam score” is the dependent variable.




#### Continuous Variable
        Can take on any value within a range—values aren’t fixed to distinct steps.

##### Example:
        Height, weight, temperature.


#### Discrete Variable
        Can only take specific, separate values (often counts).

##### Example:
        Number of students in a class, number of cars in a parking lot.



### 2. Basics of Probability

&nbsp;Probability quantifies how likely an event is to occur. Probabilities range from 0 (impossible) to 1 (certain).

##### Key Terms
- Experiment: An action with an uncertain outcome (e.g., tossing a coin).

- Sample Space: All possible outcomes (e.g., {Heads, Tails}).

- Event: A subset of the sample space (e.g., “getting Heads”).

![image.png](attachment:image.png)

##### Example:

- Tossing a fair coin:

    - Probability of Heads, P(Heads)=1/2
 
- Rolling a fair die:

    - Probability of getting a 4, P(4)=1/6

##### Demo in Python:


In [8]:
# Simulate tossing a coin 1000 times
import numpy as np
coin_flips = np.random.choice(['Heads', 'Tails'], size=1000)
prob_heads = (coin_flips == 'Heads').mean()
print("Estimated P(Heads):", prob_heads)

Estimated P(Heads): 0.511


## Day 6: Hypothesis Testing
Let’s review the foundation of hypothesis testing so you can interpret results and make data-driven decisions.

### 1. Null & Alternative Hypotheses
#### Null Hypothesis H0
    The default claim: typically, no effect, no difference, or the status quo.
#### Alternative Hypothesis H1 
    Contrasts H0, claiming there is an effect or a difference.
##### Example:
- H0: The mean exam score of two classes is the same.
- H1: The mean exam score is different between classes.


### 2. p-value & Significance Level
##### p-value
    The probability of observing data at least as extreme as your sample, assuming H0 is true.

##### Significance Level α
A threshold (commonly 0.05) for how much evidence you require to reject H0. If p value < α, you reject H0.
 .

In [9]:
from scipy import stats
sample_a = [82, 79, 90, 75, 88]
sample_b = [85, 87, 92, 80, 90]
t_stat, p_value = stats.ttest_ind(sample_a, sample_b)
print("p-value:", p_value)

p-value: 0.28302029650333277


### 3. Type I and Type II Errors

- Type I Error: Rejecting H0 when it is actually true ("false positive").

- Type II Error: Not rejecting H0  when it is actually false ("false negative").

|               | H₀ True               | H₀ False              |
|----------------|-----------------------|-----------------------|
| Reject H₀      | Type I Error          | Correct Decision      |
| Do Not Reject H₀ | Correct Decision     | Type II Error         |
