# Statistics

* Statistics is the science of **collecting, organizing, analyzing, interpreting,** and **presenting data** to make **informed decisions**.
* Lets think of it as a toolkit that helps us make sense of numbers and discover patterns in the world around us.

**There are two main branches**
1.  **Descriptive Statistics**
*   Deals with summarizing, organizing, and presenting data we already have.
*   Uses measures like mean (average), median, mode, range, variance, and standard deviation to describe a dataset.
*   Also involves graphs, charts, and tables for easy visualiztion.
*   For example calculating the average coding hours of AI engineering students.

2.  **Inferential Statistics**
*   Uses data from a sample to make conclusions or predictions about a larger population.
*   Involves methods like correlation, regression, hypothesis testing, chi-square tests, t-tests, and ANOVA.
*   Helps in data-driven decision making by testing ideas and estimating outcomes with a level of certainty.
*   Foundation of predictive analysis, guiding policymakers and organizations in making strategic decisions.
*   For example, using a survey of 1,000 people to predict how millions will vote in a presidential election.

**Why Statistics**
*   Beyond the confusing formulas and outrageous numbers, statistics will help us answer questions like;
    -   What's typical? (measures of center)
    -   How much variation is there? (measures of spread)
    -   Is this pattern real or just coincidence? (hypothesis testing)
    -   Can we predict future outcomes? (regression)

### Lets try to understand these concepts with codes

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as stats

In [None]:
# let's set random seed for reproducibility
np.random.seed(42)

In [None]:
# let's simulate a dataset for AI engineering students

# 1.    Traditional learning - classroom based
# lets create it with a 25hours/week and standard deviation of 5 hours
traditional_study_hours = np.random.normal(25, 5, 100)

# 2.    Accelerated learning (project-based and hands-on style)

# this one will be a 35 hours/week a standard deviation of 8 hours
accelerated_study_hours = np.random.normal(35, 8, 100)

# lets generate corresponding performance scores between 0-100

# we would let the performance correlate with study hours but has some randomness

traditional_scores = np.random.normal(75, 12, 100)
accelerated_scores = np.random.normal(82, 15, 100)

# lets generate project completion counts

traditional_projects = np.random.poisson(8, 100)
accelerated_projects = np.random.poisson(12, 100)