### Explaining Inferential Statistics

Firstly what is `inference`? It is the process of drawing conclusions from data.

Secondly what is `inferential in statistics`? It is the process of making conclusions about a population based on a sample.

Thirdly what is `Population`? It is the entire group of items from which the sample is drawn. So the `sample` is part of the `population`, like `the selected few`.

So `inferenceal statistics` is the process of making conclusions about a `population` based on a `sample`.


### One Technique of Inferential Statistics: Hypothesis Testing

Hypothesis is a statement that you make about the world. 
Hypothesis testing is a statistical technique used to make inferences about a `population` based on a `sample` of data. It involves testing a hypothesis about a population parameter, such as a mean or proportion, to determine whether it is true or false


Imagine you're a meteorologist, and you want to predict the temperature for a specific region over the next few days. You have a dataset of daily temperatures for the past month, and you want to make a prediction about the future temperatures.

Population: The entire dataset of daily temperatures is the population you're interested in.

Sample: The past month's data is the sample you have.

Inferential Statistics: You use the sample data to make an educated guess about the population (the future temperatures).

### Example

Let's say you want to predict the average temperature for the next 5 days. You have data from the past 30 days, and the average temperature for those days is 75°F. You want to know if the average temperature for the next 5 days will be higher or lower than 75°F.

Null Hypothesis (H0): The average temperature for the next 5 days will be equal to 75°F.

Alternative Hypothesis (H1): The average temperature for the next 5 days will be different from 75°F.

Test Statistic: You calculate a test statistic based on the sample data. This statistic measures the difference between the sample mean (75°F) and the known population mean (the average temperature for the entire dataset).

P-Value: You calculate the p-value, which is the probability of observing the test statistic (or a more extreme value) assuming that the null hypothesis is true.

Decision: If the p-value is below a certain significance level (usually 0.05), you reject the null hypothesis and conclude that the alternative hypothesis is true.

Let's say you conduct a hypothesis test to determine if the average temperature for the next 5 days will be different from 75°F. You calculate the test statistic and p-value, and you get:

Test Statistic: 2.1 P-Value: 0.01

Since the p-value is below 0.05, you reject the null hypothesis and conclude that the alternative hypothesis is true. This means that the average temperature for the next 5 days is likely to be different from 75°F.

Interpretation

The results of the hypothesis test indicate that the average temperature for the next 5 days is likely to be different from 75°F. As a meteorologist, you can now make an informed decision about the weather forecast for the next few days.

By using inferential statistics, you've made a data-driven decision that can help you predict the weather more accurately.|

### Illustration

In [4]:
import numpy as np
from scipy.stats import ttest_1samp
import pandas as pd

# Create a dataset of daily temperatures
dataset = np.array([75, 78, 72, 80, 76, 74, 77, 79, 73, 81, 75, 78, 72, 80, 76, 74, 77, 79, 73, 81, 75, 78, 72, 80, 76, 74, 77, 79, 73, 81])

population_mean = 75

# Calculate the sample mean
sample_mean = dataset.mean()

# Calculate the sample standard deviation
sample_std = dataset.std()

# Calculate the t-statistic
t_stat = (sample_mean - population_mean) / (sample_std / np.sqrt(len(dataset)))

# Calculate the p-value
p_value = ttest_1samp(dataset, population_mean)[1]

print("t-statistic:", t_stat)
print("p-value:", p_value)

t-statistic: 2.8603877677367766
p-value: 0.00873317712398378


This shows that the sample mean is `2.8603877677367766°F` away from the population mean 75°F.

Since the p-value is below 0.05, you reject the null hypothesis and conclude that the alternative hypothesis is true. This means that the average temperature for the next 5 days is likely to be different from `75°F`.