# Probability and Statistics for Machine Learning: Statistical Inference and Estimation

## 9. Statistical Inference and Estimation


### What is Statistical Inference?

Statistical inference is the process of drawing conclusions about a population based on a sample of data. It involves using probability theory to estimate population parameters and make predictions.

There are two main types of statistical inference:
1. **Point Estimation**: Provides a single value (estimate) of a population parameter (e.g., mean or proportion).
2. **Interval Estimation**: Provides a range of values (confidence interval) within which the population parameter is likely to fall.

### What is Estimation?

Estimation is the process of inferring the value of an unknown population parameter based on sample data. There are two types of estimators:
- **Point Estimator**: Estimates a single value for the parameter (e.g., sample mean as an estimate of the population mean).
- **Interval Estimator**: Estimates a range of values (e.g., confidence intervals).

### Example: Point Estimation

Suppose we want to estimate the population mean based on a sample of data.
    

In [None]:

# Example: Point Estimation of Population Mean
sample_data = np.array([168, 172, 169, 171, 170, 173, 174])
sample_mean = np.mean(sample_data)
sample_mean
    


### Confidence Intervals

A confidence interval is a range of values that is likely to contain the population parameter with a certain level of confidence (e.g., 95% confidence interval).

For a population mean, the confidence interval is given by:

\[
CI = ar{x} \pm Z_{lpha/2} \cdot rac{\sigma}{\sqrt{n}}
\]

Where:
- \( ar{x} \) is the sample mean.
- \( Z_{lpha/2} \) is the critical value from the standard normal distribution.
- \( \sigma \) is the standard deviation.
- \( n \) is the sample size.

### Example: Confidence Interval

We can calculate a 95% confidence interval for the population mean based on the sample data.
    

In [None]:

# Example: Calculating 95% Confidence Interval
import scipy.stats as stats

n = len(sample_data)
confidence_level = 0.95
degrees_freedom = n - 1
sample_std = np.std(sample_data, ddof=1)
confidence_interval = stats.t.interval(confidence_level, degrees_freedom, loc=sample_mean, scale=sample_std/np.sqrt(n))
confidence_interval
    


### Applications in Machine Learning

- **Point Estimation** is used to estimate population parameters such as means, proportions, and variances.
- **Confidence Intervals** provide a range of values for population parameters and are used to assess the reliability of estimates.
- Statistical inference is used to generalize results from a sample to a larger population and to assess the uncertainty of predictions.

    