# Chapter 1: Descriptive Statistic and Inferential statistic

Both are the main ares of statistics and we can define them as:

- Descriptive statistic provides tools to summarize and describe a sample, providing a clear picture of the data at hand.
- Inferencial statistic allow us to make broader conclusions and predictions about an entire population based on the insights drawn from the sample.

The goal in statistics is to make a statement about the population, however, in most cases it is not possible to get all data of the population, so a representative sample is taken. This sample is then analyzed using descriptive statistic, which helps summarize key characteristics, such as the mean and the variability within the sample. However, describing the sample may not be sufficient to make a statement about the population as a whole. In these cases, inferential statistic is used to make predictions about the population based on the sample.



## Descriptive Statistic

With the data, we can do graph the data, calculate the various descriptive statistic to understand our data, we can summarized this descriptive statistic as two subgroups as follows:

- Location parameter:
    - **Mean:** The average value of the sample;
    - **Median:** The middle value of the sample;
    - **Mode:** The most common value in the sample;
    - **Sum:** The sum of all values in the sample;
- Dispersion parameter:
    - **Range:** The difference between the maximum and minimum values in the sample;
    - **Variance:** The average of the squared differences between each value in the sample and the mean;
    - **Standard deviation:** The square root of the variance;
    - **Coefficient of variation:** The standard deviation divided by the mean.



In [1]:
# Configuração para o notebook e plotagem de imagens
from IPython.display import Latex
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
def jupyter_settings():
    plt.style.use('bmh')
    plt.rcParams['figure.figsize'] = [25, 12]
    plt.rcParams['font.size'] = 24
   # display(HTML('<style>.container { width:100% !important; }</style>'))
    sns.set()

jupyter_settings()


## Probability sampling

This is a simple form to sample from a population.

In [1]:
import numpy as np

# setup generator for reproducibility
random_generator = np.random.default_rng(2020)

population = np.arange(1, 10 + 1)
sample = random_generator.choice(
    population,     # sample from population
    size=3,         # number of samples to take
    replace=False   # only allow to sample individuals once
)

print(sample)
# array([1, 8, 5])

[1 8 5]


Four types of the probability sampling can be used:

 - **Simple random sampling:** Every member of the subset has an equal chance of being selected. We can use this method when all members of the population have similar properties. The exemple the we doing before is a simple random sampling.

 - **Systematic sampling:** The members of a population are selected at a random point with a fixed sampling interval. For exemple, in a classe with 50 students but only 10 books to give to these students. The sampling interval is fixed by dividing the number of students by the number of books, in this case, we have 50/10 = 5. For exemple, take the number 18, the other students selected to get the books will be as follows: 18, 23, 28, 33, 38, 43, 48, 3, 8, 13. The problem with systematic sampling is associated with the ordem. If we want the best students in math correspond to the sequencial order initially? This can introduced a bias in your sample.

 - **Stratified sampling:** In this method,  we divide the population into homogeneous subpopulations called **strata**. Each stratum splits have distinctly properties, lie as gender, age, color and etc. The members in each stratum has an equal chance of being selected.

In [2]:
# Stratified Sampling exemple

population_Strata = [ 1, "A", 3, 4, 5, 2, "D", 8, "C", 7, 6, "B"]

# group strata

strata = {'number': [],
          'string': []}

for item in population_Strata:
    if isinstance(item, int):
        strata['number'].append(item)
    else:
        strata['string'].append(item)

# fraction of population to sample
sample_fraction = 0.5

# random sample from strata
sample_strata = {}

for group in strata:
    sample_size = int(sample_fraction * len(strata[group]))

    sample_strata[group] = random_generator.choice(
        strata[group],
        size=sample_size,
        replace=False
    )

print(sample_strata)

{'number': array([2, 8, 5, 1]), 'string': array(['D', 'C'], dtype='<U1')}
