# **Estatística com Python**

## **Amostragem**

### **Importando as Bibliotecas**

In [None]:
import pandas as pd
import random
import numpy as np

### **Importando a Base de Dados**

In [None]:
df = pd.read_csv('census.csv')

In [None]:
df.shape

(32561, 15)

In [None]:
df.head()

Unnamed: 0,age,workclass,final-weight,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loos,hour-per-week,native-country,income
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


In [None]:
df.tail()

Unnamed: 0,age,workclass,final-weight,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loos,hour-per-week,native-country,income
32556,27,Private,257302,Assoc-acdm,12,Married-civ-spouse,Tech-support,Wife,White,Female,0,0,38,United-States,<=50K
32557,40,Private,154374,HS-grad,9,Married-civ-spouse,Machine-op-inspct,Husband,White,Male,0,0,40,United-States,>50K
32558,58,Private,151910,HS-grad,9,Widowed,Adm-clerical,Unmarried,White,Female,0,0,40,United-States,<=50K
32559,22,Private,201490,HS-grad,9,Never-married,Adm-clerical,Own-child,White,Male,0,0,20,United-States,<=50K
32560,52,Self-emp-inc,287927,HS-grad,9,Married-civ-spouse,Exec-managerial,Wife,White,Female,15024,0,40,United-States,>50K


##### **NOTA:**

*   **`read_csv()`**: Carrega os dados.
*   **`shape`**: Informa a quantidade de linhas e colunas da nossa base de dados.
*   **`head()`** e **`tail()`**: Mostra os **5** primeiros e os **5** últimos registros da nossa base de dados, respectivamente. Podemos alterar esse comportamento, passando como parâmetro a quantidade de registros que queremos visualizar. Por exemplo:

> **`dataset.head(10)`**

> **`dataset.tail(10)`**

### **Amostra Aleatória Simples**

In [None]:
df_amostra_aleatoria_simples = dataset.sample(n = 100, random_state = 1)

##### **NOTA:**

*   **`sample()`**: Gera uma amostra da base de dados.

Como parâmetros da função **`sample()`**, podesmos passar:

*   **`n`**: Quantidade de registros que queremos na amostra.
*   **`random_state`**: Gera uma amostra randômica fixa, por exemplo:

> **`df_amostra_aleatoria_simples_1 = dataset.sample(n = 100, random_state = 1)`**

> **`df_amostra_aleatoria_simples_2 = dataset.sample(n = 100, random_state = 2)`**

In [None]:
df_amostra_aleatoria_simples.shape

(100, 15)

In [None]:
df_amostra_aleatoria_simples.head()

Unnamed: 0,age,workclass,final-weight,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loos,hour-per-week,native-country,income
9646,62,Self-emp-not-inc,26911,7th-8th,4,Widowed,Other-service,Not-in-family,White,Female,0,0,66,United-States,<=50K
709,18,Private,208103,11th,7,Never-married,Other-service,Other-relative,White,Male,0,0,25,United-States,<=50K
7385,25,Private,102476,Bachelors,13,Never-married,Farming-fishing,Own-child,White,Male,27828,0,50,United-States,>50K
16671,33,Private,511517,HS-grad,9,Married-civ-spouse,Prof-specialty,Husband,White,Male,0,0,40,United-States,<=50K
21932,36,Private,292570,11th,7,Never-married,Machine-op-inspct,Unmarried,White,Female,0,0,40,United-States,<=50K


In [None]:
def amostra_aleatoria_simples(df, amostra):
    return df.sample(n = amostra, random_state = 1)

In [None]:
amostra_aleatoria_simples(df, 100)

Unnamed: 0,age,workclass,final-weight,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loos,hour-per-week,native-country,income
9646,62,Self-emp-not-inc,26911,7th-8th,4,Widowed,Other-service,Not-in-family,White,Female,0,0,66,United-States,<=50K
709,18,Private,208103,11th,7,Never-married,Other-service,Other-relative,White,Male,0,0,25,United-States,<=50K
7385,25,Private,102476,Bachelors,13,Never-married,Farming-fishing,Own-child,White,Male,27828,0,50,United-States,>50K
16671,33,Private,511517,HS-grad,9,Married-civ-spouse,Prof-specialty,Husband,White,Male,0,0,40,United-States,<=50K
21932,36,Private,292570,11th,7,Never-married,Machine-op-inspct,Unmarried,White,Female,0,0,40,United-States,<=50K
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
27578,64,?,200017,HS-grad,9,Widowed,?,Not-in-family,White,Female,0,0,20,United-States,<=50K
2544,19,?,192773,Some-college,10,Never-married,?,Own-child,White,Female,0,0,35,United-States,<=50K
2486,75,?,164849,9th,5,Married-civ-spouse,?,Husband,Black,Male,1409,0,5,United-States,<=50K
13143,28,Private,154863,Bachelors,13,Never-married,Adm-clerical,Own-child,Black,Male,0,0,35,United-States,<=50K


##### **NOTA:** Criando uma função para retornarmos uma amostra com a quantidade de registor que quisermos.