# Estatística descritiva
- 1. Leitura de Dados
- 2. Métodos estatísticos
    - Máximo
    - Mínimo
    - Média
    - Mediana
    - Moda
    - Desvio Padrão
    - Amplitude
    - Count
    - Valores únicos
    - Count valores únicos
    - Somatório
    - Quartis
    - Describe()
    - Covariança
    - Correlação
   
<img src='https://pandas.pydata.org/docs/_images/05_newcolumn_1.svg'>

In [1]:
import pandas as pd
import numpy as np
from numpy.random import randn
np.random.seed(101)

## 1. Criando um DataFrame

In [4]:
df = pd.read_csv('data/sales_clear.csv')

In [5]:
df.head(5)

Unnamed: 0,Start_Date,Customer Number,Customer Name,2016,2017,Percent Growth,Jan Units,Active
0,2015-01-10,10002,Quest Industries,125000.0,162500.0,0.3,500.0,True
1,2014-06-15,552278,Smith Plumbing,920000.0,1012000.0,0.1,700.0,True
2,2016-03-29,23477,ACME Industrial,50000.0,62500.0,0.25,125.0,True
3,2015-10-27,24900,Brekke LTD,350000.0,490000.0,0.04,75.0,True
4,2014-02-02,651029,Harbor Co,15000.0,12750.0,-0.15,,False


## 2. Métodos estatísticos

**Valor Máximo**

In [6]:
df.max()

Start_Date             2016-03-29
Customer Number            651029
Customer Name      Smith Plumbing
2016                     920000.0
2017                    1012000.0
Percent Growth                0.3
Jan Units                   700.0
Active                       True
dtype: object

**Valor Mínimo**

In [7]:
df.min()

Start_Date              2014-02-02
Customer Number              10002
Customer Name      ACME Industrial
2016                       15000.0
2017                       12750.0
Percent Growth               -0.15
Jan Units                     75.0
Active                       False
dtype: object

**Desvio Padrão**

In [8]:
df.std(numeric_only=True)

Customer Number    320838.999788
2016               374476.300986
2017               415142.460488
Percent Growth          0.179081
Jan Units             300.693643
Active                  0.447214
dtype: float64

**Count**

In [9]:
df.count()

Start_Date         5
Customer Number    5
Customer Name      5
2016               5
2017               5
Percent Growth     5
Jan Units          4
Active             5
dtype: int64

**Somatório**

In [10]:
df.sum(numeric_only=True)

Customer Number    1261686.00
2016               1460000.00
2017               1739750.00
Percent Growth           0.54
Jan Units             1400.00
Active                   4.00
dtype: float64

**Counte elementos únicos**

In [11]:
df.nunique()

Start_Date         5
Customer Number    5
Customer Name      5
2016               5
2017               5
Percent Growth     5
Jan Units          4
Active             2
dtype: int64

**Liste os elementos únicos**

In [12]:
df['Customer Name'].unique()

array(['Quest Industries', 'Smith Plumbing', 'ACME Industrial',
       'Brekke LTD', 'Harbor Co'], dtype=object)

**Counte o valor de cada classe**

In [13]:
df.value_counts(subset=['Customer Name'])

Customer Name   
ACME Industrial     1
Brekke LTD          1
Harbor Co           1
Quest Industries    1
Smith Plumbing      1
dtype: int64

In [14]:
df['Customer Name'].value_counts()

Quest Industries    1
Smith Plumbing      1
ACME Industrial     1
Brekke LTD          1
Harbor Co           1
Name: Customer Name, dtype: int64

**Mediana**

In [15]:
df.median(numeric_only=True)

Customer Number     24900.0
2016               125000.0
2017               162500.0
Percent Growth          0.1
Jan Units             312.5
Active                  1.0
dtype: float64

**Moda**

In [16]:
df.mode()

Unnamed: 0,Start_Date,Customer Number,Customer Name,2016,2017,Percent Growth,Jan Units,Active
0,2014-02-02,10002,ACME Industrial,15000.0,12750.0,-0.15,75.0,True
1,2014-06-15,23477,Brekke LTD,50000.0,62500.0,0.04,125.0,
2,2015-01-10,24900,Harbor Co,125000.0,162500.0,0.1,500.0,
3,2015-10-27,552278,Quest Industries,350000.0,490000.0,0.25,700.0,
4,2016-03-29,651029,Smith Plumbing,920000.0,1012000.0,0.3,,


In [17]:
df['Customer Name'].mode()

0     ACME Industrial
1          Brekke LTD
2           Harbor Co
3    Quest Industries
4      Smith Plumbing
dtype: object

**Quartis**

In [18]:
df.quantile(q=0.5)

Customer Number     24900.0
2016               125000.0
2017               162500.0
Percent Growth          0.1
Jan Units             312.5
Active                  1.0
Name: 0.5, dtype: float64

**Retorna os dados normalizados considerando N-1**

In [19]:
df.skew(numeric_only=True)

Customer Number    0.660770
2016               1.648651
2017               1.330057
Percent Growth    -0.547559
Jan Units          0.344825
Active            -2.236068
dtype: float64

**Retorne um sumário estatístico com os dados**

In [20]:
df.describe()

Unnamed: 0,Customer Number,2016,2017,Percent Growth,Jan Units
count,5.0,5.0,5.0,5.0,4.0
mean,252337.2,292000.0,347950.0,0.108,350.0
std,320838.999788,374476.300986,415142.5,0.179081,300.693643
min,10002.0,15000.0,12750.0,-0.15,75.0
25%,23477.0,50000.0,62500.0,0.04,112.5
50%,24900.0,125000.0,162500.0,0.1,312.5
75%,552278.0,350000.0,490000.0,0.25,550.0
max,651029.0,920000.0,1012000.0,0.3,700.0


**Retorne várias estatísticas personalizadas usando o ``agg()``**

In [21]:
df.agg({'2016':['mean','max','min'], 
       '2017':['mean','max','min']})

Unnamed: 0,2016,2017
mean,292000.0,347950.0
max,920000.0,1012000.0
min,15000.0,12750.0


**Covariança**

In [22]:
df[['2016','2017']].cov()

Unnamed: 0,2016,2017
2016,140232500000.0,154540400000.0
2017,154540400000.0,172343300000.0


**Medindo a correlação entre variáveis**

In [23]:
df.corr()

Unnamed: 0,Customer Number,2016,2017,Percent Growth,Jan Units,Active
Customer Number,1.0,0.33415,0.269337,-0.734572,0.759853,-0.694665
2016,0.33415,1.0,0.994078,-0.014651,0.656752,0.413505
2017,0.269337,0.994078,1.0,-0.015571,0.59544,0.451369
Percent Growth,-0.734572,-0.014651,-0.015571,1.0,0.115307,0.80537
Jan Units,0.759853,0.656752,0.59544,0.115307,1.0,
Active,-0.694665,0.413505,0.451369,0.80537,,1.0


In [24]:
df[['2016','2017']].corr()

Unnamed: 0,2016,2017
2016,1.0,0.994078
2017,0.994078,1.0


In [25]:
df[['2016','2017']].corr(method='spearman')

Unnamed: 0,2016,2017
2016,1.0,1.0
2017,1.0,1.0
