# Análise Descritiva - Dados do Seguro Médico nos EUA


### *"Como os fatores demográficos e comportamentais afetam o preço do seguro médico nos EUA"*

Primeiro, devemos observar como está estruturado o conjunto de dados:

In [2]:
import matplotlib.pyplot as plt
import pandas as pd

medical_insurance = pd.read_csv('insurance.csv')
print(medical_insurance.head(10))

   age     sex     bmi  children smoker     region      charges
0   19  female  27.900         0    yes  southwest  16884.92400
1   18    male  33.770         1     no  southeast   1725.55230
2   28    male  33.000         3     no  southeast   4449.46200
3   33    male  22.705         0     no  northwest  21984.47061
4   32    male  28.880         0     no  northwest   3866.85520
5   31  female  25.740         0     no  southeast   3756.62160
6   46  female  33.440         1     no  southeast   8240.58960
7   37  female  27.740         3     no  northwest   7281.50560
8   37    male  29.830         2     no  northeast   6406.41070
9   60  female  25.840         0     no  northwest  28923.13692


O conjunto possui 1338 registros ou linhas, e 7 colunas.
Possui 4 variáveis numéricas/quantitativas, sendo 2 delas discretas e 2 contínuas, e 3 variáveis categóricas/qualitativas.

## Análise Descritiva - Métricas

### Medidas de Tendência Central

#### Média

A média pode ser calculada por: 

$$\overline{x}=\frac{1}{n} \sum_{i=1}^{n} x_{i}=\frac{1}{n}\left(x_{1}+\cdots+x_{n}\right)$$

Calculando a média para as variáveis quantitativas:

In [3]:
avg_age = round(medical_insurance.age.mean(), 1)
avg_bmi = round(medical_insurance.bmi.mean(), 2)
avg_n_of_children = round(medical_insurance.children.mean())
avg_charges = round(medical_insurance.charges.mean())

print(f"""
As médias das variáveis quantitativas/numéricas são:
- Idade: {avg_age} anos
- IMC: {avg_bmi} 
- N° de filhos: {avg_n_of_children} filho
- Custos: ${avg_charges},00
""")


As médias das variáveis quantitativas/numéricas são:
- Idade: 39.2 anos
- IMC: 30.66 
- N° de filhos: 1 filho
- Custos: $13270,00



In [18]:
# Fazendo a média de uma região em específico

avg_kids_southeast = medical_insurance.loc[medical_insurance.region == 'southeast'].children.mean()
southeast = medical_insurance.loc[medical_insurance.region == 'southeast'].sex.count()
avg_price_southeast = round(medical_insurance.loc[medical_insurance.region == 'southeast'].charges.mean(), 2)
print(f"A média de filhos na região Sudeste é: {avg_kids_southeast}, {southeast}, ${avg_price_southeast}")

A média de filhos na região Sudeste é: 1.0494505494505495, 364, $14735.41


In [19]:
avg_kids_southwest = medical_insurance.loc[medical_insurance.region == 'southwest'].children.mean()
avg_price_southwest = round(medical_insurance.loc[medical_insurance.region == 'southwest'].charges.mean(), 2)
print(f"A média de filhos na região Sudoeste é: {avg_kids_southwest}, ${avg_price_southwest}")

A média de filhos na região Sudoeste é: 1.1415384615384616, $12346.94


In [20]:
avg_kids_northeast = medical_insurance.loc[medical_insurance.region == 'northeast'].children.mean()
northeast = medical_insurance.loc[medical_insurance.region == 'northeast'].children.count()
avg_price_northeast = round(medical_insurance.loc[medical_insurance.region == 'northeast'].charges.mean(), 2)
print(f"A média de filhos na região Nordeste é: {avg_kids_northeast}, {northeast}, ${avg_price_northeast}")

A média de filhos na região Nordeste é: 1.0462962962962963, 324, $13406.38


In [22]:
avg_kids_northwest = medical_insurance.loc[medical_insurance.region == 'northwest'].children.mean()
avg_price_northwest = round(medical_insurance.loc[medical_insurance.region == 'northwest'].charges.mean(), 2)
print(f"A média de filhos na região Noroeste é: {avg_kids_northwest}, ${avg_price_northwest}")

A média de filhos na região Noroeste é: 1.1476923076923078, $12417.58


In [23]:
smoker_charges = medical_insurance.loc[medical_insurance.smoker == 'yes'].charges.mean()
smoker_charges

32050.23183153284

In [24]:
non_smoker_charges = medical_insurance.loc[medical_insurance.smoker == 'no'].charges.mean()
non_smoker_charges

8434.268297856204

In [28]:
underweight = medical_insurance.loc[medical_insurance.bmi < 18.5].charges.mean()
underweight

8852.200585

In [34]:
normal = medical_insurance.loc[(medical_insurance.bmi >= 18.5) & (medical_insurance.bmi < 25)].charges.mean()
normal

10409.337708977777

In [35]:
overweight = medical_insurance.loc[(medical_insurance.bmi >= 25) & (medical_insurance.bmi < 30)].charges.mean()
overweight

10987.509891318654

In [36]:
obese = medical_insurance.loc[(medical_insurance.bmi >= 30) & (medical_insurance.bmi < 35)].charges.mean()
obese

14419.674969693097

In [37]:
xtrm_obese = medical_insurance.loc[(medical_insurance.bmi >= 35)].charges.mean()
xtrm_obese

16953.82361816456

In [41]:
max_price = medical_insurance.charges.max()
id_max = medical_insurance.loc[(medical_insurance.charges == max_price)]
id_max

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
543,54,female,47.41,0,yes,southeast,63770.42801


In [42]:
charges_std_dev = medical_insurance.charges.std()
charges_std_dev

12110.011236693994

In [43]:
smoker_charges_std_dev = medical_insurance.loc[medical_insurance.smoker == 'yes'].charges.std()
smoker_charges_std_dev

11541.547175589121

In [44]:
non_smoker_charges_std_dev = medical_insurance.loc[medical_insurance.smoker == 'no'].charges.std()
non_smoker_charges_std_dev

5993.781819194933

In [45]:
low_bmi_smoker_charges_mean = medical_insurance.loc[(medical_insurance.smoker == 'yes') &
                                                    (medical_insurance.bmi < 25)].charges.mean()
low_bmi_smoker_charges_mean

19839.27830854546

In [46]:
high_bmi_smoker_charges_mean = medical_insurance.loc[(medical_insurance.smoker == 'yes') &
                                                    (medical_insurance.bmi > 25)].charges.mean()
high_bmi_smoker_charges_mean

35116.90965694064

In [48]:
low_bmi_non_smoker_charges_mean = medical_insurance.loc[(medical_insurance.smoker == 'no') &
                                                    (medical_insurance.bmi < 25)].charges.mean()
low_bmi_non_smoker_charges_mean

7515.708890789475

In [49]:
high_bmi_non_smoker_charges_mean = medical_insurance.loc[(medical_insurance.smoker == 'no') &
                                                    (medical_insurance.bmi > 25)].charges.mean()
high_bmi_non_smoker_charges_mean

8629.589609712157

In [54]:
smoker_children_mean = medical_insurance.loc[(medical_insurance.smoker == 'yes') & 
                                             (medical_insurance.age > 24)].children.mean()
smoker_children_mean

1.2429906542056075

In [55]:
non_smoker_children_mean = medical_insurance.loc[(medical_insurance.smoker == 'no') & 
                                                 (medical_insurance.age > 24)].children.mean()
non_smoker_children_mean

1.218676122931442

In [56]:
male_charges = medical_insurance.loc[medical_insurance.sex == 'male'].charges.mean()
male_charges

13956.751177721893

In [57]:
female_charges = medical_insurance.loc[medical_insurance.sex == 'female'].charges.mean()
female_charges

12569.578843835347

In [60]:
male_smoker_count = medical_insurance.loc[(medical_insurance.sex == 'male') & 
                                    (medical_insurance.smoker == 'yes')].children.count()
male_count = medical_insurance.loc[(medical_insurance.sex == 'male')].children.count()

male_smoker_ratio = round((male_smoker_count / male_count) * 100)

male_smoker_ratio

24

In [61]:
female_smoker_count = medical_insurance.loc[(medical_insurance.sex == 'female') & 
                                    (medical_insurance.smoker == 'yes')].children.count()
female_count = medical_insurance.loc[(medical_insurance.sex == 'female')].children.count()

female_smoker_ratio = round((female_smoker_count / female_count) * 100)

female_smoker_ratio

17

#### Mediana

A mediana pode ser calculada colocando os valores em rol (ordem crescente), e fazendo:

- Se o n° de valores for ímpar:
$$x = \text {Valor Mediano}$$

- Se o n° de valores for par:


### Medidas de Dispersão

In [2]:
#-Código

### Medidas de Posição

In [3]:
#-Código

## Análise Descritiva - Visualizações