## Análise Exploratória dos Dados - Ifood

O conjunto de dados é composto por clientes da empresa Ifood com dados sobre:

- Perfis de clientes  
- Preferências do produto  
- Sucessos/fracassos da campanha  
- Desempenho do canal  

---

### 🎯 Objetivo

O objetivo de hoje é fazer uma análise exploratória desses dados. Responda usando a sua ferramenta de preferência:

1. **Quantos dados temos?**  
   - Verifique o número de linhas e colunas.

2. **Quais são as colunas numéricas?**  
   - Liste as colunas com dados do tipo numérico.

3. **Temos duplicados na nossa base?**  
   - Se houver, remova-os.

4. **Temos dados nulos nessa base?**  
   - Eles indicam algo? O que fazer com eles?

5. **Estatísticas descritivas das colunas numéricas:**  
   - Calcule:
     - Média
     - Mediana
     - Percentil 25%
     - Percentil 75%
     - Valor mínimo
     - Valor máximo


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
df = pd.read_csv("/content/mkt_data.csv")

In [4]:
df

Unnamed: 0.1,Unnamed: 0,Income,Kidhome,Teenhome,Recency,MntWines,MntFruits,MntMeatProducts,MntFishProducts,MntSweetProducts,...,education_Graduation,education_Master,education_PhD,MntTotal,MntRegularProds,AcceptedCmpOverall,marital_status,education_level,kids,expenses
0,0,58138.0,0,0,58,635,88,546,172,88,...,3.0,,,1529,1441,0,Single,Graduation,0,1529
1,1,46344.0,1,1,38,11,1,6,2,1,...,3.0,,,21,15,0,Single,Graduation,2,21
2,2,71613.0,0,0,26,426,49,127,111,21,...,3.0,,,734,692,0,Together,Graduation,0,734
3,3,26646.0,1,0,26,11,4,20,10,3,...,3.0,,,48,43,0,Together,Graduation,1,48
4,4,58293.0,1,0,94,173,43,118,46,27,...,,,5.0,407,392,0,Married,PhD,1,407
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2200,2200,61223.0,0,1,46,709,43,182,42,118,...,3.0,,,1094,847,0,Married,Graduation,1,1094
2201,2201,64014.0,2,1,56,406,0,30,0,0,...,,,5.0,436,428,1,Together,PhD,3,436
2202,2202,56981.0,0,0,91,908,48,217,32,12,...,3.0,,,1217,1193,1,Divorced,Graduation,0,1217
2203,2203,69245.0,0,1,8,428,30,214,80,30,...,,4.0,,782,721,0,Together,Master,1,782


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2205 entries, 0 to 2204
Data columns (total 44 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Unnamed: 0            2205 non-null   int64  
 1   Income                2205 non-null   float64
 2   Kidhome               2205 non-null   int64  
 3   Teenhome              2205 non-null   int64  
 4   Recency               2205 non-null   int64  
 5   MntWines              2205 non-null   int64  
 6   MntFruits             2205 non-null   int64  
 7   MntMeatProducts       2205 non-null   int64  
 8   MntFishProducts       2205 non-null   int64  
 9   MntSweetProducts      2205 non-null   int64  
 10  MntGoldProds          2205 non-null   int64  
 11  NumDealsPurchases     2205 non-null   int64  
 12  NumWebPurchases       2205 non-null   int64  
 13  NumCatalogPurchases   2205 non-null   int64  
 14  NumStorePurchases     2205 non-null   int64  
 15  NumWebVisitsMonth    

In [6]:
df.isnull().sum()

Unnamed: 0,0
Unnamed: 0,0
Income,0
Kidhome,0
Teenhome,0
Recency,0
MntWines,0
MntFruits,0
MntMeatProducts,0
MntFishProducts,0
MntSweetProducts,0


In [7]:
duplicados = df.duplicated()

# Mostra quantas linhas duplicadas existem
print("Número de linhas duplicadas:", duplicados.sum())

Número de linhas duplicadas: 0


In [13]:
# Seleciona apenas colunas numéricas
colunas_numericas = df.select_dtypes(include=['number']).columns

# Exibe as colunas numéricas em formato de lista, uma abaixo da outra
print("Colunas numéricas:")
for col in colunas_numericas:
    print(col)


Colunas numéricas:
Unnamed: 0
Income
Kidhome
Teenhome
Recency
MntWines
MntFruits
MntMeatProducts
MntFishProducts
MntSweetProducts
MntGoldProds
NumDealsPurchases
NumWebPurchases
NumCatalogPurchases
NumStorePurchases
NumWebVisitsMonth
AcceptedCmp3
AcceptedCmp4
AcceptedCmp5
AcceptedCmp1
AcceptedCmp2
Complain
Z_CostContact
Z_Revenue
Response
Age
Customer_Days
marital_Divorced
marital_Married
marital_Single
marital_Together
marital_Widow
education_2n Cycle
education_Basic
education_Graduation
education_Master
education_PhD
MntTotal
MntRegularProds
AcceptedCmpOverall
kids
expenses


In [11]:
# Verificando valores nulos
nulos = df.isnull().sum()
nulos_presentes = nulos[nulos > 0]
print("Colunas com valores nulos:\n", nulos_presentes)

Colunas com valores nulos:
 marital_Divorced        1975
marital_Married         1351
marital_Single          1728
marital_Together        1637
marital_Widow           2129
education_2n Cycle      2007
education_Basic         2151
education_Graduation    1092
education_Master        1841
education_PhD           1729
dtype: int64


In [None]:
mean = df['mkd']