## Tratamento dos Dados: One Piece

- API Utilizada: https://api-onepiece.com/en/documentation
- Objetivo: Realizar um tratamento de dados com a finalidade de treinar um algoritmo de Machine Learning

### 1. Importação das Bibliotecas

In [None]:
import pandas as pd
import numpy as np
import requests

### 2. Importando base de dados por uma API

In [None]:
response = requests.get("https://api.api-onepiece.com/v2/characters/en?name=Monkey%20D%20Luffy")

if response.status_code == 200:
    data = response.json()
    df = pd.json_normalize(data) 
else:
    print("Erro:", response.status_code, response.text)


### 3. Análise de Dados

In [226]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 773 entries, 0 to 772
Data columns (total 22 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   id                   773 non-null    int64  
 1   name                 773 non-null    object 
 2   size                 761 non-null    object 
 3   age                  761 non-null    object 
 4   bounty               756 non-null    object 
 5   job                  772 non-null    object 
 6   status               772 non-null    object 
 7   crew.id              738 non-null    float64
 8   crew.name            738 non-null    object 
 9   crew.description     0 non-null      float64
 10  crew.status          738 non-null    object 
 11  crew.number          393 non-null    object 
 12  crew.roman_name      449 non-null    object 
 13  crew.total_prime     393 non-null    object 
 14  crew.is_yonko        738 non-null    object 
 15  fruit.id             178 non-null    flo

### 4. Tratamento de Dados

#### 4.1 Removendo colunas desnecessárias

In [227]:
df = df.drop(columns=['id','status','crew.id', 'crew.name', 'crew.description', 'crew.status', 'crew.number', 'crew.total_prime', 'crew.is_yonko', 'fruit.id', 'fruit.name', 'fruit.description', 'fruit.filename', 'fruit.technicalFile'])

#### 4.2 Padronizando os valores vazios:

In [None]:
df["fruit.type"] = df["fruit.type"].fillna("Don't have")
df["fruit.roman_name"] = df["fruit.roman_name"].fillna("Don't have")
df["crew.roman_name"] = df["crew.roman_name"].fillna("Don't have")

#### 4.3 Convertendo o valor da recompensa para um tipo numérico:

In [None]:
df['bounty'] = df['bounty'].str.replace('.', '', regex=False)
df['bounty'] = pd.to_numeric(df['bounty'], errors='coerce')
df['bounty'] = df['bounty'].fillna(0)

#### 4.4 Convertendo o tamanho dos personagem para um tipo numérico:

In [None]:
df['size'].replace('', np.nan, inplace=True)
df = df.dropna(subset=['age'])
df['size'] = df['size'].str.replace('cm', '', regex=False)
df['size'] = df['size'].str.replace(' ', '', regex=False)
df['size'] = pd.to_numeric(df['size'], errors='coerce')

#### 4.5 Convertendo a idade dos personagens para um tipo numérico:

In [None]:
df['age'].replace('', np.nan, inplace=True)
df = df.dropna(subset=['size'])
df['age'] = df['age'].str.replace('ans', '', regex=False)
df['age'] = df['age'].str.replace(' ', '', regex=False)
df['age'] = pd.to_numeric(df['age'], errors='coerce')

### 4.6 Removendo Outliers

In [None]:
df = df.drop([760, 602, 255])

In [247]:
df

Unnamed: 0,name,size,age,bounty,job,crew.roman_name,fruit.type,fruit.roman_name
0,Monkey D Luffy,174.0,19,3.000000e+09,Captain,Mugiwara no Ichimi,Paramecia,Gomu Gomu no Mi
1,Roronoa Zoro,181.0,21,3.200000e+08,Right-hand man,Mugiwara no Ichimi,Don't have,Don't have
2,Nami,170.0,20,6.600000e+07,Navigator,Mugiwara no Ichimi,Don't have,Don't have
3,Usopp,176.0,19,2.000000e+08,Sniper,Mugiwara no Ichimi,Don't have,Don't have
4,Sanji,180.0,21,3.300000e+08,Cook,Mugiwara no Ichimi,Don't have,Don't have
...,...,...,...,...,...,...,...,...
689,Bartholomew Kuma,689.0,47,2.960000e+08,Lieutenant,Don't have,Paramecia,Nikyu Nikyu no Mi
690,César Clown,309.0,40,3.000000e+08,,Don't have,Logia,Gasu Gasu no Mi
691,Morgans,305.0,53,0.000000e+00,World Economic Journal (boss),Don't have,Zoan,Tori Tori no Mi Moderu Arubatorosu
734,Zéphyr,348.0,72,0.000000e+00,Chef,Don't have,Don't have,Don't have


In [248]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 354 entries, 0 to 753
Data columns (total 8 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   name              354 non-null    object 
 1   size              354 non-null    float64
 2   age               354 non-null    int64  
 3   bounty            354 non-null    float64
 4   job               354 non-null    object 
 5   crew.roman_name   354 non-null    object 
 6   fruit.type        354 non-null    object 
 7   fruit.roman_name  354 non-null    object 
dtypes: float64(2), int64(1), object(5)
memory usage: 24.9+ KB


### 5. Salvando Base de dados em arquivo CSV

In [None]:
df.to_csv("one_piece_dataset.csv")