<p style ="text-align:center">
    <img src="http://epecora.com.br/DataFiles/BannerUFPR.png" width="700" alt="PPGOLD/PPGMNE Python:INTRO"  />
</p>

## Prof. Eduardo Pécora, Ph.D.

# **Pandas: Manipulando o DF**

Neste notebook cobriremos os seguintes tópicos:

1. Criar novas colunas
2. Modificar Valores Específicos
3. Ordenar um DataFrame

<a id="data"></a>
# **Leitura de Dados**

A leitura de dados é o primeiro passo em qualquer fluxo de trabalho de análise de dados. O Pandas facilita a importação de dados de várias fontes e formatos para um DataFrame, que é a estrutura de dados principal utilizada ao longo da análise.

**Arquivos CSV**

O formato CSV (Comma-Separated Values) é um dos formatos de dados mais comuns, utilizado amplamente para armazenar dados tabulares.

In [45]:
import pandas as pd

nba_file = "https://raw.githubusercontent.com/EduPekUfpr/PythonProject/refs/heads/main/Dados/nba.csv" #caminho do arquivo

df_nba = pd.read_csv(nba_file)
print(df_nba.head())  # Exibe as primeiras 5 linhas do DataFrame

            Name            Team  Number Position   Age Height  Weight  \
0  Avery Bradley  Boston Celtics     0.0       PG  25.0    6-2   180.0   
1    Jae Crowder  Boston Celtics    99.0       SF  25.0    6-6   235.0   
2   John Holland  Boston Celtics    30.0       SG  27.0    6-5   205.0   
3    R.J. Hunter  Boston Celtics    28.0       SG  22.0    6-5   185.0   
4  Jonas Jerebko  Boston Celtics     8.0       PF  29.0   6-10   231.0   

             College     Salary  
0              Texas  7730337.0  
1          Marquette  6796117.0  
2  Boston University        NaN  
3      Georgia State  1148640.0  
4                NaN  5000000.0  


## Modificando e Criando Novas Colunas

Manipular as colunas de um DataFrame é uma tarefa comum e pode ser feita de diversas maneiras.

In [46]:
df_nba['Name_Pos'] = df_nba['Name'] + df_nba['Position'] #Adicionando uma nova coluna com base em uma operação

print(df_nba.head())

            Name            Team  Number Position   Age Height  Weight  \
0  Avery Bradley  Boston Celtics     0.0       PG  25.0    6-2   180.0   
1    Jae Crowder  Boston Celtics    99.0       SF  25.0    6-6   235.0   
2   John Holland  Boston Celtics    30.0       SG  27.0    6-5   205.0   
3    R.J. Hunter  Boston Celtics    28.0       SG  22.0    6-5   185.0   
4  Jonas Jerebko  Boston Celtics     8.0       PF  29.0   6-10   231.0   

             College     Salary         Name_Pos  
0              Texas  7730337.0  Avery BradleyPG  
1          Marquette  6796117.0    Jae CrowderSF  
2  Boston University        NaN   John HollandSG  
3      Georgia State  1148640.0    R.J. HunterSG  
4                NaN  5000000.0  Jonas JerebkoPF  


In [47]:
df_nba['Name_Pos'] = df_nba['Name'] + " " + df_nba['Position'] #Adicionando também valores constantes

print(df_nba.head())

            Name            Team  Number Position   Age Height  Weight  \
0  Avery Bradley  Boston Celtics     0.0       PG  25.0    6-2   180.0   
1    Jae Crowder  Boston Celtics    99.0       SF  25.0    6-6   235.0   
2   John Holland  Boston Celtics    30.0       SG  27.0    6-5   205.0   
3    R.J. Hunter  Boston Celtics    28.0       SG  22.0    6-5   185.0   
4  Jonas Jerebko  Boston Celtics     8.0       PF  29.0   6-10   231.0   

             College     Salary          Name_Pos  
0              Texas  7730337.0  Avery Bradley PG  
1          Marquette  6796117.0    Jae Crowder SF  
2  Boston University        NaN   John Holland SG  
3      Georgia State  1148640.0    R.J. Hunter SG  
4                NaN  5000000.0  Jonas Jerebko PF  


In [48]:
df_nba = df_nba.rename(columns={'Name_Pos': 'Name and Position'}) #renomeando colunas
print(df_nba.head())

            Name            Team  Number Position   Age Height  Weight  \
0  Avery Bradley  Boston Celtics     0.0       PG  25.0    6-2   180.0   
1    Jae Crowder  Boston Celtics    99.0       SF  25.0    6-6   235.0   
2   John Holland  Boston Celtics    30.0       SG  27.0    6-5   205.0   
3    R.J. Hunter  Boston Celtics    28.0       SG  22.0    6-5   185.0   
4  Jonas Jerebko  Boston Celtics     8.0       PF  29.0   6-10   231.0   

             College     Salary Name and Position  
0              Texas  7730337.0  Avery Bradley PG  
1          Marquette  6796117.0    Jae Crowder SF  
2  Boston University        NaN   John Holland SG  
3      Georgia State  1148640.0    R.J. Hunter SG  
4                NaN  5000000.0  Jonas Jerebko PF  


In [49]:
df_nba = df_nba.drop(columns=['Name and Position']) #eliminando colunas
print(df_nba.head())

            Name            Team  Number Position   Age Height  Weight  \
0  Avery Bradley  Boston Celtics     0.0       PG  25.0    6-2   180.0   
1    Jae Crowder  Boston Celtics    99.0       SF  25.0    6-6   235.0   
2   John Holland  Boston Celtics    30.0       SG  27.0    6-5   205.0   
3    R.J. Hunter  Boston Celtics    28.0       SG  22.0    6-5   185.0   
4  Jonas Jerebko  Boston Celtics     8.0       PF  29.0   6-10   231.0   

             College     Salary  
0              Texas  7730337.0  
1          Marquette  6796117.0  
2  Boston University        NaN  
3      Georgia State  1148640.0  
4                NaN  5000000.0  


In [76]:
df_nba['Weight_Kg'] = df_nba['Weight'].apply(lambda x: x * 0.453592) #Aplicando uma função a uma coluna
print(df_nba['Weight_Kg'].head())

0     81.646560
1    106.594120
2     92.986360
3     83.914520
4    104.779752
Name: Weight_Kg, dtype: float64


In [51]:
df_nba['Weight_Kg'] = df_nba['Weight']*0.453592 #Aplicando uma função a uma coluna sem usar o apply
print(df_nba['Weight_Kg'].head())

0     81.646560
1    106.594120
2     92.986360
3     83.914520
4    104.779752
Name: Weight_Kg, dtype: float64


In [52]:
import time

t = time.time()
for i in range(1000):
    df_nba['Weight_Kg'] = df_nba['Weight'].apply(lambda x: x * 0.453592)
print("Tempo com o apply:", time.time() - t)

t = time.time()
for i in range(1000):
    df_nba['Weight_Kg'] = df_nba['Weight']*0.453592
print("Tempo sem o apply:",time.time() - t)

#Tempos de execução utilizando e não utilizando o apply

Tempo com o apply: 0.44330430030822754
Tempo sem o apply: 0.26224660873413086


In [53]:
#import numpy as np

df_nba[['feet', 'inches']] = df_nba['Height'].str.split('-', expand=True).astype(int)
df_nba['feet_cm'] = df_nba['feet'] * 30.48
df_nba['inches_cm'] = df_nba['inches'] * 2.54
df_nba['height_cm'] = df_nba['feet_cm'] + df_nba['inches_cm']

df_nba.head()

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary,Weight_Kg,feet,inches,feet_cm,inches_cm,height_cm
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0,81.64656,6,2,182.88,5.08,187.96
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0,106.59412,6,6,182.88,15.24,198.12
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,,92.98636,6,5,182.88,12.7,195.58
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0,83.91452,6,5,182.88,12.7,195.58
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0,104.779752,6,10,182.88,25.4,208.28


In [54]:
df_nba.drop(['feet','inches','feet_cm','inches_cm'], axis =1, inplace = True)
df_nba.head()

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary,Weight_Kg,height_cm
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0,81.64656,187.96
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0,106.59412,198.12
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,,92.98636,195.58
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0,83.91452,195.58
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0,104.779752,208.28


### Modificando valores específicos

In [78]:
df_nba.at[3,"Age"]

22.0

In [80]:
df_nba.at[3,"Age"] = 25
df_nba.head()

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary,Weight_Kg
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0,81.64656
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0,106.59412
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,4842684.0,92.98636
3,R.J. Hunter,Boston Celtics,28.0,SG,25.0,6-5,185.0,Georgia State,1148640.0,83.91452
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,UFPR,5000000.0,104.779752


In [84]:
# Adiciona 2 à coluna 'idade' para linhas em que o time é o Boston Celtics
df_nba.loc[df_nba['Team'] == "Boston Celtics", 'Age'] += 2
df_nba.head()

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary,Weight_Kg
0,Avery Bradley,Boston Celtics,0.0,PG,27.0,6-2,180.0,Texas,7730337.0,81.64656
1,Jae Crowder,Boston Celtics,99.0,SF,27.0,6-6,235.0,Marquette,6796117.0,106.59412
2,John Holland,Boston Celtics,30.0,SG,29.0,6-5,205.0,Boston University,4842684.0,92.98636
3,R.J. Hunter,Boston Celtics,28.0,SG,27.0,6-5,185.0,Georgia State,1148640.0,83.91452
4,Jonas Jerebko,Boston Celtics,8.0,PF,31.0,6-10,231.0,UFPR,5000000.0,104.779752


## Reorganizando Dados

Reorganizar e ordenar os dados pode ser necessário para facilitar a análise ou visualização.

In [55]:
df_sorted = df_nba.sort_values(by='Position', ascending=True)  # Ordenação crescente, False caso precise ser decrescente
df_sorted.head()

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary,Weight_Kg,height_cm
456,Jeff Withey,Utah Jazz,24.0,C,26.0,7-0,231.0,Kansas,947276.0,104.779752,213.36
135,Alan Williams,Phoenix Suns,15.0,C,23.0,6-8,260.0,UC Santa Barbara,83397.0,117.93392,203.2
321,Tiago Splitter,Atlanta Hawks,11.0,C,31.0,6-11,245.0,,9756250.0,111.13004,210.82
128,Alex Len,Phoenix Suns,21.0,C,22.0,7-1,260.0,Maryland,3807120.0,117.93392,215.9
322,Walter Tavares,Atlanta Hawks,22.0,C,24.0,7-3,260.0,,1000000.0,117.93392,220.98


In [56]:
df_sorted = df_nba.sort_values(by=['Position', 'Age'], ascending=[True, False])  # Ordem mista, primeiro ordena a primeira coluna e depois a segunda
df_sorted.head()

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary,Weight_Kg,height_cm
298,Tim Duncan,San Antonio Spurs,21.0,C,40.0,6-11,250.0,Wake Forest,5250000.0,113.398,210.82
420,Nazr Mohammed,Oklahoma City Thunder,13.0,C,38.0,6-10,250.0,Kentucky,222888.0,113.398,208.28
296,Matt Bonner,San Antonio Spurs,15.0,C,36.0,6-10,235.0,Florida,947276.0,106.59412,208.28
156,Pau Gasol,Chicago Bulls,16.0,C,35.0,7-0,250.0,,7448760.0,113.398,213.36
297,Boris Diaw,San Antonio Spurs,33.0,C,34.0,6-8,250.0,,7500000.0,113.398,203.2


In [57]:
df_reset = df_sorted.reset_index(drop = True) #reseta os indexes
df_reset.head()

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary,Weight_Kg,height_cm
0,Tim Duncan,San Antonio Spurs,21.0,C,40.0,6-11,250.0,Wake Forest,5250000.0,113.398,210.82
1,Nazr Mohammed,Oklahoma City Thunder,13.0,C,38.0,6-10,250.0,Kentucky,222888.0,113.398,208.28
2,Matt Bonner,San Antonio Spurs,15.0,C,36.0,6-10,235.0,Florida,947276.0,106.59412,208.28
3,Pau Gasol,Chicago Bulls,16.0,C,35.0,7-0,250.0,,7448760.0,113.398,213.36
4,Boris Diaw,San Antonio Spurs,33.0,C,34.0,6-8,250.0,,7500000.0,113.398,203.2


In [58]:
df_reindexed = df_nba.set_index('Name') #torna a coluna escolhida em index
df_reindexed.head()

Unnamed: 0_level_0,Team,Number,Position,Age,Height,Weight,College,Salary,Weight_Kg,height_cm
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0,81.64656,187.96
Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0,106.59412,198.12
John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,,92.98636,195.58
R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0,83.91452,195.58
Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0,104.779752,208.28


In [59]:
df_team_pos= df_reindexed[['Team','Position']]
df_team_pos.head()

Unnamed: 0_level_0,Team,Position
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Avery Bradley,Boston Celtics,PG
Jae Crowder,Boston Celtics,SF
John Holland,Boston Celtics,SG
R.J. Hunter,Boston Celtics,SG
Jonas Jerebko,Boston Celtics,PF


### Transpondo o DataFrame

In [60]:
df_transposta = df_team_pos.T #transforma as linhas em colunas e as colunas em linhas
print(df_transposta)

Name       Avery Bradley     Jae Crowder    John Holland     R.J. Hunter  \
Team      Boston Celtics  Boston Celtics  Boston Celtics  Boston Celtics   
Position              PG              SF              SG              SG   

Name       Jonas Jerebko    Amir Johnson   Jordan Mickey    Kelly Olynyk  \
Team      Boston Celtics  Boston Celtics  Boston Celtics  Boston Celtics   
Position              PF              PF              PF               C   

Name        Terry Rozier    Marcus Smart  ... Rudy Gobert Gordon Hayward  \
Team      Boston Celtics  Boston Celtics  ...   Utah Jazz      Utah Jazz   
Position              PG              PG  ...           C             SF   

Name     Rodney Hood Joe Ingles Chris Johnson Trey Lyles Shelvin Mack  \
Team       Utah Jazz  Utah Jazz     Utah Jazz  Utah Jazz    Utah Jazz   
Position          SG         SF            SF         PF           PG   

Name      Raul Neto Tibor Pleiss Jeff Withey  
Team      Utah Jazz    Utah Jazz   Utah Jazz  

## Fique Conectado

- [![YouTube](https://img.icons8.com/?size=40&id=19318&format=png&color=000000)](https://www.youtube.com/@LigaDataScience/videos)  
  Explore nossos vídeos educacionais e webinars sobre ciência de dados, machine learning e inteligência artificial. Inscreva-se para não perder nenhuma atualização!

- [![LinkedIn](https://img.icons8.com/?size=40&id=13930&format=png&color=000000)](https://www.linkedin.com/company/liga-data-science-ufpr/)  
  Siga-nos no LinkedIn para as últimas novidades, oportunidades de carreira e networking profissional no campo da ciência de dados.

- [![Instagram](https://img.icons8.com/?size=40&id=32323&format=png&color=000000)](https://www.instagram.com/ligadatascience/)  
  Confira nosso Instagram para conteúdos dos bastidores, destaques de eventos e o dia a dia da Liga Data Science. Faça parte da nossa jornada!
  
  ## Autores

<a href="https://www.linkedin.com/in/eduardopecora/" target="_blank">Eduardo Pecora</a>

<a href="https://www.linkedin.com/in/jo%C3%A3o-gabriel-santin-botelho-618244222/" target="_blank">João Gabriel Santin Botelho</a>

## Log de modificações

| Data | Versão | Modificado por | Descrição |
| ----------------- | ------- | ---------- | ---------------------------------- |
| 29-08-2024       | 1.0     | Eduardo Pecora & João Gabriel| Inicial               |

<hr>

## <h3 align="center"> (c) Liga Data Science/ UFPR 2024. All rights reserved. <h3/>