# Pandas

- Pandas es una librería para manipular y analizar datos.
- Es una interfaz de alto nivel para procesar datos tabulares en varias formatos.
- Proporciona una amplia gama de herramientas para leer y procesar datos.
- Está diseñado para ser rápido, eficiente y fácil de usar.

In [1]:
import numpy as np
import pandas as pd
from datetime import date

## Cargar datos des un archivo
### cargar un archivo csv

In [154]:
df = pd.read_csv('nba_players.csv')
df
# Tambien se pueden cargar solo cietos campos
#df = pd.read_csv('nba_players.csv', usecols=['Name', 'Age'])
#df.head()


Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
...,...,...,...,...,...,...,...,...,...
452,Trey Lyles,Utah Jazz,41.0,PF,20.0,6-10,234.0,Kentucky,2239800.0
453,Shelvin Mack,Utah Jazz,8.0,PG,26.0,6-3,203.0,Butler,2433333.0
454,Raul Neto,Utah Jazz,25.0,PG,24.0,6-1,179.0,,900000.0
455,Tibor Pleiss,Utah Jazz,21.0,C,26.0,7-3,256.0,,2900000.0


### Lectura de datos de un dataframe

In [13]:
'''
Muestra las cabeceres del dataframe df
'''
print(df.columns)

Index(['Name', 'Team', 'Number', 'Position', 'Age', 'Height', 'Weight',
       'College', 'Salary'],
      dtype='object')


In [22]:
'''
Muestra las columnas Name y Age del dataframe df
'''
df[['Name','Age']]

Unnamed: 0,Name,Age
0,Avery Bradley,25.0
1,Jae Crowder,25.0
2,John Holland,27.0
3,R.J. Hunter,22.0
4,Jonas Jerebko,29.0
...,...,...
452,Trey Lyles,20.0
453,Shelvin Mack,26.0
454,Raul Neto,24.0
455,Tibor Pleiss,26.0


In [26]:
'''
Muestra las primeras 5 filas del dataframe df
'''
df.head()

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0


In [27]:
'''
Muestra 10 filas del dataframe df
'''
df.iloc[10:20]

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
10,Jared Sullinger,Boston Celtics,7.0,C,24.0,6-9,260.0,Ohio State,2569260.0
11,Isaiah Thomas,Boston Celtics,4.0,PG,27.0,5-9,185.0,Washington,6912869.0
12,Evan Turner,Boston Celtics,11.0,SG,27.0,6-7,220.0,Ohio State,3425510.0
13,James Young,Boston Celtics,13.0,SG,20.0,6-6,215.0,Kentucky,1749840.0
14,Tyler Zeller,Boston Celtics,44.0,C,26.0,7-0,253.0,North Carolina,2616975.0
15,Bojan Bogdanovic,Brooklyn Nets,44.0,SG,27.0,6-8,216.0,,3425510.0
16,Markel Brown,Brooklyn Nets,22.0,SG,24.0,6-3,190.0,Oklahoma State,845059.0
17,Wayne Ellington,Brooklyn Nets,21.0,SG,28.0,6-4,200.0,North Carolina,1500000.0
18,Rondae Hollis-Jefferson,Brooklyn Nets,24.0,SG,21.0,6-7,220.0,Arizona,1335480.0
19,Jarrett Jack,Brooklyn Nets,2.0,PG,32.0,6-3,200.0,Georgia Tech,6300000.0


### Mostrar las filas en función de una condición

In [43]:
'''
Muestra las primeras los jugadores que tengan una edad igual o mayor a 39
'''
df[df['Age'] >= 39]

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
102,Pablo Prigioni,Los Angeles Clippers,9.0,PG,39.0,6-3,185.0,,947726.0
261,Vince Carter,Memphis Grizzlies,15.0,SG,39.0,6-6,220.0,North Carolina,4088019.0
298,Tim Duncan,San Antonio Spurs,21.0,C,40.0,6-11,250.0,Wake Forest,5250000.0
304,Andre Miller,San Antonio Spurs,24.0,PG,40.0,6-3,200.0,Utah,250750.0
400,Kevin Garnett,Minnesota Timberwolves,21.0,PF,40.0,6-11,240.0,,8500000.0


In [46]:
'''
Muestra los jugadores con un salario mayor a 900000.0
'''
df[df['Salary'] > 900000]

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
5,Amir Johnson,Boston Celtics,90.0,PF,29.0,6-9,240.0,,12000000.0
...,...,...,...,...,...,...,...,...,...
451,Chris Johnson,Utah Jazz,23.0,SF,26.0,6-6,206.0,Dayton,981348.0
452,Trey Lyles,Utah Jazz,41.0,PF,20.0,6-10,234.0,Kentucky,2239800.0
453,Shelvin Mack,Utah Jazz,8.0,PG,26.0,6-3,203.0,Butler,2433333.0
455,Tibor Pleiss,Utah Jazz,21.0,C,26.0,7-3,256.0,,2900000.0


In [50]:
'''
Muestra los jugadores de los Minnesota Timberwolves
'''
df[ df['Team'] == 'Minnesota Timberwolves']

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
398,Nemanja Bjelica,Minnesota Timberwolves,88.0,PF,28.0,6-10,240.0,,3950001.0
399,Gorgui Dieng,Minnesota Timberwolves,5.0,C,26.0,6-11,241.0,Louisville,1474440.0
400,Kevin Garnett,Minnesota Timberwolves,21.0,PF,40.0,6-11,240.0,,8500000.0
401,Tyus Jones,Minnesota Timberwolves,1.0,PG,20.0,6-2,195.0,Duke,1282080.0
402,Zach LaVine,Minnesota Timberwolves,8.0,PG,21.0,6-5,189.0,UCLA,2148360.0
403,Shabazz Muhammad,Minnesota Timberwolves,15.0,SF,23.0,6-6,223.0,UCLA,2056920.0
404,Adreian Payne,Minnesota Timberwolves,33.0,PF,25.0,6-10,237.0,Michigan State,1938840.0
405,Nikola Pekovic,Minnesota Timberwolves,14.0,C,30.0,6-11,307.0,,12100000.0
406,Tayshaun Prince,Minnesota Timberwolves,12.0,SF,36.0,6-9,212.0,Kentucky,947276.0
407,Ricky Rubio,Minnesota Timberwolves,9.0,PG,25.0,6-4,194.0,,12700000.0


#### Operaciones con los datos

In [51]:
df.describe()

Unnamed: 0,Number,Age,Weight,Salary
count,457.0,457.0,457.0,446.0
mean,17.678337,26.938731,221.522976,4842684.0
std,15.96609,4.404016,26.368343,5229238.0
min,0.0,19.0,161.0,30888.0
25%,5.0,24.0,200.0,1044792.0
50%,13.0,26.0,220.0,2839073.0
75%,25.0,30.0,240.0,6500000.0
max,99.0,40.0,307.0,25000000.0


In [53]:
'''
Muestra el dataframe ordenado por el atributo Name
'''
df.sort_values('Name')

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
152,Aaron Brooks,Chicago Bulls,0.0,PG,31.0,6-0,161.0,Oregon,2250000.0
356,Aaron Gordon,Orlando Magic,0.0,PF,20.0,6-9,220.0,Arizona,4171680.0
328,Aaron Harrison,Charlotte Hornets,9.0,SG,21.0,6-6,210.0,Kentucky,525093.0
404,Adreian Payne,Minnesota Timberwolves,33.0,PF,25.0,6-10,237.0,Michigan State,1938840.0
312,Al Horford,Atlanta Hawks,15.0,C,30.0,6-10,245.0,Florida,12000000.0
...,...,...,...,...,...,...,...,...,...
386,Wilson Chandler,Denver Nuggets,21.0,SF,29.0,6-8,225.0,DePaul,10449438.0
270,Xavier Munford,Memphis Grizzlies,14.0,PG,24.0,6-3,180.0,Rhode Island,
402,Zach LaVine,Minnesota Timberwolves,8.0,PG,21.0,6-5,189.0,UCLA,2148360.0
271,Zach Randolph,Memphis Grizzlies,50.0,PF,34.0,6-9,260.0,Michigan State,9638555.0


In [54]:
df.sort_values('Name', ascending=False)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
237,Zaza Pachulia,Dallas Mavericks,27.0,C,32.0,6-11,275.0,,5200000.0
271,Zach Randolph,Memphis Grizzlies,50.0,PF,34.0,6-9,260.0,Michigan State,9638555.0
402,Zach LaVine,Minnesota Timberwolves,8.0,PG,21.0,6-5,189.0,UCLA,2148360.0
270,Xavier Munford,Memphis Grizzlies,14.0,PG,24.0,6-3,180.0,Rhode Island,
386,Wilson Chandler,Denver Nuggets,21.0,SF,29.0,6-8,225.0,DePaul,10449438.0
...,...,...,...,...,...,...,...,...,...
312,Al Horford,Atlanta Hawks,15.0,C,30.0,6-10,245.0,Florida,12000000.0
404,Adreian Payne,Minnesota Timberwolves,33.0,PF,25.0,6-10,237.0,Michigan State,1938840.0
328,Aaron Harrison,Charlotte Hornets,9.0,SG,21.0,6-6,210.0,Kentucky,525093.0
356,Aaron Gordon,Orlando Magic,0.0,PF,20.0,6-9,220.0,Arizona,4171680.0


In [56]:
'''
Muestra el dataframe ordenado por el atributo Name y Team
'''
df.sort_values(['Name', 'Team'])

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
152,Aaron Brooks,Chicago Bulls,0.0,PG,31.0,6-0,161.0,Oregon,2250000.0
356,Aaron Gordon,Orlando Magic,0.0,PF,20.0,6-9,220.0,Arizona,4171680.0
328,Aaron Harrison,Charlotte Hornets,9.0,SG,21.0,6-6,210.0,Kentucky,525093.0
404,Adreian Payne,Minnesota Timberwolves,33.0,PF,25.0,6-10,237.0,Michigan State,1938840.0
312,Al Horford,Atlanta Hawks,15.0,C,30.0,6-10,245.0,Florida,12000000.0
...,...,...,...,...,...,...,...,...,...
386,Wilson Chandler,Denver Nuggets,21.0,SF,29.0,6-8,225.0,DePaul,10449438.0
270,Xavier Munford,Memphis Grizzlies,14.0,PG,24.0,6-3,180.0,Rhode Island,
402,Zach LaVine,Minnesota Timberwolves,8.0,PG,21.0,6-5,189.0,UCLA,2148360.0
271,Zach Randolph,Memphis Grizzlies,50.0,PF,34.0,6-9,260.0,Michigan State,9638555.0


In [58]:
df.sort_values(['Team', 'Name'])

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
312,Al Horford,Atlanta Hawks,15.0,C,30.0,6-10,245.0,Florida,12000000.0
318,Dennis Schroder,Atlanta Hawks,17.0,PG,22.0,6-1,172.0,,1763400.0
323,Jeff Teague,Atlanta Hawks,0.0,PG,27.0,6-2,186.0,Wake Forest,8000000.0
309,Kent Bazemore,Atlanta Hawks,24.0,SF,26.0,6-5,201.0,Old Dominion,2000000.0
311,Kirk Hinrich,Atlanta Hawks,12.0,SG,35.0,6-4,190.0,Kansas,2854940.0
...,...,...,...,...,...,...,...,...,...
381,Marcus Thornton,Washington Wizards,15.0,SF,29.0,6-4,205.0,LSU,200600.0
376,Markieff Morris,Washington Wizards,5.0,PF,26.0,6-10,245.0,Kansas,8000000.0
375,Nene Hilario,Washington Wizards,42.0,C,33.0,6-11,250.0,,13000000.0
378,Otto Porter Jr.,Washington Wizards,22.0,SF,23.0,6-8,198.0,Georgetown,4662960.0


In [62]:
'''
Ordena alfabetiticamente los equipos y dentro de cada equipo ordena alfabeticamente por nombre (descendente)
'''
df.sort_values(['Team', 'Name'], ascending=[True,False])

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
322,Walter Tavares,Atlanta Hawks,22.0,C,24.0,7-3,260.0,,1000000.0
310,Tim Hardaway Jr.,Atlanta Hawks,10.0,SG,24.0,6-6,205.0,Michigan,1304520.0
321,Tiago Splitter,Atlanta Hawks,11.0,C,31.0,6-11,245.0,,9756250.0
320,Thabo Sefolosha,Atlanta Hawks,25.0,SF,32.0,6-7,220.0,,4000000.0
315,Paul Millsap,Atlanta Hawks,4.0,PF,31.0,6-8,246.0,Louisiana Tech,18671659.0
...,...,...,...,...,...,...,...,...,...
374,JJ Hickson,Washington Wizards,21.0,C,27.0,6-9,242.0,North Carolina State,273038.0
380,Garrett Temple,Washington Wizards,17.0,SG,30.0,6-6,195.0,LSU,1100602.0
372,Drew Gooden,Washington Wizards,90.0,PF,34.0,6-10,250.0,Kansas,3300000.0
369,Bradley Beal,Washington Wizards,3.0,SG,22.0,6-5,207.0,Florida,5694674.0


#### Modificar el dataframe

In [68]:
'''
Añade una columna al dataframe df con el año de nacimiento del jugador
'''


df['Nacimiento'] = date.today().year - df['Age']

df['Nacimiento'] = df['Nacimiento'].astype(int)
df.head()

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary,Nacimiento
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0,1997
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0,1997
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,,1995
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0,2000
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0,1993


In [82]:
'''
Elimina la columna Nacimiento del dataframe df
'''
df.drop('Nacimiento', axis=1, inplace=True)
# el inplace es lo mismo que la sentencia de a continuación
#df = df.drop('Nacimiento', axis=1)

#### Reordenar columnas

In [71]:
# Ejemplo de reordenamiento de columnas
df1 = df[['Name', 'Position', 'Team', 'Salary','Number', 'Age','Weight', 'Height', 'College']]
df1.head()

Unnamed: 0,Name,Position,Team,Salary,Number,Age,Weight,Height,College
0,Avery Bradley,PG,Boston Celtics,7730337.0,0.0,25.0,180.0,6-2,Texas
1,Jae Crowder,SF,Boston Celtics,6796117.0,99.0,25.0,235.0,6-6,Marquette
2,John Holland,SG,Boston Celtics,,30.0,27.0,205.0,6-5,Boston University
3,R.J. Hunter,SG,Boston Celtics,1148640.0,28.0,22.0,185.0,6-5,Georgia State
4,Jonas Jerebko,PF,Boston Celtics,5000000.0,8.0,29.0,231.0,6-10,


In [77]:
'''
Reordena las columnas del dataframe df2 (elige tú el orden que quieras)
'''
df2 = df[['Name', 'Position', 'Team', 'Salary','Number', 'Age','Weight', 'Height', 'College']]
df2.head()

Unnamed: 0,Name,Position,Team,Salary,Number,Age,Weight,Height,College
0,Avery Bradley,PG,Boston Celtics,7730337.0,0.0,25.0,180.0,6-2,Texas
1,Jae Crowder,SF,Boston Celtics,6796117.0,99.0,25.0,235.0,6-6,Marquette
2,John Holland,SG,Boston Celtics,,30.0,27.0,205.0,6-5,Boston University
3,R.J. Hunter,SG,Boston Celtics,1148640.0,28.0,22.0,185.0,6-5,Georgia State
4,Jonas Jerebko,PF,Boston Celtics,5000000.0,8.0,29.0,231.0,6-10,


In [83]:
df.head()

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0


#### Guardar los datos en un archivo csv

In [89]:
# guarda el dataframe df2 en un archivo csv
df2.to_csv('nba_players2.csv', sep=';', index=False)

#### Filtrando datos

In [90]:
'''
Carga el dataframe de jugadores en df
filtra los jugadores que tengan de la universidad de Minnesota
'''
df = pd.read_csv('nba_players2.csv', sep=';')
df.head()

Unnamed: 0,Name,Position,Team,Salary,Number,Age,Weight,Height,College
0,Avery Bradley,PG,Boston Celtics,7730337.0,0.0,25.0,180.0,6-2,Texas
1,Jae Crowder,SF,Boston Celtics,6796117.0,99.0,25.0,235.0,6-6,Marquette
2,John Holland,SG,Boston Celtics,,30.0,27.0,205.0,6-5,Boston University
3,R.J. Hunter,SG,Boston Celtics,1148640.0,28.0,22.0,185.0,6-5,Georgia State
4,Jonas Jerebko,PF,Boston Celtics,5000000.0,8.0,29.0,231.0,6-10,


In [95]:
'''
Filtra de los Atlanta Hawks los jugadores con mas de 30 años
'''
df[(df['Team'] == 'Atlanta Hawks') & (df['Age'] > 30)]

Unnamed: 0,Name,Position,Team,Salary,Number,Age,Weight,Height,College
311,Kirk Hinrich,SG,Atlanta Hawks,2854940.0,12.0,35.0,190.0,6-4,Kansas
313,Kris Humphries,PF,Atlanta Hawks,1000000.0,43.0,31.0,235.0,6-9,Minnesota
314,Kyle Korver,SG,Atlanta Hawks,5746479.0,26.0,35.0,212.0,6-7,Creighton
315,Paul Millsap,PF,Atlanta Hawks,18671659.0,4.0,31.0,246.0,6-8,Louisiana Tech
320,Thabo Sefolosha,SF,Atlanta Hawks,4000000.0,25.0,32.0,220.0,6-7,
321,Tiago Splitter,C,Atlanta Hawks,9756250.0,11.0,31.0,245.0,6-11,


## Expresiones regulares


Las expresiones regulares son una forma de escribir un patrón de caracteres que se puede usar para buscar una cadena de texto.


In [98]:
'''
Busca los jugadores que se llamen James
'''
df[df['Name'].str.contains('James')]

Unnamed: 0,Name,Position,Team,Salary,Number,Age,Weight,Height,College
13,James Young,SG,Boston Celtics,1749840.0,13.0,20.0,215.0,6-6,Kentucky
65,James Johnson,PF,Toronto Raptors,2500000.0,3.0,29.0,250.0,6-9,Wake Forest
86,James Michael McAdoo,SF,Golden State Warriors,845059.0,20.0,23.0,240.0,6-9,North Carolina
137,James Anderson,SG,Sacramento Kings,1015421.0,5.0,27.0,213.0,6-6,Oklahoma State
169,LeBron James,SF,Cleveland Cavaliers,22970500.0,23.0,31.0,250.0,6-8,
172,James Jones,SG,Cleveland Cavaliers,947276.0,1.0,35.0,218.0,6-8,Miami (FL)
249,James Harden,SG,Houston Rockets,15756438.0,13.0,26.0,220.0,6-5,Arizona State
284,James Ennis,SF,New Orleans Pelicans,845059.0,4.0,25.0,210.0,6-7,Long Beach State


In [102]:
'''
Busca un nombre sin tener en cuenta la capitalización
'''
nombre = 'james Anderson'
df[df['Name'].str.lower() == nombre.lower()]

Unnamed: 0,Name,Position,Team,Salary,Number,Age,Weight,Height,College
137,James Anderson,SG,Sacramento Kings,1015421.0,5.0,27.0,213.0,6-6,Oklahoma State


In [116]:
'''
Busca los jugadores que se llamen James como primer nombre
'''
import re
df[df['Name'].str.contains(r'James\s', regex=True)]

Unnamed: 0,Name,Position,Team,Salary,Number,Age,Weight,Height,College
13,James Young,SG,Boston Celtics,1749840.0,13.0,20.0,215.0,6-6,Kentucky
65,James Johnson,PF,Toronto Raptors,2500000.0,3.0,29.0,250.0,6-9,Wake Forest
86,James Michael McAdoo,SF,Golden State Warriors,845059.0,20.0,23.0,240.0,6-9,North Carolina
137,James Anderson,SG,Sacramento Kings,1015421.0,5.0,27.0,213.0,6-6,Oklahoma State
172,James Jones,SG,Cleveland Cavaliers,947276.0,1.0,35.0,218.0,6-8,Miami (FL)
249,James Harden,SG,Houston Rockets,15756438.0,13.0,26.0,220.0,6-5,Arizona State
284,James Ennis,SF,New Orleans Pelicans,845059.0,4.0,25.0,210.0,6-7,Long Beach State


In [121]:
'''
Busca los jugadores que se llamen James como primer nombre (ignora mayusculas)
'''
df[df['Name'].str.contains(r'james\s', regex=True, flags=re.IGNORECASE)]

Unnamed: 0,Name,Position,Team,Salary,Number,Age,Weight,Height,College
13,James Young,SG,Boston Celtics,1749840.0,13.0,20.0,215.0,6-6,Kentucky
65,James Johnson,PF,Toronto Raptors,2500000.0,3.0,29.0,250.0,6-9,Wake Forest
86,James Michael McAdoo,SF,Golden State Warriors,845059.0,20.0,23.0,240.0,6-9,North Carolina
137,James Anderson,SG,Sacramento Kings,1015421.0,5.0,27.0,213.0,6-6,Oklahoma State
172,James Jones,SG,Cleveland Cavaliers,947276.0,1.0,35.0,218.0,6-8,Miami (FL)
249,James Harden,SG,Houston Rockets,15756438.0,13.0,26.0,220.0,6-5,Arizona State
284,James Ennis,SF,New Orleans Pelicans,845059.0,4.0,25.0,210.0,6-7,Long Beach State


7

## Cambios condicionales

In [135]:
'''
Busca los jugadores del Boston Celtics y cambialo por Boston Celtics CocaCola
'''
df.loc[df['Team'] == 'Boston Celtics', 'Team' ] = 'Boston Celtics CocaCola'
df[df['Team'].str.contains(r'Celtics\s', regex=True)]

df.loc[df['Team'] == 'Boston Celtics CocaCola', 'Team' ] = 'Boston Celtics'
df[df['Team'].str.contains(r'^Boston', regex=True)]

Unnamed: 0,Name,Position,Team,Salary,Number,Age,Weight,Height,College
0,Avery Bradley,PG,Boston Celtics,7730337.0,0.0,25.0,180.0,6-2,Texas
1,Jae Crowder,SF,Boston Celtics,6796117.0,99.0,25.0,235.0,6-6,Marquette
2,John Holland,SG,Boston Celtics,,30.0,27.0,205.0,6-5,Boston University
3,R.J. Hunter,SG,Boston Celtics,1148640.0,28.0,22.0,185.0,6-5,Georgia State
4,Jonas Jerebko,PF,Boston Celtics,5000000.0,8.0,29.0,231.0,6-10,
5,Amir Johnson,PF,Boston Celtics,12000000.0,90.0,29.0,240.0,6-9,
6,Jordan Mickey,PF,Boston Celtics,1170960.0,55.0,21.0,235.0,6-8,LSU
7,Kelly Olynyk,C,Boston Celtics,2165160.0,41.0,25.0,238.0,7-0,Gonzaga
8,Terry Rozier,PG,Boston Celtics,1824360.0,12.0,22.0,190.0,6-2,Louisville
9,Marcus Smart,PG,Boston Celtics,3431040.0,36.0,22.0,220.0,6-4,Oklahoma State


## Funciones de agregación

In [140]:
'''
Agrupa los jugadores por universidad y suma su salario
'''
resultado = df.groupby('College')['Salary'].sum()
resultado

College
Alabama              4265059.0
Arizona             43237322.0
Arizona State       15867882.0
Arkansas             8139540.0
Baylor                981348.0
                       ...    
Western Michigan      845059.0
Wichita State         845059.0
Wisconsin            9872459.0
Wyoming              1155600.0
Xavier               1499187.0
Name: Salary, Length: 118, dtype: float64

In [143]:
'''
Agrupa los jugadores por universidad y suma su salario y ordenalos por salario
'''
resultado = df.groupby('College')['Salary'].sum().sort_values(ascending=False)
resultado


College
Kentucky             137706556.0
UCLA                  81259810.0
Texas                 80712010.0
Duke                  79979176.0
Florida               65335712.0
                        ...     
Iowa State              169883.0
UC Santa Barbara        139119.0
Boston University            0.0
Detroit                      0.0
Rhode Island                 0.0
Name: Salary, Length: 118, dtype: float64

In [152]:
'''
Agrupamos los jugadores por universidad y contabilizamos cuantos jugadores hay en cada una
'''

resultado = df.groupby('College').Name.count().sort_values(ascending=False)
resultado


College
Kentucky          22
Duke              20
Kansas            18
North Carolina    16
UCLA              15
                  ..
Utah Valley        1
IUPUI              1
Houston            1
Harvard            1
Xavier             1
Name: Name, Length: 118, dtype: int64

### Trabajando con cantidades grandes de datos

In [3]:
import pandas as pd
for df in pd.read_csv('nba_players.csv', chunksize=5):
    print(df)
    


            Name            Team  Number Position   Age Height  Weight  \
0  Avery Bradley  Boston Celtics     0.0       PG  25.0    6-2   180.0   
1    Jae Crowder  Boston Celtics    99.0       SF  25.0    6-6   235.0   
2   John Holland  Boston Celtics    30.0       SG  27.0    6-5   205.0   
3    R.J. Hunter  Boston Celtics    28.0       SG  22.0    6-5   185.0   
4  Jonas Jerebko  Boston Celtics     8.0       PF  29.0   6-10   231.0   

             College     Salary  
0              Texas  7730337.0  
1          Marquette  6796117.0  
2  Boston University        NaN  
3      Georgia State  1148640.0  
4                NaN  5000000.0  
            Name            Team  Number Position   Age Height  Weight  \
5   Amir Johnson  Boston Celtics    90.0       PF  29.0    6-9   240.0   
6  Jordan Mickey  Boston Celtics    55.0       PF  21.0    6-8   235.0   
7   Kelly Olynyk  Boston Celtics    41.0        C  25.0    7-0   238.0   
8   Terry Rozier  Boston Celtics    12.0       PG  22.0

                    Name              Team  Number Position   Age Height  \
140          Omri Casspi  Sacramento Kings    18.0       SF  27.0    6-9   
141  Willie Cauley-Stein  Sacramento Kings     0.0        C  22.0    7-0   
142      Darren Collison  Sacramento Kings     7.0       PG  28.0    6-0   
143     DeMarcus Cousins  Sacramento Kings    15.0        C  25.0   6-11   
144           Seth Curry  Sacramento Kings    30.0       SG  25.0    6-2   

     Weight   College      Salary  
140   225.0       NaN   2836186.0  
141   240.0  Kentucky   3398280.0  
142   175.0      UCLA   5013559.0  
143   270.0  Kentucky  15851950.0  
144   185.0      Duke    947276.0  
              Name              Team  Number Position   Age Height  Weight  \
145     Duje Dukan  Sacramento Kings    26.0       PF  24.0    6-9   220.0   
146       Rudy Gay  Sacramento Kings     8.0       SF  29.0    6-8   230.0   
147   Kosta Koufos  Sacramento Kings    41.0        C  27.0    7-0   265.0   
148   Ben McLem

294   240.0             Texas  19689000.0  
              Name               Team  Number Position   Age Height  Weight  \
295  Kyle Anderson  San Antonio Spurs     1.0       SF  22.0    6-9   230.0   
296    Matt Bonner  San Antonio Spurs    15.0        C  36.0   6-10   235.0   
297     Boris Diaw  San Antonio Spurs    33.0        C  34.0    6-8   250.0   
298     Tim Duncan  San Antonio Spurs    21.0        C  40.0   6-11   250.0   
299  Manu Ginobili  San Antonio Spurs    20.0       SG  38.0    6-6   205.0   

         College     Salary  
295         UCLA  1142880.0  
296      Florida   947276.0  
297          NaN  7500000.0  
298  Wake Forest  5250000.0  
299          NaN  2814000.0  
                 Name               Team  Number Position   Age Height  \
300       Danny Green  San Antonio Spurs    14.0       SG  28.0    6-6   
301     Kawhi Leonard  San Antonio Spurs     2.0       SF  24.0    6-7   
302  Boban Marjanovic  San Antonio Spurs    40.0        C  27.0    7-3   
303  

In [8]:
for df in pd.read_csv('nba_players.csv', chunksize=5):
    df['Name'] = df['Name'].str.upper()
    print(df)

            Name            Team  Number Position   Age Height  Weight  \
0  AVERY BRADLEY  Boston Celtics     0.0       PG  25.0    6-2   180.0   
1    JAE CROWDER  Boston Celtics    99.0       SF  25.0    6-6   235.0   
2   JOHN HOLLAND  Boston Celtics    30.0       SG  27.0    6-5   205.0   
3    R.J. HUNTER  Boston Celtics    28.0       SG  22.0    6-5   185.0   
4  JONAS JEREBKO  Boston Celtics     8.0       PF  29.0   6-10   231.0   

             College     Salary  
0              Texas  7730337.0  
1          Marquette  6796117.0  
2  Boston University        NaN  
3      Georgia State  1148640.0  
4                NaN  5000000.0  
            Name            Team  Number Position   Age Height  Weight  \
5   AMIR JOHNSON  Boston Celtics    90.0       PF  29.0    6-9   240.0   
6  JORDAN MICKEY  Boston Celtics    55.0       PF  21.0    6-8   235.0   
7   KELLY OLYNYK  Boston Celtics    41.0        C  25.0    7-0   238.0   
8   TERRY ROZIER  Boston Celtics    12.0       PG  22.0

                    Name              Team  Number Position   Age Height  \
140          OMRI CASSPI  Sacramento Kings    18.0       SF  27.0    6-9   
141  WILLIE CAULEY-STEIN  Sacramento Kings     0.0        C  22.0    7-0   
142      DARREN COLLISON  Sacramento Kings     7.0       PG  28.0    6-0   
143     DEMARCUS COUSINS  Sacramento Kings    15.0        C  25.0   6-11   
144           SETH CURRY  Sacramento Kings    30.0       SG  25.0    6-2   

     Weight   College      Salary  
140   225.0       NaN   2836186.0  
141   240.0  Kentucky   3398280.0  
142   175.0      UCLA   5013559.0  
143   270.0  Kentucky  15851950.0  
144   185.0      Duke    947276.0  
              Name              Team  Number Position   Age Height  Weight  \
145     DUJE DUKAN  Sacramento Kings    26.0       PF  24.0    6-9   220.0   
146       RUDY GAY  Sacramento Kings     8.0       SF  29.0    6-8   230.0   
147   KOSTA KOUFOS  Sacramento Kings    41.0        C  27.0    7-0   265.0   
148   BEN MCLEM

                  Name                  Team  Number Position   Age Height  \
290       JRUE HOLIDAY  New Orleans Pelicans    11.0       PG  25.0    6-4   
291    ORLANDO JOHNSON  New Orleans Pelicans     0.0       SG  27.0    6-5   
292   KENDRICK PERKINS  New Orleans Pelicans     5.0        C  31.0   6-10   
293   QUINCY PONDEXTER  New Orleans Pelicans    20.0       SF  28.0    6-7   
294  LAMARCUS ALDRIDGE     San Antonio Spurs    12.0       PF  30.0   6-11   

     Weight           College      Salary  
290   205.0              UCLA  10595507.0  
291   220.0  UC Santa Barbara     55722.0  
292   270.0               NaN    947276.0  
293   220.0        Washington   3382023.0  
294   240.0             Texas  19689000.0  
              Name               Team  Number Position   Age Height  Weight  \
295  KYLE ANDERSON  San Antonio Spurs     1.0       SF  22.0    6-9   230.0   
296    MATT BONNER  San Antonio Spurs    15.0        C  36.0   6-10   235.0   
297     BORIS DIAW  San Antoni

             Name       Team  Number Position   Age Height  Weight College  \
455  TIBOR PLEISS  Utah Jazz    21.0        C  26.0    7-3   256.0     NaN   
456   JEFF WITHEY  Utah Jazz    24.0        C  26.0    7-0   231.0  Kansas   

        Salary  
455  2900000.0  
456   947276.0  


## Creación de objetos
### Series
- Una serie es un array unidimensional de datos que puede contener cualquier tipo de datos.
- Las etiquetas también son conocidas como indices.

In [21]:
'''
Define una serie de pandas con los valores [1,3,5,7,6,8]  y los indices [a,b,c,d,e,f]
'''
s = pd.Series([1,3,5,7,6,8], index=['a','b','c','d','e','f'])
s

a    1
b    3
c    5
d    7
e    6
f    8
dtype: int64

In [17]:
'''
Define una serie con indices int y valores string
'''
s = pd.Series(['inicio', 'b', 'c', 'd', 'final'],index=[1,2,3,4,5])
s

1    inicio
2         b
3         c
4         d
5     final
dtype: object

In [22]:
'''
Define el mismo array que anterior pero ahora no indiques los indices.
'''
s = pd.Series([1,3,5,7,6,8])
s

0    1
1    3
2    5
3    7
4    6
5    8
dtype: int64

Como se puede ver, la serie es un array de datos y al no indicar una etiqueta, se asigna una etiqueta por defecto.

Ejemplo:
s = pd.Series(data, index=index)
data puede ser del tipo:
- Diccionario python
- ndarray
- un escalar




In [23]:
'''
Define una serie a pasando por parámetro un diccionario con los valores y los indices.
'''
data = {'a': 1, 'b': 3, 'c': 5, 'd': 7, 'e': 6, 'f': 8}
s = pd.Series(data)
s

a    1
b    3
c    5
d    7
e    6
f    8
dtype: int64

In [25]:
'''
Define una serie a1 pasando por parámetro un ndarray con los valores y opcionalmente los indices.
'''
a1 = pd.Series(np.array([1,2,3,4,5,6]))
print(a1)

0    1
1    2
2    3
3    4
4    5
5    6
dtype: int64


In [33]:
'''
Define una serie a2 pasando por parámetro un escalar.
'''
a2 = pd.Series(0, index=[1,2,3,4,5,6,7,8,9])
a2

# Si no pasamos los valores del index, solo rellena el primer valor
a2 = pd.Series(5)
a2

0    5
dtype: int64

#### Series como Arrays

Las Series actuan muy parecidas a un ndarray, y son un argumento válido para muchas funciones de NumPy. Sin embargo, operaciones como slicing también seleccionarán el índice.

In [37]:
'''
Dada la serie b, obtén el valor de la posición 'b'.
'''
b = pd.Series([1, 3, 5, 7, 6, 8], index=['a', 'b', 'c', 'd', 'e', 'f'])
print(b)
print('valor posición b:',b['b'])

a    1
b    3
c    5
d    7
e    6
f    8
dtype: int64
valor posición b: 3


In [58]:
'''
Dada la serie b3, obtén el valor de las tres últimas posiciones 
'''
b3 = pd.Series([1, 3, 5, 4, 7, 6, 8], index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
print(b3)

# b3[-3:]
# b3[-3:7]
# b3[4:]
b3[4:7]

a    1
b    3
c    5
d    4
e    7
f    6
g    8
dtype: int64


e    7
f    6
g    8
dtype: int64

In [61]:
'''
Dada la serie b4, obten los elementos que sean mayores o iguales que 5.
'''
b4 = pd.Series([1, 3, 5, 4, 7, 6, 8], index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
print(b)
b4[b4 >= 5]

a    1
b    3
c    5
d    4
e    7
f    6
g    8
dtype: int64


c    5
e    7
f    6
g    8
dtype: int64

In [63]:
'''
Dada la serie b5, obten los elementos que sean mayores que 5 y menores que 7.
'''
b5 = pd.Series([1, 3, 5, 4, 7, 6, 8], index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
print(b5)
b5[(b5>5) & (b5<7) ]

a    1
b    3
c    5
d    4
e    7
f    6
g    8
dtype: int64


f    6
dtype: int64

In [65]:
'''
Dada la serie b6, obten los elementos correspondientes a los índices 'b' y 'c'.
'''
b6 = pd.Series([1, 3, 5, 4, 7, 6, 8], index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
print(b6)
b6[['b','c']]

a    1
b    3
c    5
d    4
e    7
f    6
g    8
dtype: int64


b    3
c    5
dtype: int64

In [73]:
'''
Dada la serie b7, mapea los elementos de la serie con una función. (mulpliplicar por 2, por ejemplo)
'''
b7 = pd.Series([1, 3, 5, 4, 7, 6, 8], index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
print(b7)

b7*2

a    1
b    3
c    5
d    4
e    7
f    6
g    8
dtype: int64


a     2
b     6
c    10
d     8
e    14
f    12
g    16
dtype: int64

In [75]:
pd.Series(map(lambda x: str(x)+'a', b7))

0    1a
1    3a
2    5a
3    4a
4    7a
5    6a
6    8a
dtype: object

Podemos consultar el tipo de datos de una serie con dtype, tal como se hacía en numpy.

In [80]:
'''
Define una serie c de tipo float64 y muestra el tipo de dato de los elementos.
'''
c = pd.Series([1, 3, 5, 4, 7, 6, 8], index=['a', 'b', 'c', 'd', 'e', 'f', 'g'], dtype='float64')
print(c)
print(c.dtype)

a    1.0
b    3.0
c    5.0
d    4.0
e    7.0
f    6.0
g    8.0
dtype: float64
float64


Si lo que queremos es obtener el array tal cual, podemos usar el método array.

In [81]:
'''
Define una serie c1 de tipo float64 y muestra el array de datos.
'''
c = pd.Series([1, 3, 5, 4, 7, 6, 8], index=['a', 'b', 'c', 'd', 'e', 'f', 'g'], dtype='float64')
print(c.array)


<PandasArray>
[1.0, 3.0, 5.0, 4.0, 7.0, 6.0, 8.0]
Length: 7, dtype: float64


#### Series como diccionarios

También podemos entender una serie como un diccionario.

In [82]:
'''
Dado una serie d, obtén el valor de la posición 'f'.
'''
d = pd.Series([1, 3, 5, 4, 7, 6, 8], index=['a', 'b', 'c', 'd', 'e', 'f', 'g'], dtype='float64')
d['f']


6.0

In [85]:
# Si el índice no existe, se devuelve error
#print(d['h'])

### Head and tail

- head: devuelve los primeros n elementos de la serie.
- tail: devuelve los últimos n elementos de la serie.

In [89]:
'''
Genera un dataset con 25 valores aleatorios. Los valores se generan entre 0 y 1.
'''
data = np.random.rand(25)
dataset = pd.DataFrame(data)
dataset

Unnamed: 0,0
0,0.380752
1,0.115498
2,0.871651
3,0.727528
4,0.472588
5,0.598135
6,0.47614
7,0.119649
8,0.101537
9,0.11414


In [91]:
'''
muestra los primers 5 valores de la serie.
'''
dataset.head()

Unnamed: 0,0
0,0.380752
1,0.115498
2,0.871651
3,0.727528
4,0.472588


In [92]:
'''
Muestra los 10 primeros valores
'''
dataset.head(10)

Unnamed: 0,0
0,0.380752
1,0.115498
2,0.871651
3,0.727528
4,0.472588
5,0.598135
6,0.47614
7,0.119649
8,0.101537
9,0.11414


In [95]:
'''
Muestra los últimos 5 valores de la serie.
'''
dataset.tail()

Unnamed: 0,0
20,0.913408
21,0.383859
22,0.396968
23,0.446241
24,0.954386


In [96]:
'''
Muestra los 10 ultimos valores
'''
dataset.tail(10)

Unnamed: 0,0
15,0.573854
16,0.711948
17,0.005005
18,0.516346
19,0.683834
20,0.913408
21,0.383859
22,0.396968
23,0.446241
24,0.954386


### DataFrame

In [97]:
'''
Define un dataframe con dos columnas, 5 filas y valores aleatorios.
'''
d2 = pd.DataFrame(np.random.rand(5, 2))
print(d2)

          0         1
0  0.070317  0.209103
1  0.220159  0.371677
2  0.337703  0.076578
3  0.033993  0.333889
4  0.427952  0.745819


In [99]:
'''
Define un dataframe partiendo de un diccionario.
'''
data = {'A': [1, 2, 3, 4, 5], 'B': [6, 7, 8, 9, 10]}
d3 = pd.DataFrame(data)
print(d3)


   A   B
0  1   6
1  2   7
2  3   8
3  4   9
4  5  10


In [100]:
'''
Define un dataframe partiendo de un numpy ndarray.
'''
data = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
d4 = pd.DataFrame(data, columns=['A', 'B', 'C', 'D', 'E'])
print(d4)

   A  B  C  D   E
0  1  2  3  4   5
1  6  7  8  9  10


In [101]:
'''
Muestra las columnas del dataframe del ejercicio anterior.
'''
print(d4.columns)

Index(['A', 'B', 'C', 'D', 'E'], dtype='object')


##### Seleccion por posición

En un array o un dataframe, podemos seleccionar uno o varios elementos por su posición.

In [102]:
'''
Define un dataframe de valores enteros con 4 columnas y 5 filas. A continuación seleciona los elementos de la posición 3.
'''
d5 = pd.DataFrame(np.random.randint(0, 10, (5, 4)))
print(d5)
print(d5.iloc[3])


   0  1  2  3
0  2  9  6  2
1  9  6  2  5
2  5  8  2  9
3  4  8  6  5
4  5  1  5  8
0    4
1    8
2    6
3    5
Name: 3, dtype: int64


Como hemos venido viendo hasta ahora, podemos emplear slices con iloc.

In [103]:
'''
Define un dataframe de valores enteros con 4 columnas y 5 filas. A continuación seleciona los elementos de la posición 3 y 4.
'''
d5 = pd.DataFrame(np.random.randint(0, 10, (5, 4)))
print(d5)
print('\n')
print(d5.iloc[[3, 4]])

   0  1  2  3
0  4  4  7  3
1  1  4  7  7
2  4  6  6  1
3  8  1  4  6
4  4  0  3  5


   0  1  2  3
3  8  1  4  6
4  4  0  3  5


## Guia de expresiones regulares

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)