# Modificando el índice de un DataFrame

En la lección anterior vimos diferentes formas de ordenar un *DataFrame*.

En esta lección veremos cómo modificar el índice de un *DataFrame*. En particular veremos cómo:

- Definir el índice de un *DataFrame* al momento de su lectura
- Fijar el índice de un *DataFrame* después de realizar la lectura
- Reiniciar el índice de un *DataFrame*

## 1. Definir el índice de un *DataFrame* al momento de su lectura

La lectura de datos realizada hasta este momento con `read_csv` asigna un índice por defecto, es decir que los valores del índice realmente **NO** están almacenados en el archivo CSV original:

In [1]:
# Lectura del dataset original
import pandas as pd

df = pd.read_csv('peliculas.csv')
df

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
1,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
2,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
3,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
4,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,,12.0,7.1,,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4911,Color,Scott Smith,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,,Comedy|Drama,...,6.0,English,Canada,,,2013.0,470.0,7.7,,84
4912,Color,,43.0,43.0,,319.0,Valorie Curry,841.0,,Crime|Drama|Mystery|Thriller,...,359.0,English,USA,TV-14,,,593.0,7.5,16.00,32000
4913,Color,Benjamin Roberds,13.0,76.0,0.0,0.0,Maxwell Moody,0.0,,Drama|Horror|Thriller,...,3.0,English,USA,,1400.0,2013.0,0.0,6.3,,16
4914,Color,Daniel Hsia,14.0,100.0,0.0,489.0,Daniel Henney,946.0,10443.0,Comedy|Drama|Romance,...,9.0,English,USA,PG-13,,2012.0,719.0,6.3,2.35,660


Sin embargo, en ocasiones el índice asignado por defecto por Pandas no contiene información relevante para el análisis de los datos.

Muchas veces resulta relevante fijar el índice a partir de una de las columnas del set de datos original. Esto se puede hacer durante la lectura usando el parámetro `index_col`:

In [2]:
# Leer el dataset original y fijar index_col = 'movie_title'

df = pd.read_csv('peliculas.csv', index_col = 'movie_title')
df

Unnamed: 0_level_0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
Pirates of the Caribbean: At World's End,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
Spectre,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
The Dark Knight Rises,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
Star Wars: Episode VII - The Force Awakens,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,,12.0,7.1,,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Signed Sealed Delivered,Color,Scott Smith,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,,Comedy|Drama,...,6.0,English,Canada,,,2013.0,470.0,7.7,,84
The Following,Color,,43.0,43.0,,319.0,Valorie Curry,841.0,,Crime|Drama|Mystery|Thriller,...,359.0,English,USA,TV-14,,,593.0,7.5,16.00,32000
A Plague So Pleasant,Color,Benjamin Roberds,13.0,76.0,0.0,0.0,Maxwell Moody,0.0,,Drama|Horror|Thriller,...,3.0,English,USA,,1400.0,2013.0,0.0,6.3,,16
Shanghai Calling,Color,Daniel Hsia,14.0,100.0,0.0,489.0,Daniel Henney,946.0,10443.0,Comedy|Drama|Romance,...,9.0,English,USA,PG-13,,2012.0,719.0,6.3,2.35,660


## 2. Definir el índice de un *DataFrame* después de realizada la lectura

Si no conocemos el dataset y hasta ahora lo estamos explorando puede ser que no sepamos con antelación cuál será la columna que tomaremos como referencia.

En estos casos podemos definir el índice **DESPUÉS** de realizada la lectura:

In [3]:
# Leer el dataset original
df = pd.read_csv('peliculas.csv')
df

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
1,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
2,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
3,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
4,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,,12.0,7.1,,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4911,Color,Scott Smith,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,,Comedy|Drama,...,6.0,English,Canada,,,2013.0,470.0,7.7,,84
4912,Color,,43.0,43.0,,319.0,Valorie Curry,841.0,,Crime|Drama|Mystery|Thriller,...,359.0,English,USA,TV-14,,,593.0,7.5,16.00,32000
4913,Color,Benjamin Roberds,13.0,76.0,0.0,0.0,Maxwell Moody,0.0,,Drama|Horror|Thriller,...,3.0,English,USA,,1400.0,2013.0,0.0,6.3,,16
4914,Color,Daniel Hsia,14.0,100.0,0.0,489.0,Daniel Henney,946.0,10443.0,Comedy|Drama|Romance,...,9.0,English,USA,PG-13,,2012.0,719.0,6.3,2.35,660


In [4]:
# Y ahora fijar el índice con el método set_index
df = df.set_index('movie_title')
df

Unnamed: 0_level_0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
Pirates of the Caribbean: At World's End,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
Spectre,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
The Dark Knight Rises,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
Star Wars: Episode VII - The Force Awakens,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,,12.0,7.1,,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Signed Sealed Delivered,Color,Scott Smith,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,,Comedy|Drama,...,6.0,English,Canada,,,2013.0,470.0,7.7,,84
The Following,Color,,43.0,43.0,,319.0,Valorie Curry,841.0,,Crime|Drama|Mystery|Thriller,...,359.0,English,USA,TV-14,,,593.0,7.5,16.00,32000
A Plague So Pleasant,Color,Benjamin Roberds,13.0,76.0,0.0,0.0,Maxwell Moody,0.0,,Drama|Horror|Thriller,...,3.0,English,USA,,1400.0,2013.0,0.0,6.3,,16
Shanghai Calling,Color,Daniel Hsia,14.0,100.0,0.0,489.0,Daniel Henney,946.0,10443.0,Comedy|Drama|Romance,...,9.0,English,USA,PG-13,,2012.0,719.0,6.3,2.35,660


## 3. Reiniciar el índice

En ocasiones, cuando encadenamos varias operaciones sobre un *DataFrame* el índice sufre modificaciones consecutivas y al final de dichas operaciones resulta "inservible".

En estos casos lo mejor es reiniciar el índice al final de todas estas operaciones, usando el método `reset_index`:

In [5]:
# Leamos el dataset original con el índice asignado por defecto por Pandas
df = pd.read_csv('peliculas.csv')
df

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
1,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
2,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
3,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
4,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,,12.0,7.1,,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4911,Color,Scott Smith,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,,Comedy|Drama,...,6.0,English,Canada,,,2013.0,470.0,7.7,,84
4912,Color,,43.0,43.0,,319.0,Valorie Curry,841.0,,Crime|Drama|Mystery|Thriller,...,359.0,English,USA,TV-14,,,593.0,7.5,16.00,32000
4913,Color,Benjamin Roberds,13.0,76.0,0.0,0.0,Maxwell Moody,0.0,,Drama|Horror|Thriller,...,3.0,English,USA,,1400.0,2013.0,0.0,6.3,,16
4914,Color,Daniel Hsia,14.0,100.0,0.0,489.0,Daniel Henney,946.0,10443.0,Comedy|Drama|Romance,...,9.0,English,USA,PG-13,,2012.0,719.0,6.3,2.35,660


In [6]:
# Ahora organicemos las filas del *DataFrame* usando las columnas "gross" y 
# "budget" como referencia. Observemos lo que ocurre con el índice
df = df.sort_values(['gross', 'budget'])
df

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
3272,Color,Ekachai Uekrongtham,66.0,96.0,3.0,305.0,Mike Dopud,2000.0,162.0,Action|Crime|Thriller,...,38.0,English,Thailand,R,9000000.0,2014.0,368.0,5.7,2.35,0
4500,Color,Frank Whaley,9.0,96.0,436.0,4.0,Frank Whaley,474.0,703.0,Comedy|Drama,...,21.0,English,USA,R,1500000.0,2001.0,436.0,5.4,1.85,47
4499,Color,Brian Trenchard-Smith,8.0,88.0,53.0,176.0,Mariel Hemingway,563.0,721.0,Action|Drama,...,12.0,English,Germany,R,1000000.0,2006.0,288.0,4.1,1.85,42
3485,Color,Ian Fitzgibbon,54.0,88.0,11.0,415.0,Brendan Coyle,1000.0,828.0,Action|Comedy|Crime|Drama|Romance|Thriller,...,31.0,English,Ireland,R,,2009.0,418.0,6.4,2.35,663
4795,Color,Ricki Stern,11.0,106.0,15.0,0.0,Evelyn Jefferson,2.0,1111.0,Crime|Documentary,...,10.0,English,USA,PG-13,200000.0,2006.0,0.0,7.7,1.66,246
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4892,Color,Marcus Nispel,43.0,91.0,158.0,265.0,Brittany Curran,630.0,,Horror|Mystery|Thriller,...,33.0,English,USA,R,,2015.0,512.0,4.6,1.85,0
4903,Color,Tadeo Garcia,,84.0,5.0,12.0,Michael Cortez,21.0,,Drama,...,3.0,English,USA,,,2004.0,20.0,6.1,,22
4905,Color,Ash Baron-Cohen,10.0,98.0,3.0,152.0,Stanley B. Herman,789.0,,Crime|Drama,...,14.0,English,USA,,,1995.0,194.0,6.4,,20
4911,Color,Scott Smith,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,,Comedy|Drama,...,6.0,English,Canada,,,2013.0,470.0,7.7,,84


In [7]:
# En este punto podemos reiniciar el índice para que los valores sean consecutivos, comenzando en 0:
df = df.reset_index()
df

Unnamed: 0,index,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,3272,Color,Ekachai Uekrongtham,66.0,96.0,3.0,305.0,Mike Dopud,2000.0,162.0,...,38.0,English,Thailand,R,9000000.0,2014.0,368.0,5.7,2.35,0
1,4500,Color,Frank Whaley,9.0,96.0,436.0,4.0,Frank Whaley,474.0,703.0,...,21.0,English,USA,R,1500000.0,2001.0,436.0,5.4,1.85,47
2,4499,Color,Brian Trenchard-Smith,8.0,88.0,53.0,176.0,Mariel Hemingway,563.0,721.0,...,12.0,English,Germany,R,1000000.0,2006.0,288.0,4.1,1.85,42
3,3485,Color,Ian Fitzgibbon,54.0,88.0,11.0,415.0,Brendan Coyle,1000.0,828.0,...,31.0,English,Ireland,R,,2009.0,418.0,6.4,2.35,663
4,4795,Color,Ricki Stern,11.0,106.0,15.0,0.0,Evelyn Jefferson,2.0,1111.0,...,10.0,English,USA,PG-13,200000.0,2006.0,0.0,7.7,1.66,246
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4911,4892,Color,Marcus Nispel,43.0,91.0,158.0,265.0,Brittany Curran,630.0,,...,33.0,English,USA,R,,2015.0,512.0,4.6,1.85,0
4912,4903,Color,Tadeo Garcia,,84.0,5.0,12.0,Michael Cortez,21.0,,...,3.0,English,USA,,,2004.0,20.0,6.1,,22
4913,4905,Color,Ash Baron-Cohen,10.0,98.0,3.0,152.0,Stanley B. Herman,789.0,,...,14.0,English,USA,,,1995.0,194.0,6.4,,20
4914,4911,Color,Scott Smith,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,,...,6.0,English,Canada,,,2013.0,470.0,7.7,,84


Lo anterior funciona pero sólo en parte, pues el uso de `reset_index` ha convertido el índice anterior en una nueva columna del *DataFrame* (la columna "index").

Para evitar esto debemos usar el parámetro `drop = True` al momento de llamar `reset_index`:

In [8]:
# Lectura del dataset original
df = pd.read_csv('peliculas.csv')

# Ordenar usando "gross" y "budget" como referencia
df = df.sort_values(['gross', 'budget'])

# Reiniciar índice usando drop = True
df = df.reset_index(drop=True)

df

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,Color,Ekachai Uekrongtham,66.0,96.0,3.0,305.0,Mike Dopud,2000.0,162.0,Action|Crime|Thriller,...,38.0,English,Thailand,R,9000000.0,2014.0,368.0,5.7,2.35,0
1,Color,Frank Whaley,9.0,96.0,436.0,4.0,Frank Whaley,474.0,703.0,Comedy|Drama,...,21.0,English,USA,R,1500000.0,2001.0,436.0,5.4,1.85,47
2,Color,Brian Trenchard-Smith,8.0,88.0,53.0,176.0,Mariel Hemingway,563.0,721.0,Action|Drama,...,12.0,English,Germany,R,1000000.0,2006.0,288.0,4.1,1.85,42
3,Color,Ian Fitzgibbon,54.0,88.0,11.0,415.0,Brendan Coyle,1000.0,828.0,Action|Comedy|Crime|Drama|Romance|Thriller,...,31.0,English,Ireland,R,,2009.0,418.0,6.4,2.35,663
4,Color,Ricki Stern,11.0,106.0,15.0,0.0,Evelyn Jefferson,2.0,1111.0,Crime|Documentary,...,10.0,English,USA,PG-13,200000.0,2006.0,0.0,7.7,1.66,246
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4911,Color,Marcus Nispel,43.0,91.0,158.0,265.0,Brittany Curran,630.0,,Horror|Mystery|Thriller,...,33.0,English,USA,R,,2015.0,512.0,4.6,1.85,0
4912,Color,Tadeo Garcia,,84.0,5.0,12.0,Michael Cortez,21.0,,Drama,...,3.0,English,USA,,,2004.0,20.0,6.1,,22
4913,Color,Ash Baron-Cohen,10.0,98.0,3.0,152.0,Stanley B. Herman,789.0,,Crime|Drama,...,14.0,English,USA,,,1995.0,194.0,6.4,,20
4914,Color,Scott Smith,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,,Comedy|Drama,...,6.0,English,Canada,,,2013.0,470.0,7.7,,84


In [9]:
# Y recordemos que podemos encadenar las operaciones anteriores para escribir todo en una sola
# línea de código

# Lectura
df = pd.read_csv('peliculas.csv')

# Encadenar operaciones: ordenar y reiniciar índice
df = df.sort_values(['gross', 'budget']).reset_index(drop = True)
df

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,Color,Ekachai Uekrongtham,66.0,96.0,3.0,305.0,Mike Dopud,2000.0,162.0,Action|Crime|Thriller,...,38.0,English,Thailand,R,9000000.0,2014.0,368.0,5.7,2.35,0
1,Color,Frank Whaley,9.0,96.0,436.0,4.0,Frank Whaley,474.0,703.0,Comedy|Drama,...,21.0,English,USA,R,1500000.0,2001.0,436.0,5.4,1.85,47
2,Color,Brian Trenchard-Smith,8.0,88.0,53.0,176.0,Mariel Hemingway,563.0,721.0,Action|Drama,...,12.0,English,Germany,R,1000000.0,2006.0,288.0,4.1,1.85,42
3,Color,Ian Fitzgibbon,54.0,88.0,11.0,415.0,Brendan Coyle,1000.0,828.0,Action|Comedy|Crime|Drama|Romance|Thriller,...,31.0,English,Ireland,R,,2009.0,418.0,6.4,2.35,663
4,Color,Ricki Stern,11.0,106.0,15.0,0.0,Evelyn Jefferson,2.0,1111.0,Crime|Documentary,...,10.0,English,USA,PG-13,200000.0,2006.0,0.0,7.7,1.66,246
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4911,Color,Marcus Nispel,43.0,91.0,158.0,265.0,Brittany Curran,630.0,,Horror|Mystery|Thriller,...,33.0,English,USA,R,,2015.0,512.0,4.6,1.85,0
4912,Color,Tadeo Garcia,,84.0,5.0,12.0,Michael Cortez,21.0,,Drama,...,3.0,English,USA,,,2004.0,20.0,6.1,,22
4913,Color,Ash Baron-Cohen,10.0,98.0,3.0,152.0,Stanley B. Herman,789.0,,Crime|Drama,...,14.0,English,USA,,,1995.0,194.0,6.4,,20
4914,Color,Scott Smith,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,,Comedy|Drama,...,6.0,English,Canada,,,2013.0,470.0,7.7,,84


## 4. Ejemplo práctico

Usando las ideas anteriores, y algo de herramientas de programación de Python, podemos indexar el *DataFrame* de manera más sofisticada.

Por ejemplo, supongamos que nos interesa codificar en el índice la siguiente información:

- Primer apellido del Director
- Año de lanzamiento

Así por ejemplo, si el director es "Gore Verbinski" y el año de lanzamiento es 2007, entonces el índice debería ser "Verbinski-2007".

Veamos cómo lograr esto:

In [10]:
# Primero creemos una serie que contenga el apellido de cada director
# NOTA: ESTAMOS ASUMIENDO QUE CADA DIRECTOR ESTÁ ESCRITO COMO NOMBRE + APELLIDO
serie_apellidos = df['director_name'].str.split().str[1]
serie_apellidos

0           Uekrongtham
1                Whaley
2       Trenchard-Smith
3            Fitzgibbon
4                 Stern
             ...       
4911             Nispel
4912             Garcia
4913        Baron-Cohen
4914              Smith
4915                NaN
Name: director_name, Length: 4916, dtype: object

In [11]:
# Y extraigamos la serie con el año de lanzamiento
serie_years = df['title_year'].astype('str')
serie_years

0       2014.0
1       2001.0
2       2006.0
3       2009.0
4       2006.0
         ...  
4911    2015.0
4912    2004.0
4913    1995.0
4914    2013.0
4915       nan
Name: title_year, Length: 4916, dtype: object

Vemos que los años originalmente están en formato *float* así que al convertirlos a *str* se agregan decimales. Para evitar esto podemos usar una conversión a *int* antes de la conversión a *str*:

In [12]:
# Conversión a "int" antes de la conversión a "str"
serie_years = df['title_year'].astype('int').astype('str')
serie_years

IntCastingNaNError: Cannot convert non-finite values (NA or inf) to integer

Pero en este caso ocurre un error que nos indica que en la serie hay datos tipo *NaN* que no se pueden convertir a enteros.

Así que antes de todo esto debemos eliminar los *NaN* del *DataFrame*:

In [13]:
# Remover NaN
df = df.dropna()

# Serie con los apellidos
serie_apellidos = df['director_name'].str.split().str[1]

# Y ahora sí la serie con los años (entero y luego string):
serie_years = df['title_year'].astype('int').astype('str')
serie_years

0       2014
1       2001
2       2006
4       2006
5       2012
        ... 
4049    2008
4050    2012
4051    2015
4052    1997
4053    2009
Name: title_year, Length: 3654, dtype: object

¡Perfecto! Ahora unamos las dos series agregando un guión en la mitad:

In [14]:
indice = serie_apellidos + '-' + serie_years
indice

0           Uekrongtham-2014
1                Whaley-2001
2       Trenchard-Smith-2006
4                 Stern-2006
5                 Craig-2012
                ...         
4049              Nolan-2008
4050             Whedon-2012
4051          Trevorrow-2015
4052            Cameron-1997
4053            Cameron-2009
Length: 3654, dtype: object

In [15]:
# Y finalmente fijemos esta serie como nuevo índice del DataFrame
df = df.set_index(indice)
df

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
Uekrongtham-2014,Color,Ekachai Uekrongtham,66.0,96.0,3.0,305.0,Mike Dopud,2000.0,162.0,Action|Crime|Thriller,...,38.0,English,Thailand,R,9000000.0,2014.0,368.0,5.7,2.35,0
Whaley-2001,Color,Frank Whaley,9.0,96.0,436.0,4.0,Frank Whaley,474.0,703.0,Comedy|Drama,...,21.0,English,USA,R,1500000.0,2001.0,436.0,5.4,1.85,47
Trenchard-Smith-2006,Color,Brian Trenchard-Smith,8.0,88.0,53.0,176.0,Mariel Hemingway,563.0,721.0,Action|Drama,...,12.0,English,Germany,R,1000000.0,2006.0,288.0,4.1,1.85,42
Stern-2006,Color,Ricki Stern,11.0,106.0,15.0,0.0,Evelyn Jefferson,2.0,1111.0,Crime|Documentary,...,10.0,English,USA,PG-13,200000.0,2006.0,0.0,7.7,1.66,246
Craig-2012,Color,Alex Craig Mann,29.0,87.0,38.0,445.0,Christa B. Allen,552.0,1332.0,Comedy|Horror,...,20.0,English,USA,Not Rated,500000.0,2012.0,533.0,4.6,2.35,898
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Nolan-2008,Color,Christopher Nolan,645.0,152.0,22000.0,11000.0,Heath Ledger,23000.0,533316061.0,Action|Crime|Drama|Thriller,...,4667.0,English,USA,PG-13,185000000.0,2008.0,13000.0,9.0,2.35,37000
Whedon-2012,Color,Joss Whedon,703.0,173.0,0.0,19000.0,Robert Downey Jr.,26000.0,623279547.0,Action|Adventure|Sci-Fi,...,1722.0,English,USA,PG-13,220000000.0,2012.0,21000.0,8.1,1.85,123000
Trevorrow-2015,Color,Colin Trevorrow,644.0,124.0,365.0,1000.0,Judy Greer,3000.0,652177271.0,Action|Adventure|Sci-Fi|Thriller,...,1290.0,English,USA,PG-13,150000000.0,2015.0,2000.0,7.0,2.00,150000
Cameron-1997,Color,James Cameron,315.0,194.0,0.0,794.0,Kate Winslet,29000.0,658672302.0,Drama|Romance,...,2528.0,English,USA,PG-13,200000000.0,1997.0,14000.0,7.7,2.35,26000
