# Formas de ordenar una Serie

En esta lección veremos las dos maneras que existen de ordenar una serie: con base en sus datos y con base en su índice.

Y veremos que los dos métodos para lograr esto (`sort_values` y `sort_index`) son exactamente iguales a los usados en los *DataFrames* y que vimos anteriormente.

## 1. Ordenar una Serie con base en los valores de los datos

En este caso usamos el método `sort_values` que, por defecto, ordena la serie de manera ascendente (del valor más pequeño al más grande, o en orden alfabético en caso de tener *strings*):

In [1]:
# Importar librerías
import pandas as pd

# Leer dataframe
df = pd.read_csv('peliculas.csv')

# Extraer series
serie_num = df['num_critic_for_reviews']
serie_str = df['director_name']

In [2]:
# Ordenar Serie numérica de forma ascendente
serie_num.sort_values() # Ascending = True

4196    1.0
4860    1.0
3894    1.0
3933    1.0
4655    1.0
       ... 
4880    NaN
4889    NaN
4893    NaN
4903    NaN
4909    NaN
Name: num_critic_for_reviews, Length: 4916, dtype: float64

In [3]:
# Y ordenarla de forma descendente
serie_num.sort_values(ascending = False)

3       813.0
224     775.0
293     765.0
30      750.0
128     739.0
        ...  
4880      NaN
4889      NaN
4893      NaN
4903      NaN
4909      NaN
Name: num_critic_for_reviews, Length: 4916, dtype: float64

In [4]:
# Ordenar serie de "strings" de forma ascendente
serie_str.sort_values()

4498      A. Raven Cruz
4221         Aaron Hann
3430    Aaron Schneider
2156      Aaron Seltzer
2862       Abel Ferrara
             ...       
4683                NaN
4688                NaN
4704                NaN
4752                NaN
4912                NaN
Name: director_name, Length: 4916, dtype: object

In [5]:
# Y ahora de forma descendente
serie_str.sort_values(ascending = False)

4731         Étienne Faure
4103          Éric Tessier
3609      Émile Gaudreault
3185    Álex de la Iglesia
4769         Zoran Lisinac
               ...        
4683                   NaN
4688                   NaN
4704                   NaN
4752                   NaN
4912                   NaN
Name: director_name, Length: 4916, dtype: object

## 2. Ordenar una Serie con base en los valores de su índice

Al igual que con los *DataFrames* podemos usar de nuevo `sort_index`:

In [6]:
serie_num.sort_index(ascending = False)

4915     43.0
4914     14.0
4913     13.0
4912     43.0
4911      1.0
        ...  
4         NaN
3       813.0
2       602.0
1       302.0
0       723.0
Name: num_critic_for_reviews, Length: 4916, dtype: float64

In [7]:
# Fijemos un nuevo índice para el DataFrame
df = df.set_index('director_name')
df

Unnamed: 0_level_0,color,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,actor_1_name,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
director_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
James Cameron,Color,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,CCH Pounder,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
Gore Verbinski,Color,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,Johnny Depp,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
Sam Mendes,Color,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,Christoph Waltz,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
Christopher Nolan,Color,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,Tom Hardy,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
Doug Walker,,,,131.0,,Rob Walker,131.0,,Documentary,Doug Walker,...,,,,,,,12.0,7.1,,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Scott Smith,Color,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,,Comedy|Drama,Eric Mabius,...,6.0,English,Canada,,,2013.0,470.0,7.7,,84
,Color,43.0,43.0,,319.0,Valorie Curry,841.0,,Crime|Drama|Mystery|Thriller,Natalie Zea,...,359.0,English,USA,TV-14,,,593.0,7.5,16.00,32000
Benjamin Roberds,Color,13.0,76.0,0.0,0.0,Maxwell Moody,0.0,,Drama|Horror|Thriller,Eva Boehnke,...,3.0,English,USA,,1400.0,2013.0,0.0,6.3,,16
Daniel Hsia,Color,14.0,100.0,0.0,489.0,Daniel Henney,946.0,10443.0,Comedy|Drama|Romance,Alan Ruck,...,9.0,English,USA,PG-13,,2012.0,719.0,6.3,2.35,660


In [8]:
# Y extraigamos la columna "movie_title"
serie_movies = df['movie_title']
serie_movies

director_name
James Cameron                                            Avatar
Gore Verbinski         Pirates of the Caribbean: At World's End
Sam Mendes                                              Spectre
Christopher Nolan                         The Dark Knight Rises
Doug Walker          Star Wars: Episode VII - The Force Awakens
                                        ...                    
Scott Smith                             Signed Sealed Delivered
NaN                                               The Following
Benjamin Roberds                           A Plague So Pleasant
Daniel Hsia                                    Shanghai Calling
Jon Gunn                                      My Date with Drew
Name: movie_title, Length: 4916, dtype: object

In [9]:
# Y ahora organicemos la Serie alfabéticamente de forma ascendente con base en su índice 
serie_movies.sort_index()

director_name
A. Raven Cruz      The Helix... Loaded
Aaron Hann                      Circle
Aaron Schneider                Get Low
Aaron Seltzer               Date Movie
Abel Ferrara               The Funeral
                          ...         
NaN                             Heroes
NaN                        Home Movies
NaN                         Revolution
NaN                       Happy Valley
NaN                      The Following
Name: movie_title, Length: 4916, dtype: object

In [10]:
# Y ahora la ordenamos alfabéticamente en orden inverso (también con base en su índice)
serie_movies.sort_index(ascending = False)

director_name
Étienne Faure                    Bizarre
Éric Tessier                Sur le seuil
Émile Gaudreault          Mambo Italiano
Álex de la Iglesia    The Oxford Murders
Zoran Lisinac         Along the Roadside
                             ...        
NaN                               Heroes
NaN                          Home Movies
NaN                           Revolution
NaN                         Happy Valley
NaN                        The Following
Name: movie_title, Length: 4916, dtype: object

Y recordemos que, tal como ocurría con los *DataFrames*, estas operaciones **NO persisten** a menos que sobre-escribamos la serie o que usemos el parámetro `inplace = True`:

In [11]:
serie_movies

director_name
James Cameron                                            Avatar
Gore Verbinski         Pirates of the Caribbean: At World's End
Sam Mendes                                              Spectre
Christopher Nolan                         The Dark Knight Rises
Doug Walker          Star Wars: Episode VII - The Force Awakens
                                        ...                    
Scott Smith                             Signed Sealed Delivered
NaN                                               The Following
Benjamin Roberds                           A Plague So Pleasant
Daniel Hsia                                    Shanghai Calling
Jon Gunn                                      My Date with Drew
Name: movie_title, Length: 4916, dtype: object

In [12]:
serie_movies.sort_index(ascending=True, inplace=True)

In [13]:
serie_movies

director_name
A. Raven Cruz      The Helix... Loaded
Aaron Hann                      Circle
Aaron Schneider                Get Low
Aaron Seltzer               Date Movie
Abel Ferrara               The Funeral
                          ...         
NaN                             Heroes
NaN                        Home Movies
NaN                         Revolution
NaN                       Happy Valley
NaN                      The Following
Name: movie_title, Length: 4916, dtype: object

## 3. Ejercicio práctico

Usando Series encontrar las peores 5 películas, es decir las que generaron los peores retornos de inversión.

In [14]:
# Crear columna "net_income"
df['net_income'] = df['gross'] - df['budget']
df

Unnamed: 0_level_0,color,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,actor_1_name,...,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes,net_income
director_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
James Cameron,Color,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,CCH Pounder,...,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000,523505847.0
Gore Verbinski,Color,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,Johnny Depp,...,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0,9404152.0
Sam Mendes,Color,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,Christoph Waltz,...,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000,-44925825.0
Christopher Nolan,Color,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,Tom Hardy,...,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000,198130642.0
Doug Walker,,,,131.0,,Rob Walker,131.0,,Documentary,Doug Walker,...,,,,,,12.0,7.1,,0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Scott Smith,Color,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,,Comedy|Drama,Eric Mabius,...,English,Canada,,,2013.0,470.0,7.7,,84,
,Color,43.0,43.0,,319.0,Valorie Curry,841.0,,Crime|Drama|Mystery|Thriller,Natalie Zea,...,English,USA,TV-14,,,593.0,7.5,16.00,32000,
Benjamin Roberds,Color,13.0,76.0,0.0,0.0,Maxwell Moody,0.0,,Drama|Horror|Thriller,Eva Boehnke,...,English,USA,,1400.0,2013.0,0.0,6.3,,16,
Daniel Hsia,Color,14.0,100.0,0.0,489.0,Daniel Henney,946.0,10443.0,Comedy|Drama|Romance,Alan Ruck,...,English,USA,PG-13,,2012.0,719.0,6.3,2.35,660,


In [15]:
# Fijar el índice del DataFrame como la columna "movie_title"
df = df.set_index('movie_title')
df

Unnamed: 0_level_0,color,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,actor_1_name,...,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes,net_income
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
The Helix... Loaded,Color,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,CCH Pounder,...,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000,523505847.0
Circle,Color,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,Johnny Depp,...,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0,9404152.0
Get Low,Color,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,Christoph Waltz,...,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000,-44925825.0
Date Movie,Color,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,Tom Hardy,...,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000,198130642.0
The Funeral,,,,131.0,,Rob Walker,131.0,,Documentary,Doug Walker,...,,,,,,12.0,7.1,,0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Heroes,Color,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,,Comedy|Drama,Eric Mabius,...,English,Canada,,,2013.0,470.0,7.7,,84,
Home Movies,Color,43.0,43.0,,319.0,Valorie Curry,841.0,,Crime|Drama|Mystery|Thriller,Natalie Zea,...,English,USA,TV-14,,,593.0,7.5,16.00,32000,
Revolution,Color,13.0,76.0,0.0,0.0,Maxwell Moody,0.0,,Drama|Horror|Thriller,Eva Boehnke,...,English,USA,,1400.0,2013.0,0.0,6.3,,16,
Happy Valley,Color,14.0,100.0,0.0,489.0,Daniel Henney,946.0,10443.0,Comedy|Drama|Romance,Alan Ruck,...,English,USA,PG-13,,2012.0,719.0,6.3,2.35,660,


In [16]:
# Extraer la serie proveniente de la columna "net_income"
serie_ni = df['net_income']
serie_ni

movie_title
The Helix... Loaded    523505847.0
Circle                   9404152.0
Get Low                -44925825.0
Date Movie             198130642.0
The Funeral                    NaN
                          ...     
Heroes                         NaN
Home Movies                    NaN
Revolution                     NaN
Happy Valley                   NaN
The Following              84122.0
Name: net_income, Length: 4916, dtype: float64

Podemos encontrar las 5 peores películas de dos maneras.

La primera es ordenando la Serie de manera ascendente y tomando los 5 primeros datos (de forma similar a como lo hicimos en un ejemplo anterior para el caso de los *DataFrames*):

In [17]:
# Método 1: usar sort_values seguido de indexación
bottom_5 = serie_ni.sort_values()[0:5]
bottom_5

movie_title
The Ugly Truth         -4.199788e+09
Amour                  -2.499804e+09
The Hebrew Hammer      -2.397702e+09
The Whole Nine Yards   -2.127110e+09
Black Book             -1.099561e+09
Name: net_income, dtype: float64

O podemos usar el método `nsmallest` ("los n más pequeños") para encontrar los 5 niveles de retorno más bajos. **Este método también se puede usar en los *DataFrames***:

In [18]:
bottom_5 = serie_ni.nsmallest(5)
bottom_5

movie_title
The Ugly Truth         -4.199788e+09
Amour                  -2.499804e+09
The Hebrew Hammer      -2.397702e+09
The Whole Nine Yards   -2.127110e+09
Black Book             -1.099561e+09
Name: net_income, dtype: float64

Si quisieramos encontrar las top-5 películas por nivel de retorno podemos usar `nlargest` (de nuevo, también funciona para los *DataFrames*):

In [19]:
top_5 = serie_ni.nlargest(5)
top_5

movie_title
The Helix... Loaded     523505847.0
Flashdance              502177271.0
The Wedding Planner     458672302.0
The Insider             449935665.0
The Face of an Angel    424449459.0
Name: net_income, dtype: float64