# 🐍 **Introducción a Python para el Análisis de Datos**<br>

### 👨‍💻 Jorge Gómez Galván
* LinkedIn: [linkedin.com/in/jorgeggalvan/](https://www.linkedin.com/in/jorgeggalvan/) 
* E-mail: gomezgalvanjorge@gmail.com

## **Capítulo 2: DataFrames - Soluciones a Ejercicios** 
---

Este notebook incluye las soluciones a los ejercicios del [capítulo 2: DataFrames](https://github.com/jorgeggalvan/Data-Analysis-Fundamentals-with-Python/blob/main/Python_Data_Analysis_2.1_DataFrames.ipynb) de la Introducción a Python para el Análisis de Datos.

### 2. Ejercicios resueltos

#### Ejercicio 2.1
---

> Dataset a utilizar:  `star_wars.csv`

**2.1A:** Importa el dataset de personajes de Star Wars y selecciona las columnas 'Name', 'Homeworld' y 'Species', mostrando las primeras 10 filas del dataframe.

**2.1B:** Selecciona únicamente los personajes que:

* No pertenezcan a la especie 'Human'.
* Su planeta de origen ('Homeworld') sea 'Naboo', 'Kashyyyk' o 'Endor'.

##### Ejercicio 2.1A

In [1]:
# Importamos Pandas
import pandas as pd

# Importamos el DataFrame de Star Wars
df_starwars = pd.read_csv('./data/star_wars.csv')
df_starwars.head()

Unnamed: 0,Name,Birth Year,Homeworld,Species,Gender,Height,Mass,Skin Color,Hair Color,Eye Color,Films
0,Luke Skywalker,19.0,Tatooine,Human,male,172.0,77.0,fair,blond,blue,"A New Hope, The Empire Strikes Back, Return of..."
1,C-3PO,112.0,Tatooine,Droid,none,167.0,75.0,gold,none,yellow,"A New Hope, The Empire Strikes Back, Return of..."
2,R2-D2,33.0,Naboo,Droid,none,96.0,32.0,"white, blue",none,red,"A New Hope, The Empire Strikes Back, Return of..."
3,Darth Vader,41.9,Tatooine,Human,male,202.0,136.0,white,none,yellow,"A New Hope, The Empire Strikes Back, Return of..."
4,Leia Organa,19.0,Alderaan,Human,female,150.0,49.0,light,brown,brown,"A New Hope, The Empire Strikes Back, Return of..."


In [2]:
# Seleccionamos las columnas 'Name', 'Homeworld' y 'Species'
df_starwars = df_starwars[['Name', 'Homeworld', 'Species']]

# Mostramos las 10 primeras filas del DataFrame
df_starwars.head(10)

Unnamed: 0,Name,Homeworld,Species
0,Luke Skywalker,Tatooine,Human
1,C-3PO,Tatooine,Droid
2,R2-D2,Naboo,Droid
3,Darth Vader,Tatooine,Human
4,Leia Organa,Alderaan,Human
5,Owen Lars,Tatooine,Human
6,R5-D4,Tatooine,Droid
7,Biggs Darklighter,Tatooine,Human
8,Obi-Wan Kenobi,Stewjon,Human
9,Anakin Skywalker,Tatooine,Human


In [3]:
# También se pueden seleccionar las primeras 10 filas con .iloc[]
df_starwars.iloc[:10] 

Unnamed: 0,Name,Homeworld,Species
0,Luke Skywalker,Tatooine,Human
1,C-3PO,Tatooine,Droid
2,R2-D2,Naboo,Droid
3,Darth Vader,Tatooine,Human
4,Leia Organa,Alderaan,Human
5,Owen Lars,Tatooine,Human
6,R5-D4,Tatooine,Droid
7,Biggs Darklighter,Tatooine,Human
8,Obi-Wan Kenobi,Stewjon,Human
9,Anakin Skywalker,Tatooine,Human


##### Ejercicio 2.1B

In [4]:
# Creamos las condiciones para seleccionar sólo los personajes que cumplan los criterios definidos
cond_1 = df_starwars['Species'] != 'Human'
cond_2 = df_starwars['Homeworld'] == 'Naboo'
cond_3 = df_starwars['Homeworld'] == 'Endor'
cond_4 = df_starwars['Homeworld'] == 'Kashyyyk'

# Aplicamos las condiciones para filtrar
df_starwars[cond_1 & (cond_2 | cond_3 | cond_4)]

Unnamed: 0,Name,Homeworld,Species
2,R2-D2,Naboo,Droid
11,Chewbacca,Kashyyyk,Wookiee
27,Wicket Systri Warrick,Endor,Ewok
33,Jar Jar Binks,Naboo,Gungan
34,Roos Tarpals,Naboo,Gungan
35,Rugor Nass,Naboo,Gungan
76,Tarfful,Kashyyyk,Wookiee


In [5]:
# Realizamos el filtrado con el método .isin()
cond_1 = df_starwars['Species'] != 'Human'
cond_2 = df_starwars['Homeworld'].isin(['Naboo', 'Endor', 'Kashyyyk'])

df_starwars[cond_1 & cond_2]

Unnamed: 0,Name,Homeworld,Species
2,R2-D2,Naboo,Droid
11,Chewbacca,Kashyyyk,Wookiee
27,Wicket Systri Warrick,Endor,Ewok
33,Jar Jar Binks,Naboo,Gungan
34,Roos Tarpals,Naboo,Gungan
35,Rugor Nass,Naboo,Gungan
76,Tarfful,Kashyyyk,Wookiee


#### Ejercicio 2.2
---

> Dataset a utilizar:  `fortune_1000.csv`

**2.2A:** Importa el dataset de las mayores empresas estadounidenses y selecciona las columnas 'Company', 'Location', 'Employees' y 'Profits'.

**2.2B:** Crea una nueva columna llamada 'Profits per Employee' que se calcule dividiendo la columna 'Profits' entre la columna 'Employees'.

**2.2C:** Crea una nueva columna llamada 'Company HQ' que combine los valores de las columnas 'Company' y 'Location'.

##### Ejercicio 2.2A

In [6]:
# Importamos el DataFrame de empresas estadounidenses
df_fortune = pd.read_csv('./data/fortune_1000.csv')
df_fortune.head()

Unnamed: 0,Rank,Company,Sector,Industry,Location,Employees,Revenue,Profits,Change in Rank
0,1,Walmart,Retailing,General Merchandisers,"Bentonville, AR",2100000,648125.0,15511.0,0.0
1,2,Amazon,Retailing,Internet Services and Retailing,"Seattle, WA",1525000,574785.0,30425.0,0.0
2,3,Apple,Technology,"Computers, Office Equipment","Cupertino, CA",161000,383285.0,96995.0,1.0
3,4,UnitedHealth Group,Health Care,Health Care: Insurance and Managed Care,"Minnetonka, MN",440000,371622.0,22381.0,1.0
4,5,Berkshire Hathaway,Financials,Insurance: Property and Casualty (Stock),"Omaha, NE",396500,364482.0,96223.0,2.0


In [7]:
# Seleccionamos las columnas 'Company', 'Location', 'Employees' y 'Profits'
df_fortune = df_fortune[['Company', 'Location', 'Employees', 'Profits']]
df_fortune.head()

Unnamed: 0,Company,Location,Employees,Profits
0,Walmart,"Bentonville, AR",2100000,15511.0
1,Amazon,"Seattle, WA",1525000,30425.0
2,Apple,"Cupertino, CA",161000,96995.0
3,UnitedHealth Group,"Minnetonka, MN",440000,22381.0
4,Berkshire Hathaway,"Omaha, NE",396500,96223.0


##### Ejercicio 2.2B

In [8]:
# Creamos la columna 'Profits per Employee'
df_fortune['Profits per Employee'] = df_fortune['Profits'] / df_fortune['Employees']
df_fortune.head()

Unnamed: 0,Company,Location,Employees,Profits,Profits per Employee
0,Walmart,"Bentonville, AR",2100000,15511.0,0.007386
1,Amazon,"Seattle, WA",1525000,30425.0,0.019951
2,Apple,"Cupertino, CA",161000,96995.0,0.602453
3,UnitedHealth Group,"Minnetonka, MN",440000,22381.0,0.050866
4,Berkshire Hathaway,"Omaha, NE",396500,96223.0,0.242681


##### Ejercicio 2.2C

In [9]:
# Creamos la columna 'Company HQ'
df_fortune['Company HQ'] = df_fortune['Company'] + ', ' + df_fortune['Location']
df_fortune.head()

Unnamed: 0,Company,Location,Employees,Profits,Profits per Employee,Company HQ
0,Walmart,"Bentonville, AR",2100000,15511.0,0.007386,"Walmart, Bentonville, AR"
1,Amazon,"Seattle, WA",1525000,30425.0,0.019951,"Amazon, Seattle, WA"
2,Apple,"Cupertino, CA",161000,96995.0,0.602453,"Apple, Cupertino, CA"
3,UnitedHealth Group,"Minnetonka, MN",440000,22381.0,0.050866,"UnitedHealth Group, Minnetonka, MN"
4,Berkshire Hathaway,"Omaha, NE",396500,96223.0,0.242681,"Berkshire Hathaway, Omaha, NE"


#### Ejercicio 2.3
---

> Dataset a utilizar:  `imdb_movies.csv`

**2.3A:** Importa el dataset de películas y consulta su dimensión (números de filas y columnas).

**2.3B:** Lista todas las clasificaciones únicas de contenido ('content_rating') presentes en las películas.

##### Ejercicio 2.3A

In [10]:
# Importamos el DataFrame de películas
df_movies = pd.read_csv('./data/imdb_movies.csv')
df_movies.head()

Unnamed: 0,movie_title,title_year,duration,country,language,genres,content_rating,color,aspect_ratio,gross,...,num_critic_for_reviews,movie_facebook_likes,director_facebook_likes,actor_1_facebook_likes,actor_2_facebook_likes,actor_3_facebook_likes,cast_total_facebook_likes,facenumber_in_poster,plot_keywords,movie_imdb_link
0,Avatar,2009.0,178.0,USA,English,Action|Adventure|Fantasy|Sci-Fi,PG-13,Color,1.78,760505847.0,...,723.0,33000,0.0,1000.0,936.0,855.0,4834,0.0,avatar|future|marine|native|paraplegic,http://www.imdb.com/title/tt0499549/?ref_=fn_t...
1,Pirates of the Caribbean: At World's End,2007.0,169.0,USA,English,Action|Adventure|Fantasy,PG-13,Color,2.35,309404152.0,...,302.0,0,563.0,40000.0,5000.0,1000.0,48350,0.0,goddess|marriage ceremony|marriage proposal|pi...,http://www.imdb.com/title/tt0449088/?ref_=fn_t...
2,Spectre,2015.0,148.0,UK,English,Action|Adventure|Thriller,PG-13,Color,2.35,200074175.0,...,602.0,85000,0.0,11000.0,393.0,161.0,11700,1.0,bomb|espionage|sequel|spy|terrorist,http://www.imdb.com/title/tt2379713/?ref_=fn_t...
3,The Dark Knight Rises,2012.0,164.0,USA,English,Action|Thriller,PG-13,Color,2.35,448130642.0,...,813.0,164000,22000.0,27000.0,23000.0,23000.0,106759,0.0,deception|imprisonment|lawlessness|police offi...,http://www.imdb.com/title/tt1345836/?ref_=fn_t...
4,Star Wars: Episode VII - The Force Awakens ...,,,,,Documentary,,,,,...,,0,131.0,131.0,12.0,,143,0.0,,http://www.imdb.com/title/tt5289954/?ref_=fn_t...


In [11]:
# Consultamos la dimensión del DataFrame
df_movies.shape

(5043, 28)

##### Ejercicio 2.3B

In [12]:
# Obtenemos todas las clasificaciones únicas de 'content_rating'
df_movies['content_rating'].unique()

array(['PG-13', nan, 'PG', 'G', 'R', 'TV-14', 'TV-PG', 'TV-MA', 'TV-G',
       'Not Rated', 'Unrated', 'Approved', 'TV-Y', 'NC-17', 'X', 'TV-Y7',
       'GP', 'Passed', 'M'], dtype=object)

#### Ejercicio 2.4
---

> Dataset a utilizar:  `imdb_movies.csv`

Importa el dataset de películas, seleccionando las columnas 'movie_title', 'country', 'director_name' y 'imdb_score', y filtra las películas por los siguientes criterios:

* Que sean producidas fuera de USA o con una calificación en IMDb mayor a 8,5.
* Que sean dirigidas por alguno de los siguientes directores: Peter Jackson, Tim Burton o Steven Spielberg.

👉 Para mostrar todas las columnas de un DataFrame, se puede utilizar esta configuración de Pandas: `pd.set_option('display.max_columns', None)`

In [13]:
# Establecemos la opción para mostrar todas las columnas
pd.set_option('display.max_columns', None)

In [14]:
# Creamos las tres condiciones con los criterios definidos
cond_1_1 = df_movies['country'] != 'USA'
cond_1_2 = df_movies['imdb_score'] > 8.5
cond_2 = df_movies['director_name'].isin(['Peter Jackson', 'Tim Burton', 'Steven Spielberg'])

# Aplicamos las condiciones, seleccionando sólo las cuatro columnas utilizadas
df_movies[(cond_1_1 | cond_1_2) & cond_2][['movie_title','country', 'director_name', 'imdb_score']].head(100)

Unnamed: 0,movie_title,country,director_name,imdb_score
20,The Hobbit: The Battle of the Five Armies,New Zealand,Peter Jackson,7.5
25,King Kong,New Zealand,Peter Jackson,7.2
178,The BFG,UK,Steven Spielberg,6.8
270,The Lord of the Rings: The Fellowship of the R...,New Zealand,Peter Jackson,8.8
339,The Lord of the Rings: The Return of the King,USA,Peter Jackson,8.9
340,The Lord of the Rings: The Two Towers,USA,Peter Jackson,8.7
545,Munich,France,Steven Spielberg,7.6
648,Saving Private Ryan,USA,Steven Spielberg,8.6
1874,Schindler's List,USA,Steven Spielberg,8.9
2049,King Kong,New Zealand,Peter Jackson,7.2


#### Ejercicio 2.5
---

> Dataset a utilizar:  `fortune_1000.csv`

**2.5A:** Importa el dataset de las mayores empresas estadounidenses y extrae las 10 empresas con mayor facturación ('Revenue') del sector tecnológico.

**2.5B:** Renombra la columna 'Company' por 'Tech Company', y elimina las columnas 'Sector' y 'Rank'.

##### Ejercicio 2.5A

In [15]:
# Importamos el DataFrame de empresa estadounidenses
df_fortune = pd.read_csv('./data/fortune_1000.csv')

In [16]:
# Seleccionamos sólo las empresas tecnológicas y las ordenamos por facturación
tech_fortune = df_fortune[df_fortune['Sector'] == 'Technology'].sort_values(by='Revenue', ascending=False)
tech_fortune.head(10)

Unnamed: 0,Rank,Company,Sector,Industry,Location,Employees,Revenue,Profits,Change in Rank
2,3,Apple,Technology,"Computers, Office Equipment","Cupertino, CA",161000,383285.0,96995.0,1.0
7,8,Alphabet,Technology,Internet Services and Retailing,"Mountain View, CA",182502,307394.0,73795.0,0.0
12,13,Microsoft,Technology,Computer Software,"Redmond, WA",221000,211915.0,72361.0,0.0
29,30,Meta Platforms,Technology,Internet Services and Retailing,"Menlo Park, CA",67317,134902.0,39098.0,1.0
47,48,Dell Technologies,Technology,"Computers, Office Equipment","Round Rock, TX",120000,88425.0,3211.0,-14.0
62,63,IBM,Technology,Information Technology Services,"Armonk, NY",296600,61860.0,7502.0,2.0
64,65,Nvidia,Technology,Semiconductors and Other Electronic Components,"Santa Clara, CA",29600,60922.0,29760.0,87.0
73,74,Cisco Systems,Technology,Network and Other Communications Equipment,"San Jose, CA",84900,56998.0,12613.0,8.0
78,79,Intel,Technology,Semiconductors and Other Electronic Components,"Santa Clara, CA",124800,54228.0,1689.0,-17.0
81,82,HP,Technology,"Computers, Office Equipment","Palo Alto, CA",58000,53718.0,3263.0,-19.0


##### Ejercicio 2.5B

In [17]:
# Renombramos la columna 'Company'
tech_fortune = tech_fortune.rename(columns = {'Company':'Tech Company'}).drop(columns=['Sector', 'Rank'])
tech_fortune.head(10)

Unnamed: 0,Tech Company,Industry,Location,Employees,Revenue,Profits,Change in Rank
2,Apple,"Computers, Office Equipment","Cupertino, CA",161000,383285.0,96995.0,1.0
7,Alphabet,Internet Services and Retailing,"Mountain View, CA",182502,307394.0,73795.0,0.0
12,Microsoft,Computer Software,"Redmond, WA",221000,211915.0,72361.0,0.0
29,Meta Platforms,Internet Services and Retailing,"Menlo Park, CA",67317,134902.0,39098.0,1.0
47,Dell Technologies,"Computers, Office Equipment","Round Rock, TX",120000,88425.0,3211.0,-14.0
62,IBM,Information Technology Services,"Armonk, NY",296600,61860.0,7502.0,2.0
64,Nvidia,Semiconductors and Other Electronic Components,"Santa Clara, CA",29600,60922.0,29760.0,87.0
73,Cisco Systems,Network and Other Communications Equipment,"San Jose, CA",84900,56998.0,12613.0,8.0
78,Intel,Semiconductors and Other Electronic Components,"Santa Clara, CA",124800,54228.0,1689.0,-17.0
81,HP,"Computers, Office Equipment","Palo Alto, CA",58000,53718.0,3263.0,-19.0


#### Ejercicio 2.6
---

> Dataset a utilizar:  `imdb_movies.csv`

Importa el dataset de películas y extrae las 10 películas del género animación con mayor 'imbd_score', seleccionando únicamente los campos 'movie_title', 'title_year' y 'imbd_score'.

In [18]:
# Importamos el DataFrame de películas
df_movies = pd.read_csv('./data/imdb_movies.csv')

In [19]:
# Filtramos las películas que contienen 'Animation' en el campo de género
cond = df_movies['genres'].str.contains('Animation', regex=False, case=False)

animation_movies = df_movies[cond]
animation_movies.head()

Unnamed: 0,movie_title,title_year,duration,country,language,genres,content_rating,color,aspect_ratio,gross,budget,director_name,actor_1_name,actor_2_name,actor_3_name,imdb_score,num_voted_users,num_user_for_reviews,num_critic_for_reviews,movie_facebook_likes,director_facebook_likes,actor_1_facebook_likes,actor_2_facebook_likes,actor_3_facebook_likes,cast_total_facebook_likes,facenumber_in_poster,plot_keywords,movie_imdb_link
7,Tangled,2010.0,100.0,USA,English,Adventure|Animation|Comedy|Family|Fantasy|Musi...,PG,Color,1.85,200807262.0,260000000.0,Nathan Greno,Brad Garrett,Donna Murphy,M.C. Gainey,7.8,294810,387.0,324.0,29000,15.0,799.0,553.0,284.0,2036,1.0,17th century|based on fairy tale|disney|flower...,http://www.imdb.com/title/tt0398286/?ref_=fn_t...
35,Monsters University,2013.0,104.0,USA,English,Adventure|Animation|Comedy|Family|Fantasy,G,Color,1.85,268488329.0,200000000.0,Dan Scanlon,Steve Buscemi,Tyler Labine,Sean Hayes,7.3,235025,265.0,376.0,44000,37.0,12000.0,779.0,760.0,14863,0.0,cheating|fraternity|monster|singing in a car|u...,http://www.imdb.com/title/tt1453405/?ref_=fn_t...
41,Cars 2,2011.0,106.0,USA,English,Adventure|Animation|Comedy|Family|Sport,G,Color,2.35,191450875.0,200000000.0,John Lasseter,Joe Mantegna,Thomas Kretschmann,Eddie Izzard,6.3,101178,283.0,304.0,10000,487.0,1000.0,919.0,776.0,4482,0.0,best friend|car race|conspiracy|gadget car|spy,http://www.imdb.com/title/tt1216475/?ref_=fn_t...
43,Toy Story 3,2010.0,103.0,USA,English,Adventure|Animation|Comedy|Family|Fantasy,G,Color,1.85,414984497.0,200000000.0,Lee Unkrich,Tom Hanks,John Ratzenberger,Don Rickles,8.3,544884,733.0,453.0,30000,125.0,15000.0,1000.0,721.0,19085,3.0,college|day care|escape|teddy bear|toy,http://www.imdb.com/title/tt0435761/?ref_=fn_t...
55,The Good Dinosaur,2015.0,93.0,USA,English,Adventure|Animation|Comedy|Family|Fantasy,PG,Color,2.35,123070338.0,,Peter Sohn,A.J. Buckley,Jack McGraw,Peter Sohn,6.8,62836,345.0,298.0,20000,113.0,275.0,150.0,113.0,696,0.0,apatosaurus|asteroid|dinosaur|fear|river,http://www.imdb.com/title/tt1979388/?ref_=fn_t...


In [20]:
# Seleccionamos las columnas 'movie_title', 'title_year' e 'imdb_score' del DataFrame filtrado, ordenando por 'imdb_score'
animation_movies[['movie_title', 'title_year', 'imdb_score']].sort_values(by='imdb_score', ascending=False).head(10)

Unnamed: 0,movie_title,title_year,imdb_score
2373,Spirited Away,2001.0,8.6
509,The Lion King,1994.0,8.5
4937,A Charlie Brown Christmas,1965.0,8.4
58,WALL·E,2008.0,8.4
2323,Princess Mononoke,1997.0,8.4
4017,"Batman: The Dark Knight Returns, Part 2",2013.0,8.4
43,Toy Story 3,2010.0,8.3
1588,Toy Story,1995.0,8.3
1947,Shaun the Sheep,,8.3
67,Up,2009.0,8.3


#### Ejercicio 2.7
---

> Dataset a utilizar:  `imdb_movies.csv`

**2.7A:** Importa el dataset de películas y calcula el número de películas españolas.

**2.7B:** Extrae las 10 películas españolas con más ingresos brutos ('gross'), siempre y cuando cuenten con al menos 100 reseñas de usuarios ('num_user_for_reviews'). Extrae sólo las columnas 'movie_title', 'title_year', 'actor_1_name' y 'gross'.

**2.7C:** Modifica el valor de 'actor_1_name' para la película 'The Legend of Zorro' en el DataFrame original y cámbialo a 'Antonio Banderas'.

##### Ejercicio 2.7A

In [21]:
# Importamos el DataFrame de películas
df_movies = pd.read_csv('./data/imdb_movies.csv')

In [22]:
# Filtramos para seleccionar sólo las películas españolas
spanish_movies = df_movies[df_movies['language'] == 'Spanish']

# Imprimimos el número de filas de películas españolas
print(f"Número de películas españolas: {spanish_movies.shape[0]}")

Número de películas españolas: 40


##### Ejercicio 2.7B

In [23]:
# Creamos la condición para seleccionar las películas con 100 o más reseñas de usuarios
cond = spanish_movies['num_user_for_reviews'] >= 100

# Aplicamos la condición y seleccionamos sólo las columnas de 'movie_title', 'title_year', 'actor_1_name' y 'gross'
spanish_movies_filtered = spanish_movies[cond][['movie_title', 'title_year', 'actor_1_name', 'gross']]

# Ordenamos las películas por ingresos brutos para obtener las 10 con más
spanish_movies_filtered.sort_values(by='gross', ascending=False).head(10)

Unnamed: 0,movie_title,title_year,actor_1_name,gross
484,The Legend of Zorro,2005.0,Michael Emerson,45356386.0
2551,Pan's Labyrinth,2006.0,Ivana Baquero,37623143.0
4000,The Secret in Their Eyes,2009.0,Ricardo Darín,20167424.0
3740,Y Tu Mamá También,2001.0,Maribel Verdú,13622333.0
3268,Volver,2006.0,Carmen Maura,12899702.0


##### Ejercicio 2.7C

In [24]:
# Actualizamos el nombre de 'actor_1_name' para la película 'The Legend of Zorro' (fila 484)
df_movies.loc[484, 'actor_1_name'] = 'Antonio Banderas'

In [25]:
# Comprobamos que el cambio se ha realizado
df_movies.loc[484, 'actor_1_name']

'Antonio Banderas'

In [26]:
# Filtramos el DataFrame original para mostrar la fila correspondiente a 'The Legend of Zorro'
df_movies[df_movies['movie_title'].str.contains("The Legend of Zorro", case=False, na=False)]

Unnamed: 0,movie_title,title_year,duration,country,language,genres,content_rating,color,aspect_ratio,gross,budget,director_name,actor_1_name,actor_2_name,actor_3_name,imdb_score,num_voted_users,num_user_for_reviews,num_critic_for_reviews,movie_facebook_likes,director_facebook_likes,actor_1_facebook_likes,actor_2_facebook_likes,actor_3_facebook_likes,cast_total_facebook_likes,facenumber_in_poster,plot_keywords,movie_imdb_link
484,The Legend of Zorro,2005.0,129.0,USA,Spanish,Action|Adventure|Western,PG,Color,2.35,45356386.0,75000000.0,Martin Campbell,Antonio Banderas,Nick Chinlund,Adrian Alonso,5.9,71574,244.0,137.0,951,258.0,2000.0,277.0,163.0,2864,1.0,california|fight|hero|mask|zorro,http://www.imdb.com/title/tt0386140/?ref_=fn_t...


#### Ejercicio 2.8
---

> Dataset a utilizar:  `star_wars.csv`

**2.8A:** Importa el dataset de personajes Star Wars y crea una nueva columna en el DataFrame que clasifique los títulos de las películas ('Films') según si pertenecen a la trilogía original. Asigna el valor 'Original trilogy' a aquellas A New Hope, The Empire Strikes Back, Return of Jedi' y 'New films' a las demás.

**2.8B:** Exporta el DataFrame modificado en un archivo Excel.

##### Ejercicio 2.8A

In [27]:
# Importamos el DataFrame de Star Wars
df_starwars = pd.read_csv('./data/star_wars.csv')
df_starwars.head()

Unnamed: 0,Name,Birth Year,Homeworld,Species,Gender,Height,Mass,Skin Color,Hair Color,Eye Color,Films
0,Luke Skywalker,19.0,Tatooine,Human,male,172.0,77.0,fair,blond,blue,"A New Hope, The Empire Strikes Back, Return of..."
1,C-3PO,112.0,Tatooine,Droid,none,167.0,75.0,gold,none,yellow,"A New Hope, The Empire Strikes Back, Return of..."
2,R2-D2,33.0,Naboo,Droid,none,96.0,32.0,"white, blue",none,red,"A New Hope, The Empire Strikes Back, Return of..."
3,Darth Vader,41.9,Tatooine,Human,male,202.0,136.0,white,none,yellow,"A New Hope, The Empire Strikes Back, Return of..."
4,Leia Organa,19.0,Alderaan,Human,female,150.0,49.0,light,brown,brown,"A New Hope, The Empire Strikes Back, Return of..."


In [28]:
# Resolvemos con el método '.apply()' con una función 'lambda'

# Creamos la columna 'Triology', donde asignamos 'Original trilogy' a las películas que contienen 'Hope', 'Empire' o 'Return', y 'New films' para el resto
df_starwars['Triology'] = df_starwars['Films'].apply(lambda x: 'Original trilogy' if ('Hope' in x or 'Empire' in x or 'Return' in x) else 'New films')
df_starwars

Unnamed: 0,Name,Birth Year,Homeworld,Species,Gender,Height,Mass,Skin Color,Hair Color,Eye Color,Films,Triology
0,Luke Skywalker,19.0,Tatooine,Human,male,172.0,77.0,fair,blond,blue,"A New Hope, The Empire Strikes Back, Return of...",Original trilogy
1,C-3PO,112.0,Tatooine,Droid,none,167.0,75.0,gold,none,yellow,"A New Hope, The Empire Strikes Back, Return of...",Original trilogy
2,R2-D2,33.0,Naboo,Droid,none,96.0,32.0,"white, blue",none,red,"A New Hope, The Empire Strikes Back, Return of...",Original trilogy
3,Darth Vader,41.9,Tatooine,Human,male,202.0,136.0,white,none,yellow,"A New Hope, The Empire Strikes Back, Return of...",Original trilogy
4,Leia Organa,19.0,Alderaan,Human,female,150.0,49.0,light,brown,brown,"A New Hope, The Empire Strikes Back, Return of...",Original trilogy
...,...,...,...,...,...,...,...,...,...,...,...,...
80,Finn,,,Human,male,,,dark,black,dark,The Force Awakens,New films
81,Rey,,,Human,female,,,light,brown,hazel,The Force Awakens,New films
82,Poe Dameron,,,Human,male,,,light,brown,brown,The Force Awakens,New films
83,BB8,,,Droid,none,,,none,none,black,The Force Awakens,New films


In [29]:
# Resolvemos con el método 'np.where()'

# Importamos NumPy
import numpy as np

# Creamos condiciones para verificar si los títulos contienen palabras clave de la trilogía original
cond_IV = df_starwars['Films'].str.contains('Hope', regex=False, case=False)
cond_V = df_starwars['Films'].str.contains('Empire', regex=False, case=False)
cond_VI = df_starwars['Films'].str.contains('Return', regex=False, case=False)

# Clasificamos las películas en 'Original Trilogy' o 'New Films' en una nueva columna con 'n.where()'
df_starwars['Triology'] = np.where(cond_IV | cond_V | cond_VI, 'Original triology', 'New films')
df_starwars

Unnamed: 0,Name,Birth Year,Homeworld,Species,Gender,Height,Mass,Skin Color,Hair Color,Eye Color,Films,Triology
0,Luke Skywalker,19.0,Tatooine,Human,male,172.0,77.0,fair,blond,blue,"A New Hope, The Empire Strikes Back, Return of...",Original triology
1,C-3PO,112.0,Tatooine,Droid,none,167.0,75.0,gold,none,yellow,"A New Hope, The Empire Strikes Back, Return of...",Original triology
2,R2-D2,33.0,Naboo,Droid,none,96.0,32.0,"white, blue",none,red,"A New Hope, The Empire Strikes Back, Return of...",Original triology
3,Darth Vader,41.9,Tatooine,Human,male,202.0,136.0,white,none,yellow,"A New Hope, The Empire Strikes Back, Return of...",Original triology
4,Leia Organa,19.0,Alderaan,Human,female,150.0,49.0,light,brown,brown,"A New Hope, The Empire Strikes Back, Return of...",Original triology
...,...,...,...,...,...,...,...,...,...,...,...,...
80,Finn,,,Human,male,,,dark,black,dark,The Force Awakens,New films
81,Rey,,,Human,female,,,light,brown,hazel,The Force Awakens,New films
82,Poe Dameron,,,Human,male,,,light,brown,brown,The Force Awakens,New films
83,BB8,,,Droid,none,,,none,none,black,The Force Awakens,New films


##### Ejercicio 2.8B

In [30]:
# Exportamos el DataFrame a un archivo Excel, nombrando 'Star Wars Characters' a la hoja
df_starwars.to_excel('star_wars_modified.xlsx', sheet_name='Star Wars Characters', index=False)