# Introducción a la Programación para Ciencia de Datos
## Lenguaje de programación R
_Rocío Romero Zaliz_ - rocio@decsai.ugr.es

# Tidyverse
Una colección de paquetes con una gramática, filosofía y estructura similar (https://tidyverse.tidyverse.org)

<i>Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the tidyverse.” Journal of Open Source Software, 4(43), 1686. doi:10.21105/joss.01686.</i>

In [1]:
# Carga del paquete
library(tidyverse)

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.2     [32m✔[39m [34mtibble   [39m 3.3.0
[32m✔[39m [34mlubridate[39m 1.9.4     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.4     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


## Cheatsheets
https://posit.co/resources/cheatsheets/

In [2]:
class(mtcars)

In [3]:
mtcars

Unnamed: 0_level_0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225.0,105,2.76,3.46,20.22,1,0,3,1
Duster 360,14.3,8,360.0,245,3.21,3.57,15.84,0,0,3,4
Merc 240D,24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
Merc 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
Merc 280,19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4


## Pipeline (%>%) - paquete magrittr
Operador que sirve para realizar varias operaciones de forma secuencial sin recurrir a parentesis anidados o a sobrescribir bases de datos.

In [4]:
# Sin magrittr
x <- c(1, 4, 6, 8)
y <- round(mean(sqrt(log(x))), 2)
y

In [5]:
# Con magrittr
# library(magrittr)
x <- c(1, 4, 6, 8)
y <- x %>% log() %>% sqrt() %>% mean() %>% round(2)
y

In [6]:
# Con magrittr
x <- c(1, 4, 6, 8)
y <- x %>% log %>% sqrt %>% mean %>% round(2)
y

In [7]:
log <- 2
log

In [8]:
log(2)

## Paquete dplyr
dplyr es un paquete de R para manipular, limpiar y resumir datos no estructurados. Facilita y agiliza la exploración y manipulación de datos en R.

In [12]:
# Vamos a trabajar con un conjunto de datos ya creado
# library(dplyr)
starwars %>% print

[90m# A tibble: 87 × 14[39m
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   [3m[90m<chr>[39m[23m     [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m 
[90m 1[39m Luke Sk…    172    77 blond      fair       blue            19   male  mascu…
[90m 2[39m C-3PO       167    75 [31mNA[39m         gold       yellow         112   none  mascu…
[90m 3[39m R2-D2        96    32 [31mNA[39m         white, bl… red             33   none  mascu…
[90m 4[39m Darth V…    202   136 none       white      yellow          41.9 male  mascu…
[90m 5[39m Leia Or…    150    49 brown      light      brown           19   fema… femin…
[90m 6[39m Owen La…    178   120 brown, gr… light      blue            52   male  mascu…
[90m 7[39m Beru Wh…    165    75 brown      light      blue          

In [19]:
print(class(mtcars))
mtcars$pepe

[1] "data.frame"


NULL

In [None]:
# Tibble: si accedes a una columna que no existe te avisa...
starwars$pepe

“Unknown or uninitialised column: `pepe`.”


NULL

In [21]:
starwars %>% str

tibble [87 × 14] (S3: tbl_df/tbl/data.frame)
 $ name      : chr [1:87] "Luke Skywalker" "C-3PO" "R2-D2" "Darth Vader" ...
 $ height    : int [1:87] 172 167 96 202 150 178 165 97 183 182 ...
 $ mass      : num [1:87] 77 75 32 136 49 120 75 32 84 77 ...
 $ hair_color: chr [1:87] "blond" NA NA "none" ...
 $ skin_color: chr [1:87] "fair" "gold" "white, blue" "white" ...
 $ eye_color : chr [1:87] "blue" "yellow" "red" "yellow" ...
 $ birth_year: num [1:87] 19 112 33 41.9 19 52 47 NA 24 57 ...
 $ sex       : chr [1:87] "male" "none" "none" "male" ...
 $ gender    : chr [1:87] "masculine" "masculine" "masculine" "masculine" ...
 $ homeworld : chr [1:87] "Tatooine" "Tatooine" "Naboo" "Tatooine" ...
 $ species   : chr [1:87] "Human" "Droid" "Droid" "Human" ...
 $ films     :List of 87
  ..$ : chr [1:5] "A New Hope" "The Empire Strikes Back" "Return of the Jedi" "Revenge of the Sith" ...
  ..$ : chr [1:6] "A New Hope" "The Empire Strikes Back" "Return of the Jedi" "The Phantom Menace" ...
  ..$ 

In [13]:
mtcars$pepe <- starwars$films

ERROR: Error in `$<-.data.frame`(`*tmp*`, pepe, value = list(c("A New Hope", : replacement has 87 rows, data has 32


In [14]:
class(starwars)

### filter()
Selecciona filas en un data frame

In [15]:
mtcars[mtcars$hp > 100,]

Unnamed: 0_level_0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225.0,105,2.76,3.46,20.22,1,0,3,1
Duster 360,14.3,8,360.0,245,3.21,3.57,15.84,0,0,3,4
Merc 280,19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4
Merc 280C,17.8,6,167.6,123,3.92,3.44,18.9,1,0,4,4
Merc 450SE,16.4,8,275.8,180,3.07,4.07,17.4,0,0,3,3
Merc 450SL,17.3,8,275.8,180,3.07,3.73,17.6,0,0,3,3


In [16]:
# Busco a los andorides...
# Fuera del tidyverso
print(starwars[!is.na(starwars$species) & starwars$species == "Droid",])

[90m# A tibble: 6 × 14[39m
  name   height  mass hair_color skin_color  eye_color birth_year sex   gender  
  [3m[90m<chr>[39m[23m   [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m       [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m   
[90m1[39m C-3PO     167    75 [31mNA[39m         gold        yellow           112 none  masculi…
[90m2[39m R2-D2      96    32 [31mNA[39m         white, blue red               33 none  masculi…
[90m3[39m R5-D4      97    32 [31mNA[39m         white, red  red               [31mNA[39m none  masculi…
[90m4[39m IG-88     200   140 none       metal       red               15 none  masculi…
[90m5[39m R4-P17     96    [31mNA[39m none       silver, red red, blue         [31mNA[39m none  feminine
[90m6[39m BB8        [31mNA[39m    [31mNA[39m none       none        black             [31mNA[39m none  masculi…
[

In [17]:
# Dentro del tidyverso
starwars %>% filter(species == "Droid") %>% print

[90m# A tibble: 6 × 14[39m
  name   height  mass hair_color skin_color  eye_color birth_year sex   gender  
  [3m[90m<chr>[39m[23m   [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m       [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m   
[90m1[39m C-3PO     167    75 [31mNA[39m         gold        yellow           112 none  masculi…
[90m2[39m R2-D2      96    32 [31mNA[39m         white, blue red               33 none  masculi…
[90m3[39m R5-D4      97    32 [31mNA[39m         white, red  red               [31mNA[39m none  masculi…
[90m4[39m IG-88     200   140 none       metal       red               15 none  masculi…
[90m5[39m R4-P17     96    [31mNA[39m none       silver, red red, blue         [31mNA[39m none  feminine
[90m6[39m BB8        [31mNA[39m    [31mNA[39m none       none        black             [31mNA[39m none  masculi…
[

In [18]:
filter(starwars, species == "Droid") %>% print

[90m# A tibble: 6 × 14[39m
  name   height  mass hair_color skin_color  eye_color birth_year sex   gender  
  [3m[90m<chr>[39m[23m   [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m       [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m   
[90m1[39m C-3PO     167    75 [31mNA[39m         gold        yellow           112 none  masculi…
[90m2[39m R2-D2      96    32 [31mNA[39m         white, blue red               33 none  masculi…
[90m3[39m R5-D4      97    32 [31mNA[39m         white, red  red               [31mNA[39m none  masculi…
[90m4[39m IG-88     200   140 none       metal       red               15 none  masculi…
[90m5[39m R4-P17     96    [31mNA[39m none       silver, red red, blue         [31mNA[39m none  feminine
[90m6[39m BB8        [31mNA[39m    [31mNA[39m none       none        black             [31mNA[39m none  masculi…
[

In [19]:
starwars %>% filter(species == "Droid") %>% filter(homeworld == "Naboo") %>% print

[90m# A tibble: 1 × 14[39m
  name  height  mass hair_color skin_color  eye_color birth_year sex   gender   
  [3m[90m<chr>[39m[23m  [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m       [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m    
[90m1[39m R2-D2     96    32 [31mNA[39m         white, blue red               33 none  masculine
[90m# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,[39m
[90m#   vehicles <list>, starships <list>[39m


In [20]:
starwars %>% filter(species == "Droid") %>% filter(height < 100) %>% print

[90m# A tibble: 3 × 14[39m
  name   height  mass hair_color skin_color  eye_color birth_year sex   gender  
  [3m[90m<chr>[39m[23m   [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m       [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m   
[90m1[39m R2-D2      96    32 [31mNA[39m         white, blue red               33 none  masculi…
[90m2[39m R5-D4      97    32 [31mNA[39m         white, red  red               [31mNA[39m none  masculi…
[90m3[39m R4-P17     96    [31mNA[39m none       silver, red red, blue         [31mNA[39m none  feminine
[90m# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,[39m
[90m#   vehicles <list>, starships <list>[39m


In [21]:
starwars %>% filter(species == "Droid") %>%
    filter(height < 100 | is.na(height)) %>% print

[90m# A tibble: 4 × 14[39m
  name   height  mass hair_color skin_color  eye_color birth_year sex   gender  
  [3m[90m<chr>[39m[23m   [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m       [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m   
[90m1[39m R2-D2      96    32 [31mNA[39m         white, blue red               33 none  masculi…
[90m2[39m R5-D4      97    32 [31mNA[39m         white, red  red               [31mNA[39m none  masculi…
[90m3[39m R4-P17     96    [31mNA[39m none       silver, red red, blue         [31mNA[39m none  feminine
[90m4[39m BB8        [31mNA[39m    [31mNA[39m none       none        black             [31mNA[39m none  masculi…
[90m# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,[39m
[90m#   vehicles <list>, starships <list>[39m


In [22]:
starwars %>% filter(species == "Droid") %>%
    filter(height >= 96 & height < 200) %>% print

[90m# A tibble: 4 × 14[39m
  name   height  mass hair_color skin_color  eye_color birth_year sex   gender  
  [3m[90m<chr>[39m[23m   [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m       [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m   
[90m1[39m C-3PO     167    75 [31mNA[39m         gold        yellow           112 none  masculi…
[90m2[39m R2-D2      96    32 [31mNA[39m         white, blue red               33 none  masculi…
[90m3[39m R5-D4      97    32 [31mNA[39m         white, red  red               [31mNA[39m none  masculi…
[90m4[39m R4-P17     96    [31mNA[39m none       silver, red red, blue         [31mNA[39m none  feminine
[90m# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,[39m
[90m#   vehicles <list>, starships <list>[39m


In [23]:
starwars %>% filter(species == "Droid", height >= 96 & height < 200) %>% print

[90m# A tibble: 4 × 14[39m
  name   height  mass hair_color skin_color  eye_color birth_year sex   gender  
  [3m[90m<chr>[39m[23m   [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m       [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m   
[90m1[39m C-3PO     167    75 [31mNA[39m         gold        yellow           112 none  masculi…
[90m2[39m R2-D2      96    32 [31mNA[39m         white, blue red               33 none  masculi…
[90m3[39m R5-D4      97    32 [31mNA[39m         white, red  red               [31mNA[39m none  masculi…
[90m4[39m R4-P17     96    [31mNA[39m none       silver, red red, blue         [31mNA[39m none  feminine
[90m# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,[39m
[90m#   vehicles <list>, starships <list>[39m


In [24]:
starwars %>% filter(species == "Droid") %>% 
    filter(height >= 96 & homeworld %in% c("Naboo", "Tatooine")) %>% print

[90m# A tibble: 3 × 14[39m
  name  height  mass hair_color skin_color  eye_color birth_year sex   gender   
  [3m[90m<chr>[39m[23m  [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m       [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m    
[90m1[39m C-3PO    167    75 [31mNA[39m         gold        yellow           112 none  masculine
[90m2[39m R2-D2     96    32 [31mNA[39m         white, blue red               33 none  masculine
[90m3[39m R5-D4     97    32 [31mNA[39m         white, red  red               [31mNA[39m none  masculine
[90m# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,[39m
[90m#   vehicles <list>, starships <list>[39m


In [27]:
print(starwars[10:15,])

[90m# A tibble: 6 × 14[39m
  name      height  mass hair_color skin_color eye_color birth_year sex   gender
  [3m[90m<chr>[39m[23m      [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m 
[90m1[39m Obi-Wan …    182    77 auburn, w… fair       blue-gray       57   male  mascu…
[90m2[39m Anakin S…    188    84 blond      fair       blue            41.9 male  mascu…
[90m3[39m Wilhuff …    180    [31mNA[39m auburn, g… fair       blue            64   male  mascu…
[90m4[39m Chewbacca    228   112 brown      unknown    blue           200   male  mascu…
[90m5[39m Han Solo     180    80 brown      fair       brown           29   male  mascu…
[90m6[39m Greedo       173    74 [31mNA[39m         green      black           44   male  mascu…
[90m# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,

In [25]:
starwars %>% slice(10:15) %>% print

[90m# A tibble: 6 × 14[39m
  name      height  mass hair_color skin_color eye_color birth_year sex   gender
  [3m[90m<chr>[39m[23m      [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m 
[90m1[39m Obi-Wan …    182    77 auburn, w… fair       blue-gray       57   male  mascu…
[90m2[39m Anakin S…    188    84 blond      fair       blue            41.9 male  mascu…
[90m3[39m Wilhuff …    180    [31mNA[39m auburn, g… fair       blue            64   male  mascu…
[90m4[39m Chewbacca    228   112 brown      unknown    blue           200   male  mascu…
[90m5[39m Han Solo     180    80 brown      fair       brown           29   male  mascu…
[90m6[39m Greedo       173    74 [31mNA[39m         green      black           44   male  mascu…
[90m# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,

In [26]:
starwars %>% slice_max(n=5, height) %>% print # slice_max

[90m# A tibble: 5 × 14[39m
  name      height  mass hair_color skin_color eye_color birth_year sex   gender
  [3m[90m<chr>[39m[23m      [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m 
[90m1[39m Yarael P…    264    [31mNA[39m none       white      yellow            [31mNA[39m male  mascu…
[90m2[39m Tarfful      234   136 brown      brown      blue              [31mNA[39m male  mascu…
[90m3[39m Lama Su      229    88 none       grey       black             [31mNA[39m male  mascu…
[90m4[39m Chewbacca    228   112 brown      unknown    blue             200 male  mascu…
[90m5[39m Roos Tar…    224    82 none       grey       orange            [31mNA[39m male  mascu…
[90m# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,[39m
[90m#   vehicles <list>, starships <list>[39m


In [31]:
starwars %>% slice_min(n=5, height) %>% print # slice_min

[90m# A tibble: 6 × 14[39m
  name      height  mass hair_color skin_color eye_color birth_year sex   gender
  [3m[90m<chr>[39m[23m      [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m 
[90m1[39m Yoda          66    17 white      green      brown            896 male  mascu…
[90m2[39m Ratts Ty…     79    15 none       grey, blue unknown           [31mNA[39m male  mascu…
[90m3[39m Wicket S…     88    20 brown      brown      brown              8 male  mascu…
[90m4[39m Dud Bolt      94    45 none       blue, grey yellow            [31mNA[39m male  mascu…
[90m5[39m R2-D2         96    32 [31mNA[39m         white, bl… red               33 none  mascu…
[90m6[39m R4-P17        96    [31mNA[39m none       silver, r… red, blue         [31mNA[39m none  femin…
[90m# ℹ 5 more variables: homeworld <chr>

### select()
Permite seleccionar variables (columnas) del data frame

In [32]:
print(starwars[,c("name","homeworld")])

[90m# A tibble: 87 × 2[39m
   name               homeworld
   [3m[90m<chr>[39m[23m              [3m[90m<chr>[39m[23m    
[90m 1[39m Luke Skywalker     Tatooine 
[90m 2[39m C-3PO              Tatooine 
[90m 3[39m R2-D2              Naboo    
[90m 4[39m Darth Vader        Tatooine 
[90m 5[39m Leia Organa        Alderaan 
[90m 6[39m Owen Lars          Tatooine 
[90m 7[39m Beru Whitesun Lars Tatooine 
[90m 8[39m R5-D4              Tatooine 
[90m 9[39m Biggs Darklighter  Tatooine 
[90m10[39m Obi-Wan Kenobi     Stewjon  
[90m# ℹ 77 more rows[39m


In [33]:
starwars %>% select(name, homeworld) %>% print

[90m# A tibble: 87 × 2[39m
   name               homeworld
   [3m[90m<chr>[39m[23m              [3m[90m<chr>[39m[23m    
[90m 1[39m Luke Skywalker     Tatooine 
[90m 2[39m C-3PO              Tatooine 
[90m 3[39m R2-D2              Naboo    
[90m 4[39m Darth Vader        Tatooine 
[90m 5[39m Leia Organa        Alderaan 
[90m 6[39m Owen Lars          Tatooine 
[90m 7[39m Beru Whitesun Lars Tatooine 
[90m 8[39m R5-D4              Tatooine 
[90m 9[39m Biggs Darklighter  Tatooine 
[90m10[39m Obi-Wan Kenobi     Stewjon  
[90m# ℹ 77 more rows[39m


In [34]:
starwars %>% select(-name, -homeworld) %>% print

[90m# A tibble: 87 × 12[39m
   height  mass hair_color  skin_color eye_color birth_year sex   gender species
    [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m       [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m  [3m[90m<chr>[39m[23m  
[90m 1[39m    172    77 blond       fair       blue            19   male  mascu… Human  
[90m 2[39m    167    75 [31mNA[39m          gold       yellow         112   none  mascu… Droid  
[90m 3[39m     96    32 [31mNA[39m          white, bl… red             33   none  mascu… Droid  
[90m 4[39m    202   136 none        white      yellow          41.9 male  mascu… Human  
[90m 5[39m    150    49 brown       light      brown           19   fema… femin… Human  
[90m 6[39m    178   120 brown, grey light      blue            52   male  mascu… Human  
[90m 7[39m    165    75 brown       light      blue            47   f

In [35]:
starwars %>% select(starts_with("s")) %>% head(5) %>% print

[90m# A tibble: 5 × 4[39m
  skin_color  sex    species starships
  [3m[90m<chr>[39m[23m       [3m[90m<chr>[39m[23m  [3m[90m<chr>[39m[23m   [3m[90m<list>[39m[23m   
[90m1[39m fair        male   Human   [90m<chr [2]>[39m
[90m2[39m gold        none   Droid   [90m<chr [0]>[39m
[90m3[39m white, blue none   Droid   [90m<chr [0]>[39m
[90m4[39m white       male   Human   [90m<chr [1]>[39m
[90m5[39m light       female Human   [90m<chr [0]>[39m


In [36]:
starwars %>% select(ends_with("es")) %>% head(5) %>% print

[90m# A tibble: 5 × 2[39m
  species vehicles 
  [3m[90m<chr>[39m[23m   [3m[90m<list>[39m[23m   
[90m1[39m Human   [90m<chr [2]>[39m
[90m2[39m Droid   [90m<chr [0]>[39m
[90m3[39m Droid   [90m<chr [0]>[39m
[90m4[39m Human   [90m<chr [0]>[39m
[90m5[39m Human   [90m<chr [1]>[39m


In [42]:
starwars %>% select(contains("a")) %>% head(5) %>% print

[90m# A tibble: 5 × 5[39m
  name            mass hair_color birth_year starships
  [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m           [3m[90m<dbl>[39m[23m [3m[90m<list>[39m[23m   
[90m1[39m Luke Skywalker    77 blond            19   [90m<chr [2]>[39m
[90m2[39m C-3PO             75 [31mNA[39m              112   [90m<chr [0]>[39m
[90m3[39m R2-D2             32 [31mNA[39m               33   [90m<chr [0]>[39m
[90m4[39m Darth Vader      136 none             41.9 [90m<chr [1]>[39m
[90m5[39m Leia Organa       49 brown            19   [90m<chr [0]>[39m


In [40]:
starwars %>% select(name)

name
<chr>
Luke Skywalker
C-3PO
R2-D2
Darth Vader
Leia Organa
Owen Lars
Beru Whitesun Lars
R5-D4
Biggs Darklighter
Obi-Wan Kenobi


In [39]:
starwars %>% pull(name)

In [43]:
class(starwars %>% pull(name))

### arrange()
Reordena filas en un data frame

In [46]:
starwars %>% arrange(height) %>% print

[90m# A tibble: 87 × 14[39m
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   [3m[90m<chr>[39m[23m     [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m 
[90m 1[39m Yoda         66    17 white      green      brown            896 male  mascu…
[90m 2[39m Ratts T…     79    15 none       grey, blue unknown           [31mNA[39m male  mascu…
[90m 3[39m Wicket …     88    20 brown      brown      brown              8 male  mascu…
[90m 4[39m Dud Bolt     94    45 none       blue, grey yellow            [31mNA[39m male  mascu…
[90m 5[39m R2-D2        96    32 [31mNA[39m         white, bl… red               33 none  mascu…
[90m 6[39m R4-P17       96    [31mNA[39m none       silver, r… red, blue         [31mNA[39m none  femin…
[90m 7[39m R5-D4        97    32 [31mN

In [47]:
starwars %>% arrange(desc(height)) %>% print

[90m# A tibble: 87 × 14[39m
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   [3m[90m<chr>[39m[23m     [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m 
[90m 1[39m Yarael …    264    [31mNA[39m none       white      yellow          [31mNA[39m   male  mascu…
[90m 2[39m Tarfful     234   136 brown      brown      blue            [31mNA[39m   male  mascu…
[90m 3[39m Lama Su     229    88 none       grey       black           [31mNA[39m   male  mascu…
[90m 4[39m Chewbac…    228   112 brown      unknown    blue           200   male  mascu…
[90m 5[39m Roos Ta…    224    82 none       grey       orange          [31mNA[39m   male  mascu…
[90m 6[39m Grievous    216   159 none       brown, wh… green, y…       [31mNA[39m   male  mascu…
[90m 7[39m Taun We     213   

In [48]:
starwars %>% arrange(height, desc(birth_year)) %>% print

[90m# A tibble: 87 × 14[39m
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   [3m[90m<chr>[39m[23m     [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m 
[90m 1[39m Yoda         66    17 white      green      brown            896 male  mascu…
[90m 2[39m Ratts T…     79    15 none       grey, blue unknown           [31mNA[39m male  mascu…
[90m 3[39m Wicket …     88    20 brown      brown      brown              8 male  mascu…
[90m 4[39m Dud Bolt     94    45 none       blue, grey yellow            [31mNA[39m male  mascu…
[90m 5[39m R2-D2        96    32 [31mNA[39m         white, bl… red               33 none  mascu…
[90m 6[39m R4-P17       96    [31mNA[39m none       silver, r… red, blue         [31mNA[39m none  femin…
[90m 7[39m R5-D4        97    32 [31mN

### rename()
Renombra columnas en un data frame

In [49]:
starwars %>% rename(hair = hair_color) %>% print

[90m# A tibble: 87 × 14[39m
   name          height  mass hair  skin_color eye_color birth_year sex   gender
   [3m[90m<chr>[39m[23m          [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m 
[90m 1[39m Luke Skywalk…    172    77 blond fair       blue            19   male  mascu…
[90m 2[39m C-3PO            167    75 [31mNA[39m    gold       yellow         112   none  mascu…
[90m 3[39m R2-D2             96    32 [31mNA[39m    white, bl… red             33   none  mascu…
[90m 4[39m Darth Vader      202   136 none  white      yellow          41.9 male  mascu…
[90m 5[39m Leia Organa      150    49 brown light      brown           19   fema… femin…
[90m 6[39m Owen Lars        178   120 brow… light      blue            52   male  mascu…
[90m 7[39m Beru Whitesu…    165    75 brown light      blue          

###  mutate()
Crea nueva columnas en un data frame o actualiza las ya existentes

In [None]:
starwars %>% 
    mutate(height_in = height * 0.393701) %>% 
    select(starts_with("he")) %>% head(5)

height,height_in
<int>,<dbl>
172,67.71657
167,65.74807
96,37.7953
202,79.5276
150,59.05515


In [56]:
starwars %>% select(sex) %>% head(5)

sex
<chr>
male
none
none
male
female


In [58]:
starwars %>% 
    mutate(sex = as.factor(sex), height = 1) %>% print

[90m# A tibble: 87 × 14[39m
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   [3m[90m<chr>[39m[23m     [3m[90m<dbl>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<fct>[39m[23m [3m[90m<chr>[39m[23m 
[90m 1[39m Luke Sk…      1    77 blond      fair       blue            19   male  mascu…
[90m 2[39m C-3PO         1    75 [31mNA[39m         gold       yellow         112   none  mascu…
[90m 3[39m R2-D2         1    32 [31mNA[39m         white, bl… red             33   none  mascu…
[90m 4[39m Darth V…      1   136 none       white      yellow          41.9 male  mascu…
[90m 5[39m Leia Or…      1    49 brown      light      brown           19   fema… femin…
[90m 6[39m Owen La…      1   120 brown, gr… light      blue            52   male  mascu…
[90m 7[39m Beru Wh…      1    75 brown      light      blue          

También existen funciones `mutate_if` y `mutate_at`.

In [59]:
starwars %>% mutate_if(is.character, as.factor) %>% print

[90m# A tibble: 87 × 14[39m
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   [3m[90m<fct>[39m[23m     [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<fct>[39m[23m      [3m[90m<fct>[39m[23m      [3m[90m<fct>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<fct>[39m[23m [3m[90m<fct>[39m[23m 
[90m 1[39m Luke Sk…    172    77 blond      fair       blue            19   male  mascu…
[90m 2[39m C-3PO       167    75 [31mNA[39m         gold       yellow         112   none  mascu…
[90m 3[39m R2-D2        96    32 [31mNA[39m         white, bl… red             33   none  mascu…
[90m 4[39m Darth V…    202   136 none       white      yellow          41.9 male  mascu…
[90m 5[39m Leia Or…    150    49 brown      light      brown           19   fema… femin…
[90m 6[39m Owen La…    178   120 brown, gr… light      blue            52   male  mascu…
[90m 7[39m Beru Wh…    165    75 brown      light      blue          

In [60]:
starwars %>% mutate_at(c("name","sex"), as.factor) %>% print

[90m# A tibble: 87 × 14[39m
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   [3m[90m<fct>[39m[23m     [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<fct>[39m[23m [3m[90m<chr>[39m[23m 
[90m 1[39m Luke Sk…    172    77 blond      fair       blue            19   male  mascu…
[90m 2[39m C-3PO       167    75 [31mNA[39m         gold       yellow         112   none  mascu…
[90m 3[39m R2-D2        96    32 [31mNA[39m         white, bl… red             33   none  mascu…
[90m 4[39m Darth V…    202   136 none       white      yellow          41.9 male  mascu…
[90m 5[39m Leia Or…    150    49 brown      light      brown           19   fema… femin…
[90m 6[39m Owen La…    178   120 brown, gr… light      blue            52   male  mascu…
[90m 7[39m Beru Wh…    165    75 brown      light      blue          

### summarise()/summarize()
Permite colapsar/resumir filas en un data frame

In [61]:
summary(starwars)

     name               height           mass          hair_color       
 Length:87          Min.   : 66.0   Min.   :  15.00   Length:87         
 Class :character   1st Qu.:167.0   1st Qu.:  55.60   Class :character  
 Mode  :character   Median :180.0   Median :  79.00   Mode  :character  
                    Mean   :174.6   Mean   :  97.31                     
                    3rd Qu.:191.0   3rd Qu.:  84.50                     
                    Max.   :264.0   Max.   :1358.00                     
                    NA's   :6       NA's   :28                          
                                                                        
                                                                        
                                                                        
                                                                        
                                                                        
                                                   

In [62]:
starwars$height

In [63]:
starwars$height %>% mean(na.rm=TRUE)

In [64]:
starwars %>% summarise(promedio = mean(height, na.rm=TRUE), NN = n(), desv=sd(mass))

promedio,NN,desv
<dbl>,<int>,<dbl>
174.6049,87,


In [None]:
starwars %>% filter(species == "Droid") %>% summarise(NN = n()) # SELECT COUNT(*) FROM starwars WHERE species='Droid'

NN
<int>
6


In [69]:
starwars %>% filter(species == "Droid") %>% count

n
<int>
6


In [70]:
starwars %>% summarise(desviacion_tipica = sd(height, na.rm=TRUE))

desviacion_tipica
<dbl>
34.77416


In [71]:
starwars %>% summarise(max(mass, na.rm=TRUE))

"max(mass, na.rm = TRUE)"
<dbl>
1358


In [72]:
starwars %>% summarise(mean = mean(height, na.rm=TRUE), sd = sd(height, na.rm=TRUE))

mean,sd
<dbl>,<dbl>
174.6049,34.77416


In [74]:
starwars$species %>% unique %>% length

In [75]:
starwars %>% summarise(n(), mean(mass, na.rm = TRUE))

n(),"mean(mass, na.rm = TRUE)"
<int>,<dbl>
87,97.31186


In [76]:
# Atención que se vienen curvas...
starwars %>%
        group_by(species) %>%
        summarise(n = n(), mass = mean(mass, na.rm = TRUE))

species,n,mass
<chr>,<int>,<dbl>
Aleena,1,15.0
Besalisk,1,102.0
Cerean,1,82.0
Chagrian,1,
Clawdite,1,55.0
Droid,6,69.75
Dug,1,40.0
Ewok,1,20.0
Geonosian,1,80.0
Gungan,3,74.0


In [78]:
starwars %>% 
    group_by(species) %>% 
    summarise(n = n(), mass = mean(mass, na.rm = TRUE)) %>% 
    filter(n > 2, mass > 50)

species,n,mass
<chr>,<int>,<dbl>
Droid,6,69.75
Gungan,3,74.0
Human,35,81.31
,4,81.0


Más funciones útiles para usar con sumarise():
* Center: mean(), median()
* Spread: sd(), IQR(), mad()
* Range: min(), max(), quantile()
* Position: first(), last(), nth()
* Count: n(), n_distinct()
* Logical: any(), all()

In [93]:
starwars %>% summarise_if(is.numeric, mean, na.rm = TRUE)

height,mass,birth_year
<dbl>,<dbl>,<dbl>
174.6049,97.31186,87.56512


In [94]:
starwars %>% summarise_if(is.numeric, list(prom=mean,sd=sd), na.rm = TRUE)

height_prom,mass_prom,birth_year_prom,height_sd,mass_sd,birth_year_sd
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
174.6049,97.31186,87.56512,34.77416,169.4572,154.6914


In [96]:
starwars %>% 
    summarise_at(vars(height,mass), mean, na.rm = TRUE)

height,mass
<dbl>,<dbl>
174.6049,97.31186


In [99]:
starwars %>% 
    summarise_at(vars(height,mass), list(mean,sd), na.rm = TRUE)

height_fn1,mass_fn1,height_fn2,mass_fn2
<dbl>,<dbl>,<dbl>,<dbl>
174.6049,97.31186,34.77416,169.4572


In [83]:
starwars %>% select(height, mass, birth_year) %>%
    summarise_all(list(minimo = min, maximo = max), na.rm = TRUE)

height_minimo,mass_minimo,birth_year_minimo,height_maximo,mass_maximo,birth_year_maximo
<int>,<dbl>,<dbl>,<int>,<dbl>,<dbl>
66,15,8,264,1358,896


### across()
Se utiliza para aplicar una operación a varias columnas de un data frame de manera simultánea. Esta es una versión más moderna y más legible.

In [122]:
starwars %>% 
  group_by(species) %>% 
  filter(n() > 1) %>% # ¡Ojo! usad parentesis si no os lía con la función n()
  summarise(across(c(sex, gender, homeworld), n_distinct))

species,sex,gender,homeworld
<chr>,<int>,<int>,<int>
Droid,1,2,3
Gungan,1,1,1
Human,2,2,15
Kaminoan,2,2,1
Mirialan,1,1,1
Twi'lek,2,2,1
Wookiee,1,1,1
Zabrak,1,1,2
,1,1,3


In [None]:
starwars %>% 
  group_by(species) %>% 
  filter(n() > 1) %>% 
  summarise(across(c(sex, gender, homeworld), list(n_distinct,length))) # WTF? length vs. n

species,sex_1,sex_2,gender_1,gender_2,homeworld_1,homeworld_2
<chr>,<int>,<int>,<int>,<int>,<int>,<int>
Droid,1,6,2,6,3,6
Gungan,1,3,1,3,1,3
Human,2,35,2,35,15,35
Kaminoan,2,2,2,2,1,2
Mirialan,1,2,1,2,1,2
Twi'lek,2,2,2,2,1,2
Wookiee,1,2,1,2,1,2
Zabrak,1,2,1,2,2,2
,1,4,1,4,3,4


In [127]:
starwars %>% n_distinct

In [128]:
starwars %>% n

ERROR: Error in n(.): el argumento no fue usado (.)


In [129]:
starwars %>% length

In [130]:
dim(starwars)

## Extra

In [131]:
df <- data.frame(period=c("Q1_y2019","Q2_y2019", "Q3_y2019","Q4_y2019"), revenue=c(23,24,27,29))
df

period,revenue
<chr>,<dbl>
Q1_y2019,23
Q2_y2019,24
Q3_y2019,27
Q4_y2019,29


In [None]:
df %>% separate(period, c("Quarter","Year"), sep="_y") # El contario de separate es unite

Quarter,Year,revenue
<chr>,<chr>,<dbl>
Q1,2019,23
Q2,2019,24
Q3,2019,27
Q4,2019,29


In [139]:
ndf <- df %>% extract(period, c("Quarter","Year"), "Q(.*)_y(.*)") # Expresiones regulares... más adelante veremos esto
ndf

Quarter,Year,revenue
<chr>,<chr>,<dbl>
1,2019,23
2,2019,24
3,2019,27
4,2019,29


In [152]:
df <- tibble(
  a = c(1, 5, 3),
  b = c(4, 2, 6),
  c = c(7, 3, 1)
)

# Sin rowwise... 
df %>% 
    mutate(max_val = max(a, b, c))  # Devuelve el mismo valor para todas las filas


a,b,c,max_val
<dbl>,<dbl>,<dbl>,<dbl>
1,4,7,7
5,2,3,7
3,6,1,7


In [151]:
# Con rowwise: calcula el máximo por fila
df %>%
  rowwise() %>%
  mutate(max_val = max(a, b, c))


a,b,c,max_val
<dbl>,<dbl>,<dbl>,<dbl>
1,4,7,7
5,2,3,5
3,6,1,6


### Ejercicios Tidyverse
1) Utiliza el dataset de `mtcars` y el paquete `tidyverse` para:
    * Mostrar las 5 primeras filas del dataset
    * Convertir la variables "cyl", "gear" y "carb" en factores y las variables "vs" y "am" en lógicas, y mostrar la estructura del dataset (usar este dataset transformado de aqui en adelante)
    * Mostrar solo los coches con una potencia ("hp") mayor a 100
    * Seleccionar solo las columnas "mpg", "cyl", "hp" y "qsec" del dataset
    * Calcular la cantidad total de coches para cada valor único en la columna "cyl" (número de cilindros)
    * Encontrar el modelo de coche con la mayor potencia ("hp") y mostrar su información completa
    * Calcular el promedio de potencia de los coches con 8 cilindros
2) Supongamos que tienes un data frame llamado `lego_sets` que contiene información sobre diferentes conjuntos de Lego, incluyendo el nombre del conjunto, el número de piezas, el tema y el año de lanzamiento:
    * Filtra y muestra solo los conjuntos de Lego lanzados en el año 2020
    * Encuentra y muestra el nombre y el número de piezas del conjunto de Lego más grande
    * Calcula la cantidad total de piezas para cada tema y muéstralos en orden descendente por número de piezas
    * Calcula cuántos conjuntos de Lego se lanzaron en cada año y muéstralo ordenado por año de forma ascendente
    * Encuentra los 3 temas más populares (con más conjuntos) y muestra el número de conjuntos y el número total de piezas para cada uno de ellos

<pre>
# Generar datos ficticios
nombres <- paste("Conjunto", 1:80)
piezas <- sample(50:1000, 80, replace = TRUE)
temas <- sample(c("Ciudad", "Espacio", "Arquitectura", "Granja", "Dinosaurios", "Aventuras", "Piratas"), 80, replace = TRUE)
años <- sample(2000:2023, 80, replace = TRUE)

# Crear el data frame
lego_sets <- data.frame(
  Set_Name = nombres,
  Piece_Count = piezas,
  Theme = temas,
  Year = años
)
</pre>
3) Dado un data frame llamado `bebidas` que contiene información sobre la edad, género, tipo de bebida y cantidad de copas consumidas por un grupo de personas, filtra y muestra únicamente a las mujeres mayores de 20 años. Para este suconjunto calcula la media, el máximo, la cantidad de copas totales y la desviación estándar de la edad para cada combinación de tipo de bebida. Finalemnte, agrega una nueva columna que indique cuántas personas se encuentran en cada grupo de tipo de bebida.

<pre>
# Generar datos ficticios
num_elementos <- 100
edades <- sample(18:70, num_elementos, replace = TRUE)
sexos <- sample(c("Hombre", "Mujer"), num_elementos, replace = TRUE)
tipos_bebida <- sample(c("Cerveza", "Vino", "Refresco", "Cóctel", "Agua"), num_elementos, replace = TRUE)
cantidad_copas <- sample(1:5, num_elementos, replace = TRUE)

# Crear el data frame de bebidas
bebidas <- data.frame(
  Edad = edades,
  Sexo = sexos,
  Bebida = tipos_bebida,
  Copas = cantidad_copas
)
</pre>
4) Dado un data frame llamado `peliculas`, indique los 2 géneros con mayor puntuación media por país pero solo de peliculas para mayores de 13 o superiores, que incluya el número de películas en ese género, su beneficio medio (ganancia-presupuesto), la desviación estándar de la puntuación y la puntuación media. Ordenar los resultados por puntuación media por país de forma descendente.

<pre>
# Crear un data frame con 150 filas
num_filas <- 150

# Generar datos ficticios
set.seed(123)  # Para reproducibilidad
paises <- sample(c("EE. UU.", "Reino Unido", "Francia", "España", "Italia"), num_filas, replace = TRUE)
nombres <- paste("Película", 1:num_filas)
años <- sample(1980:2023, num_filas, replace = TRUE)
puntuaciones <- round(runif(num_filas, 1, 10), 1)
tematicas <- sample(c("Acción", "Drama", "Comedia", "Ciencia Ficción", "Animación"), num_filas, replace = TRUE)
directores <- paste("Director", 1:num_filas)
companias <- sample(c("Warner Bros.", "Universal Pictures", "Disney", "Sony Pictures", "Paramount Pictures"), num_filas, replace = TRUE)
presupuestos <- round(runif(num_filas, 1000000, 50000000), 2)
ganancias <- round(presupuestos * runif(num_filas, 0.5, 2.5), 2)
ratings <- sample(c("G", "PG", "PG-13", "R", "NC-17"), num_filas, replace = TRUE)  # Columna de rating ficticio

# Crear el data frame
peliculas <- data.frame(
  País = paises,
  Nombre = nombres,
  Año = años,
  Puntuación = puntuaciones,
  Temática = tematicas,
  Director = directores,
  Compañía = companias,
  Presupuesto = presupuestos,
  Ganancia = ganancias,
  Rating = ratings
)
</pre>
5) Dado un conjunto de datos llamado notas que contiene información sobre las notas de los estudiantes en varias asignaturas, agrega una columna con la nota promedio de cada estudiante. Previamente reemplaza los valores faltantes con 0 (HINT: usa replace_na).

<pre>
# Generar datos ficticios
estudiantes <- paste("Estudiante", 1:50)
notas_matematicas <- sample(c(NA, 5, 6, 7, 8, 9, 10), 50, replace = TRUE)
notas_historia <- sample(c(NA, 4, 5, 6, 7, 8), 50, replace = TRUE)
notas_ciencias <- sample(c(NA, 6, 7, 8, 9, 10), 50, replace = TRUE)

# Crear el data frame
notas <- data.frame(
  Estudiante = estudiantes,
  Nota_Matemáticas = notas_matematicas,
  Nota_Historia = notas_historia,
  Nota_Ciencias = notas_ciencias
)
</pre>