Datasets from: Disney+ Hotstar
เลือกข้อมูล Disney-Hotstar มาเนื่องจากเป็นแหล่งข้อมูลที่ใหม่ และยังไม่ค่อยมีการเผยแพร่ออกมามากนัก และภายในชุดข้อมูลมีรายละเอียดชื่อหนัง , แนวหนัง และความเหมาะสมแก้ทุกวัย โดยจะอิงตามประวัติการใช้งานของทุกคนในช่วงไม่กี่เดือนที่ผ่านมา ซึ่งกลุ่มเราได้นำชุดข้อมูลนี้มา ปรับแต่งและจัดระเบียบข้อมูลบางส่วนให้ง่ายต่อการใช้งานมากขึ้น
- disney_plus_beforeclean: Original Dataset
- disney_plus_aftercleaned: Clean Dataset
- R-Script: R-Script used in this assignment
- Import datasets and libraries.
- Define a question
- Data Cleaning
- List a film that is a episode
- Find average rumtime of movie
- Find a movies that is directed by Anthony Russo and Joe Russo
- Find total, max, min, average of metascore.
- Find total, max, min, average of imdb_vote.
- Find total, max, min, average of imdb rating.
- List a film that is rated at PG-13.
- List a film that has runtime greater than 140 minutes.
Use read.csv to import dataset
#Import Data
data <- read.csv("https://raw.githubusercontent.com/sit-2021-int214/005-Disney-Hotstar/main/disney_plus_shows.csv")
#Library
library(dplyr)
library(assertive)
library(readr)
library(stringr)
- How many episodes in this dataset ?
- Find average runtime of movie
- Name a movie that is directed by Anthony Russo and Joe Russo
- Find total, max, min, average of metascore.
- Find total, max, min, average of imdb_vote.
- Find total, max, min, average of imdb_rating.
- List a film that is rated at PG-13.
- List a film that has runtime greather than 140 minutes.
data <- data %>% na_if(.,"") %>% na_if(.,"N/A")
data$type <- data$type %>% as.factor()
data$rated <- data$rated %>% str_trim() %>% as.factor()
data$rated <- data$rated %>% str_replace("APPROVED", "Approved")
data$rated <- data$rated %>% str_replace("NOT RATED" , "Not Rated")
data$rated <- data$rated %>% str_replace("PASSED" , "Passed")
data$rated <- data$rated %>% str_replace("UNRATED" , "Unrated")
2.8 Delete extra character out and trim the space out, then change type to numeric and rename to "runtime (min)"
data$runtime <- data$runtime %>% str_remove("min") %>% str_trim() %>% as.numeric()
names(data)[8] <- 'runtime_min'
data$metascore <- data$metascore %>% as.numeric()
data %>% select('Film ID' = imdb_id,'Film title' = title,type) %>% filter(type == "episode")
Film ID Film title type
1 tt1378121 El Materdor episode
2 tt1788785 Moon Mater episode
3 tt1378123 Rescue Squad Mater episode
4 tt3755836 Spinning episode
5 tt2166364 Time Travel Mater episode
6 tt1511164 Unidentified Flying Mater episode
7 tt3755824 Bugged episode
8 tt0090803 Casebusters episode
9 tt3067144 Phineas and Ferb: Star Wars episode
10 tt7642482 Holiday Magic episode
11 tt0923160 Disneyland Around the Seasons episode
12 tt1455430 Easter Island Underworld episode
13 tt11194650 What is Cheese? episode
14 tt11194652 What is Reading? episode
15 tt4664808 Incredible! The Story of Dr. Pol episode
16 tt2278028 Episode #2.7 episode
17 tt0091566 Mr. Boogedy episode
18 tt0064705 My Dog, the Thief: Part 1 episode
19 tt2283584 Phineas and Ferb: Mission Marvel episode
20 tt0056441 Sammy, the Way-Out Seal: Part 1 episode
21 tt0166829 The Sultan and the Rock Star episode
22 tt6001956 Blue Ribbon Kids episode
23 tt0561332 The Plausible Impossible episode
There are 23 episodes in this dataset.
movies <- data %>% select(imdb_id,title,type,runtime_min) %>% filter(type == "movie")
movies$runtime_min %>% mean(.,na.rm = TRUE)
[1] 78.38671
An average of movies in this dataset is 78.38671
5.1 Use filter() to filter out others than movie and assigned to a new param called movies_director.
movies_director <- data %>% select(imdb_id,title,type,director) %>% filter(type == "movie")
5.2 Use filter() to filter a director that are named Anthony Russo and Joe Russo, then use glimpse() to display a data
movies_director %>% filter(director == "Anthony Russo, Joe Russo") %>% glimpse()
Rows: 3
Columns: 4
$ imdb_id <chr> "tt4154796", "tt3498820", "tt1843866"
$ title <chr> "Avengers: Endgame", "Captain America: Civil War", "Captain America: The Winter Soldier"
$ type <fct> movie, movie, movie
$ director <chr> "Anthony Russo, Joe Russo", "Anthony Russo, Joe Russo", "Anthony Russo, Joe Russo"
There are 3 moives that is directed by Anthony Russo and Joe Russ
data$metascore %>% sum(.,na.rm = T)
[1] 18122
Total of metascore is 18122
data$metascore %>% max(.,na.rm = T)
[1] 99
The highest metascore is 99
data$metascore %>% min(.,na.rm = T)
[1] 19
The lowest metascore is 19
data$metascore %>% mean(.,na.rm = T)
[1] 62.06164
The average metascore is 62.06
data$imdb_votes %>% sum(.,na.rm = T)
[1] 89242
Total of imdb vote is 89242
data$imdb_votes %>% max(.,na.rm = T)
[1] 1173
The highest vote score is 1173
data$imdb_votes %>% min(.,na.rm = T)
[1] 5
The lowest imdb vote score is 5
data$imdb_votes %>% mean(.,na.rm = T)
[1] 347.2451
The average imdb vote is 347.2451
data$imdb_rating %>% sum(.,na.rm = T)
[1] 5851
Total of imdb rating is 5851
data$imdb_rating %>% max(.,na.rm = T)
[1] 9.7
The highest imdb rating is 9.7
data$imdb_rating %>% min(.,na.rm = T)
[1] 1.5
The lowest imdb rating is 1.5
data$imdb_rating %>% mean(.,na.rm = T)
[1] 6.656428
The average imdb rating is 6.66
data %>% filter(rated == "PG-13") %>% glimpse()
Rows: 37
Columns: 19
$ imdb_id <chr> "tt0147800", "tt0499549", "tt0118998"~
$ title <chr> "10 Things I Hate About You", "Avatar~
$ plot <chr> "A pretty, popular teenager can't go ~
$ type <fct> movie, movie, movie, movie, movie, mo~
$ rated <chr> "PG-13", "PG-13", "PG-13", "PG-13", "~
$ year <chr> "1999", "2009", "1998", "2018", "1996~
$ released_at <chr> "31 Mar 1999", "18 Dec 2009", "26 Jun~
$ added_at <chr> "November 12, 2019", "November 12, 20~
$ runtime_min <dbl> 97, 162, 85, 100, 113, 132, 117, 141,~
$ genre <chr> "Comedy, Drama, Romance", "Action, Ad~
$ director <chr> "Gil Junger", "James Cameron", "Betty~
$ writer <chr> "Karen McCullah, Kirsten Smith", "Jam~
$ actors <chr> "Heath Ledger, Julia Stiles, Joseph G~
$ language <chr> "English, French", "English, Spanish"~
$ country <chr> "USA", "USA", "USA", "USA", "USA", "U~
$ awards <chr> "2 wins & 13 nominations.", "Won 3 Os~
$ metascore <dbl> 70, 83, 46, 83, 31, 51, 64, 66, 78, 8~
$ imdb_rating <dbl> 7.3, 7.8, 5.4, 8.2, 5.8, 6.6, 7.3, 7.~
$ imdb_votes <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
There are 37 films that are PG-13 rate
data %>% filter(runtime_min > 140) %>% glimpse()
Rows: 15
Columns: 19
$ imdb_id <chr> "tt0499549", "tt0416716", "tt2395427"~
$ title <chr> "Avatar", "Empire of Dreams: The Stor~
$ plot <chr> "A paraplegic Marine dispatched to th~
$ type <fct> movie, movie, movie, movie, movie, mo~
$ rated <chr> "PG-13", NA, "PG-13", "PG-13", "PG-13~
$ year <chr> "2009", "2004", "2015", "2019", "2016~
$ released_at <chr> "18 Dec 2009", "20 Sep 2004", "01 May~
$ added_at <chr> "November 12, 2019", "November 12, 20~
$ runtime_min <dbl> 162, 151, 141, 181, 147, 143, 149, 16~
$ genre <chr> "Action, Adventure, Fantasy, Sci-Fi",~
$ director <chr> "James Cameron", "Edith Becker, Kevin~
$ writer <chr> "James Cameron", "Ed Singer", "Joss W~
$ actors <chr> "Sam Worthington, Zoe Saldana, Sigour~
$ language <chr> "English, Spanish", "English", "Engli~
$ country <chr> "USA", "USA", "USA", "USA", "USA", "U~
$ awards <chr> "Won 3 Oscars. Another 86 wins & 129 ~
$ metascore <dbl> 83, NA, 66, 78, 75, 69, NA, 50, 53, 6~
$ imdb_rating <dbl> 7.8, 8.3, 7.3, 8.4, 7.8, 8.0, 8.5, 7.~
$ imdb_votes <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
There are 15 films that has runtime greater than 140 minutes.
งานนี้เป็นส่วนของวิชา INT214 Statistics for Information technology
ภาคเรียนที่ 1 ปีการศึกษา 2564 คณะเทคโนโลยีสารสนเทศ มหาวิทยาลัยเทคโนโลยีพระจอมเกล้าธนบุรี
No. | Name | Student ID |
---|---|---|
1 | นายกรวิชญ์ วัฒนธนกุล | 63130500001 |
2 | นางสาวชญาดา อินทรสอน | 63130500020 |
3 | นายโชติวิทย์ เสือยันต์ | 6313050026 |
4 | นางสาวนาตาเซีย ยุสุวพันธ์ | 63130500070 |
- ATCHARA TRAN-U-RAIKUL
- JATAWAT XIE (Git: safesit23)