Skip to content
Downloading YouTube Subtitle Transcription in a Tidy Tibble Data_Frame in R
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R
docs
inst
man
vignettes
.Rbuildignore V0.1.0 Apr 23, 2019
.gitignore Update Apr 23, 2019
.travis.yml
CRAN-RELEASE
DESCRIPTION
NAMESPACE
NEWS.md
README.Rmd
README.md Update May 1, 2019
_pkgdown.yml
appveyor.yml
codecov.yml
cran-comments.md

README.md

youtubecaption

License: GPL v3 CRAN status Total Downloads Travis build status AppVeyor build status Codecov test coverage

Motivation

Although there exist some R packages tailored for YouTube API (e.g., ‘tuber’), downloading YouTube video subtitle (i.e., caption) in a tidy form has never been a low-hanging fruit. Using ‘youtube-transcript-api’ Python package under the hood, this R package provides users with a convenient way of parsing and converting a desired YouTube caption into a handy tibble data_frame object. Furthermore, users can easily save a desired YouTube caption data as a tidy Excel file without advanced programming background knowledge.

Installation

Python Dependencies

youtubecaption requires Anaconda Python environment on your system Path.

If you have not installed Conda environment on your system, please download and install Anaconda (Python 3.6 or later is recommended).

For this package, I have employed youtube-transcript-api Python module into R using reticulate.

R Package Installation

Development Version

You can install the latest development version as follows:

if(!require(remotes)) {
install.packages("remotes")
}

remotes::install_github("jooyoungseo/youtubecaption")

Stable Version

You can install the released version of youtubecaption from CRAN with:

install.packages('youtubecaption')

Usage

Please use get_caption() function after loading youtubecaption package like below:

library(youtubecaption)

# Let's get the video caption out of Hadley Wickham's "You can't do data science in a GUI":
url <- "https://www.youtube.com/watch?v=cpbtcsGE0OA"
caption <- get_caption(url)
caption

#> # A tibble: 1,420 x 5
#>    segment_id text                                start duration vid       
#>         <int> <chr>                               <dbl>    <dbl> <chr>     
#>  1          1 thank you for coming to a meeting ~  7.13     8.32 cpbtcsGE0~
#>  2          2 in regards to data science GUI with 10.7      8.44 cpbtcsGE0~
#>  3          3 happy with chief data scientist in~ 15.4      7.11 cpbtcsGE0~
#>  4          4 studio as well as the member of th~ 19.1      7.23 cpbtcsGE0~
#>  5          5 Foundation and an attempt professo~ 22.6      6    cpbtcsGE0~
#>  6          6 Stanford and at the University of   26.4      6.48 cpbtcsGE0~
#>  7          7 Auckland he builds both computatio~ 28.6      7.17 cpbtcsGE0~
#>  8          8 and cognitive tools to make data s~ 32.8      7.5  cpbtcsGE0~
#>  9          9 easier faster and more times his w~ 35.7      7.01 cpbtcsGE0~
#> 10         10 includes various packages as well ~ 40.4      6.21 cpbtcsGE0~
#> # ... with 1,410 more rows

# Save the caption as an Excel file and open it right it away:
get_caption(url = url, savexl = TRUE, openxl = TRUE)
You can’t perform that action at this time.