Skip to content
R Package for exploring Samples of generated texts from Open AI's new GPT-2 language model
R Rebol
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R now returns tidy dataframes Feb 21, 2019
data-raw now returns tidy dataframes Feb 21, 2019
data now returns tidy dataframes Feb 21, 2019
man now returns tidy dataframes Feb 21, 2019
revdep Conditional Samples added Feb 21, 2019
tests first dev release Feb 21, 2019
.Rbuildignore Conditional Samples added Feb 21, 2019
.gitignore Conditional Samples added Feb 21, 2019
CODE_OF_CONDUCT.md Conditional Samples added Feb 21, 2019
DESCRIPTION first dev release Feb 21, 2019
LICENSE Conditional Samples added Feb 21, 2019
NAMESPACE
README.Rmd now returns tidy dataframes Feb 21, 2019
README.md now returns tidy dataframes Feb 21, 2019
gpt2samples.Rproj Conditional Samples added Feb 21, 2019

README.md

gpt2samples

The goal of gpt2samples is to help users explore the various sample texts as generated by Open AI’s new GPT2 transformer based language model.

An original implementation of a smaller version of GPT-2 can be found here, and the original sample text files can be found here.

Data

This package contains the following data, stored as tibbles:

tibble description
conditional-t07 Conditionally generated samples, with context prompts from WebText test corpus, default settings (temperature 1 and no truncation).
conditional-topk40 Conditionally generated samples, with context prompts from WebText test corpus, with temperature 0.7
conditional Conditionally generated samples, with context prompts from WebText test corpus, with truncation and top_k 40.
unconditional Unconditionally generated samples, default settings.
unconditional-t07 Unconditionally generated samples, with temperature 0.7
unconditional-topk40 Unconditionally generated samples, with truncation and top_k 40.

Additionally, all the generated samples (conditional and unconditional) can be explored by calling all_samples().

Installation

You can install the released version of gpt2samples from GitHub with:

# install.packages("gpt2samples")
# install.packages("devtools")
devtools::install_github("kanishkamisra/gpt2samples")

Example

This is a basic example to explore the data using dplyr verbs

library(dplyr)
library(gpt2samples)

conditional %>%
  filter(id == 100)
#> # A tibble: 2 x 4
#>   file         id type     text                                            
#>   <chr>     <int> <chr>    <chr>                                           
#> 1 conditio…   100 sample   the waterbody that you are managing, getting pr…
#> 2 conditio…   100 complet… Permit, WDFW ensures that nonconventional child…

unconditional_t07 %>%
  filter(id == 250)
#> # A tibble: 213 x 3
#>    file              id text                                               
#>    <chr>          <int> <chr>                                              
#>  1 unconditional…   250 This question already has an answer here: How do I…
#>  2 unconditional…   250 ""                                                 
#>  3 unconditional…   250 This is a basic question regarding text editing. T…
#>  4 unconditional…   250 ""                                                 
#>  5 unconditional…   250 (A)                                                
#>  6 unconditional…   250 ""                                                 
#>  7 unconditional…   250 (B)                                                
#>  8 unconditional…   250 ""                                                 
#>  9 unconditional…   250 (A)                                                
#> 10 unconditional…   250 ""                                                 
#> # … with 203 more rows

all_samples() %>%
  filter(file == "conditional") %>%
  tail()
#> # A tibble: 6 x 4
#>   file         id type     text                                            
#>   <chr>     <int> <chr>    <chr>                                           
#> 1 conditio…   500 complet… "BOP will be remembered for it's technically in…
#> 2 conditio…   500 complet… ""                                              
#> 3 conditio…   500 complet… There were literal lap times in running the wat…
#> 4 conditio…   500 complet… ""                                              
#> 5 conditio…   500 complet… ""                                              
#> 6 conditio…   500 complet… I was voiced by legendary actor turns down play…

all_samples() %>%
  group_by(file) %>%
  summarise(total_lines = n())
#> # A tibble: 6 x 2
#>   file                 total_lines
#>   <chr>                      <int>
#> 1 conditional                18067
#> 2 conditional-t07            24081
#> 3 conditional-topk40         20405
#> 4 unconditional              19469
#> 5 unconditional-t07          28841
#> 6 unconditional-topk40       21188

Additional exploration can use Julia Silge and David Robinson’s tidytext package, among others to analyze the generated text as produced by GPT-2.

Contributor Code of Conduct

Please note that the ‘gpt2samples’ project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

You can’t perform that action at this time.