gpt2samples

The goal of gpt2samples is to help users explore the various sample texts as generated by Open AI’s new GPT2 transformer based language model.

An original implementation of a smaller version of GPT-2 can be found here, and the original sample text files can be found here.

Data

This package contains the following data, stored as tibbles:

tibble	description
conditional-t07	Conditionally generated samples, with context prompts from `WebText` test corpus, default settings (temperature 1 and no truncation).
conditional-topk40	Conditionally generated samples, with context prompts from `WebText` test corpus, with temperature 0.7
conditional	Conditionally generated samples, with context prompts from `WebText` test corpus, with truncation and top_k 40.
unconditional	Unconditionally generated samples, default settings.
unconditional-t07	Unconditionally generated samples, with temperature 0.7
unconditional-topk40	Unconditionally generated samples, with truncation and top_k 40.

Additionally, all the generated samples (conditional and unconditional) can be explored by calling all_samples().

Installation

You can install the released version of gpt2samples from GitHub with:

# install.packages("gpt2samples")
# install.packages("devtools")
devtools::install_github("kanishkamisra/gpt2samples")

Example

This is a basic example to explore the data using dplyr verbs

library(dplyr)
library(gpt2samples)

conditional %>%
  filter(id == 100)
#> # A tibble: 2 x 4
#>   file         id type     text                                            
#>   <chr>     <int> <chr>    <chr>                                           
#> 1 conditio…   100 sample   the waterbody that you are managing, getting pr…
#> 2 conditio…   100 complet… Permit, WDFW ensures that nonconventional child…

unconditional_t07 %>%
  filter(id == 250)
#> # A tibble: 213 x 3
#>    file              id text                                               
#>    <chr>          <int> <chr>                                              
#>  1 unconditional…   250 This question already has an answer here: How do I…
#>  2 unconditional…   250 ""                                                 
#>  3 unconditional…   250 This is a basic question regarding text editing. T…
#>  4 unconditional…   250 ""                                                 
#>  5 unconditional…   250 (A)                                                
#>  6 unconditional…   250 ""                                                 
#>  7 unconditional…   250 (B)                                                
#>  8 unconditional…   250 ""                                                 
#>  9 unconditional…   250 (A)                                                
#> 10 unconditional…   250 ""                                                 
#> # … with 203 more rows

all_samples() %>%
  filter(file == "conditional") %>%
  tail()
#> # A tibble: 6 x 4
#>   file         id type     text                                            
#>   <chr>     <int> <chr>    <chr>                                           
#> 1 conditio…   500 complet… "BOP will be remembered for it's technically in…
#> 2 conditio…   500 complet… ""                                              
#> 3 conditio…   500 complet… There were literal lap times in running the wat…
#> 4 conditio…   500 complet… ""                                              
#> 5 conditio…   500 complet… ""                                              
#> 6 conditio…   500 complet… I was voiced by legendary actor turns down play…

all_samples() %>%
  group_by(file) %>%
  summarise(total_lines = n())
#> # A tibble: 6 x 2
#>   file                 total_lines
#>   <chr>                      <int>
#> 1 conditional                18067
#> 2 conditional-t07            24081
#> 3 conditional-topk40         20405
#> 4 unconditional              19469
#> 5 unconditional-t07          28841
#> 6 unconditional-topk40       21188

Additional exploration can use Julia Silge and David Robinson’s tidytext package, among others to analyze the generated text as produced by GPT-2.

Contributor Code of Conduct

Please note that the ‘gpt2samples’ project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
R		R
data-raw		data-raw
data		data
man		man
revdep		revdep
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
gpt2samples.Rproj		gpt2samples.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gpt2samples

Data

Installation

Example

Contributor Code of Conduct

About

Releases

Packages

Languages

License

kanishkamisra/gpt2samples

Folders and files

Latest commit

History

Repository files navigation

gpt2samples

Data

Installation

Example

Contributor Code of Conduct

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages