Skip to content
/ tidyfreud Public

Data package that contains complete Freud's work in a tidy format tokenized by page, by sentence and by word

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

gygl/tidyfreud

Repository files navigation

tidyfreud

tidyfreud contains the complete work of Sigmund Freud in a tidy format, e.g. ready for NLP tasks.

Installation

You can install the development version of tidyfreud from GitHub with:

# install.packages("devtools")
devtools::install_github("gygl/tidyfreud")

Reproduction of text preparation

The source of the data is a PDF that contains Freud’s complete work and that was downloaded from the following website: https://www.valas.fr/?lang=fr. To reproduce the data preparation clone locally the repository:

git clone git@github.com:gygl/tidyfreud.git

and open the folder as an RStudio project. Then download the file Freud_Complete_Works.pdf and move it to ./data. You can then run the whole preprocessing steps by running the following command:

targets::tar_make()

All the pre-processing steps are done via the function:

create_sfreud_complete_work_tibble(path_pdf = "./data/Freud_Complete_Works.pdf")

that takes as argument path_pdf the path of the pdf file.

The main processing steps are:

  • import in R of the text and table of content contained in the PDF
  • tokenization of the text in sentence/words
  • detection of the book/article titles and subtitles

About

Data package that contains complete Freud's work in a tidy format tokenized by page, by sentence and by word

Topics

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Packages

 
 
 

Languages