A corpus of Spanish political speeches from 1937 to 2021

A corpus of political speeches delivered by the head of state of Spain from 1937 to 2021. The corpus consists of 77 speeches (206,937 tokens) written in European Spanish that were delivered on Christmas by the different heads of state of Spain from 1937 to 2021 (dictator Francisco Franco, king Juan Carlos I and king Felipe VI).

This repo contains the corpus files, the corpus interface and the visualization scripts.

For the main page of the project (with visualizations, etc) please visit the website of the project (in Spanish).
The description of this corpus can be found on the LREC paper A Corpus of Spanish Political Speeches from 1937 to 2019
For information about the Christmas speech see Wikipedia's page about the Christmas Eve National Speech.
A previous version of this project (with speeches only from 1975 on) can be found here.
This project was featured in Spanish newspaper eldiario.es on December 2019. See the article (in Spanish)

This repo includes:

the texts of the speeches
a Python interface using NLTK and spaCy to query the corpus
a set of HTML visualizations using scattertext libraries. (see HTML visualizations [in Spanish]).

The repo contains the following files:

The data folder contains the files with the Christmas speeches from 1937 to 2021 in txt format and the metadata associated with every speech (year, speaker, URL were it was retrieved).
The Speech class creates the object Speech with the information for a given speech.
The Corpus class creates the corpus object that contains the speeches to be analyzed. This class uses the files inside the speeches folder. This file also contain the methods to perform the lexical analysis of the created corpus.
The visualize.py script creates interactive HTML visualization from TF-IDF measures using scattertext library. The visualization files are stored inside the visualization folder.

The files that have a main method than can be executed are:

corpus.py (creates several corpus objects for different time periods of time and calls the radiography method in order to get their lexical analysis)
visualize.py (creates an instance of the corpus class and generates several visualization files with TF-IDF measures using the scattertext library)

The file requirements.txt contains the libraries required to run the program. After installing the repo, just run the following command: pip install -r requirements.txt

When citing this work please use the following reference to my LREC paper:

@inproceedings{alvarez-mellado-2020-corpus,
    title = "A Corpus of {S}panish Political Speeches from 1937 to 2019",
    author = "{\'A}lvarez-Mellado, Elena",
    booktitle = "Proceedings of The 12th Language Resources and Evaluation Conference",
    month = may,
    year = "2020",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    url = "https://www.aclweb.org/anthology/2020.lrec-1.116",
    pages = "928--932",
    ISBN = "979-10-95546-34-4",
}

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
data		data
docs		docs
visualization		visualization
.gitattributes		.gitattributes
README.md		README.md
corpus.py		corpus.py
requirements.txt		requirements.txt
speech.py		speech.py
visualize.py		visualize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A corpus of Spanish political speeches from 1937 to 2021

About

Releases

Packages

Languages

lirondos/discursos-de-navidad

Folders and files

Latest commit

History

Repository files navigation

A corpus of Spanish political speeches from 1937 to 2021

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages