Skip to content

Corpora downloader and reader for annotated poetic sources

License

Notifications You must be signed in to change notification settings

linhd-postdata/averell

Repository files navigation

Averell

PyPI Package latest release Travis-CI Build Status Documentation Status Zenodo DOI

Averell, the python library and command line interface that facilitates working with existing repositories of annotated poetry. Averell is able to download an annotated corpus and reconcile different TEI entities to provide a unified JSON output at the desired granularity. That is, for their investigations some researchers might need the entire poem, poems split line by line, or even word by word if that is available. Averell allows to specify the granularity of the final generated dataset, which is a combined JSON with all the entities in the selected corpora. Each corpus in the catalog must specify the parser to produce the expected data format.

  • Free software: Apache Software License 2.0

Available corpora (version 1.1.0)

id name lang size docs words granularity license
1 Disco V2.1 (disco2_1) es 22M 4088 381539 stanza line CC-BY
2 Disco V3 (disco3) es 28M 4080 377978 stanza line CC-BY
3 Sonetos Siglo de Oro (adso) es 6.8M 5078 466012 stanza line CC-BY-NC 4.0
4 ADSO 100 poems corpus (adso100) es 128K 100 9208 stanza line CC-BY-NC 4.0
5 Poesía Lírica Castellana Siglo de Oro (plc) es 3.8M 475 299402 stanza line word syllable CC-BY-NC 4.0
6 Gongocorpus (gongo) es 9.2M 481 99079 stanza line word syllable CC-BY-NC-ND 3.0 FR
7 Eighteenth Century Poetry Archive (ecpa) en 2400M 3084 2063668 stanza line word CC BY-SA 4.0
8 For Better For Verse (4b4v) en 39.5M 103 41749 stanza line Unknown
9 Métrique en Ligne (mel) fr 183M 5081 1850222 stanza line Unknown
10 Biblioteca Italiana (bibit) it 242M 25341 7121246 stanza line word Unknown
11 Corpus of Czech Verse (czverse) cs 4100M 66428 12636867 stanza line word CC-BY-SA
12 Stichotheque (stichopt) pt 11.8M 1702 168411 stanza line Unkwown

Documentation

https://averell.readthedocs.io/

Installation

To install averell, run this command in your terminal:

pip install averell

This is the preferred method to install averell, as it will always install the most recent stable release.

If you don't have pip installed, this Python installation guide can guide you through the process.

Usage

Check usage page