@stevenbird
Initial release of the corpus available at: https://doi.org/10.7488/ds/1411
Suggested NLTK name: ARCOSG
I have updated and corrected the corpus for inclusion in NLTK. (The one at the link above is older and shouldn't be used).
Corpus reader code verified:
arcosg = LazyCorpusLoader( 'arcosg', CategorizedTaggedCorpusReader, r'.*\.txt', cat_file='cats.prn', tagset='parole', encoding='utf-8', )
Categories file and map to Universal Tag Set created and verified
Licensed under Creative Commons. See: https://doi.org/10.7488/ds/1411