Prevert iterator

To use the prevert parser, copy the file prevert.py in your directory.

Use

# import libraries
from prevert import dataset
import pandas as pd

If you are using the MaCoCu corpora in the XML format, the method dataset() needs only the path of the file as the argument:

# Open the dataset with the prevert parser 
dset = dataset("/data/monolingual/mk.xml")

dset consists of docs where you can access the metadata by doc.meta['attribute_name']. Docs consist of paragraphs where you can access the metadata by par.meta['attribute_name'].

Basic use:

for doc in dset: # iterating through documents of a dataset
    print(doc.meta) # all attributes
    print(eval(doc.meta['lang_distr'])[0][0]) # most prominent language in the document
    print(str(doc)) # whole document text
    for par in doc: # iterating through paragraphs of a document
        print(par.meta['id']) # specific attribute
        print(str(par)) # whole paragraph text
    print(doc.to_prevert()) # obtaining the original format

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
pypi_publication		pypi_publication
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
prevert.py		prevert.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pypi_publication

pypi_publication

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

prevert.py

prevert.py

Repository files navigation

Prevert iterator

Use

About

Releases

Packages

Languages

License

macocu/prevert

Folders and files

Latest commit

History

Repository files navigation

Prevert iterator

Use

About

Resources

License

Stars

Watchers

Forks

Languages