You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We need a script to generate a dataset for experiment. Our current dataset is ALTA-2010 Shared Task. In the case for the need of more language, annotation, shorter text or whatever else, we need to be able to generate a similar dataset.
Step needed:
Download Wikipedia dumps. Wikipedia texts are named in a format of xxwiki. All we need here are "current versions only" dumps.
Apply the methodology of the paper. You can easily get interlanguage links from the corresponding wiki page. (use a library BeautifulSoup4, find tags with a class "interlanguage-link")
The text was updated successfully, but these errors were encountered:
I committed a utility code to help downloading Wikipedia dump files. It can also be used for other purposes. You'll need BeautifulSoup4 and requests to run it.
We need a script to generate a dataset for experiment. Our current dataset is ALTA-2010 Shared Task. In the case for the need of more language, annotation, shorter text or whatever else, we need to be able to generate a similar dataset.
Step needed:
The text was updated successfully, but these errors were encountered: