Biografix is a multilingual NLP tool that removes the parenthetical biographical structures and creates new sentences out of them.
Biografix has been developed for Basque but it has been adapted for other languages.
At this moment Biografix works for this languages:
- Basque
- Catalan
- Galician
- German
- Italian
- French
- Portugese
- Spanish
The input file for Biografix should be a csv file containg the title of the Wikipedian article (or the name of the person) in the first column and the sentence containing the biografical information in brackets in the second column. The file input.example.csv is an example file to use Biografix. However the input format can be easily adapted.
Please note that all the versions but Basque have been adapted and no develoment has been done for them. All the version but the Portuguese and the Italian have been evaluated.
If you want more information or if you use this tool please cite:
http://ixa.si.ehu.es/Ixa/Produktuak/1403535629
Gonzalez-Dios, I., Aranzabe, M.J., Díaz de Ilarraza, A. (2014) Making Biographical Data in Wikipedia Readable: A pattern-based Multilingual Approach. Proceedings of the Workshop on Automatic Text Simplification - Methods and Applications in the Multilingual Society (ATS-MA 2014). Workshop at Coling 2014. pp. 11--20.
Contact: itziar dot gonzalezd at ehu dot es