Skip to content

Biografix is a multilingual NLP tool that removes the parenthetical biographical structures and creates new sentences out of them.

Notifications You must be signed in to change notification settings

itziargd/Biografix

Repository files navigation

Biografix

Biografix is a multilingual NLP tool that removes the parenthetical biographical structures and creates new sentences out of them.

Biografix has been developed for Basque but it has been adapted for other languages.

At this moment Biografix works for this languages:

  • Basque
  • Catalan
  • Galician
  • German
  • Italian
  • French
  • Portugese
  • Spanish

The input file for Biografix should be a csv file containg the title of the Wikipedian article (or the name of the person) in the first column and the sentence containing the biografical information in brackets in the second column. The file input.example.csv is an example file to use Biografix. However the input format can be easily adapted.

Please note that all the versions but Basque have been adapted and no develoment has been done for them. All the version but the Portuguese and the Italian have been evaluated.

If you want more information or if you use this tool please cite:

http://ixa.si.ehu.es/Ixa/Produktuak/1403535629

Gonzalez-Dios, I., Aranzabe, M.J., Díaz de Ilarraza, A. (2014) Making Biographical Data in Wikipedia Readable: A pattern-based Multilingual Approach. Proceedings of the Workshop on Automatic Text Simplification - Methods and Applications in the Multilingual Society (ATS-MA 2014). Workshop at Coling 2014. pp. 11--20.

Contact: itziar dot gonzalezd at ehu dot es

About

Biografix is a multilingual NLP tool that removes the parenthetical biographical structures and creates new sentences out of them.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages