How to use an updated AGROVOC thesaurus

Fabrizio Celli edited this page Aug 13, 2014 · 6 revisions

The AgroTagger is used to work with AGROVOC and, especially, with the English version of the thesaurus. If you want to update AGROVOC (for instance, because the thesaurus itself has been updated) and/or if you want to re-train MAUI to use AgroTagger ina different language, you need:

  • the SKOS file of the new AGROVOC version/language, in RDF/XML serialization
  • in case you are using AGROVOC in a language different from English, Spanish, French, Italian, and Portugese, you need to generate a mapping file to link an AGROVOC string to a URI.
  • the vocabulary model built by MAUI during the training

Then, when you run the AgroTagger, you have to specify the usage of the new AGROVOC/model with some command line parameters (see Java Applications to learn more).


About the SKOS file, it is an RDF/XML file containing only the following predicates: rdf:Description, skos:prefLabel, skos:altLabel, rdf:type, skos:narrower, skos:broader. SKOS-XL is not supported. As an example of SPARQL query to extract it (in English):

PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
construct {?s ?p ?o}
WHERE
{?s ?p ?o
FILTER ((((?p = skos:prefLabel) && (lang(?o)="en") )) || ((?p = skos:altLabel) && (lang(?o)="en")) || (?p = skos:broader) || (?p = skos:related) || ((?p = rdf:type) && (?o = skos:Concept)) || (?p = skos:narrower))
}

The file should we saved with the extension rdf.gz (so, it has to be compressed using the _gzip _command). You can place the file in data/vocabularies directory.


In case you are using AGROVOC in a language different from English, Spanish, French, Italian, and Portugese, you need to generate a mapping file to link an AGROVOC string to a URI. The mapping file is located in data/vocabularies and its named agrovocURILabelMappings.txt. It is used by the Java application ProduceMappingTable to generate an RDF file as output, since MAUI generates only labels. The file provided in the package contains mappings for English, Spanish, French, Italian, and Portugese. To use another language, simply update this file or substitute it with a new one (it is important that the filename is agrovocURILabelMappings.txt). This file is a TAB separator file, where the first column is an AGROVOC URI, and the others are AGROVOC labels in different languages. As an example:
http://aims.fao.org/aos/agrovoc/c_4788|methods|Metodi|Métodos|Método|Méthode| http://aims.fao.org/aos/agrovoc/c_2208|design|Progetto|Diseño|Design||
You can add another column, or create a new file (also with only two coumns: URI and label).


If you want to re-train MAUI and build a new model, what you basically need is:

  • A set of documents as .txt files
  • For each document provided, a file .key with the same name of the document must contain some AGROVOC keywords already generated for that file (probably, maually generated)
  • Run MAUI model builder

The Java code to run MAUI and the documentation are available on Google Code. The generated model should be place at root level of the application (at the same position of the file fao780)

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.