Hierarchical evaluation measures
Implementation of hierarchical F-measure (hF), hierarchical precision (hP) and hierarchical recall (hR) and exact precision. The script was developed for evaluation of type quality in DBpedia.
python computeHmeasures.py "en.lhd.core.2014.nt" "dbpedia_2014.owl" "gs3-toDBpedia2014.nt" "en.lhd.core.gs3.log"
Input files - Gold standard datasets
The datasets are described in our JWS paper
Additional details can be found here http://ner.vse.cz/datasets/linkedhypernyms/
reading gs reading predicted finished reading input datasets total instances in groundtruth:1033.0 total instances in intersection of groundtruth and prediction:402.0 hP:0.864357864358 hR:0.370553665326 hF:0.518726997186 Precision (exact):0.654228855721