Skip to content

mnn-001-p/LODlit

Repository files navigation

How Contentious Terms About People and Cultures are Used in Linked Open Data

The repository of the research paper

The online Appendix is available at https://mnn-001-p.github.io/LODlit/

Data

  • We reuse the previously developed knowledge graph of contentious terminology;

  • From the knowledge graph, we extract culturally sensitive terms to inspect them in LOD-datasets; the process is in the notebook getting_query_terms.ipynb, the resulting file is query_terms.json; there are 75 EN and 82 NL canonical forms of terms, which are linked to their inflected forms (for example, "aboriginal" and "aboriginals"); with both canonical and inflected forms, there are 154 EN and 242 NL terms;

  • We query terms in four LOD datasets:

    • Wikidata (EN and NL);
    • The Getty Art & Architechture Thesaurus (AAT) (EN and NL);
    • Princeton WordNet (version 3.1) (only EN);
    • Open Dutch WordNet (version 1.3) (only NL);
  • For details on querying each dataset, see README in the corresponding directories:

Sets constructed for analysis

Set 1: literlas of resources from the knowledge graph

Set 2: all retrieved literals

Set 3: disambiguated literals

  • samples contains (1) samples for annotations by dataset and language, (2) background information for each term presented to anotators, (3) annotated samples with the prefix "ann_" and IDs of annotators (1 and 3); the notebook samples.ipynb generates 6 csv files with samples and calculates inter-annotator agreement for each annotated sample; the mean of these agreement scores (0.8) is reported in the section 4.2;

LODlit package

LODlit_package allows querying terms in Wikidata, AAT, PWN, and ODWN. The package can be used to both reproduce our research results and retrieve literals from the LOD datasets for other purposes. Read more in the package documentation.

Paper footnotes

Other directories and files

  • n_hits contains 36 csv files with number of terms' hits in the three sets by property values; the code to generate these files is in the notebook n_hits.ipynb;

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages