Realfagstermers prosjektrom

Dette er et åpent prosjektrom for Realfagstermer, et kontrollert emneordsvokabular som utvikles og brukes av Universitetsbiblioteket i Oslo og Universitetsbiblioteket i Bergen. Vokabularet har termer på bokmål, nynorsk og engelsk og har overganger til Tekord, Dewey og Wikidata.

Søk i vokabularet via Emnesøk, Skosmos eller visuelt med LodLive.

Vi bruker sakssystemet (issue trackeren) til GitHub for å diskutere og behandle nye termer og endringer.

→ Til termdiskusjonene

Sakene organiseres med etiketter (labels). Se CONTRIBUTING.md for forklaring av de ulike etikettene.

GitHub-tips for bidragsytere:

Du blir varslet hver gang noen nevner deg. Du blir også varslet når det kommer nye innlegg i diskusjoner du har opprettet eller kommentert. Som standard får du varslinger på e-post, men du kan endre dette i innstillingene dine.
Hvis du ønsker tilbakemeldinger fra bestemte personer, for eksempel fagansvarlige for bestemte fag, kan det være lurt å nevne dem. Under finnes det en oversikt over hvem som er hvem.
Funksjonaliteten watch gir varslinger om alle oppdateringer i alle saker og kan fort gi varslingsutmattelse.

Bidragsytere

(sortert alfabetisk etter brukernavn) (For å varsle alle: @realfagstermer/alle)

Klassifikasjon/emneordsredaksjon:

@grosyn : Gro Synnøve Nesland
@vibekelundetrae : Vibeke Stockinger Lundetræ

Fagpersoner Universitetsbiblioteket i Oslo:

@BioHeidi : Heidi Konestabo: biovitenskap
@danmichaelo : Dan Michael O. Heggø: fysikk og materialvitenskap
@edinab : Edina Pozer Bue: geofag og meteorologi
@jessiclo : Jessica Lönn-Stensrud: mikrobiologi
@haraldse : Kirsten Borse Haraldsen: biovitenskap
@kyrretl : Kyrre Traavik Låberg: adferdsbiologi
@Quether : Trine Høyås: informatikk
@superLine : Line Nybakk Akerholt: astrofysikk.
@TorgunnKarolineMoe : Karoline Moe: matematikk.
Tone Charlotte Gadmar: kjemi.

Fagpersoner Universitetsbiblioteket i Bergen:

@beatekh : Beate Krøvel Humberset, klassifiserer matematikk, informatikk og fysikk.
@bubir : Ingunn Rødland, klassifiserer kjemi.
@ken075 : Kjersti Enerstvedt, klassifiserer geofag og teknologi.
@Hypsibius : Hege Folkestad, klassifiserer biovitenskap.

Tidligere bidragsytere:

@bibliomari : Mari Lundevall, tidligere redaksjonsleder (UiO, –2016).
@jw-geo : Johannes Wiest, student med prosjektstilling for å klassifisere geologi (UiB, 2016–2017).
@knuthe : Knut Hegna, klassifiserte informatikk, utviklet Roald og Sonja (UiO, –2017).
@kristinran : Kristin Rangnes, klassifiserte geofag (UiO).
@mittinatten : Simon Mitternacht, klassifiserte matematikk, informatikk og (teoretisk) fysikk (UiB).
@mzyg : Marta Zygmuntowska, klassifiserte geofag, fysikk (anvendt) og teknologi (UiB)
@iloveyellow : Solveig Isis Sørbø, taksonomist
@starseekr : Trude Westby, klassifiserte informatikk (UiO, 2017–2019).
Bente Kathrine Rasch, klassifiserer farmasi.
@Caroline-A : Caroline Susanne Armitage, klassifiserer marinbiologi, delvis vikar for @Hypsibus.
@violabibaluba : Viola Kuldvere

Prosjekt Realfagstermar på nynorsk

Omsetjing til nynorsk ble ferdigstilt ved årsskiftet 2015/2016 av:

Maria Svendsen (@mariaksv): biologi/kjemi)
Jørgen Eriksson Midtbø (@jorgenem): fysikk/matte)
Vebjørn Sture (@totlevase): korrektur/nynorsk)

Sjå retningslinjer/diskusjon og eigen prosjektside.

Prosjekt Kinderegg

I perioden 2017-2019 har Realfagstermer blitt oversatt til engelsk, kategorisert og mappet til Wikidata med hjelp fra en rekke studenter:

Jeanette Viken (@jeanetvi)
Olav Bjerke (@olavbje)
Anja Maria Aardal (@anjamaa)
Gunvor Evenrud (@gunvorev)
Eirill Strand Hauge (@eirillsh)
Jenny Marie Skytte Af Sätra (@jennynnej)
Eirik Bager Sundmark (@ebsundmark)
Jakob Lindtorp (@JakobUiO)
Marit Sandberg (@maritsan)
Ole Andreas Hoel
Mirna Porobic (@mirnap)
Islam Hadjilah
Trine Høyås (@Quether)

En bieffekt har vært generell kvalitetsheving av vokabularet gjennom at en rekke feil og sammenslåingskandidater har blitt oppdaget. Se Prosjekt «Kinderegg» for mer om prosjektet.

Prosjekt Mapping mot Norsk WebDewey

I perioden 2015-2019 har Humord og Realfagstermer blitt mappet til Norsk WebDewey. Som med Kinderegg-prosjektet har en bieffekt vært en generell kvalitetsheving av begge vokabular. Takk særlig til Hege Nenseth (@HegeNenseth), Kristine Aalrust Kristoffersen (@kaalrust) og Vibeke Stockinger Lundetræ (@vibekelundetrae) som har bidratt med mye på Realfagstermer-siden.

Conversion

Authority data is currently maintained in Sonja and converted to JSON (RoaldIII data model) using RoaldIII. RoaldIII is also used to mix in mappings and translations before exporting RDF/SKOS and MARC21.

The conversion is done by running python publish.py, which only runs a conversion if any of the source files have changed. You can run python publish.py -f to force a conversion even if no source files have changed (useful during development).

Please see the RoaldIII repo for more details on the conversion.

The RoaldIII JSON data is found in realfagstermer.json. This does not currently include data from the Nynorsk translation project. Complete, distributable RDF/SKOS and MARC21 files are found in the dist folder. These includes mappings and all translations.

Data model

Example concept in JSON:

{
  "id": "REAL004162",
  "type": [
    "Topic"
  ],
  "created": "2014-08-25T00:00:00Z",
  "modified": "2014-12-17T00:00:00Z",
  "prefLabel": {
    "nb": {
      "value": "Cellekommunikasjon"
    },
    "en": {
      "value": "Cell signalling"
    }
  },
  "altLabel": {
    "nb": [
      {
        "value": "Cellesignalisering"
      }
    ],
    "en": [
      {
        "value": "Cell signaling"
      }
    ]
  },
  "hiddenLabel": {}
},

Characteristics:

Realfagstermer contains only concepts, no facets, arrays or other thesaurus constructs.
Concept properties
- id (string): an unique identifier.
- type (array): at least one type (Topic, Geographic, Temporal, GenreForm, CompoundHeading or VirtualCompoundHeading).
- created (datetime string): a creation date.
- modified (datetime string): a modification date.
- prefLabel (language map): one preferred term per language. A preferred term for the language code nb is required, while others are optional.
- altLabel (language map array): any number of alternative terms per language.
- hiddenLabel (langauge map array): any number of hidden terms (not yet implemented).
- editorialNote (array): any number of editorial notes (in Bokmål only).
- definition (language map): one definition per language (currently only Bokmål though).
- ddc (string): a DDC number (these should eventually be moved to the mapping project).
- msc (string): a MSC number.
- elementSymbol (string): a chemical element symbol.
- related (array): Any number of IDs for related concepts.
- broader (array): Any number of IDs for broader concepts.
- narrower (array): Any number of IDs for narrower concepts.
- component (array): Any number of IDs for components that make up the CompoundHeading or VirtualCompoundHeading (emnestreng). Note that concepts of this type do not have their own terms, since the compound terms are generated from the concepts. Example:
```
{
  "id": "REAL014060",
  "type": [
    "CompoundHeading"
  ],
  "created": "2015-03-10T10:07:34Z",
  "component": [
    "REAL002845",
    "REAL007608"
  ]
}
```
Term properties:
- value (string): The term value.
- hasAcronym (string): An acronym (used only if value is the full form).
- acronymFor (string): The full form (used only if value is an acronym).
Note that terms do not have their own IDs, so the relationship between acronyms and their full form is represented by embedding. Example:
```
{
  "prefLabel": {
    "nb": {
      "hasAcronym": "DCOM",
      "value": "Distributed component object model"
    }
  }
}
```
When converting to SKOS core, we simplify the model by removing term-term relationships.

Mapping to Wikidata

A large part of the vocabulary (~ 8000 concepts as of Nov 2018) has been mapped to Wikidata:

PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX real: <http://data.ub.uio.no/realfagstermer/>

SELECT ?concept ?term ?wd_item
FROM <http://data.ub.uio.no/realfagstermer>
WHERE
{
  ?concept a skos:Concept ;
           skos:prefLabel ?term ;
           skos:mappingRelation ?wd_item .

  FILTER(LANG(?term) = "nb")
  FILTER(STRSTARTS(STR(?wd_item), "http://www.wikidata.org"))
  FILTER NOT EXISTS { ?concept owl:deprecated true . }
}
LIMIT 10

concept	term	wd_item
http://data.ub.uio.no/realfagstermer/c006328	Strukturgeologi	http://www.wikidata.org/entity/Q334823
http://data.ub.uio.no/realfagstermer/c030694	Rogaland	http://www.wikidata.org/entity/Q50624
http://data.ub.uio.no/realfagstermer/c030446	Luxemburg	http://www.wikidata.org/entity/Q32
http://data.ub.uio.no/realfagstermer/c013836	Stjernerotasjon	http://www.wikidata.org/entity/Q6464
http://data.ub.uio.no/realfagstermer/c006404	Patologi	http://www.wikidata.org/entity/Q7208
http://data.ub.uio.no/realfagstermer/c004699	Utmarksbeiter	http://www.wikidata.org/entity/Q30121
http://data.ub.uio.no/realfagstermer/c005522	Tungmetaller	http://www.wikidata.org/entity/Q105789
http://data.ub.uio.no/realfagstermer/c030118	Bibliografier	http://www.wikidata.org/entity/Q134995
http://data.ub.uio.no/realfagstermer/c030681	USA	http://www.wikidata.org/entity/Q30
http://data.ub.uio.no/realfagstermer/c000237	Tidsmåling	http://www.wikidata.org/entity/Q11471

(Try the query)

This opens the possibility for federated queries, combining data from Realfagstermer with data from Wikidata. We can for instance list all concepts that are identified as species on Wikidata (wdt:P105 wd:Q7432) and have a mapping to wdt:P815 and a scientific name (wdt:P225):

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT
  ?rtTerm ?wdItem ?itisUrl ?sciName
FROM <http://data.ub.uio.no/realfagstermer>
WHERE{
  SERVICE <https://query.wikidata.org/sparql>
{
  ?wdItem wdt:P31 wd:Q16521 ;  # er et taxon
    wdt:P105 wd:Q7432 ;  # rang: art
    wdt:P815 ?itisTsn ;  # har ITIS TSN
    wdt:P225 ?sciName .  # har vitenskapelig navn
}
?rtConcept skos:closeMatch ?wdItem ;
  skos:prefLabel ?rtTerm

FILTER(LANG(?rtTerm) = 'nb')
BIND(IRI(CONCAT("http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=",?itisTsn)) AS ?itisUrl)}
LIMIT 100

rtTerm	wdItem	itisUrl	sciName
Løver	http://www.wikidata.org/entity/Q140	http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=183803	Panthera leo
Escherichia coli	http://www.wikidata.org/entity/Q25419	http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=285	Escherichia coli
Ekorn	http://www.wikidata.org/entity/Q4388	http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=632439	Sciurus vulgaris
Ål	http://www.wikidata.org/entity/Q26387	http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=161128	Anguilla anguilla
Blodigle	http://www.wikidata.org/entity/Q30041	http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=69417	Hirudo medicinalis
Jerv	http://www.wikidata.org/entity/Q14334	http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=180551	Gulo gulo
Tårnugler	http://www.wikidata.org/entity/Q25317	http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=177851	Tyto alba
Fasaner	http://www.wikidata.org/entity/Q25432	http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=175905	Phasianus colchicus
Kongeørn	http://www.wikidata.org/entity/Q41181	http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=175407	Aquila chrysaetos
Sitronmelisse	http://www.wikidata.org/entity/Q148396	http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=32565	Melissa officinalis

(Try the query)

In practice, the software behind the sparql endpoint at https://data.ub.uio.no (Fuseki) is not good at optimizing federated queries. In the example above, the best would be to swap the remote query and the local query, so the local query is carried out first, but in that case, Fuseki makes one request to Wikidata for each item rather than querying all items in a single query (using VALUES), making it terribly slow. In practice, at least with our current stack, it's therefore better to rather have a script that first queries our local endpoint, then the Wikidata endpoint, and combines the data. But while slow, federated queries are good for demonstrating the possibilities. Here's a query that compares the preferred terms in our vocabulary and the mapped Wikidata items:

PREFIX schema: <http://schema.org/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT ?realTerm ?wikidataItem ?wikidataTerm
FROM <http://data.ub.uio.no/realfagstermer>
WHERE {
  ?realConcept skos:mappingRelation ?wikidataItem ;
             skos:prefLabel ?realTerm .

  FILTER(LANG(?realTerm) = "nb")
  FILTER(STRSTARTS(STR(?wikidataItem), "http://www.wikidata.org")) 

  SERVICE <https://query.wikidata.org/sparql>
  {
    ?wikidataItem schema:version ?o .
    OPTIONAL {
      ?wikidataItem rdfs:label ?wikidataTerm .
      FILTER(LANG(?wikidataTerm) = "nb")
    }
  }
}
LIMIT 10

realTerm	wikidataItem	wikidataTerm
Luxemburg	http://www.wikidata.org/entity/Q32	Luxembourg
Stjernerotasjon	http://www.wikidata.org/entity/Q6464	stjernerotasjon
Tungmetaller	http://www.wikidata.org/entity/Q105789	tungmetall
Aktiv læring	http://www.wikidata.org/entity/Q1542052
Elektronspektra	http://www.wikidata.org/entity/Q905243
Elektronspektroskopi	http://www.wikidata.org/entity/Q905243
Dislokasjoner	http://www.wikidata.org/entity/Q737571	dislokasjon
Darmstadtium	http://www.wikidata.org/entity/Q1266	darmstadtium
Metadata	http://www.wikidata.org/entity/Q180160	metadata
Balkan	http://www.wikidata.org/entity/Q23522	Balkan

(Try it)

This makes it easy to identify cases where a Norwegian term is missing on Wikidata, or where there is a mismatch between terms – in which case we might want to change either our own term or the Wikidata term or keep them different from each other. Automatic comparison is still complicated by the fact that we use plural for our terms, while Wikidata use singular.

Name		Name	Last commit message	Last commit date
Latest commit History 1,128 Commits
.github		.github
src		src
.gitignore		.gitignore
.travis.yml		.travis.yml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
Realfagstermer_description.md		Realfagstermer_description.md
buildspec.yml		buildspec.yml
dodo.py		dodo.py
logging.yml		logging.yml
poetry.lock		poetry.lock
publish.py		publish.py
pyproject.toml		pyproject.toml
realfagstermer.scheme.ttl		realfagstermer.scheme.ttl
update-fuseki.sh		update-fuseki.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Realfagstermers prosjektrom

Bidragsytere

Prosjekt Realfagstermar på nynorsk

Prosjekt Kinderegg

Prosjekt Mapping mot Norsk WebDewey

Conversion

Data model

Mapping to Wikidata

About

Releases

Packages

Contributors 7

Languages

License

realfagstermer/realfagstermer

Folders and files

Latest commit

History

Repository files navigation

Realfagstermers prosjektrom

Bidragsytere

Prosjekt Realfagstermar på nynorsk

Prosjekt Kinderegg

Prosjekt Mapping mot Norsk WebDewey

Conversion

Data model

Mapping to Wikidata

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages