Skip to content

Commit

Permalink
Merge pull request #122 from maxlath/split-simplify-claims-and-simpli…
Browse files Browse the repository at this point in the history
…fy-snaks

Split simplifyClaim and simplifySnak
  • Loading branch information
maxlath committed Mar 3, 2024
2 parents ad69e34 + 740a737 commit 2aeed2c
Show file tree
Hide file tree
Showing 13 changed files with 8,456 additions and 4,145 deletions.
178 changes: 104 additions & 74 deletions docs/simplify_claims.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,24 @@
# Simplify claims
*associated Wikibase doc: [DataModel](https://www.mediawiki.org/wiki/Wikibase/DataModel)*

`simplify.claims` functions are part of the larger [`simplify.entity` functions family](simplify_entities_data.md)
`simplifyClaims` functions are part of the larger [`simplifyEntity` functions family](simplify_entities_data.md)

## Summary

<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->

- [Intro](#intro)
- [simplify.claims](#simplifyclaims)
- [simplify.propertyClaims](#simplifypropertyclaims)
- [simplify.claim](#simplifyclaim)
- [simplify.qualifiers](#simplifyqualifiers)
- [simplify.propertyQualifiers](#simplifypropertyqualifiers)
- [simplify.qualifier](#simplifyqualifier)
- [simplify.references](#simplifyreferences)
- [simplify.snaks](#simplifysnaks)
- [simplify.propertySnaks](#simplifypropertysnaks)
- [simplify.snak](#simplifysnak)
- [simplifyClaims](#simplifyclaims)
- [simplifyPropertyClaims](#simplifypropertyclaims)
- [simplifyClaim](#simplifyclaim)
- [simplifyQualifiers](#simplifyqualifiers)
- [simplifyPropertyQualifiers](#simplifypropertyqualifiers)
- [simplifyQualifier](#simplifyqualifier)
- [simplifyReferences](#simplifyreferences)
- [simplifySnaks](#simplifysnaks)
- [simplifyPropertySnaks](#simplifypropertysnaks)
- [simplifySnak](#simplifysnak)
- [Options](#options)
- [Add prefixes to entities and properties ids](#add-prefixes-to-entities-and-properties-ids)
- [Keep rich values](#keep-rich-values)
Expand Down Expand Up @@ -105,12 +105,13 @@ we could have
"P279": [ "Q340169", "Q2342494", "Q386724" ]
```

That's what `simplify.claims`, `simplify.propertyClaims`, `simplify.claim` do, each at their own level.
That's what `simplifyClaims`, `simplifyPropertyClaims`, `simplifyClaim` do, each at their own level.

## simplify.claims
you just need to pass your entity' claims object to simplify.claims as such:
## simplifyClaims
you just need to pass your entity' claims object to `simplifyClaims` as such:
```js
const simplifiedClaims = wbk.simplify.claims(entity.claims)
import { simplifyClaims } from 'wikibase-sdk'
const simplifiedClaims = simplifyClaims(entity.claims)
```

in your workflow, that could give something like:
Expand All @@ -119,45 +120,74 @@ in your workflow, that could give something like:
const url = wbk.getEntities('Q535')
const { entities } = await fetch(url)
const entity = entities.Q535
const simplifiedClaims = wbk.simplify.claims(entity.claims)
const simplifiedClaims = simplifyClaims(entity.claims)
```

To keep things simple, "weird" values are removed (for instance, statements of datatype `wikibase-item` but set to `somevalues` instead of the expected Q id)

Note that you don't need to instantiate a `wbk` object to access those `simplify` functions, as they can directly imported: `import { simplify } from 'wikibase-sdk'`
Note that those functions are also available on the `wbk.simplify` object: `wbk.simplify.claims`, etc.

## simplify.propertyClaims
Same as simplify.claims but expects an array of claims, typically the array of claims of a specific property:
## simplifyPropertyClaims
Simplify an array of claims, typically the array of claims of a specific property:
```js
const simplifiedP31Claims = wbk.simplify.propertyClaims(entity.claims.P31)
import { simplifyPropertyClaims } from 'wikibase-sdk'
const simplifiedP31Claims = simplifyPropertyClaims(entity.claims.P31, options)
```

## simplify.claim
Same as simplify.claims but expects a unique claim
## simplifyClaim
Simplify a unique claim
```js
const simplifiedP31Claim = wbk.simplify.claim(entity.claims.P31[0])
import { simplifyClaim } from 'wikibase-sdk'
const simplifiedP31Claim = simplifyClaim(entity.claims.P31[0], options)
```

## simplify.qualifiers
Same interface as [simplify.claims](#simplifyclaims) but taking a qualifiers object

## simplify.propertyQualifiers
Same interface as [simplify.propertyClaims](#simplifypropertyclaims) but taking an array of qualifiers
## simplifyQualifiers
Simplify a qualifiers object
```js
import { simplifyQualifiers } from 'wikibase-sdk'
const claim = entity.claims.P31[0]
const simplifiedQualifiers = simplifyQualifiers(claim.qualifiers, options)
```

## simplify.qualifier
Same interface as [simplify.claim](#simplifyclaim) but taking a qualifier object
## simplifyPropertyQualifiers
Simplify an array of qualifiers
```js
import { simplifyPropertyQualifiers } from 'wikibase-sdk'
const claim = entity.claims.P31[0]
const simplifiedP580Qualifiers = simplifyPropertyQualifiers(claim.qualifiers.P580, options)
```

## simplify.references
Same interface as [simplify.claims](#simplifyclaims) but taking an array of reference records
## simplifyQualifier
Simplify a qualifier
```js
import { simplifyQualifier } from 'wikibase-sdk'
const claim = entity.claims.P31[0]
const simplifiedQualifier = simplifyPropertyQualifiers(claim.qualifiers.P580[0], options)
```

## simplify.snaks
Same interface as [simplify.claims](#simplifyclaims), but with a name that hints that it could also accept qualifiers or reference records.
## simplifyReferences
Simplify an array of references
```js
import { simplifyReferences } from 'wikibase-sdk'
const claim = entity.claims.P31[0]
const simplifiedReferences = simplifyReferences(claim.references, options)
```

## simplify.propertySnaks
Same interface as [simplify.propertyClaims](#simplifypropertyclaims), but with a name that hints that it could also accept an array of qualifiers snaks or an array of reference snaks.
## simplifyReference
Simplify a reference
```js
import { simplifyReference } from 'wikibase-sdk'
const claim = entity.claims.P31[0]
const simplifiedReference = simplifyReference(claim.references[0], options)
```

## simplify.snak
Same interface as [simplify.claim](#simplifyclaim), but with a name that hints that it could also accept a qualifier or reference record [snak](https://www.wikidata.org/wiki/Wikidata:Glossary/en#Snak).
## simplifySnak
Simplify a [snak](https://www.wikidata.org/wiki/Wikidata:Glossary/en#Snak), be it a claim `mainsnak`, a qualifier snak, or a reference snak
```js
import { simplifySnak } from 'wikibase-sdk'
const claim = entity.claims.P31[0]
const simplifiedSnak = simplifySnak(claim.mainsnak, options)
```

## Options

Expand All @@ -167,9 +197,9 @@ Same interface as [simplify.claim](#simplifyclaim), but with a name that hints t
It may be useful to prefix entities and properties ids in case you work with data from several domains/sources. This can done by setting an entity prefix and/or a property prefix in the options:
```js
const options = { entityPrefix: 'wd', propertyPrefix: 'wdt' }
wbk.simplify.claims(entity.claims, options)
wbk.simplify.propertyClaims(entity.claims.P31, options)
wbk.simplify.claim(entity.claims.P31[0], options)
simplifyClaims(entity.claims, options)
simplifyPropertyClaims(entity.claims.P31, options)
simplifyClaim(entity.claims.P31[0], options)
```
Results would then look something like
```json
Expand All @@ -181,7 +211,7 @@ Results would then look something like
### Keep rich values
> `keepRichValues`
By default, `simplify.claims` returns only the simpliest values, so just a string for `monolingualtext` values and just a number for `quantity` values.
By default, `simplifyClaims` returns only the simpliest values, so just a string for `monolingualtext` values and just a number for `quantity` values.
By setting `keepRichValues=true`,
- `monolingualtext` values will be objects on the pattern `{ text, language }`
- `quantity` values will be objects on the pattern `{ amount, unit, upperBound, lowerBound }`
Expand All @@ -191,9 +221,9 @@ By setting `keepRichValues=true`,
You can keep the value's types by passing `keepTypes: true` in the options:
```js
wbk.simplify.claims(entity.claims, { keepTypes: true })
wbk.simplify.propertyClaims(entity.claims.P50, { keepTypes: true })
wbk.simplify.claim(entity.claims.P50[0], { keepTypes: true })
simplifyClaims(entity.claims, { keepTypes: true })
simplifyPropertyClaims(entity.claims.P50, { keepTypes: true })
simplifyClaim(entity.claims.P50[0], { keepTypes: true })
```
Results would then look something like
```json
Expand Down Expand Up @@ -233,9 +263,9 @@ If one if missing from this list (probably because it's new) you are welcome to
You can keep qualifiers by passing `keepQualifiers: true` in the options:
```js
wbk.simplify.claims(entity.claims, { keepQualifiers: true })
wbk.simplify.propertyClaims(entity.claims.P50, { keepQualifiers: true })
wbk.simplify.claim(entity.claims.P50[0], { keepQualifiers: true })
simplifyClaims(entity.claims, { keepQualifiers: true })
simplifyPropertyClaims(entity.claims.P50, { keepQualifiers: true })
simplifyClaim(entity.claims.P50[0], { keepQualifiers: true })
```
Results would then look something like
```json
Expand Down Expand Up @@ -266,9 +296,9 @@ Results would then look something like
You can keep reference by passing `keepReferences: true` in the options:
```js
wbk.simplify.claims(entity.claims, { keepReferences: true })
wbk.simplify.propertyClaims(entity.claims.P50, { keepReferences: true })
wbk.simplify.claim(entity.claims.P50[0], { keepReferences: true })
simplifyClaims(entity.claims, { keepReferences: true })
simplifyPropertyClaims(entity.claims.P50, { keepReferences: true })
simplifyClaim(entity.claims.P50[0], { keepReferences: true })
```
Results would then look something like
```json
Expand Down Expand Up @@ -297,9 +327,9 @@ Results would then look something like
You can keep claim ids (a.k.a. `guid`), references and qualifiers hashes by passing `keepIds: true` in the options:

```js
wbk.simplify.claims(entity.claims, { keepIds: true })
wbk.simplify.propertyClaims(entity.claims.P50, { keepIds: true })
wbk.simplify.claim(entity.claims.P50[0], { keepIds: true })
simplifyClaims(entity.claims, { keepIds: true })
simplifyPropertyClaims(entity.claims.P50, { keepIds: true })
simplifyClaim(entity.claims.P50[0], { keepIds: true })
```
Results would then look something like
```json
Expand All @@ -318,9 +348,9 @@ Results would then look something like
You can keep references and qualifiers hashes by passing `keepHashes: true` in the options:

```js
wbk.simplify.claims(entity.claims, { keepHashes: true })
wbk.simplify.propertyClaims(entity.claims.P50, { keepHashes: true })
wbk.simplify.claim(entity.claims.P50[0], { keepHashes: true })
simplifyClaims(entity.claims, { keepHashes: true })
simplifyPropertyClaims(entity.claims.P50, { keepHashes: true })
simplifyClaim(entity.claims.P50[0], { keepHashes: true })
```

This option has no effect if neither `keepQualifiers` nor `keepReferences` is `true`.
Expand Down Expand Up @@ -351,26 +381,26 @@ Results would then look something like
By default, [non-truthy statements](https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Truthy_statements) are filtered-out (keeping only claims of rank `preferred` if any, otherwise only claims of rank `normal`). This can be disable with this option.
```js
wbk.simplify.claims(entity.claims, { keepNonTruthy: true })
wbk.simplify.propertyClaims(entity.claims.P1082, { keepNonTruthy: true })
simplifyClaims(entity.claims, { keepNonTruthy: true })
simplifyPropertyClaims(entity.claims.P1082, { keepNonTruthy: true })
```

#### Keep ranks
> `keepRanks`
```js
wbk.simplify.claims(entity.claims, { keepRanks: true })
wbk.simplify.propertyClaims(entity.claims.P1082, { keepRanks: true })
wbk.simplify.claim(entity.claims.P1082[0], { keepRanks: true })
simplifyClaims(entity.claims, { keepRanks: true })
simplifyPropertyClaims(entity.claims.P1082, { keepRanks: true })
simplifyClaim(entity.claims.P1082[0], { keepRanks: true })
```
This is mostly useful in combination with `keepNonTruthy`. Example: a city might have several population claims, with only the most recent having a `preferred` rank.

```js
// By default, the simplification only keep the claim of rank 'preferred'
wbk.simplify.propertyClaims(city.claims.P1082, { keepRanks: true })
simplifyPropertyClaims(city.claims.P1082, { keepRanks: true })
// => [ { value: 100000, rank: 'preferred' } ]

// But the other claims can also be returned thank to 'keepNonTruthy'
wbk.simplify.propertyClaims(city.claims.P1082, { keepRanks: true, keepNonTruthy: true })
simplifyPropertyClaims(city.claims.P1082, { keepRanks: true, keepNonTruthy: true })
// => [
// { value: 100000, rank: 'preferred' },
// { value: 90000, rank: 'normal' },
Expand All @@ -383,47 +413,47 @@ wbk.simplify.propertyClaims(city.claims.P1082, { keepRanks: true, keepNonTruthy:
#### Customize novalue value
> `novalueValue`
```js
wbk.simplify.claims(claimWithNoValue, { novalueValue: '-' })
simplifyClaims(claimWithNoValue, { novalueValue: '-' })
// => '-'
```

#### Customize somevalue value
> `somevalueValue`
```js
wbk.simplify.claims(claimWithSomeValue, { somevalueValue: '?' })
simplifyClaims(claimWithSomeValue, { somevalueValue: '?' })
// => '?'
```

#### Keep snaktypes
> `keepSnaktypes`
```js
wbk.simplify.claims(claimWithSomeValue, { keepSnaktypes: true })
simplifyClaims(claimWithSomeValue, { keepSnaktypes: true })
// => { value: undefined, snaktype: 'somevalue' }
wbk.simplify.claims(claimWithSomeValue, { keepSnaktypes: true, somevalueValue: '?' })
simplifyClaims(claimWithSomeValue, { keepSnaktypes: true, somevalueValue: '?' })
// => { value: '?', snaktype: 'somevalue' }
```

### Keep all
> `keepAll`
Activates all the `keep` options detailed above:
```js
wbk.simplify.claims(claims, { keepAll: true })
simplifyClaims(claims, { keepAll: true })
// Is equivalent to
wbk.simplify.claims(claims, { keepQualifiers: true, keepReferences: true, keepIds: true, keepHashes: true, keepTypes: true, keepSnaktypes: true, keepRanks: true })
simplifyClaims(claims, { keepQualifiers: true, keepReferences: true, keepIds: true, keepHashes: true, keepTypes: true, keepSnaktypes: true, keepRanks: true })
```
Those options can then be disabled one by one
```js
wbk.simplify.claims(claims, { keepAll: true, keepTypes: false })
simplifyClaims(claims, { keepAll: true, keepTypes: false })
```

### Change time parser

By default, `simplify.claims` functions use [`wikidataTimeToISOString`](general_helpers.md#wikidataTimeToISOString) to parse [Wikidata time values](https://www.mediawiki.org/wiki/Wikibase/DataModel#Dates_and_times).
By default, `simplifyClaims` functions use [`wikidataTimeToISOString`](general_helpers.md#wikidataTimeToISOString) to parse [Wikidata time values](https://www.mediawiki.org/wiki/Wikibase/DataModel#Dates_and_times).

You can nevertheless request to use a different converter by setting the option `timeConverter`:

```js
wbk.simplify.claims(claims, { timeConverter: 'iso' })
simplifyClaims(claims, { timeConverter: 'iso' })
```

Possible modes:
Expand All @@ -446,5 +476,5 @@ If none of those format fits your needs, you can pass a custom time converter fu
```
```js
const timeConverterFn = ({ time, precision }) => `foo/${time}/${precision}/bar`
wbk.simplify.claims(claims, { timeConverter })
simplifyClaims(claims, { timeConverter })
```
2 changes: 1 addition & 1 deletion scripts/compare_datatypes.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/usr/bin/env ts-node
import { kebabCase } from 'lodash-es'
import { red, green } from 'tiny-chalk'
import { parsers } from '../src/helpers/parse_claim.js'
import { parsers } from '../src/helpers/parse_snak.js'
import { readJsonFile } from '../tests/lib/utils.js'

const supportedTypes = Object.keys(parsers)
Expand Down
27 changes: 12 additions & 15 deletions src/helpers/parse_claim.ts → src/helpers/parse_snak.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
import { wikibaseTimeToEpochTime, wikibaseTimeToISOString, wikibaseTimeToSimpleDay } from './time.js'
import type { TimeInputValue } from './time.js'
import type { DataType } from '../types/claim.js'
import type { SimplifySnakOptions } from '../types/simplify_claims.js'
import type { SnakValue } from '../types/snakvalue.js'

const simple = datavalue => datavalue.value

Expand Down Expand Up @@ -105,22 +108,16 @@ for (const [ datatype, parser ] of Object.entries(parsers)) {
normalizedParsers[normalizeDatatype(datatype)] = parser
}

export function parseClaim (datatype, datavalue, options, claimId) {
// Known case of missing datatype: form.claims, sense.claims, mediainfo.statements
export function parseSnak (datatype: DataType | undefined, datavalue: SnakValue, options: SimplifySnakOptions) {
// @ts-expect-error Known case of missing datatype: form.claims, sense.claims, mediainfo.statements
datatype = datatype || datavalue.type

try {
// Known case requiring normalization
// - legacy "muscial notation" datatype
// - mediainfo won't have datatype="globe-coordinate", but datavalue.type="globecoordinate"
const parser = normalizedParsers[normalizeDatatype(datatype)]
return parser(datavalue, options)
} catch (err) {
if (err.message === 'parsers[datatype] is not a function') {
err.message = `${datatype} claim parser isn't implemented
Claim id: ${claimId}
Please report to https://github.com/maxlath/wikibase-sdk/issues`
}
throw err
// Known case requiring normalization
// - legacy "musical notation" datatype
// - mediainfo won't have datatype="globe-coordinate", but datavalue.type="globecoordinate"
const parser = normalizedParsers[normalizeDatatype(datatype)]
if (!parser) {
throw new Error(`${normalizeDatatype(datatype)} claim parser isn't implemented. Please report to https://github.com/maxlath/wikibase-sdk/issues`)
}
return parser(datavalue, options)
}
Loading

0 comments on commit 2aeed2c

Please sign in to comment.