Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for representing Aggregate Statistical Data #2291

Closed
danbri opened this issue Jun 25, 2019 · 8 comments
Closed

Proposal for representing Aggregate Statistical Data #2291

danbri opened this issue Jun 25, 2019 · 8 comments
Assignees

Comments

@danbri
Copy link
Contributor

danbri commented Jun 25, 2019

Proposal circulated to various lists by @rvguha,

Vocabulary

  • new type: StatisticalPopulation
  • new properties: populationType, numConstraints, constrainingProperties
  • new type: Observation
  • new properties: observedNode, measuredProperty, measuredValue, observationDate, marginOfError, measurementMethod

Related work

@RichardWallis
Copy link
Contributor

Implemented in V3.9

@VladimirAlexiev
Copy link

VladimirAlexiev commented Oct 30, 2020

constrainingProperties Bug

Compared to the proposal https://lists.w3.org/Archives/Public/public-schemaorg/2019Jun/0021.html, this task is not implemented well and should be reopened.

https://schema.org/StatisticalPopulation says:

The properties numConstraints and constrainingProperties are used to specify which of the populations properties are used to specify the population.

But:

  • there is constrainingProperty not constrainingProperties
  • constrainingProperty has range integer. Instead it should have range s:Property (and should be multivalued)
  • The examples from the proposal are not implemented
  • One example uses prop median which is not explained (other statistical concepts to be covered are average, min, max, variance, stdDev, quartiles...)
  • Many examples use measuredProperty: count. Please give a real example how to express count, because it's the most common measure.

One of the examples goes like this:

:pop a s:StatisticalPopulation;
  s:populationType s:Person;
  s:numConstraints 1;
  s:constrainingProperty s:homeLocation;
  s:homeLocation :East_Podunk_California.

:obs a s:Observation;
  s:observedNode :pop;
  s:measuredProperty: :POPULATION; # HOW to express this?
    # This is SDMX population, or some GS1 prop, or maybe I can omit it since it's just a count?
  s:measuredValue 10000;
  s:observationDate "2020-10-30"^^xsd:date.

Direct Observations

If I have a prop :POPULATION at my disposal, why can't I use it directly like this?

:obs a s:Observation;
  s:observedNode :East_Podunk_California;
  s:measuredProperty: :POPULATION;
  s:measuredValue 10000;
  s:observationDate "2020-10-30"^^xsd:date.

This covers the most common case where you observe some characteristic of an entity, without subdividing the characteristic further. Eg Wikidata has Place.population, Company.turnover, Company.revenue, etc.

How can you express Google has turnover of 100M USD in 2019 with StatisticalPopulation ?? (see example below)

Also, observedNode is a terrible name because it's technology-bound. (It's an IT term not an economics or statistics term).

PROPOSAL: rename observedNode to observedThing and change its range to Thing

observationDate vs reference period

Consider these observations about Google:

:obs1 a s:Observation;
  s:observedNode wd:Q95;
  s:measuredProperty wd:P2403; # total assets
  s:measuredValue 2000000000;
  s:observationDate "2019"^^xsd:gYear.

:obs2 a s:Observation;
  s:observedNode wd:Q95;
  s:measuredProperty wd:P2139; # total revenue
  s:measuredValue 1000000000;
  s:observationDate "2019"^^xsd:gYear. ####
  • obs1 is ok, it says the assets as of that point in time were that much.
  • But obs2 is not quite ok because sales (revenue) are not measured at a point, they occur over a period (year, quarter, etc). So we need something like this:
# yearly revenue 2019
:obs2 a s:Observation;
  s:observedNode wd:Q95;
  s:measuredProperty wd:P2139; # total revenue
  s:measuredValue 1000000000;
  s:observationPeriod "2019"^^xsd:gYear.

# quarterly revenue 2019Q1
:obs2 a s:Observation;
  s:observedNode wd:Q95;
  s:measuredProperty wd:P2139; # total revenue
  s:measuredValue 250000000;
  s:observationPeriodStart "2019-01"^^xsd:gYearMonth;
  s:observationPeriodEnd   "2019-03"^^xsd:gYearMonth.

Actually it's better to use precise dates and to always do it with two props:

# yearly revenue 2019
:obs2 a s:Observation;
  s:observedNode wd:Q95;
  s:measuredProperty wd:P2139; # total revenue
  s:measuredValue 1000000000;
  s:observationPeriodStart "2019-01-01"^^xsd:date;
  s:observationPerioEnd    "2019-12-31"^^xsd:date.

# quarterly revenue 2019Q1
:obs2 a s:Observation;
  s:observedNode wd:Q95;
  s:measuredProperty wd:P2139; # total revenue
  s:measuredValue 250000000;
  s:observationPeriodStart "2019-01-01"^^xsd:date;
  s:observationPeriodEnd   "2019-03-31"^^xsd:date.

Superclass of Observation; Units

Observation doesn't relate to PropertyValue, QuantitativeValue but they have a lot in common. (I've checked PropertyValue, but perhaps similar considerations apply to QuantitativeValue)

  • PropertyValue has measurementTechnique whereas Observation has marginForError, but both props are applicable to both. Eg for population you have Census, Estimation, etc
  • PropertyValue has unitText and unitCode and Observation needs it. Eg the examples about revenue and assets above are incomplete, since they don't state the currency.

PROPOSAL:

  • Make https://schema.org/Observation a subclass of PropertyValue and/or QuantitativeValue
  • Remove measuredProperty in favor of propertyID
  • Move marginForError up
  • See if you need to refactor any other props of Observation

@VladimirAlexiev
Copy link

VladimirAlexiev commented Dec 15, 2020

Can't Record Multiple Measures of Same Point

How would you you represent this situation: an air balloon is rising, and every 10s records altitude, pressure and temperature.
A new s:StatisticalPopulation for each instant, and 3 separate s:Observations?
Seems inferior to W3 Cube's option to use multiple Measures in Observation.

Compared to W3 Cube, this is ill-conceived and badly executed.

@dr-shorthair
Copy link

@VladimirAlexiev suggest you create a new issue? (linked back to this one, which is already closed).

@VladimirAlexiev
Copy link

@dr-shorthair You think it's worth it? Don't want to waste my time:
Google's faling the schema.org community #2573

@smrgeoinfo
Copy link

The term 'observation' has a much broader meaning in the physical science community (see https://www.w3.org/TR/vocab-ssn/#SOSAObservation), so limiting the rangeIncludes on schema:observedNode property to schema:StatitisticalPopulation is unnecessarily constraining. This is based on the correlation of 'schema:observedNode with sosa:hasFeatureOfInterest.

Alignment with the SOSA model would be really useful for those of us working on Science on Schema.org. To that end, the schema:Observation model would need to include properties like schema:measurementTechnique (analogous to sosa:usedProcedure), phenomenonTime, resultTime, and madeBySensor to specify the instrumentation used.

One niggling question is what are the use cases that distinguish schema:Observation from schema:PropertyValue, i.e. what is the domain of schema:Observation if not schema:Dataset? (SpecialAnnouncement??? )

see also ESIPFed/science-on-schema.org#143

@dr-shorthair
Copy link

I did a mapping SOSA <--> schema.org here: https://github.com/w3c/sdw/blob/gh-pages/ssn/rdf/sosa-sdo-mapping.ttl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants