Skip to content

Design considerations

Brendan Quinn edited this page Nov 23, 2021 · 4 revisions

IPTC Sports Ontology - design considerations

While working on the RDF model for sports, we have been considering whether to design the model as a "generic model" with few defined properties and liberal use of controlled vocabularies and literals to handle the needs of different sports, versus a "domain-specific model" with defined properties for each aspect of each sport.

Conclusion of brainstorm:

IF we think we can manage the authority of the models without getting drawn into an ontology world of pain then the domain-specific option has more weighted benefits. Assuming users want to do more than just render stats on a page.

This will also be dependent on us being able to abstract properties up to high level groupings: sport, teams sport, ball sport, then football for example. Which I think we can. Other caveat is that we need to find a nice way to handle units because this aspect could be really ugly in domain specific version.

Notes on pros and cons follow. Note that the considerations are not equally weighted.

Generic

Pros

  • Changes are cheap
  • If i don’t care about the semantics can treat as data points. Easy to anticipate new stats don’t have to get to grips with new model for each sport. How much use is purely rendering stats onto a page? As opposed to querying into the semantics.
    • There are the use cases implied by Jim's diagram which suggest the need for some semantic correlation between systems. Other requirements (Reuters) are lining up statistical or action descriptions to photos or moments in videos. Audience profiling (CBC) linked to events and in-game actions. These are about more than throwing up stat reports but is it the kind of semantic querying you refer to?
    • The above requirements are also very B2B. On the consumer schema.org side what are the interesting semantic applications?
  • Contextual relationships are easy to apply
    • This could be incorporated in other model?
    • The scope of a stat. i.e. goals scored in a season as opposed to a game, or in all competitions or just World Cup? Or wickets against a specific batsman.
  • Consistency of representation
  • Ease of understanding and documentation
    • Napkin size model without loads plugins
  • Less domain owners - less ontologists 😉
  • Less design authority needed: domain owner management Weakness of schema.org is lack of consistency around quality, patterns and approaches.
  • Single namespace
    • Right, so that means all those stat names are bunched together rather than being in a hundred vocabs each with its own URI like we have now here? Yes less models to refer to. Fewer prefixes to manage.
  • If units are a big deal. For example power in cycling..220watts - see example bottom of email
    • In my experience units have never been a big deal, they are always implied or assumed eg. yards gained in american football, mph in baseball pitch velocity. But we could ask around. Would be interesting to hear what the opentrack people have to say about it.
    • https://www.w3.org/community/opentrack/ - They are active again.
  • Easier to handle updates
    • If we had a system/approach for minting a stat uri it would provide a mechanism for identifying an individual stat and allow updating of micro-graphs within the data.

Cons

  • Developer still need to conform to the instance data
  • Nasty for data typing (really nasty)
  • Not very json friendly
  • Also IDs, which are readymade in domain-specific, but have to be minted for generic and assured of global uniqueness. This seems an issue.
  • But could also be a benefit when dealing with changes to data. This stat has changed and replaced with this data with the same uri.

Domain specific

Pros

  • More JSON developer friendly
  • Works with data typing patterns
  • Ontology can be used to document semantics - self documenting
  • More concise - instance data concise (ontology more verbose)
  • Control over inference over lower level semantics

Cons

  • Stat context might be more painful
  • More ontologists
  • Lots of plugins/domain specific models
  • Design authority
  • Potential to get more and more complex:
    • do we have a property for a score at half time? or do we model the concept of halves?
      • Halftime score in SportsML is a combination of event-status (intermission) and period-value (2) and current score. I suppose that state can also be conveyed in this format?

clarification of units, and measurement system via some data sidecar that describes the semantic stats

<EFBO1059986-TFBB13> a sport:AthletePerformance ;
    sport:performedBy <TFBB13> ;
    sport:powerToWeightRatio "4"^^xsd:integer ;
    sport:cyclistPeakPowerOutput "230.6"^^xsd:decimal ;
    sport:cyclistAvergePowerOutput "205.8"^^xsd:decimal ;
    sport:unitsDefinition <units-definition-1> .

<units-definition-1> a sport:UnitsDefinition ;
    sport:measurementSystem  <metric> ;
    sport:propertyUnits (
        sport:propertyKey "powerToWeightRatio" ;
        sport:propertyUnits <ratio> .
    ),
    (
        sport:propertyKey "cyclistPeakPowerOutput" ;
        sport:propertyUnits <watts> .
    ),
    (
        sport:propertyKey "cyclistAvergePowerOutput" ;
        sport:propertyUnits <watts> .
    ).

clarification of units via rule of thumb property naming convention

<EFBO1059986-TFBB13> a sport:AthletePerformance ;`
    sport:performedBy <TFBB13> ;    `
    sport:powerToWeightRatio "4"^^xsd:integer ;`
    sport:cyclistPeakPowerOutput "230.6"^^xsd:decimal ;`
    sport:cyclistPeakPowerOutputUnits <watts> ;`
    sport:cyclistAvergePowerOutput "205.8"^^xsd:decimal ;`
    sport:cyclistAvergePowerOutputUnits <watts> .`

units provided in the already reiifed Stat

<stat-3> rdf:type sport:Statistic ;
    statisticProperty <cyclistPeakPowerOutput> ;
    value "230.6" ;
    datatype "decimal" ;
    units <watts> .

Patterns for updating RDF

This should really read sending updated or modified stats package payloads. The serialisation of data should make little difference.

This is not about updating RDF (or XML, or JSON) in some target data store (by some consumer) but about how we model the ability for the Stats payload to be resent with modified data (in the case of a correction) or an updated stat being sent over the wire (eg distance covered by a player in a football match).

Memberships - how to relate Teams to Players over time

Dedicate 30 Nov Sports WG meeting to this subject. Collect examples.

Different ways that people can be members of teams/squads at the same time, over time etc.

Player memberships - design considerations