Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCD Division IDs in Wikidata #213

Open
epaulson opened this issue Oct 10, 2020 · 5 comments
Open

OCD Division IDs in Wikidata #213

epaulson opened this issue Oct 10, 2020 · 5 comments

Comments

@epaulson
Copy link

This is more of an FYI issue - I posted this to the google group, but James McKinney recommended also creating an issue and tagging the @opencivicdata/division-id-curators (though @jpmckinney, I don't seem to be able to tag that team, maybe you can only do that if you're on the team?) At any rate, I am reproducing that post here


As you might have seen in an earlier note to the list, there is now a property in Wikidata that connects entities to an Open Civic Data Division ID. I was the one who wrote the property proposal and fielded questions there, so I wanted to do some introductions here now that it’s been enabled.

Wikidata, if you don’t know it, is a sibling project to Wikipedia that aims to create a free and open structured knowledge base that’s both machine-readable and easy for humans to read too. Part of it aims to put data that’s found in the Wikipedia infoboxes (facts like city populations, area, etc) into a structured format so that it can be reused across wikis, but it goes much beyond that. Wikidata includes data about many more “things” than Wikipedia does - there are about 90 million entities in Wikidata right now, and growing.

Wikidata is a knowledge graph, using an extension to Mediawiki called ‘Wikibase’. For purposes of this discussion, you can treat Wikibase as a triplestore graph database that captures facts in the form:

e.g. each fact makes a statement about how “subject” and “object” are related via “predicate”. For example:

“madison” isA “City”
“madison” isCapitalOf “Wisconsin”
“madison” hasPopulation 223209
“madison” hasOpenCivicDataDivisionID ocd-division/country:us/state:wi/place:madison

(In actuality Wikidata uses its own set of identifiers for everything in its facts. So, instead of saying “madison” isCapitalOf “Wisconsin”, Wikidata writes

Q43788 P1376 Q1537

and for OCD-ID, it says

Q43788 P8651 ‘ocd-division/country:us/state:wi/place:madison’ )

The interesting change that happened recently was that property P8651 ( https://www.wikidata.org/wiki/Property:P8651 ) was recently created, which now lets folks add OCD-IDs to entities in Wikidata. (They’re added as literals)

Wikidata includes a query language that can be used via a nice UI or over an HTTP endpoint (https://query.wikidata.org/ ), using the SPARQL query language. Here’s how to look up the Wikidata entity for Madison

SELECT ?item ?itemLabel 
WHERE 
{
  ?item wdt:P8651 "ocd-division/country:us/state:wi/place:madison" .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

(See it in action here - click on the blue run triangle to execute it - https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%0AWHERE%20%0A%7B%0A%20%20%3Fitem%20wdt%3AP8651%20%22ocd-division%2Fcountry%3Aus%2Fstate%3Awi%2Fplace%3Amadison%22%20.%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22.%20%7D%0A%7D )

I’ve labeled just a handful of US congressional districts with OCD-IDs, you can see them with this query:

SELECT ?item ?itemLabel ?ocd 
WHERE 
{
  ?item wdt:P8651 ?ocd.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%3Focd%20%0AWHERE%20%0A%7B%0A%20%20%3Fitem%20wdt%3AP8651%20%3Focd.%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22.%20%7D%0A%7D

So, a couple of things:

First, I think this is really exciting because Wikidata can be the database that OCD lacks. Virtually everything that gets an OCD Division Identifier is notable enough to include in Wikidata, so it can be a database to find out more data about any OCD-ID.

Second, what makes OCD-IDs cool is that they’re the shared database keys across multiple databases, so now you can join in through Wikidata to many more databases and facts. The Google Civic Info API can tell you who the current representatives are for a district, but you can combine that with Wikidata and get the shapefile for the CD or follow the link to the Reddit for that district, or find the population of the state, etc.

Third, Wikidata can be the source of additional entities that probably need OCD-IDs. It’s also possible that there are OCD-IDs for things that aren’t in Wikidata, so both projects can complete each other.

And some caveats:

Wikidata is not going to mint new OCD-IDs. Obviously nothing changes for OCD-ID - if there is something that someone finds in Wikidata that should have an OCD-ID, they should still come to the Github and propose a change with the new identifier. The value of OCD-ID remains the same: everyone who uses OCD-IDs agrees that they’ll use the OCD-IDs as identifiers against their local database, and the value is that the set of identifiers is governed so there is a shared join key.

For now, this property is only for Division identifiers, again because they’re governed. There are formats specified for other identifiers like People and Jurisdictions and others, but as near as I can tell anyone using these types are just minting them themselves for local data use and no one is committing to share them between data sources. (Wikidata can support that use case, but I think it would treat each identifier as a property to a specific data source, like an OpenStates ID vs an OpenElections ID, etc)

There may be some weirdness in the data models between OCD and Wikidata - for example, in Wikidata they’ve decided to separate out some constituencies that might represent an entire district, for example, for Australia they’ve got Queensland the senate constituency ( https://www.wikidata.org/wiki/Q56649111 ) and Queensland the state ( https://www.wikidata.org/wiki/Q36074 ) and they think at some point they may do something similar for the US.

This list is pretty low-traffic so I’m not sure how many people are going to see this. (I’m actually hoping that maybe some folks on the list are interested in working together to put OCD-IDs into Wikidata and maybe that would lead to some more traffic on the list). Looking forward to hearing from folks with questions or ideas!

@jpmckinney
Copy link
Member

Tagging @opencivicdata/division-id-curators for discussion. Integration with Wikidata seems like a good idea.

@djbridges
Copy link

djbridges commented Oct 13, 2020 via email

@derenrich
Copy link

So there are now about 2000 mappings between ocd-ids and wikidata items. I think there's full coverage of US federal house/senate/states. The state legislature district coverage is choppy and there are modeling issues that need to be considered (where we might need many to one mappings between things).

Unrelatedly, do you know if anyone has the mapping from ocd-ids and ballotpedia pages? I know it exists but I'm guessing ballotpedia is guarding it.

@epaulson
Copy link
Author

This is awesome, if you were the one who added them, thanks!

I'm not aware of anyone with OCD-ID to ballotpedia mappings. It looks like you (or the person who's been editing wikidata) has been making progress on that, there's now ~1300 things that have both OCD-IDs and Ballotpedia IDs.

Is the modeling issue on wikidata or in OCD?

@derenrich
Copy link

so the main modeling issue is things like https://www.wikidata.org/wiki/Q4632627 where wikidata has one entry but you have two. Currently Wikidata now has https://www.wikidata.org/wiki/Q104235619 and https://www.wikidata.org/wiki/Q104235541 but unclear if that's the best way to do this. This issue extends to all other "coterminous" districts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants