This notebook works through one method of building out initial world country items that I'll have to improve on. It is simple the ISO 3166-1 set of codes and country names. As we work out what all information about countries we need the GeoKB to contain, we can pull additional information via these codes from Wikidata or other sources. I used the expedient of getting at the ISO country list via the pycountry package, which leverages the Debian ISO database.

As we dig deeper into what we need to link with in terms of political boundaries in other countries, we may want to revisit this from the standpoint of subdivisions identified with ISO 3166-2 codes. The challenge there will be classifying those in some useful way as each country uses their own classification system (e.g., London boroughs, Unitary areas, etc. in GB). For now, we'll leave those out, using the U.S. Census sources for subdivions in the U.S.

In [5]:
import pycountry
from wbmaker import WikibaseConnection

In [6]:
geokb = WikibaseConnection("GEOKB_CLOUD")

In [9]:
def lookup_country(iso_alpha):
    q = """
    %(namespaces)s

    SELECT ?st ?iso_alpha
    WHERE {
    ?st wdt:%(p_iso_alpha)s "%(v_iso_alpha)s" .
    ?st wdt:%(p_iso_alpha)s ?iso_alpha .
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
    }
    """ % {
        "namespaces": geokb.sparql_namespaces(),
        "v_iso_alpha": iso_alpha,
        "p_iso_alpha": geokb.prop_lookup['ISO 3166-1 alpha-2 code']
    }

    return geokb.sparql_query(query=q, output="lookup")


In [20]:
for c in pycountry.countries:
    c_lookup = lookup_country(c.alpha_2)
    if c_lookup:
        item = geokb.wbi.item.get(c_lookup[c.alpha_2])
    else:
        item = geokb.wbi.item.new()

    aliases = []
    try:
        item.labels.set('en', c.official_name)
        aliases.append(c.name)
    except:
        item.labels.set('en', c.name)

    aliases.extend([c.alpha_2, c.alpha_3])
    item.aliases.set('en', aliases)

    item.descriptions.set('en', 'a world country item')

    claims = geokb.models.Claims()
    
    claims.add(
        geokb.datatypes.Item(
            prop_nr=geokb.prop_lookup['instance of'],
            value=geokb.class_lookup['country']
        )
    )

    claims.add(
        geokb.datatypes.ExternalID(
            prop_nr=geokb.prop_lookup['ISO 3166-1 alpha-2 code'],
            value=c.alpha_2
        )
    )
    
    claims.add(
        geokb.datatypes.ExternalID(
            prop_nr=geokb.prop_lookup['ISO 3166-1 alpha-3 code'],
            value=c.alpha_3
        )
    )

    claims.add(
        geokb.datatypes.ExternalID(
            prop_nr=geokb.prop_lookup['ISO 3166-1 numeric code'],
            value=c.numeric
        )
    )

    item.claims.add(claims)

    response = item.write(
        summary="Updated item from pycountry",
        clear=True
    )
    print("UPDATED:", c.name, response.id)

UPDATED: Aruba Q153
UPDATED: Afghanistan Q103
UPDATED: Angola Q106
UPDATED: Anguilla Q26702
UPDATED: Åland Islands Q26703
UPDATED: Albania Q196
UPDATED: Andorra Q199
UPDATED: United Arab Emirates Q100
UPDATED: Argentina Q37
UPDATED: Armenia Q35
UPDATED: American Samoa Q26704
UPDATED: Antarctica Q26705
UPDATED: French Southern Territories Q26706
UPDATED: Antigua and Barbuda Q72
UPDATED: Australia Q36
UPDATED: Austria Q171
UPDATED: Azerbaijan Q197
UPDATED: Burundi Q119
UPDATED: Belgium Q26707
UPDATED: Benin Q117
UPDATED: Bonaire, Sint Eustatius and Saba Q26708
UPDATED: Burkina Faso Q120
UPDATED: Bangladesh Q104
UPDATED: Bulgaria Q195
UPDATED: Bahrain Q34
UPDATED: Bahamas Q70
UPDATED: Bosnia and Herzegovina Q26709
UPDATED: Saint Barthélemy Q26710
UPDATED: Belarus Q187
UPDATED: Belize Q26711
UPDATED: Bermuda Q26712
UPDATED: Bolivia, Plurinational State of Q62
UPDATED: Brazil Q184
UPDATED: Barbados Q26713
UPDATED: Brunei Darussalam Q108
UPDATED: Bhutan Q107
UPDATED: Bouvet Island Q26714
UPD