# Tribal Representation in the Global Knowledge Commons
I often refer to the combination of Wikipedia, Wikidata, and the other projects of the Wikimedia Foundation as the Global Knowledge Commons. It is the completely open online space where lots of people from many cultures and languages all around the world are coming together to build a foundation of the world's knowledge about everything. It's an incredibly messy experiment in fully democratized knowledge management that is often messy but nevertheless incredibly compelling to be a part of.

From the perspective of indigenous peoples, it represents an opportunity to set some of the record straight about how things are and how they've come to be through the long history of colonialism and the terrible results of the doctrine of discovery. The open nature of the platform also provides a way that indigenous systems of knowledge can be encoded and represented alongside other systems of knowledge, providing opportunities for increased understanding and communication between different ways that we look at and try to understand the world.

This notebook takes a dive through the basic identification of Native American Tribes in the United States as they exist today. It will touch on Alaska Native and Native Hawaiian groups to some extent, but those will require other work as well for their particular context. My main focus is to track down how the Tribes and their sovereign nation governments represent themselves in name, mainly their English language names as used in relationships with the U.S. Federal government. I'm focusing here, because many of my research objectives have to do with the land areas and other resources that the Tribes are currently stewarding. Data sources I need to connect with from BIA, U.S. Census, and other government organizations are connected with names of the Federally recognized tribes. If I hope to contribute what I can develop through these sources into the Global Knowledge Commons, I need to nail down how Tribes are currently represented and clean some things up along the way.

## Overall Objective
The most important initial thrust of my work here is to have labeled entities in Wikidata that represent the full group of Sovereign Nations that the U.S. Government recognizes, pursuant to the Federally Recognized Indian Tribe List Act of 1994 (Pub. L. 103-454, 108 Stat. 4791-4792). I have to determine whether existing entities in Wikidata that seem to match up based on labels and other identifying claims line up well enough and if there are any issues. If there are issues I find where some person or bot introduced something that's just not accurate or needs clarification, I'll consider doing so (e.g., see this [modification](https://www.wikidata.org/wiki/Q5092177)). I'll need to decide some things like whether to change labels around (e.g., Diné (Navajo) vs. Navajo (Diné)) and what all to do with claims that are without evidence or merit.

I want to make sure and record statements pointing to official web sites for the Tribes wherever I have those from sources or can track them down. This is an important first step in having the items represent the Tribes as they represent themselves online. The URLs are an important lead to more detailed, accurate, and timely information.

Messing around with what's in Wikipedia is more involved. I can already see where there are issues in how things have been done with redirects and how some of the articles are written. But Wikipedia articles are stories that I don't feel qualified to contribute to. What I'll try to do is connect the dots and point out issues I see, but it should be members from Tribes themselves who contribute there.

## Why this matters
Along with the aforementioned notes, the Global Knowledge Commons is increasingly factoring into the development of AI and all of its implications in terms of what we turn AI models onto working for us. I see this as an indigenous data sovereignty issue in that Tribes and allies working in concert with Tribes need to create the most accurate and current digital representation in the Commons. If the basis for things like the reasoning of Large Language Models is based at least partially in inaccurate, incomplete, out of date, or systemically skewed information about Indian Country, then we are going to continue perpetuating, at an accelerated pace, systemic injustice and the effects of colonialism.

In [24]:
import requests
from nested_lookup import nested_lookup
import pandas as pd
import wd

In [2]:
wd_cnxn = wd.Wikidata()

# Wikipedia List of Federally Recognized Tribes for CONUS

The [listing of Federally recognized Tribes for CONUS](https://en.wikipedia.org/wiki/List_of_federally_recognized_tribes_in_the_contiguous_United_States) is a somewhat reasonable list. It is certainly more comprehensive than what a [Wikidata query](https://w.wiki/77Ms) currently returns. The Wikidata query uses the semantics of classification as a "federally recognized Native American tribe in the United States," so it returns some of the Alaska Native villages as well. Indigenous Hawaiian peoples are often a separate classification in things like the U.S. Census, and I'll work through those as well at some point.

Ideally, we should be able to have a clear pathway into all Federally recognized Tribes as maintained in datasets from the BIA within the "Global Knowledge Commons." Right now, this is not the case, and I'm working through how to bring about alignment.

In [25]:
fed_tribes_page = "List_of_federally_recognized_tribes_in_the_contiguous_United_States"
fed_tribes_sections_api = f"https://en.wikipedia.org/w/api.php?action=parse&prop=sections&page={fed_tribes_page}&format=json"

fed_tribes_sections = requests.get(fed_tribes_sections_api).json()
tribal_list_section_index = next((i["index"] for i in fed_tribes_sections["parse"]["sections"] if i["line"] == "Alphabetical list of federally recognized tribes"), None)

tribal_list_api = f"https://en.wikipedia.org/w/api.php?action=parse&prop=links&page={fed_tribes_page}&section={tribal_list_section_index}&format=json"
tribal_list_links = requests.get(tribal_list_api).json()

tribe_pages = [i['*'] for i in tribal_list_links["parse"]["links"]]

wikipedia_tribes = pd.DataFrame(tribe_pages, columns=["tribe_page_link"])
wikipedia_tribes["site_link"] = wikipedia_tribes['tribe_page_link'].apply(lambda x: x.replace(' ', '_'))

## Redirects/Aliases

The tribal listing page shows a number of cases of names that a Tribe was "previously listed as." In digging a bit further, I found that these are not all of the names that are now redirected to a current name, and many of the listed tribe names actually redirect to something else that is not listed on that page. Redirects are built into the Mediawiki instance of EN-Wikipedia. To get at this dynamic, I wrote a couple of functions to operate against the Wikipedia API and pull back the full web of redirects in play. This gives us all the Wikipedia links that go to some page that presumably describes the tribe.

This is where we start to see some of the issues in semantic confusion represented by the Tribes not representing themselves in the global knowledge commons.

In [38]:
user_agent_header = {"User-Agent": "https://github.com/skybristol/indian_country_data"}

def get_redirects_to(page_title):
    api = f"https://en.wikipedia.org/w/api.php?action=query&titles={page_title}&redirects&format=json"
    r = requests.get(api, headers=user_agent_header).json()
    if "query" in r and "redirects" in r["query"]:
        return r["query"]["redirects"][0]["to"]
    return None

def get_redirects_from(page_title):
    if page_title is None:
        return
    api = f"https://en.wikipedia.org/w/api.php?action=query&titles={page_title}&prop=redirects&format=json"
    r = requests.get(api, headers=user_agent_header).json()
    redirects = nested_lookup('redirects', r)
    if redirects:
        return nested_lookup('title', redirects)


In [29]:
wikipedia_tribes['redirects_to'] = wikipedia_tribes.tribe_page_link.apply(get_redirects_to)

In [34]:
wikipedia_tribes["redirects_to"] = wikipedia_tribes.redirects_to.apply(lambda x: x[0] if isinstance(x, list) else None)

In [36]:
wikipedia_tribes["aliases"] = wikipedia_tribes.redirects_to.apply(get_redirects_from)

In [55]:
wikipedia_tribes['wp_tribe_name'] = wikipedia_tribes.apply(lambda x: x['redirects_to'] if x['redirects_to'] is not None else x['tribe_page_link'], axis=1)

In [58]:
wp_tribes_and_aliases = wikipedia_tribes[~wikipedia_tribes.wp_tribe_name.str.startswith("Help:")][['wp_tribe_name','aliases']].reset_index(drop=True)
wp_tribes_and_aliases.head()

Unnamed: 0,wp_tribe_name,aliases
0,Absentee Shawnee Tribe of Indians,"[Absentee-Shawnee Tribe, Absentee-Shawnee Trib..."
1,Delaware Nation,"[Delaware Nation, Oklahoma, Delaware Tribe of ..."
2,Agua Caliente Band of Cahuilla Indians,[Agua Caliente Band of Cahuilla Indians of the...
3,Ak-Chin Indian Community,
4,Alabama–Coushatta Tribe of Texas,"[Alabama-Coushatta, Alabama-Coushatta Tribe, A..."


# Tribes in Wikidata

In [3]:
existing_tribes_sparql_url = "https://query.wikidata.org/sparql?query=SELECT%20%3Ftribe%20%3FtribeLabel%20%3Farticle%0AWHERE%20%7B%0A%20%20%3Ftribe%20wdt%3AP31%20wd%3AQ7840353.%0A%20%20OPTIONAL%20%7B%0A%20%20%20%20%20%20%3Farticle%20schema%3Aabout%20%3Ftribe%20.%0A%20%20%20%20%20%20%3Farticle%20schema%3AinLanguage%20%22en%22%20.%0A%20%20%20%20%20%20FILTER%20(SUBSTR(str(%3Farticle)%2C%201%2C%2025)%20%3D%20%22https%3A%2F%2Fen.wikipedia.org%2F%22)%0A%20%20%7D%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D%0A%7D%0A"
existing_tribes = wd_cnxn.url_sparql_query(
    sparql_url=existing_tribes_sparql_url,
    output_format="dataframe"
)

In [15]:
existing_tribes['site_link'] = existing_tribes.article.apply(lambda x: x.split("/")[-1] if x is not None else None)

In [17]:
existing_tribes[~existing_tribes['site_link'].isin(tribe_pages_sitelinks)]

Unnamed: 0,tribe,tribeLabel,article,site_link
0,http://www.wikidata.org/entity/Q79507,Holy Cross Village,"https://en.wikipedia.org/wiki/Holy_Cross,_Alaska","Holy_Cross,_Alaska"
1,http://www.wikidata.org/entity/Q79540,Village of Aniak,"https://en.wikipedia.org/wiki/Aniak,_Alaska","Aniak,_Alaska"
2,http://www.wikidata.org/entity/Q79541,New Stuyahok Village,"https://en.wikipedia.org/wiki/New_Stuyahok,_Al...","New_Stuyahok,_Alaska"
3,http://www.wikidata.org/entity/Q79553,Emmonak Village,"https://en.wikipedia.org/wiki/Emmonak,_Alaska","Emmonak,_Alaska"
4,http://www.wikidata.org/entity/Q79677,Native Village of Gambell,"https://en.wikipedia.org/wiki/Gambell,_Alaska","Gambell,_Alaska"
...,...,...,...,...
247,http://www.wikidata.org/entity/Q113344872,Coyote Valley Band of Pomo Indians,,
248,http://www.wikidata.org/entity/Q118958746,Native Village of Kotzebue,,
249,http://www.wikidata.org/entity/Q118958752,Qawalangin Tribe of Unalaska,,
250,http://www.wikidata.org/entity/Q118959010,Village of Alakanuk,,


# BIA Tribal Leaders
Perhaps the best way to start with BIA's data is to work with their listing of Tribal Leaders. This is one of only a small number of datasets that BIA has online and public. It's a questionable source in that it's disclaimer indicates it is not really an official listing. Nevertheless, it does provide some useful details such as web sites for some Tribes, names and titles of tribal leaders, locations of Tribal Government offices, and the connections to the other DOI Bureaus on the landscape.

There are also fields for the date a tribal leader was elected and when the next election will be held. These might actually provide the best clue as to the currency of the information for some records that belie the stated currency date in something like the [data.gov representation](https://catalog.data.gov/dataset/tribal-leader-directory) for the dataset. However, looking at the record for my own Tribe, the Cherokee Nation, I see that their data is out of date and incorrect. Chuck Hoskin, Jr. was reelected for a second term on June 3, 2023. The latest date in the tribal leaders table is 2019, and it incorrectly states the "next election" for 6/22/2023. So, chances are good we can't trust this aspect of the data model as accurate or actionable.

The name of the Tribe as recognized by the U.S. Government and BIA is the most important connecting point as it will provide us the linkage to official BIA geospatial data on reservation boundaries and trust lands, which we will use in other work. We can also use the web sites listed for some Tribes as a starting point toward seeing how the Tribes reference themselves. They will each need to be visited and examined further.

In [46]:
tribal_leaders = pd.read_csv("https://www.bia.gov/tribal-leaders-csv")


In [51]:
tribal_leaders.columns

Index(['TribeFullName', 'Tribe', 'TribeAlternateName', 'TribalComponent',
       'Salutation', 'FirstName', 'MiddleName', 'LastName', 'Suffix', 'Aka',
       'JobTitle', 'Organization', 'BIARegion', 'BIAAgency', 'PhysicalAddress',
       'City', 'State', 'ZIPCode', 'Alaska', 'Phone', 'Fax', 'Email',
       'WebSite', 'MailingAddress', 'MailingAddressCity',
       'MailingAddressState', 'MailingAddressZIPCode', 'DateElected',
       'NextElection', 'Directory', 'Notes', 'ANCSARegion', 'BLMRegion',
       'BORRegion', 'FWSRegion', 'LCC', 'NPSRegion', 'USGSRegion',
       'AlaskaSubsistenceRegion', 'Latitude', 'Longtitude'],
      dtype='object')

In [52]:
tribal_leaders[tribal_leaders.NextElection.notnull()].NextElection

0       4/12/2026
1      11/30/2023
2       3/16/2024
3       3/16/2024
4        3/1/2024
          ...    
580     11/1/2022
581     July 2023
584      4/1/2024
585    12/31/2025
586      6/1/2019
Name: NextElection, Length: 415, dtype: object

In [54]:
tribal_leaders[tribal_leaders.TribeFullName == 'Cherokee Nation'].iloc[0].to_dict()

{'TribeFullName': 'Cherokee Nation',
 'Tribe': 'Cherokee',
 'TribeAlternateName': 'Cherokee Nation',
 'TribalComponent': 'Tribe',
 'Salutation': nan,
 'FirstName': 'Chuck',
 'MiddleName': nan,
 'LastName': 'Hoskin',
 'Suffix': 'Jr.',
 'Aka': nan,
 'JobTitle': 'Principal Chief',
 'Organization': nan,
 'BIARegion': 'Eastern Oklahoma',
 'BIAAgency': 'Eastern Oklahoma Regional Office',
 'PhysicalAddress': '17675 South Muskogee Avenue',
 'City': 'Tahlequah',
 'State': 'OK',
 'ZIPCode': '74464',
 'Alaska': 'No',
 'Phone': '(800) 256-0671',
 'Fax': '(918) 458-5580',
 'Email': 'chuck-hoskin@cherokee.org',
 'WebSite': 'http://www.cherokee.org',
 'MailingAddress': 'P.O. Box 948',
 'MailingAddressCity': 'Tahlequah',
 'MailingAddressState': 'Oklahoma',
 'MailingAddressZIPCode': '74465',
 'DateElected': '6/22/2019',
 'NextElection': '6/22/2023',
 'Directory': nan,
 'Notes': nan,
 'ANCSARegion': nan,
 'BLMRegion': 'New Mexico',
 'BORRegion': 'Missouri Basin',
 'FWSRegion': 'Region 2 - Southwest',
 '

# Identifiers
By and large, we are dealing with names as identifiers for Tribes. In the BIA Tribal Leaders dataset, we have three fields represending names (TribeFullName, TribeAlternateName, and Tribe). In Wikipedia, we have the names of the tribes from the listing pointing to Wikipedia pages, many of which have redirects which provide alternate names. If we connect the dots to Wikidata, we have additional possibilities for names and alternate names, some of which provide a rich source of the names Tribes call themselves in some circumstances (e.g., Diné vs. Navajo).

Interestingly, we have at least one example in the Wikidata item for the [Cherokee Nation](https://www.wikidata.org/wiki/Q14708404) where there are a number of additional identifiers for the entity, including one from the [Research Organization Registry](https://ror.org/00p23dy23), itself another interesting source for identifier connection. The Cherokee Nation Wikidata item is particularly well developed compared to many others. This may be do to the broad reach of [Cherokee Nation Businesses](https://www.wikidata.org/wiki/Q5092178) and their role in government contracting, including government R&D work across a number of agencies.

Once I nail down what existing Wikidata entities align with Wikipedia pages and BIA Tribes, I can critically examine what all statements have been made, including statements for external identifiers. I can already see some cases where external ID claims made are not really accurate until they are qualified in some way as to their significance. For instance, the ISNI reference for the item representing the [Diné](https://www.wikidata.org/wiki/Q5092178) points to something that is for a "[musical group or band](https://isni.org/isni/0000000472000739)." In whatever circumstance (I haven't tracked that down), that claim made sense at some level for whoever meant it. But if the person or bot making the claim didn't record the reasoning through reference and qualifiers, we can't know what that context was.

In [64]:
tribal_leaders[
    (tribal_leaders['TribeFullName'].isin(wp_tribes_and_aliases['wp_tribe_name']))
    |
    (tribal_leaders['TribeAlternateName'].isin(wp_tribes_and_aliases['wp_tribe_name']))
    |
    (tribal_leaders['Tribe'].isin(wp_tribes_and_aliases['wp_tribe_name']))
    |
    (tribal_leaders['TribeFullName'].isin(wp_tribes_and_aliases[['aliases']].explode('aliases')['aliases']))
    |
    (tribal_leaders['TribeAlternateName'].isin(wp_tribes_and_aliases[['aliases']].explode('aliases')['aliases']))
    |
    (tribal_leaders['Tribe'].isin(wp_tribes_and_aliases[['aliases']].explode('aliases')['aliases']))
]

Unnamed: 0,TribeFullName,Tribe,TribeAlternateName,TribalComponent,Salutation,FirstName,MiddleName,LastName,Suffix,Aka,...,ANCSARegion,BLMRegion,BORRegion,FWSRegion,LCC,NPSRegion,USGSRegion,AlaskaSubsistenceRegion,Latitude,Longtitude
0,"Confederated Tribes of the Coos, Lower Umpqua ...",Confederated Coos,,Tribe,,Brad,,Kneaper,,,...,,Oregon/Washington,Columbia-Pacific Northwest,Region 1 - Pacific,,Pacific West,Northwest,,43.383287,-124.264684
1,Confederated Tribes and Bands of the Yakama Na...,Confederated Yakama,,Tribe,,Gerald,,Lewis,,,...,,Oregon/Washington,Columbia-Pacific Northwest,Region 1 - Pacific,,Pacific West,Northwest,,46.377351,-120.308667
2,Suquamish Indian Tribe of the Port Madison Res...,Suquamish,,Tribe,,Leonard,,Forsman,,,...,,Oregon/Washington,Columbia-Pacific Northwest,Region 1 - Pacific,,Pacific West,Northwest,,47.730754,-122.561621
3,Tulalip Tribes of Washington,Tulalip,,Tribe,,Teri,,Gobin,,,...,,Oregon/Washington,Columbia-Pacific Northwest,Region 1 - Pacific,,Pacific West,Northwest,,48.054661,-122.258542
5,Quinault Indian Nation,Quinault,,Tribe,,Guy,,Capoeman,,,...,,Oregon/Washington,Columbia-Pacific Northwest,Region 1 - Pacific,,Pacific West,Northwest,,47.347305,-124.293240
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
582,Cayuga Nation,Cayuga Nation of New York,,Tribe,,Clint,,Halftown,,,...,,Eastern States,,Region 5 - Northeast,,Northeast,Northeast,,42.910622,-76.796622
583,"Cedarville Rancheria, California",Cedarville,,Tribe,,Melissa,,Daniello,,,...,,California,California-Great Basin,Region 8 - Pacific Southwest,,Pacific West,Pacific,,41.483971,-120.545339
584,Cher-Ae Heights Indian Community of the Trinid...,Cher-Ae Heights,Trinidad Rancheria,Tribe,,Garth,,Sundberg,Sr.,,...,,California,California-Great Basin,Region 8 - Pacific Southwest,,Pacific West,Pacific,,41.059291,-124.143125
585,"Cheyenne and Arapaho Tribes, Oklahoma",Cheyenne River,,Tribe,,Reggie,,Wassana,,,...,,New Mexico,Missouri Basin,Region 2 - Southwest,,Intermountain,Southwest,,35.614724,-97.992524
