# Identifying Ethnicity in OpenSAFELY-TPP
This short report describes how ethnicity can be identified in the OpenSAFELY-TPP database, and the strengths and weaknesses of the methods. This is a living document that will be updated to reflect changes to the OpenSAFELY-TPP database and the patient records within.

## OpenSAFELY
OpenSAFELY is an analytics platform for conducting analyses on Electronic Health Records inside the secure environment where the records are held. This has multiple benefits: 

* We don't transport large volumes of potentially disclosive pseudonymised patient data outside of the secure environments for analysis
* Analyses can run in near real-time as records are ready for analysis as soon as they appear in the secure environment
* All infrastructure and analysis code is stored in GitHub repositories, which are open for security review, scientific review, and re-use

A key feature of OpenSAFELY is the use of study definitions, which are formal specifications of the datasets to be generated from the OpenSAFELY database. This takes care of much of the complex EHR data wrangling required to create a dataset in an analysis-ready format. It also creates a library of standardised and validated variable definitions that can be deployed consistently across multiple projects. 

The purpose of this report is to describe all such variables that relate to BMI, their relative strengths and weaknesses, in what scenarios they are best deployed. It will also describe potential future definitions that have not yet been implemented.

## Available Records
OpenSAFELY-TPP runs inside TPP’s data centre which contains the primary care records for all patients registered at practices using TPP’s SystmOne Clinical Information System. This data centre also imports external datasets from other sources, including A&E attendances and hospital admissions from NHS Digital’s Secondary Use Service, and death registrations from the ONS. More information on available data sources can be found within the OpenSAFELY documentation. 

In [13]:
import sys
print(sys.path)

['c:\\Users\\candrews\\Documents\\GitHub\\ethnicity-short-data-report\\notebooks_jupyter', 'c:\\Users\\candrews\\anaconda3\\python39.zip', 'c:\\Users\\candrews\\anaconda3\\DLLs', 'c:\\Users\\candrews\\anaconda3\\lib', 'c:\\Users\\candrews\\anaconda3', '', 'c:\\Users\\candrews\\anaconda3\\lib\\site-packages', 'c:\\Users\\candrews\\anaconda3\\lib\\site-packages\\locket-0.2.1-py3.9.egg', 'c:\\Users\\candrews\\anaconda3\\lib\\site-packages\\win32', 'c:\\Users\\candrews\\anaconda3\\lib\\site-packages\\win32\\lib', 'c:\\Users\\candrews\\anaconda3\\lib\\site-packages\\Pythonwin', 'c:\\Users\\candrews\\anaconda3\\lib\\site-packages\\IPython\\extensions', 'C:\\Users\\candrews\\.ipython']


In [25]:
import os
import pandas as pd
import numpy as np
from itertools import product
from IPython.display import display, Markdown, Image

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 500)
pd.options.mode.chained_assignment = None 
pd.options.display.float_format = '{:,.0f}'.format

In [26]:

def local_patient_counts(
    definitions, output_path, code_dict="", categories=False, missing=False,
):
    import pandas as pd

    suffix = "_filled"
    overlap = "all_filled"
    if missing == True:
        suffix = "_missing"
        overlap = "all_missing"
    if categories:
        df_append = pd.read_csv(
            f"../output/{output_path}/simple_patient_counts_categories.csv"
        ).set_index(["group", "subgroup"])
        # ensure definitions[n] in code_dict[definitions[n]] below refers to one of the definitions of interest
        definitions = [
            f"{category}_{definition}"
            for category, definition in product(
                code_dict[definitions[1]].values(), definitions
            )
        ]
    else:
        df_append = pd.read_csv(
            f"../output/{output_path}/simple_patient_counts.csv"
        ).set_index(["group", "subgroup"])

    for definition in definitions:
        if missing:
            df_append[definition + suffix] = (
                df_append["population"] - df_append[definition + "_filled"]
            )
        df_append[definition + "_pct"] = round(
            (df_append[definition + suffix].div(df_append["population"])) * 100, 1
        )
        df_append[overlap + "_pct"] = round(
            (df_append[overlap].div(df_append["population"])) * 100, 1
        )

        # Combine count and percentage columns
        df_append[definition] = (
            df_append[definition + suffix].apply(lambda x: "{:,.0f}".format(x))
            + " ("
            + df_append[definition + "_pct"].astype(str)
            + ")"
        )
        df_append = df_append.drop(columns=[definition + suffix, definition + "_pct"])
    df_append[overlap] = (
        df_append[overlap].apply(lambda x: "{:,.0f}".format(x))
        + " ("
        + df_append[overlap + "_pct"].astype(str)
        + ")"
    )
    df_append = df_append.drop(columns=[overlap + "_pct"])
    df_patient_counts = df_append[definitions + [overlap] + ["population"]]
    # Final redaction step
    df_patient_counts = df_patient_counts.replace(np.nan, "-")
    df_patient_counts = df_patient_counts.replace("nan (nan)", "- (-)")
    df_patient_counts.columns = df_patient_counts.columns.str.replace("_", " ")
    display(df_patient_counts)


In [27]:
### CONFIGURE ###
definitions = ['ethnicity_5', 'ethnicity_new_5', 'ethnicity_primis_5']
covariates = ['_age_band','_sex','_region','_imd','_dementia','_diabetes','_hypertension','_learning_disability']
output_path = 'released/output'
suffixes = ['','_missing']

code_dict = {
    "imd": {
        0: "Unknown",
        1: "1 Most deprived",
        2: "2",
        3: "3",
        4: "4",
        5: "5 Least deprived",
    },
    "ethnicity_5": {1: "White", 2: "Mixed", 3: "Asian", 4: "Black", 5: "Other"},
    "ethnicity_new_5": {1: "White", 2: "Mixed", 3: "Asian", 4: "Black", 5: "Other"},
    "ethnicity_primis_5": {1: "White", 2: "Mixed", 3: "Asian", 4: "Black", 5: "Other"},
}



## Results

### Count of Patients

In [28]:
local_patient_counts(
         definitions,  output_path
    )

Unnamed: 0_level_0,Unnamed: 1_level_0,ethnicity 5,ethnicity new 5,ethnicity primis 5,all filled,population
group,subgroup,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
all,with records,"18,853,950 (76.1)","18,592,275 (75.1)","14,750,720 (59.5)","14,581,555 (58.9)",24772720
age_band,0-19,"3,390,325 (62.3)","3,343,035 (61.4)","2,549,525 (46.8)","2,522,430 (46.3)",5443105
age_band,20-29,"2,205,355 (71.7)","2,174,105 (70.7)","1,743,815 (56.7)","1,724,100 (56.0)",3076605
age_band,30-39,"2,901,825 (81.6)","2,855,865 (80.3)","2,324,640 (65.3)","2,293,945 (64.5)",3557485
age_band,40-49,"2,641,115 (82.8)","2,598,215 (81.4)","2,106,740 (66.0)","2,078,585 (65.1)",3191130
age_band,50-59,"2,762,175 (81.4)","2,725,415 (80.3)","2,182,860 (64.3)","2,158,330 (63.6)",3395235
age_band,60-69,"2,213,925 (81.9)","2,187,485 (80.9)","1,738,795 (64.3)","1,720,795 (63.7)",2702765
age_band,70-79,"1,785,160 (82.3)","1,764,745 (81.4)","1,391,405 (64.1)","1,377,315 (63.5)",2169175
age_band,80+,"954,065 (77.1)","943,410 (76.3)","712,940 (57.6)","706,050 (57.1)",1237215
age_band,missing,- (-),- (-),- (-),- (-),5


### Count of Missings

In [29]:
local_patient_counts(
         definitions,  output_path, missing= True
    )

Unnamed: 0_level_0,Unnamed: 1_level_0,ethnicity 5,ethnicity new 5,ethnicity primis 5,all missing,population
group,subgroup,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
all,with records,"5,918,770 (23.9)","6,180,445 (24.9)","10,022,000 (40.5)","5,918,770 (23.9)",24772720
age_band,0-19,"2,052,780 (37.7)","2,100,070 (38.6)","2,893,580 (53.2)","2,052,785 (37.7)",5443105
age_band,20-29,"871,250 (28.3)","902,500 (29.3)","1,332,790 (43.3)","871,250 (28.3)",3076605
age_band,30-39,"655,660 (18.4)","701,620 (19.7)","1,232,845 (34.7)","655,655 (18.4)",3557485
age_band,40-49,"550,015 (17.2)","592,915 (18.6)","1,084,390 (34.0)","550,015 (17.2)",3191130
age_band,50-59,"633,060 (18.6)","669,820 (19.7)","1,212,375 (35.7)","633,060 (18.6)",3395235
age_band,60-69,"488,840 (18.1)","515,280 (19.1)","963,970 (35.7)","488,840 (18.1)",2702765
age_band,70-79,"384,015 (17.7)","404,430 (18.6)","777,770 (35.9)","384,020 (17.7)",2169175
age_band,80+,"283,150 (22.9)","293,805 (23.7)","524,275 (42.4)","283,145 (22.9)",1237215
age_band,missing,- (-),- (-),- (-),- (-),5


### Count by Category

In [30]:
local_patient_counts(
         definitions,  output_path,code_dict, categories=True,missing=False
    )

Unnamed: 0_level_0,Unnamed: 1_level_0,White ethnicity 5,White ethnicity new 5,White ethnicity primis 5,Mixed ethnicity 5,Mixed ethnicity new 5,Mixed ethnicity primis 5,Asian ethnicity 5,Asian ethnicity new 5,Asian ethnicity primis 5,Black ethnicity 5,Black ethnicity new 5,Black ethnicity primis 5,Other ethnicity 5,Other ethnicity new 5,Other ethnicity primis 5,all filled,population
group,subgroup,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
all,with records,"15,744,455 (63.6)","15,582,325 (62.9)","12,115,495 (48.9)","352,710 (1.4)","348,070 (1.4)","319,045 (1.3)","1,639,170 (6.6)","1,647,140 (6.6)","1,426,990 (5.8)","567,540 (2.3)","560,600 (2.3)","444,565 (1.8)","550,080 (2.2)","454,140 (1.8)","444,625 (1.8)","14,581,555 (58.9)",24772720
age_band,0-19,"2,648,150 (48.7)","2,622,960 (48.2)","1,936,315 (35.6)","130,995 (2.4)","129,435 (2.4)","112,240 (2.1)","369,240 (6.8)","370,135 (6.8)","315,835 (5.8)","134,465 (2.5)","132,920 (2.4)","100,560 (1.8)","107,475 (2.0)","87,585 (1.6)","84,575 (1.6)","2,522,430 (46.3)",5443105
age_band,20-29,"1,693,715 (55.1)","1,678,160 (54.5)","1,310,935 (42.6)","59,925 (1.9)","59,230 (1.9)","54,115 (1.8)","247,195 (8.0)","248,555 (8.1)","210,825 (6.9)","84,140 (2.7)","83,145 (2.7)","65,945 (2.1)","120,380 (3.9)","105,020 (3.4)","101,995 (3.3)","1,724,100 (56.0)",3076605
age_band,30-39,"2,271,350 (63.8)","2,247,770 (63.2)","1,787,685 (50.3)","61,795 (1.7)","60,925 (1.7)","57,270 (1.6)","349,755 (9.8)","351,135 (9.9)","303,130 (8.5)","99,725 (2.8)","98,500 (2.8)","78,665 (2.2)","119,195 (3.4)","97,530 (2.7)","97,885 (2.8)","2,293,945 (64.5)",3557485
age_band,40-49,"2,095,685 (65.7)","2,071,690 (64.9)","1,637,130 (51.3)","45,080 (1.4)","44,365 (1.4)","42,875 (1.3)","302,910 (9.5)","304,805 (9.6)","268,320 (8.4)","102,135 (3.2)","100,755 (3.2)","81,895 (2.6)","95,305 (3.0)","76,600 (2.4)","76,520 (2.4)","2,078,585 (65.1)",3191130
age_band,50-59,"2,417,855 (71.2)","2,391,820 (70.4)","1,887,020 (55.6)","30,950 (0.9)","30,505 (0.9)","29,485 (0.9)","173,880 (5.1)","175,655 (5.2)","155,570 (4.6)","83,025 (2.4)","82,015 (2.4)","66,690 (2.0)","56,460 (1.7)","45,415 (1.3)","44,095 (1.3)","2,158,330 (63.6)",3395235
age_band,60-69,"2,017,065 (74.6)","1,996,360 (73.9)","1,570,050 (58.1)","14,845 (0.5)","14,630 (0.5)","14,305 (0.5)","112,360 (4.2)","112,770 (4.2)","99,470 (3.7)","39,020 (1.4)","38,535 (1.4)","31,110 (1.2)","30,640 (1.1)","25,185 (0.9)","23,865 (0.9)","1,720,795 (63.7)",2702765
age_band,70-79,"1,694,105 (78.1)","1,676,415 (77.3)","1,313,550 (60.6)","6,060 (0.3)","5,965 (0.3)","5,795 (0.3)","56,020 (2.6)","56,325 (2.6)","49,565 (2.3)","14,340 (0.7)","14,170 (0.7)","11,370 (0.5)","14,635 (0.7)","11,870 (0.5)","11,125 (0.5)","1,377,315 (63.5)",2169175
age_band,80+,"906,520 (73.3)","897,150 (72.5)","672,815 (54.4)","3,055 (0.2)","3,010 (0.2)","2,965 (0.2)","27,810 (2.2)","27,760 (2.2)","24,275 (2.0)","10,685 (0.9)","10,550 (0.9)","8,325 (0.7)","5,990 (0.5)","4,940 (0.4)","4,565 (0.4)","706,050 (57.1)",1237215
age_band,missing,- (-),- (-),- (-),- (-),- (-),- (-),- (-),- (-),- (-),- (-),- (-),- (-),- (-),- (-),- (-),- (-),5


### Overlapping Definitions
Idea: Use an upset plot

In [31]:
#display(Image(f"../output/{output_path}/../figures/heatmap.png"))

### Latest vs. Most Common

In [32]:
for definition in definitions:
    for suffix in suffixes:
        df_sum = pd.read_csv(f'../output/{output_path}/simple_latest_common_{definition}{suffix}.csv').set_index(definition)
        # sort rows by category index
        df_sum.columns = df_sum.columns.str.replace(definition + "_", "")
        df_sum.columns = df_sum.columns.str.lower()
        df_sum = df_sum.reindex(list(code_dict[definition].values()))
        
        df_counts = pd.DataFrame(
            np.diagonal(df_sum),
            index=df_sum.index,
        #   columns=[f"matching (n={np.diagonal(df_sum).sum()})"],
        )

        df_sum2 = df_sum.copy(deep=True)
        np.fill_diagonal(df_sum2.values, 0)
        df_diag = pd.DataFrame(
            df_sum2.sum(axis=1),
        )
        df_out = df_counts.merge(df_diag, right_index=True, left_index=True)
        columns=round(df_out.sum()/df_out.sum(axis=1).sum()*100,1)
        df_out.columns=[f"matching ({columns[0]}%)",f"not matching ({columns[1]}%)"]
        display(df_out)
        
        if code_dict != "":
            lowerlist = [x.lower() for x in (list(code_dict[definition].values()))]
            df_sum = df_sum[lowerlist]
        else:
            df_sum = df_sum.reindex(sorted(df_sum.columns), axis=1)

        # Combine count and percentage columns
        df_sum["population"]=df_sum.sum(axis = 1)
        for item in lowerlist:
            df_sum[item + "_pct"]= round(
                    (df_sum[item].div(df_sum["population"])) * 100, 1
                )
        
            df_sum[item] = (
                    df_sum[item].apply(lambda x: "{:,.0f}".format(x))
                    + " ("
                    + df_sum[item + "_pct"].astype(str)
                    + ")"
                )
        df_sum = df_sum[lowerlist]

        display(df_sum)
    # df_expanded = pd.read_csv(f'../output/{output_path}/tables/latest_common_expanded_{definition}.csv').set_index(definition)
    
    # display(df_expanded)

Unnamed: 0_level_0,matching (97.7%),not matching (2.3%)
ethnicity_5,Unnamed: 1_level_1,Unnamed: 2_level_1
White,15715175,120825
Mixed,321470,97355
Asian,1624315,67770
Black,555610,50225
Other,512885,113930


Unnamed: 0_level_0,white,mixed,asian,black,other
ethnicity_5,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
White,"15,715,175 (99.2)","33,505 (0.2)","22,835 (0.1)","16,475 (0.1)","48,010 (0.3)"
Mixed,"44,600 (10.6)","321,470 (76.8)","13,805 (3.3)","25,375 (6.1)","13,575 (3.2)"
Asian,"20,355 (1.2)","11,710 (0.7)","1,624,315 (96.0)","4,315 (0.3)","31,390 (1.9)"
Black,"18,820 (3.1)","19,625 (3.2)","4,135 (0.7)","555,610 (91.7)","7,645 (1.3)"
Other,"61,085 (9.7)","12,765 (2.0)","31,875 (5.1)","8,205 (1.3)","512,885 (81.8)"


Unnamed: 0_level_0,matching (96.8%),not matching (3.2%)
ethnicity_5,Unnamed: 1_level_1,Unnamed: 2_level_1
White,970885,10660
Mixed,16145,8425
Asian,123900,5740
Black,38935,4555
Other,19730,9160


Unnamed: 0_level_0,white,mixed,asian,black,other
ethnicity_5,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
White,"970,885 (98.9)","3,040 (0.3)","2,155 (0.2)","1,400 (0.1)","4,065 (0.4)"
Mixed,"4,090 (16.6)","16,145 (65.7)","1,270 (5.2)","2,185 (8.9)",880 (3.6)
Asian,"2,290 (1.8)",805 (0.6),"123,900 (95.6)",510 (0.4),"2,135 (1.6)"
Black,"2,090 (4.8)","1,310 (3.0)",495 (1.1),"38,935 (89.5)",660 (1.5)
Other,"4,580 (15.9)",910 (3.1),"3,025 (10.5)",645 (2.2),"19,730 (68.3)"


Unnamed: 0_level_0,matching (97.9%),not matching (2.1%)
ethnicity_new_5,Unnamed: 1_level_1,Unnamed: 2_level_1
White,15557865,109605
Mixed,317765,95425
Asian,1635455,60285
Black,550055,48125
Other,426920,86965


Unnamed: 0_level_0,white,mixed,asian,black,other
ethnicity_new_5,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
White,"15,557,865 (99.3)","32,935 (0.2)","23,020 (0.1)","16,375 (0.1)","37,275 (0.2)"
Mixed,"44,800 (10.8)","317,765 (76.9)","14,285 (3.5)","25,565 (6.2)","10,775 (2.6)"
Asian,"20,695 (1.2)","11,915 (0.7)","1,635,455 (96.4)","4,330 (0.3)","23,345 (1.4)"
Black,"18,710 (3.1)","19,100 (3.2)","4,155 (0.7)","550,055 (92.0)","6,160 (1.0)"
Other,"48,475 (9.4)","9,825 (1.9)","22,085 (4.3)","6,580 (1.3)","426,920 (83.1)"


Unnamed: 0_level_0,matching (97.1%),not matching (2.9%)
ethnicity_new_5,Unnamed: 1_level_1,Unnamed: 2_level_1
White,962340,8935
Mixed,16000,8250
Asian,123845,5210
Black,38755,4290
Other,17965,8260


Unnamed: 0_level_0,white,mixed,asian,black,other
ethnicity_new_5,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
White,"962,340 (99.1)","2,985 (0.3)","2,185 (0.2)","1,410 (0.1)","2,355 (0.2)"
Mixed,"4,145 (17.1)","16,000 (66.0)","1,300 (5.4)","2,195 (9.1)",610 (2.5)
Asian,"2,340 (1.8)",835 (0.6),"123,845 (96.0)",520 (0.4),"1,515 (1.2)"
Black,"2,085 (4.8)","1,265 (2.9)",500 (1.2),"38,755 (90.0)",440 (1.0)
Other,"4,310 (16.4)",825 (3.1),"2,485 (9.5)",640 (2.4),"17,965 (68.5)"


Unnamed: 0_level_0,matching (98.0%),not matching (2.0%)
ethnicity_primis_5,Unnamed: 1_level_1,Unnamed: 2_level_1
White,12098680,86070
Mixed,299750,70385
Asian,1419295,43705
Black,437190,36055
Other,425400,67605


Unnamed: 0_level_0,white,mixed,asian,black,other
ethnicity_primis_5,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
White,"12,098,680 (99.3)","25,215 (0.2)","17,645 (0.1)","11,535 (0.1)","31,675 (0.3)"
Mixed,"32,760 (8.9)","299,750 (81.0)","10,765 (2.9)","17,695 (4.8)","9,165 (2.5)"
Asian,"16,245 (1.1)","8,785 (0.6)","1,419,295 (97.0)","2,560 (0.2)","16,115 (1.1)"
Black,"13,630 (2.9)","15,065 (3.2)","2,630 (0.6)","437,190 (92.4)","4,730 (1.0)"
Other,"35,640 (7.2)","8,775 (1.8)","18,040 (3.7)","5,150 (1.0)","425,400 (86.3)"


Unnamed: 0_level_0,matching (97.6%),not matching (2.4%)
ethnicity_primis_5,Unnamed: 1_level_1,Unnamed: 2_level_1
White,875420,5930
Mixed,16705,6345
Asian,121565,3320
Black,36050,2985
Other,3165,6945


Unnamed: 0_level_0,white,mixed,asian,black,other
ethnicity_primis_5,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
White,"875,420 (99.3)","2,575 (0.3)","1,900 (0.2)","1,230 (0.1)",225 (0.0)
Mixed,"3,295 (14.3)","16,705 (72.5)","1,045 (4.5)","1,905 (8.3)",100 (0.4)
Asian,"2,020 (1.6)",690 (0.6),"121,565 (97.3)",415 (0.3),195 (0.2)
Black,"1,605 (4.1)",975 (2.5),350 (0.9),"36,050 (92.4)",55 (0.1)
Other,"3,650 (36.1)",680 (6.7),"2,080 (20.6)",535 (5.3),"3,165 (31.3)"


### State Change

In [33]:
for definition in definitions:
    for suffix in suffixes:
        df_state_change = pd.read_csv(f'../output/{output_path}/simple_state_change_{definition}{suffix}.csv').set_index(definition)
        df_state_change.columns = df_state_change.columns.str.replace(definition + "_", "")
        #resort rows
        df_state_change = df_state_change.reindex(list(code_dict[definition].values()))
        df_state_change = df_state_change.reset_index()
        
        df_state_change[definition]=df_state_change[definition]+": " +df_state_change["n"].apply(lambda x: "{:,.0f}".format(x))
        df_state_change = df_state_change.set_index(definition)
        for item in lowerlist:
            df_state_change[item + "_pct"]= round(
                    (df_state_change[item].div(df_state_change["n"])) * 100, 1
                )
        
            df_state_change[item] = (
                    df_state_change[item].apply(lambda x: "{:,.0f}".format(x))
                    + " ("
                    + df_state_change[item + "_pct"].astype(str)
                    + ")"
                )
        df_state_change=df_state_change[lowerlist]
        display(df_state_change)

Unnamed: 0_level_0,white,mixed,asian,black,other
ethnicity_5,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
"White: 15,744,455","15,744,455 (100.0)","59,000 (0.4)","30,170 (0.2)","24,255 (0.2)","96,905 (0.6)"
"Mixed: 352,710","59,045 (16.7)","352,710 (100.0)","17,065 (4.8)","32,075 (9.1)","18,840 (5.3)"
"Asian: 1,639,170","39,805 (2.4)","23,815 (1.5)","1,639,170 (100.0)","8,260 (0.5)","53,430 (3.3)"
"Black: 567,540","31,275 (5.5)","39,040 (6.9)","6,365 (1.1)","567,540 (100.0)","13,500 (2.4)"
"Other: 550,080","73,060 (13.3)","20,330 (3.7)","43,700 (7.9)","11,075 (2.0)","550,080 (100.0)"


Unnamed: 0_level_0,white,mixed,asian,black,other
ethnicity_5,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
"White: 977,745","977,745 (100.0)","6,610 (0.7)","3,590 (0.4)","2,840 (0.3)","11,520 (1.2)"
"Mixed: 22,545","5,990 (26.6)","22,545 (100.0)","1,765 (7.8)","3,130 (13.9)","1,725 (7.7)"
"Asian: 127,645","5,150 (4.0)","2,750 (2.2)","127,645 (100.0)","1,295 (1.0)","5,590 (4.4)"
"Black: 41,990","3,740 (8.9)","3,755 (8.9)",910 (2.2),"41,990 (100.0)","1,525 (3.6)"
"Other: 27,015","6,155 (22.8)","1,830 (6.8)","4,300 (15.9)","1,090 (4.0)","27,015 (100.0)"


Unnamed: 0_level_0,white,mixed,asian,black,other
ethnicity_new_5,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
"White: 15,582,325","15,582,325 (100.0)","58,365 (0.4)","30,245 (0.2)","23,950 (0.2)","76,590 (0.5)"
"Mixed: 348,070","58,435 (16.8)","348,070 (100.0)","17,300 (5.0)","31,765 (9.1)","14,745 (4.2)"
"Asian: 1,647,140","40,245 (2.4)","24,330 (1.5)","1,647,140 (100.0)","8,250 (0.5)","38,360 (2.3)"
"Black: 560,600","30,940 (5.5)","38,735 (6.9)","6,340 (1.1)","560,600 (100.0)","10,785 (1.9)"
"Other: 454,140","56,640 (12.5)","15,260 (3.4)","30,175 (6.6)","8,700 (1.9)","454,140 (100.0)"


Unnamed: 0_level_0,white,mixed,asian,black,other
ethnicity_new_5,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
"White: 967,210","967,210 (100.0)","6,535 (0.7)","3,580 (0.4)","2,805 (0.3)","8,920 (0.9)"
"Mixed: 22,125","5,920 (26.8)","22,125 (100.0)","1,765 (8.0)","3,095 (14.0)","1,360 (6.1)"
"Asian: 126,775","5,150 (4.1)","2,770 (2.2)","126,775 (100.0)","1,295 (1.0)","4,495 (3.5)"
"Black: 41,430","3,690 (8.9)","3,720 (9.0)",905 (2.2),"41,430 (100.0)","1,265 (3.1)"
"Other: 23,860","5,120 (21.5)","1,455 (6.1)","3,335 (14.0)",910 (3.8),"23,860 (100.0)"


Unnamed: 0_level_0,white,mixed,asian,black,other
ethnicity_primis_5,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
"White: 12,115,495","12,115,495 (100.0)","39,275 (0.3)","22,145 (0.2)","15,425 (0.1)","55,380 (0.5)"
"Mixed: 319,045","41,430 (13.0)","319,045 (100.0)","12,685 (4.0)","21,250 (6.7)","11,665 (3.7)"
"Asian: 1,426,990","28,540 (2.0)","15,680 (1.1)","1,426,990 (100.0)","4,215 (0.3)","24,545 (1.7)"
"Black: 444,565","19,915 (4.5)","25,435 (5.7)","3,835 (0.9)","444,565 (100.0)","7,435 (1.7)"
"Other: 444,625","41,685 (9.4)","12,580 (2.8)","23,455 (5.3)","6,630 (1.5)","444,625 (100.0)"


Unnamed: 0_level_0,white,mixed,asian,black,other
ethnicity_primis_5,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
"White: 879,355","879,355 (100.0)","4,760 (0.5)","2,920 (0.3)","2,010 (0.2)","4,135 (0.5)"
"Mixed: 21,405","4,610 (21.5)","21,405 (100.0)","1,380 (6.4)","2,415 (11.3)",525 (2.5)
"Asian: 123,645","3,965 (3.2)","1,935 (1.6)","123,645 (100.0)",810 (0.7),"1,635 (1.3)"
"Black: 38,130","2,605 (6.8)","2,560 (6.7)",620 (1.6),"38,130 (100.0)",430 (1.1)
"Other: 8,265","4,010 (48.5)",990 (12.0),"2,400 (29.0)",670 (8.1),"8,265 (100.0)"
