# Calculating Mutual Information Index by Race


In [None]:
%pip install nycschools
import pandas as pd

In [2]:
from nycschools import schools

In [3]:
district_data = pd.read_csv('/Users/rwilcox/GitHub/newprojectnotebook/district_data.csv')


In this tutorial, we'll walk through a Python code snippet that calculates the Mutual Information Index (MII) for different racial groups in a given dataset of school district enrollment data. The MII is a measure used to determine the degree of association between two categorical variables, such as race and district enrollment. It can help us understand the distribution of different races across the districts.

<b>Dataset Requirements</b>: The dataset (district_data) should be a pandas DataFrame with the following columns:

- 'total_enrollment': total enrollment in the district
- 'white_n', 'black_n', 'asian_n', 'hispanic_n', 'multi_racial_n', 'native_american_n': number of students belonging to each racial group in the district

### Step 1: Calculate the total enrollment and total race proportions

First, we calculate the total enrollment across all districts and the proportion of each racial group in the dataset.

In [None]:
total_enrollment = district_data['total_enrollment'].sum()
total_race_props = {}

for race in ['white_n', 'black_n', 'asian_n', 'hispanic_n', 'multi_racial_n', 'native_american_n']:
    total_race_props[race] = district_data[race].sum() / total_enrollment

### Step 2: Define the Mutual Information Index function

Next, we create a function called mutual_information_index that takes a district row, a race, and the total race proportion as input arguments and calculates the MII for that specific race in the district.

In [None]:
import numpy as np

def mutual_information_index(district, race, total_race_prop):
    prop_i = district[race + '_prop']
    prop = total_race_prop
    if prop_i == 0 or prop == 0:
        return 0
    return (prop_i / prop) * np.log(prop_i / prop)

### Step 3: Calculate the MII for each race

Now, we iterate over the racial groups and apply the mutual_information_index function to each district using the apply method from pandas. We store the results in a dictionary called mii_by_race.

In [None]:
mii_by_race = {}

for race in ['white_n', 'black_n', 'asian_n', 'hispanic_n', 'multi_racial_n', 'native_american_n']:
    mii_by_race[race] = district_data.apply(mutual_information_index, race=race, total_race_prop=total_race_props[race], axis=1).sum()


### Step 4: Display the results

Finally, we display the MII by race using a simple loop that prints the results.

In [8]:
print("Mutual Information Index by Race:")
for race, mii in mii_by_race.items():
    race_formatted = race[:-2].replace('_', ' ').title()
    print(f"{race_formatted}: {mii:.3f}")



Mutual Information Index by Race:
White: 6.307
Black: 15.712
Asian: 8.258
Hispanic: 5.032
Multi Racial: 7.730
Native American: 8.426


### Step 5: Interpreting the results

The Mutual Information Index (MII) provides a way to quantify the association between race and district enrollment. It ranges from 0 to infinity, with larger values indicating a stronger association.

- MII = 0: No association between race and district enrollment (i.e., the distribution of a racial group across districts is random).

- MII > 0: Some association between race and district enrollment (i.e., the distribution of a racial group across districts is not random).

To interpret the results, consider the following guidelines:

1. <b>Compare MII values across races</b>: By comparing the MII values of different racial groups, you can determine which groups show stronger associations with district enrollment. Higher MII values suggest that the distribution of a racial group is more concentrated in specific districts, indicating possible patterns or segregation.

2. <b>Evaluate MII values in the context of your data</b>: The absolute value of MII may not be meaningful by itself. Instead, consider the relative values of MII for different racial groups and interpret them in the context of your data, such as the geographical distribution of districts, historical factors, or socioeconomic factors that may influence the enrollment patterns.

3. <b>Use MII as a starting point for further analysis</b>: MII can help identify patterns and potential areas of concern, but it is not a definitive measure. Use the MII results as a starting point for further analysis, such as investigating the underlying causes of the observed associations or exploring the impact of policy changes on the distribution of racial groups across districts.