Internet Access in Orleans Parish
===========================

## Introduction

This webpage \([https://jnmasur.github.io](https://jnmasur.github.io)\) was created by Jacob Masur and Daniel Cohen to investigate the distribution of internet access in Orleans Parish across geographic regions. We would like to uanswer the question: how accessible is wifi in different areas in Orleans Parish? Additionally, we want to find correlations between wifi access and demographic information such as income level, school zone, and ethnicity. It will also be interesting to investigate whether locations that provide access to free internet \(for example, public libbraries\) are distributed in such a way that free internet access is optimized for those without access in their households. We predict that income level will have a significant impact on access to internet, and since income is not generally equally distributed between ethnic groups, there will likely be a dependence on race as well.

## Collaboration Plan

We plan on meeting at least weekly to ensure that we collaborate enough on this project. We have set up a Jupyter notebook for which changes are being tracked and updated regularly in a public Github repository located at [https://github.com/jnmasur/jnmasur.github.io](https://github.com/jnmasur/jnmasur.github.io). The raw data, contained in a CSV file, and the text file that provides a full explanation of the data set is located in a subdirectory on our GitHub page. In general, we will be meeting in person to work on the project together. However, if the need arises, we will temporarily switch to Zoom meetings. Regardless of meeting schedule, we discuss what work needs to be accomplished, what is working well, and what we might need to rethink every few days to ensure that we are maintaining consistent progress.

## NHGIS Computer and Internet Data

The first step for this project is determining what data regarding internet access by region is available to us. Valentina Martinez Pabon from the economics department guided us towards two datasets provided by NHGIS containing information from the years 2013-2017. One dataset provides information regarding the precense and types of computers in households, and the other contains information regarding the precense and types of internet substcriptions in households. Both datasets contain census tract level data for the entire country; however, we have already filtered out all data not related to Orleans Parish in our notebook. The data frame (which contains the information from both datasets) we are left with consists of 177 rows (each representing a census tract in Orleans Parish) and 84 columns. NHGIS provides a detailed data dictionary that includes explanations for each variable, so we will have to determine which variables will provide valuable information. This data will allow us to gather insight into how wifi is distributed throughout Orleans Parish. It will also be of interest to compare the dependence of the precense of computation devices in households on other variables to the dependence of the precense of internet subscriptions in households on the same variables. Once we assess how to answer our question further, we will combine our current information with more datasets. We may use 911 datasets to test if areas with more crime are the ones with less wifi access. After we assess the results from the first dataset it will be easier to understand what other data we can bring in for this project.

The first data sets we will consider is sourced from NHGIS, it contains census tract data on the type of computers and the precense and types of internet subscriptions in households for the entire country. This data frame is therefore extremely large.

In [2]:
import pandas as pd
internet = pd.read_csv("./data/internet_data.csv")
internet.head()

Unnamed: 0,GISJOIN,YEAR,REGIONA,DIVISIONA,STATE,STATEA,COUNTY,COUNTYA,COUSUBA,PLACEA,...,AIQBM004,AIQBM005,AIQBM006,AIQBM007,AIQBM008,AIQBM009,AIQBM010,AIQBM011,AIQBM012,AIQBM013
0,G0100010020100,2013-2017,,,Alabama,1,Autauga County,1,,,...,71,69,36,75,61,30,8,11,8,50
1,G0100010020200,2013-2017,,,Alabama,1,Autauga County,1,,,...,85,80,52,85,51,20,10,11,24,82
2,G0100010020300,2013-2017,,,Alabama,1,Autauga County,1,,,...,129,107,81,129,86,57,17,11,14,108
3,G0100010020400,2013-2017,,,Alabama,1,Autauga County,1,,,...,160,142,78,158,107,61,11,11,25,126
4,G0100010020500,2013-2017,,,Alabama,1,Autauga County,1,,,...,372,362,174,434,200,92,45,16,148,183


Source:  
Steven Manson, Jonathan Schroeder, David Van Riper, Tracy Kugler, and Steven Ruggles.  
IPUMS National Historical Geographic Information System: Version 16.0  
2017 American Community Survey: 5-Year Data \[2013-2017, Tracts & Larger Areas\]. Minneapolis, MN: IPUMS. 2021.  
[http://doi.org/10.18128/D050.V16.0](http://doi.org/10.18128/D050.V16.0)

## Cleaning the Data Set

The data set at this point contains a lot of data that is irrelevant to our investigation; we will therefore need to clean the data set, so that it becomes usable. Since we are only interested in Orleans Parish data, we filter the dataset accordingly.

In [3]:
internet = internet[(internet["STATE"] == "Louisiana") & (internet["COUNTY"] == "Orleans Parish")]
internet.head()

Unnamed: 0,GISJOIN,YEAR,REGIONA,DIVISIONA,STATE,STATEA,COUNTY,COUNTYA,COUSUBA,PLACEA,...,AIQBM004,AIQBM005,AIQBM006,AIQBM007,AIQBM008,AIQBM009,AIQBM010,AIQBM011,AIQBM012,AIQBM013
28917,G2200710000100,2013-2017,,,Louisiana,22,Orleans Parish,71,,,...,155,144,75,145,129,79,12,12,34,87
28918,G2200710000200,2013-2017,,,Louisiana,22,Orleans Parish,71,,,...,79,42,11,79,72,6,12,12,45,47
28919,G2200710000300,2013-2017,,,Louisiana,22,Orleans Parish,71,,,...,56,51,41,46,33,15,9,12,10,68
28920,G2200710000400,2013-2017,,,Louisiana,22,Orleans Parish,71,,,...,80,61,43,62,61,34,29,12,38,84
28921,G2200710000601,2013-2017,,,Louisiana,22,Orleans Parish,71,,,...,43,34,9,43,31,7,12,12,14,71


There is still much extraneous data contained in this table, and we certainly will not be able to investigate 84 variables. In the next cell, we reduce our consideration to 20 variables that might be useful to our investigation, we will name this appropriately after reducing the data frame to the variables we want.

In [74]:
# taking variables that might be important, we will check after
# explanations for these variables are in the data dictionary
df_internet = internet[["TRACTA", "RES_ONLYA", "ZCTA5A", "SDELMA", "SDSECA", "SDUNIA", "AIQAE001", "AIQAE002", 
                        "AIQAE006", "AIQAE011", "AIQBE001", "AIQBE002", "AIQBE004", "AIQBE006", "AIQBE010", 
                        "AIQBE012", "AIQBE013"]]

Below we check whether 5 columns contain any data by looking at the unique values contained in each series.

In [75]:
pd.unique(df_internet["RES_ONLYA"]), pd.unique(df_internet["ZCTA5A"])

(array([nan]), array([nan]))

In [76]:
pd.unique(df_internet["SDELMA"]), pd.unique(df_internet["SDSECA"]), pd.unique(df_internet["SDUNIA"])

(array([nan]), array([nan]), array([nan]))

Clearly, these columns do not contain any useful information, so we drop them.

In [77]:
df_internet = df_internet.drop(columns=["RES_ONLYA", "ZCTA5A", "SDELMA", "SDSECA", "SDUNIA"])

In [78]:
df_internet = df_internet.rename(columns={"TRACTA": "Census Tract", "AIQAE001":"Total_c", 
                                          "AIQAE002":">=1 Computer", "AIQAE006":"Smartphone Only", 
                                          "AIQAE011":"No Computer", "AIQBE001":"Total_i", 
                                          "AIQBE002":"Has Internet", "AIQBE004":"Has Broadband",
                                          "AIQBE006":"Only Celular","AIQBE010":"Only Satellite",
                                          "AIQBE012":"Internet no Subscription", "AIQBE013":"No Internet"})

Now, since each observation in the data frame is a census tract, this value is unique, and works well as an index.

In [79]:
df_internet.set_index("Census Tract", inplace=True)
df_internet

Unnamed: 0_level_0,Total_c,>=1 Computer,Smartphone Only,No Computer,Total_i,Has Internet,Has Broadband,Only Celular,Only Satellite,Internet no Subscription,No Internet
Census Tract,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
100,1404,1229,34,175,1404,1160,1160,112,0,40,204
200,487,355,17,132,487,231,228,15,0,64,192
300,525,346,43,179,525,203,203,60,9,7,315
400,854,488,10,366,854,394,394,58,24,52,408
601,384,139,10,245,384,70,70,6,0,9,305
...,...,...,...,...,...,...,...,...,...,...,...
14400,750,602,19,148,750,521,521,53,0,34,195
14500,0,0,0,0,0,0,0,0,0,0,0
980000,0,0,0,0,0,0,0,0,0,0,0
980100,0,0,0,0,0,0,0,0,0,0,0


After renaming our columns, it appears that the total households surveyed for precense of computers and precense of internet subscriptions are the same, we will drop one if this is the case.

In [80]:
df_internet["Total_c"].equals(df_internet["Total_i"])

True

In [81]:
df_internet.drop(columns=["Total_i"], inplace=True)

In [82]:
df_internet = df_internet.rename(columns={"Total_c":"Total"})

Some census tracts contain no data, we would like to remove these so we filter out any row in which the total households surveyed is 0.

In [83]:
df_internet = df_internet[df_internet["Total"] > 0]

In [84]:
df_internet

Unnamed: 0_level_0,Total,>=1 Computer,Smartphone Only,No Computer,Has Internet,Has Broadband,Only Celular,Only Satellite,Internet no Subscription,No Internet
Census Tract,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
100,1404,1229,34,175,1160,1160,112,0,40,204
200,487,355,17,132,231,228,15,0,64,192
300,525,346,43,179,203,203,60,9,7,315
400,854,488,10,366,394,394,58,24,52,408
601,384,139,10,245,70,70,6,0,9,305
...,...,...,...,...,...,...,...,...,...,...
14000,935,399,90,536,276,271,52,20,49,610
14100,1008,789,103,219,610,568,84,14,102,296
14200,909,730,23,179,649,649,90,0,42,218
14300,772,496,91,276,297,297,49,20,90,385


At this point, data across census tracts is difficult to compare since the total households vary significantly. Therefore, we find the proportion of households with the property described by each variable.

In [85]:
internet_proportions = df_internet.drop(columns=["Total"]).divide(df_internet["Total"], axis=0)

In [86]:
internet_proportions

Unnamed: 0_level_0,>=1 Computer,Smartphone Only,No Computer,Has Internet,Has Broadband,Only Celular,Only Satellite,Internet no Subscription,No Internet
Census Tract,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
100,0.875356,0.024217,0.124644,0.826211,0.826211,0.079772,0.000000,0.028490,0.145299
200,0.728953,0.034908,0.271047,0.474333,0.468172,0.030801,0.000000,0.131417,0.394251
300,0.659048,0.081905,0.340952,0.386667,0.386667,0.114286,0.017143,0.013333,0.600000
400,0.571429,0.011710,0.428571,0.461358,0.461358,0.067916,0.028103,0.060890,0.477752
601,0.361979,0.026042,0.638021,0.182292,0.182292,0.015625,0.000000,0.023438,0.794271
...,...,...,...,...,...,...,...,...,...
14000,0.426738,0.096257,0.573262,0.295187,0.289840,0.055615,0.021390,0.052406,0.652406
14100,0.782738,0.102183,0.217262,0.605159,0.563492,0.083333,0.013889,0.101190,0.293651
14200,0.803080,0.025303,0.196920,0.713971,0.713971,0.099010,0.000000,0.046205,0.239824
14300,0.642487,0.117876,0.357513,0.384715,0.384715,0.063472,0.025907,0.116580,0.498705


## Race, Income, and Education Data

The goal for our project is to compare internet access and precense of computers in households to the racial makeup, income, and educational attainment by census tract. In order to do this, we obtain

In [89]:
rci = pd.read_csv("./data/race_education_income.csv", encoding="ISO-8859-1")
rci.head()

Unnamed: 0,GISJOIN,YEAR,REGIONA,DIVISIONA,STATE,STATEA,COUNTY,COUNTYA,COUSUBA,PLACEA,...,AH04E017,AH04E018,AH04E019,AH04E020,AH04E021,AH04E022,AH04E023,AH04E024,AH04E025,AH1PE001
0,G0100010020100,2013-2017,,,Alabama,1,Autauga County,1,,,...,282,88,34,184,77,261,142,45,28,67826.0
1,G0100010020200,2013-2017,,,Alabama,1,Autauga County,1,,,...,495,90,91,182,106,166,51,16,5,41287.0
2,G0100010020300,2013-2017,,,Alabama,1,Autauga County,1,,,...,724,114,161,446,146,232,150,20,19,46806.0
3,G0100010020400,2013-2017,,,Alabama,1,Autauga County,1,,,...,873,208,249,291,250,540,156,38,57,55895.0
4,G0100010020500,2013-2017,,,Alabama,1,Autauga County,1,,,...,1331,264,439,1033,587,1446,1037,197,87,68143.0


Source:  
Steven Manson, Jonathan Schroeder, David Van Riper, Tracy Kugler, and Steven Ruggles.  
IPUMS National Historical Geographic Information System: Version 16.0  
2017 American Community Survey: 5-Year Data \[2013-2017, Tracts & Larger Areas\]. Minneapolis, MN: IPUMS. 2021.  
[http://doi.org/10.18128/D050.V16.0](http://doi.org/10.18128/D050.V16.0)

In [99]:
# filter out everywhere besides orleans parish
rci = rci[(rci["STATE"] == "Louisiana") & (rci["COUNTY"] == "Orleans Parish")]

We would like to separate this large data set into three different data sets for race, income, and educational attainment.

In [100]:
race = rci[["TRACTA", "AHY2E001", "AHY2E002", "AHY2E003", "AHY2E004", "AHY2E005", "AHY2E006", "AHY2E007", "AHY2E008"]]
race = race.rename(columns={"TRACTA":"Census Tract", "AHY2E001":"Total", "AHY2E002":"White", "AHY2E003":"Black", "AHY2E004":"Native American", "AHY2E005":"Asian", "AHY2E006":"Pacific Islander", "AHY2E007":"Other", "AHY2E008":">= 2 Races"})
race.set_index("Census Tract", inplace=True)
race = race[race["Total"] > 0]

In [102]:
race_proportions = race.drop(columns=["Total"]).divide(race["Total"], axis=0)
race_proportions

Unnamed: 0_level_0,White,Black,Native American,Asian,Pacific Islander,Other,>= 2 Races
Census Tract,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
100,0.915688,0.070438,0.000000,0.009249,0.0,0.000000,0.004625
200,0.136153,0.840714,0.000000,0.000000,0.0,0.008592,0.014541
300,0.103420,0.896580,0.000000,0.000000,0.0,0.000000,0.000000
400,0.152901,0.814159,0.000000,0.014749,0.0,0.010816,0.007375
601,0.027001,0.972999,0.000000,0.000000,0.0,0.000000,0.000000
...,...,...,...,...,...,...,...
14200,0.727590,0.243306,0.001746,0.027357,0.0,0.000000,0.000000
14300,0.140055,0.833029,0.002281,0.000000,0.0,0.000000,0.024635
14400,0.675108,0.300185,0.007412,0.002471,0.0,0.008030,0.006794
14500,0.185538,0.806375,0.000000,0.003330,0.0,0.001427,0.003330
