# Assessing the characteristics of interstate relationships in the USA

The State Networks dataset is a compilation of many state-to-state relational variables, including measures of shared borders, travel and trade between states, and demographic characteristics of state populations. The 2,550 units in the dataset are dyadic state-pairs (e.g., Alabama–Alaska, Alabama–Arizona, Alabama–Arkansas, and so on, for each state plus the District of Columbia). The data were collected from multiple sources and incorporate measures of similarity drawn from data in the Correlates of State Policy Project.

In [None]:
import numpy as np
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

Opening Database

In [None]:
sn=pd.read_csv('/Users/josephthomas/Documents/Projects/State Networks/Data Cleaning/Raw Data/statenetworks.csv')

In [None]:
pd.options.display.max_rows =300

In [None]:
sn.columns

**Snapshot of data and datatypes**

In [None]:
sn.describe()

In [None]:
sn.info()

## Cleaning the Data
Income, IRS Migration and IRS Migration 2020 are not numeral in the original database. Furthermore, Income has dollar signs and commas in its value.

In [None]:
sn.dtypes

**Converting some variables to categorical variables**

In [None]:
sn['S1region']=sn['S1region'].astype('category')
sn['S2region']=sn['S2region'].astype('category')

In [None]:
sn['S1division']=sn['S1division'].astype('category')
sn['S2division']=sn['S2division'].astype('category')

In [None]:
sn.dtypes

**Dealing with missing variables**

In [None]:
sn.isnull().sum()

DC has a lot of missing values

In [None]:
sn[sn["State1"]=="District of Columbia"].isnull().sum()

In [None]:
sn=sn.drop(sn.index[sn["State1"]=="District of Columbia"])
sn=sn.drop(sn.index[sn["State2"]=="District of Columbia"])

In [None]:
sn[sn["S1SenDemProp"].isnull()]["State1"].value_counts()

In [None]:
sn[sn["S2SenDemProp"].isnull()]["State2"].value_counts()

The missing values in State Senate Democrat Proportions is due to the unicameral charactristic of Nebraska State Legislation.

_source: https://ballotpedia.org/Nebraska_State_Senate_elections,_2016_

In [None]:
sn["S1SenDemProp"]=sn["S1SenDemProp"].replace([np.nan], 0.244898)

In [None]:
sn["S2SenDemProp"]=sn["S2SenDemProp"].replace([np.nan], 0.244898)

House Proportions

In [None]:
sn[sn["S1HSDemProp"].isnull()]["State1"].value_counts()

In [None]:
sn[sn["S2HSDemProp"].isnull()]["State2"].value_counts()

Since the state of Nebraska only has a unicameral parliament, its single house acts as both the House and Senate. Hence, the proportion of democrats remain the same for both houses.

In [None]:
sn["S1HSDemProp"]=sn["S1HSDemProp"].replace([np.nan], 0.2448)

In [None]:
sn["S2HSDemProp"]=sn["S2HSDemProp"].replace([np.nan], 0.2448)

Transforming the remaining political variables including this filled information

In [None]:
sn["S1AvgDem"]=(sn["S1SenDemProp"]+sn["S1HSDemProp"])/2
sn["S2AvgDem"]=(sn["S2SenDemProp"]+sn["S2HSDemProp"])/2
sn["DemDif"]=sn["S1AvgDem"]-sn["S2AvgDem"]

In [None]:
sn[sn["State1PolSocLib"].isnull()]["State1"].value_counts()

In [None]:
sn["S1EconomicLiberalism S1SocialLiberalism State1PolSocLib State1PolEconLib State1MassSocLib State1MassEconLib".split()].corr()

In [None]:
sns.heatmap(sn["S1EconomicLiberalism S1SocialLiberalism State1PolSocLib State1PolEconLib State1MassSocLib State1MassEconLib".split()].corr())

In [None]:
sn["S2EconomicLiberalism S2SocialLiberalism State2PolSocLib State2PolEconLib State2MassSocLib State2MassEconLib".split()].corr()

In [None]:
sns.heatmap(sn["S2EconomicLiberalism S2SocialLiberalism State2PolSocLib State2PolEconLib State2MassSocLib State2MassEconLib".split()].corr())

There is strong correlation between the aggregated and disaggregated liberalism score indices with a disproportionately large number of missing values in the aggregated indices. This is sufficient cause to drop the aggregated liberalism indices and fix missing value issues in the disaggregated liberalism scores.

In [None]:
delpsl=sn[sn["State1"]=="Delaware"]["State1PolSocLib"].head(1).item()
delpel=sn[sn["State1"]=="Delaware"]["State1PolEconLib"].head(1).item()
delmsl=sn[sn["State1"]=="Delaware"]["State1MassSocLib"].head(1).item()
delmel=sn[sn["State1"]=="Delaware"]["State1MassEconLib"].head(1).item()

In [None]:
sn["State2PolSocLib"]=sn["State2PolSocLib"].replace(np.nan,delpsl)
sn["State2PolEconLib"]=sn["State2PolEconLib"].replace(np.nan,delpel)
sn["State2MassSocLib"]=sn["State2MassSocLib"].replace(np.nan,delmsl)
sn["State2MassEconLib"]=sn["State2MassEconLib"].replace(np.nan,delmel)

In [None]:
sn["PolSocLibDif"]=sn["State1PolSocLib"]-sn["State2PolSocLib"]
sn["PolEconLibDif"]=sn["State1PolEconLib"]-sn["State2PolEconLib"]
sn["MassSocLibDif"]=sn["State1MassSocLib"]-sn["State2MassSocLib"]
sn["MassEconLibDif"]=sn["State1MassEconLib"]-sn["State2MassEconLib"]

In [None]:
to_drop = "State1Abbr State2Abbr State1 State2 LibDif ELibDif SLibDif S1EconomicLiberalism S1SocialLiberalism S2EconomicLiberalism S2SocialLiberalism".split()

In [None]:
sn=sn.drop(to_drop, axis=1)

In [None]:
sn.isnull().sum()

In [None]:
sn["SameRegion"]= sn["S1region"]==sn["S2region"]
sn["SameDivision"]= sn["S1division"]==sn["S2division"]

In [None]:
varlist="dyadid S1region S2region S1division S2division Border Distance PopDif ACS_Migration State1_Pop State2_Pop IncomingFlights IRS_migration IRS_migration_2010 Income Income_2010 Imports GSPDif S1GSP S2GSP DemDif S1AvgDem S2AvgDem S1SenDemProp S1HSDemProp S2SenDemProp S2HSDemProp IdeologyDif PIDDif S1Ideology S1PID S2Ideology S2PID policy_diffusion_tie policy_diffusion_2015 policy_diffusion_2000.2015 MassSocLibDif MassEconLibDif PolSocLibDif PolEconLibDif State1PolSocLib State1PolEconLib State1MassSocLib State1MassEconLib State2PolSocLib State2PolEconLib State2MassSocLib State2MassEconLib perceived_similarity fb_friend_index RaceDif LatinoDif WhiteDif BlackDif AsianDif NativeDif S1Latino S1White S1Black S1Asian S1Native S2Latino S2White S2Black S2Asian S2Native ReligDif ChristianDif JewishDif MuslimDif BuddhistDif HinduDif NonesDif NPDif S1Christian S1Jewish S1Muslim S1Buddhist S1Hindu S1Nones S1NothingParticular S1HighlyReligious S2Christian S2Jewish S2Muslim S2Buddhist S2Hindu S2Nones S2NothingParticular S2HighlyReligious".split()

In [None]:
econvar= "Border Distance policy_diffusion_tie policy_diffusion_2015 policy_diffusion_2000.2015 perceived_similarity fb_friend_index PopDif ACS_Migration State1_Pop State2_Pop IncomingFlights IRS_migration IRS_migration_2010 Income Income_2010 Imports GSPDif S1GSP S2GSP".split()

In [None]:
snecon=sn.loc[:,econvar]

In [None]:
snecon.info()

## Data Analysis and Visualization

In [None]:
sns.set_context("notebook")
color = sns.color_palette("twilight")

In [None]:
f1 = plt.figure(figsize=(3, 4))
f1=sns.barplot(x='Border', y='Border',data=sn,estimator=lambda x: len(x) / len(sn) * 100,  palette=color)
f1.set(ylabel="Percentage")
f1.set(title="Do they share a border?")
f1.set(xlim=(0, 1.5))
f1.set(ylim=(0, 100))
sns.despine(offset=10);

In [None]:
f1=f1.get_figure()
f1.savefig('/Users/josephthomas/Documents/Projects/State Networks/Data Cleaning/Out/f1BorderBar.png')

**Inference:** Less that 10 percent of interstate relations are based on sharing a border

In [None]:
f2=sns.pairplot(snecon["Distance policy_diffusion_tie policy_diffusion_2015 policy_diffusion_2000.2015 perceived_similarity fb_friend_index ACS_Migration State1_Pop State2_Pop IncomingFlights IRS_migration Income Imports GSPD".split()])

In [None]:
f2.savefig('/Users/josephthomas/Documents/Projects/State Networks/Data Cleaning/Out/f22econpairplot.png')

Selecting out variables that show some kind of relationships

In [None]:
plt.figure(figsize=(10,10))
f2_2=sns.pairplot(snecon["Distance policy_diffusion_tie policy_diffusion_2000.2015 perceived_similarity fb_friend_index ACS_Migration PopDif IRS_migration Income Imports GSPDif".split()])
f2_2.savefig('/Users/josephthomas/Documents/Projects/State Networks/Data Cleaning/Out/f22econpairplot.png')

Mention observed relationships here - What warrant further investigation

In [None]:
sn.loc[sn["fb_friend_index"]>100][["dyadid", "Border","fb_friend_index"]].sort_values("fb_friend_index", ascending=False)

In [None]:
sn.loc[sn["Border"]==0][["dyadid", "Border","fb_friend_index", "Distance"]].sort_values("fb_friend_index", ascending=False).head(20)

Looking at FB Friends Across shared borders out of the states having friends in other states, New England states take the top 8/10 spots having friends in other NE states. The other two spots are occupied my Maryland in Delaware and vice versa.  And 18 of top 20 states share their border.

MA and CA, NV and HI, HI and CA have fb friends in each other, not sharing borders and distance greater than 4000 miles.

`## Income relations

In [None]:
sns.jointplot(x=sn['Income'],y=sn['Imports'],kind='scatter')
sns.lmplot(x='Income',y='Imports', data=sn)


In [None]:
sns.jointplot(x=sn['Income'],y=sn['ACS_Migration'],kind='scatter')
sns.lmplot(x='Income',y='ACS_Migration', data=sn)

In [None]:
sns.jointplot(x=sn['Income'],y=sn['fb_friend_index'],kind='scatter')
sns.lmplot(x='Income',y='fb_friend_index', data=sn)

In [None]:
sns.jointplot(x=sn['Income'],y=sn['IRS_migration'],kind='scatter')
sns.lmplot(x='Income',y='IRS_migration', data=sn)

Last Line Update 2