# County Connectivity Map

Explore building a [US county adjacency map based on US Census data](https://www.census.gov/geographies/reference-files/2010/geo/county-adjacency.html).

You need to manually download the data.

wget https://www2.census.gov/geo/docs/reference/county_adjacency.txt

The census data uses [FIPS codes](https://en.wikipedia.org/wiki/FIPS_county_code) for states and counties.  We can use these numeric IDs directly as raw index values for our grid.

Set up the library imports

In [None]:
import pandas as pd
import geopandas
import matplotlib.pyplot as plt
import matplotlib.pylab as pylab
import seaborn as sns
import scipy.sparse as sparse

## Load the county adjacency data

Deal with encoding issues in file with python engine. https://stackoverflow.com/a/56053794/8928529

In [None]:
connector_df = pd.read_csv("data/county_adjacency.txt", sep='\t',  header=None, engine='python')

In [None]:
connector_df

## Build a sparse grid of just FIPS data

In [None]:
sparse_grid=dict()
incounty = False

In [None]:
for index, row in connector_df.iterrows():
    if (type(row[0]) == str):
        print(row[0], row[1])
        curr_county = row[1]
        # first connection is on the same line, often self
        sparse_grid[curr_county]=[]
        sparse_grid[curr_county].append(row[3])
        continue
    sparse_grid[curr_county].append(row[3])
    print(row[3])

In [None]:
sparse_grid

## Build and visualize sparse matrix 

Use the above data to [build a sparse matrix for the connectivity so it can be visualized](https://cmdlinetips.com/2019/02/how-to-visualize-sparse-matrix-in-python/).  The matplotlib.pylab.spy function is like imshow but for sparse matricies.  It give quick insight into the connectivity patterns in a matrix.

Get the max index for the sparse matrix and add one to tollerate FIPS 1-indexed values.

In [None]:
FIPS=list(sparse_grid.keys())

M=int(max(FIPS))+1

Create a sparse matrix using the list-of-lists constructor with dimensions to support every FIPS id.

In [None]:
county_adjmat = sparse.lil_matrix((M,M))

Load the processed data above initializing each adjacent entry with 1.  All non-adjacent entries remain zero. The [lil_matrix is simply indexed to initialize](https://stackoverflow.com/q/40352616/8928529). See http://scipy-lectures.org/advanced/scipy_sparse/lil_matrix.html

In [None]:
for county in sparse_grid.keys():
    for adjacent in sparse_grid[county]:
        county_adjmat[county, adjacent]=1

[Adjust the default image size](https://stackoverflow.com/a/36368418/8928529) to get a visuallization we can see.

In [None]:
from matplotlib.pyplot import figure
figure(num=None, figsize=(20, 20), dpi=80, facecolor='w', edgecolor='k')

In [None]:
plt.rcParams['figure.figsize'] = [10, 10]

In [None]:
pylab.spy(county_sparse, markersize=1)

## Visualize Connectivity Counts

Show distribution of connectivity so we can understand what issues we might face when mapping to a [Moore neighborhood](https://en.wikipedia.org/wiki/Moore_neighborhood).

#### Build dictionaries of counts

In [None]:
neighbors=dict()
biguns=dict()

for county in sparse_grid:
    # count the neighbors not including self
    count = len(sparse_grid[county]) - 1
    if count in neighbors:
        neighbors[count] += 1
    else:
        neighbors[count] = 1
    if count > 8:
        # remember ones that exceed the moore neighborhood
        biguns[county] = count

Understand the range of connectivity

In [None]:
connections=list(sorted(neighbors.keys()))
connections

In [None]:
connectcounts = pd.DataFrame(0, index=range(max(connections)), columns=["count"])

In [None]:
for count in neighbors.keys():
    connectcounts[count] = neighbors[count]

In [None]:
len(sparse_grid.keys())

#### Build dataframe of counts for easy plotting

A simple approach of iterating through the sparse_grid to sum the counts again.  [Use .at to avoid the default of creating new dataframes](https://stackoverflow.com/a/13842286/8928529).

In [None]:
connectcounts = pd.DataFrame(0, index=range(len(sparse_grid.keys())), columns=["count"])

In [None]:
index = 0
for county in sorted(sparse_grid.keys()):
    # count the neighbors not including self
    connectcounts.at[index] = len(sparse_grid[county]) - 1
    #print(len(sparse_grid[county]) - 1)
    index += 1

Use a [simple seaborne distplot() to visualize distributions](https://seaborn.pydata.org/tutorial/distributions.html)

In [None]:
sns.distplot(connectcounts,  kde=False)

Show the counties that exceed the Moore neighborhood of 8.

In [None]:
for county in biguns.keys():
    print("{}: {}".format(county, biguns[county]))

## Reflections on Connectivity

Interestingly, the [county with the greatest connections (14) is San Juan county in Utah](https://www.familysearch.org/wiki/en/San_Juan_County,_Utah_Genealogy#Maps). You'd think it was the crowded eastern half of the US that would have the larger counts.  This county is a relatively tall vertical rectangle in a neighborhood of realtively short horizontal rectangles.

These larger connected counties gives us something to think about in building the matrix of connectivity.  Should counties simply have a squared off shape but be allowed to span multiple grid entries?  That is do we take geographic size into consideration?  Would this be a more natural representation for inter cell relationships? It could provide our images with a behavior that shows multiple cells acting with identical statistics.