In [1]:
%matplotlib inline
import matplotlib
import seaborn as sns
sns.set()
matplotlib.rcParams['figure.dpi'] = 144

# Industry Cluster Bubble Charts for the State of Ohio and Adams County

After familiarizing with the existing literature of industry cluster classification and searching the data for Adams country and the State of Ohio, I have chosen the methodology, described by Primont and Domazlicky (2008).

I use cluster definition, provided by statsamerica (www.statsamerica.org/innovation/about.html).

To analyze regional comparative advantages, I construct the bubble charts evaluating each cluster by means of a location quotient (LQ). 
LQ compares the fraction of the region’s employment in a particular industry cluster to the fraction of the nation’s employment in the same industry cluster. 
The location quotient for industry cluster $i$ in region $r$ could be calculated as:


$LQ_{ri} ={[E_{ri} /E_{r}]/[E_{ni} /E_{n}]}$,


where $E_{ri}$ is the region’s employment in industry cluster $i$, $E_r$ is total regional employment, $E_{ni}$ is
the nation’s employment in industry cluster $i$, and $E_n$ is total national employment.

By definition, LQ should be "centered" around 1.
This value indicates that the region employs the same fraction of its workforce in the industry cluster as does the nation as a whole. 

An LQ greater than 1 indicates that the region employs a larger fraction of its workforce in the industry cluster (less than one a smaller fraction) than does the nation. 

When the LQ exceeds 1, the region “specializes” in the industry cluster. 
The region has a concentration in the industry cluster.

## Finding the data at statsamerica.org

In order to load the data one has to go to http://www.statsamerica.org/innovation/anydata/. 

For the section "1. Select Data", choose "Emplyment and Wages", "Annual QCEW Jobs", "Major Industries/Sectors".

Then in the section "2. Select Geography", choose "Ohio" and "State Total" or "Adams" to load the data for the whole state or the requested county.

Since the last year the data available is 2018, in section "3. Select a time period" we load data for 2014 and 2018 years for both state and county levels.

So, we have 4 xls spreadsheets for small data, which we can easily combine in a "clean" file to work with.

We use the employment for 2018 and delete it for 2014, when combining the data.

Also, since Adams County data doesn't have "Unallocated" sector and it has no interest for industry clustering purpose, I delete these rows for the State of Ohio.



In [40]:
# Import pandas
import pandas as pd
 
# Read file into a DataFrame and print its head
df = pd.read_csv('ohio_adams_2014_2018.csv', sep=',', thousands=',', header=0)
print(df.head())

                                  Description  Jobs_ohio  Ohio_LQ_2018  \
0  Agriculture, Forestry, Fishing and Hunting      16662          0.36   
1                                      Mining      11859          0.48   
2                                   Utilities      28693          0.96   
3                                Construction     220709          0.81   
4                               Manufacturing     698960          1.49   

   Ohio_LQ_2014  Jobs_adams  Adams_LQ_2018  Adams_LQ_2014  
0          0.33           0           0.00           0.34  
1          0.45           0           0.00           1.40  
2          0.92         139           4.12           4.87  
3          0.82         264           0.86           0.50  
4          1.46         856           1.62           1.32  


Industry clusters bubble charts have three dimensions: LQ, percentage-point change in LQ and industry employment. 

Here we get the percentage-point change in LQ:

In [86]:
df['Ohio_LQ_growth'] = round(100*(df['Ohio_LQ_2018']-df['Ohio_LQ_2014'])/df['Ohio_LQ_2018'], 2)

df['Adams_LQ_growth'] = round(100*(df['Adams_LQ_2018']-df['Adams_LQ_2014'])/df['Adams_LQ_2018'], 2)

Now, let's plot the State of Ohio industry cluster bubble chart.


In [259]:
import plotly.graph_objects as go
import plotly.express as px
import math

hover_text = []
bubble_size = []

for index, row in df.iterrows():
    hover_text.append(('Industry Cluster: {Description}<br>'+
                      'Sector Employment: {Jobs_ohio}<br>'+
                      'Location Quotient: {Ohio_LQ_2018}<br>'+
                      'Percentage-Point Change in LQ: {Ohio_LQ_growth}').format(Description=row['Description'],
                                            Jobs_ohio=row['Jobs_ohio'],
                                            Ohio_LQ_2018=row['Ohio_LQ_2018'],
                                            Ohio_LQ_growth=row['Ohio_LQ_growth']))
    bubble_size.append(math.sqrt(row['Jobs_ohio']))

df['text_ohio'] = hover_text
df['size_ohio'] = bubble_size
sizeref = 5.*max(df['size_ohio'])/(100**2)

# Create figure
fig = go.Figure()
fig.add_trace(go.Scatter(
        x=df['Ohio_LQ_growth'], y=df['Ohio_LQ_2018'],
        name=pd.Series.to_string(df['Description']), text=df['text_ohio'],
        marker_size=df['size_ohio']))

# Tune marker appearance and layout
fig.update_traces(mode='markers', marker=dict(size=df['size_ohio'],
                                              sizemode='area', 
                                              color=[*range(10, 200, 10)],
                                              sizeref=sizeref, line_width=2,))
for index, row in df.iterrows():
    if math.fabs(row['Ohio_LQ_growth'])>1 or math.fabs(row['Ohio_LQ_2018']-1)>1:
        fig.add_annotation(
            x=row['Ohio_LQ_growth'],
            y=row['Ohio_LQ_2018'],   
            text=row['Description'],
    )

fig.update_layout(
    title='Industry Cluster Bubble Charts, Ohio State, 2014-2018',
    xaxis=dict(
        title='Percentage-Point Change in LQ, 2014-2018',
        gridcolor='white',
        #type='log',
        gridwidth=2
    ),
    yaxis=dict(
        title='Location Quotient in 2018',
        gridcolor='white',
        gridwidth=2,
    ),
    paper_bgcolor='rgb(243, 243, 243)',
    plot_bgcolor='rgb(243, 243, 243)'
)


fig.update_layout(shapes=[
    dict(
      type= 'line',
      yref= 'paper', y0= 0, y1= 1,
      xref= 'x', x0= 0, x1= 0
    ),
    dict(
      type= 'line',
      yref= 'y', y0= 1, y1= 1,
      xref= 'paper', x0= 0, x1= 1
    )
])    


fig.show()

Bubbles above the horizontal axis represent clusters that employ a larger fraction of its workforce in the industry cluster than does the nation, while the bubbles below the horizontal axis represent clusters that employ a smaller fraction of its workforce than does the nation. 

The horizontal axis shows the percentage change in the value of the LQ from 2014 to 2018. 
Bubbles lying to the right of the vertical axis represent industry clusters that have increased their fraction of employment relative to the nation, while those to the left of the vertical axis have decreased their fraction of employment. 
The size of the bubble represents regional employment in the cluster. The larger the bubble, the larger is the industry cluster’s employment.

As it could be seen from the bubble chart above, it is pretty hard to make all information readable even using hoover text, which shows the industry cluster name and its data, when the user moves the mouse over the bubble.

So, I choose categorical dot plots as an alternative data visualization techniques.
So called Cleveland dot plots show changes between three data points for each category. 
The categories are plotted on the vertical axes for readaility purpose.
Compared to a bubble chart, dot plots are less cluttered and allow for an easier comparison between conditions.

In [258]:
import plotly.graph_objects as go

y = df['Description'].values.tolist()
fig = go.Figure()

fig.add_trace(go.Scatter(
    x=df['Ohio_LQ_2018'], y=y,
    name='Location Quotient in 2018',
    marker=dict(
        color='rgba(156, 165, 196, 0.95)',
        line_color='rgba(156, 165, 196, 1.0)',
    )
))
fig.add_trace(go.Scatter(
    x=df['Ohio_LQ_growth'], y=y,
    name='Percentage-Point Change in LQ, 2014-2018',
    marker=dict(
        color='rgba(204, 204, 204, 0.95)',
        line_color='rgba(217, 217, 217, 1.0)'
    )
))

fig.add_trace(go.Scatter(
    x=df['Jobs_ohio']/100000, y=y,
    name='Sector Employment in 2018, *100k',
    marker=dict(
        color='rgba(144, 244, 244, 0.9)',
        line_color='rgba(217, 217, 217, 1.0)'
    )
))

fig.update_traces(mode='markers', marker=dict(line_width=1, symbol='circle', size=16))

fig.update_layout(
    title="Industry Cluster Categorical Dot Plot, Ohio State, 2014-2018",
    xaxis=dict(
        showgrid=True,
        showline=True,
        linecolor='rgb(102, 102, 102)',
        tickfont_color='rgb(102, 102, 102)',
        showticklabels=True,
        dtick=1,
        ticks='outside',
        tickcolor='rgb(102, 102, 102)',
    ),
    margin=dict(l=140, r=40, b=50, t=80),
    legend=dict(
        font_size=10,
        yanchor='middle',
        xanchor='right',
    ),
    width=800,
    height=600,
    paper_bgcolor='white',
    plot_bgcolor='white',
    hovermode='closest',
)
fig.show()


From the "Industry Cluster Categorical Dot Plot, Ohio State, 2014-2018" one may see that location quotient of the cluster of "Agriculture, Forestry, Fishing and Hunting" grew the fastest (by more than 8%).

Another decently growing sector is the "Mining", which LQ grew by more than 6% during this time period.
Sadly, such sectors as "Admin, Support, Waste", "Management of Companies and Enterprises" shrink during these years.

Now, let's plot the industry cluster bubble chart and categorical dot plot for Adams County, Ohio.

In [264]:
import plotly.graph_objects as go
import plotly.express as px
import math

hover_text = []
bubble_size = []

for index, row in df.iterrows():
    hover_text.append(('Industry Cluster: {Description}<br>'+
                      'Sector Employment: {Jobs_adams}<br>'+
                      'Location Quotient: {Adams_LQ_2018}<br>'+
                      'Percentage-Point Change in LQ: {Adams_LQ_growth}').format(Description=row['Description'],
                                            Jobs_adams=row['Jobs_adams'],
                                            Adams_LQ_2018=row['Adams_LQ_2018'],
                                            Adams_LQ_growth=row['Adams_LQ_growth']))
    bubble_size.append(math.sqrt(row['Jobs_adams']))

df['text_adams'] = hover_text
df['size_adams'] = bubble_size
sizeref = 5.*max(df['size_adams'])/(100**2)

# Create figure
fig = go.Figure()
fig.add_trace(go.Scatter(
        x=df['Adams_LQ_growth'], y=df['Adams_LQ_2018'],
        name=pd.Series.to_string(df['Description']), text=df['text_adams'],
        marker_size=df['size_adams']))

# Tune marker appearance and layout
fig.update_traces(mode='markers', marker=dict(size=df['size_adams'],
                                              sizemode='area', 
                                              color=[*range(10, 200, 10)],
                                              sizeref=sizeref, line_width=2,))
for index, row in df.iterrows():
    if math.fabs(row['Adams_LQ_growth'])>1 or math.fabs(row['Adams_LQ_2018']-1)>1:
        fig.add_annotation(
            x=row['Adams_LQ_growth'],
            y=row['Adams_LQ_2018'],   
            text=row['Description'],
    )

fig.update_layout(
    title='Industry Cluster Bubble Charts, Adams Country, Ohio, 2014-2018',
    xaxis=dict(
        title='Percentage-Point Change in LQ, 2014-2018',
        gridcolor='white',
        #type='log',
        gridwidth=2
    ),
    yaxis=dict(
        title='Location Quotient in 2018',
        gridcolor='white',
        gridwidth=2,
    ),
    paper_bgcolor='rgb(243, 243, 243)',
    plot_bgcolor='rgb(243, 243, 243)'
)


fig.update_layout(shapes=[
    dict(
      type= 'line',
      yref= 'paper', y0= 0, y1= 1,
      xref= 'x', x0= 0, x1= 0
    ),
    dict(
      type= 'line',
      yref= 'y', y0= 1, y1= 1,
      xref= 'paper', x0= 0, x1= 1
    )
])    


fig.show()

In [268]:
import plotly.graph_objects as go

y = df['Description'].values.tolist()
fig = go.Figure()

fig.add_trace(go.Scatter(
    x=df['Adams_LQ_2018'], y=y,
    name='Location Quotient in 2018',
    marker=dict(
        color='rgba(156, 165, 196, 0.95)',
        line_color='rgba(156, 165, 196, 1.0)',
    )
))
fig.add_trace(go.Scatter(
    x=df['Adams_LQ_growth'], y=y,
    name='Percentage-Point Change in LQ, 2014-2018',
    marker=dict(
        color='rgba(204, 204, 204, 0.95)',
        line_color='rgba(217, 217, 217, 1.0)'
    )
))

fig.add_trace(go.Scatter(
    x=df['Jobs_adams']/10, y=y,
    name='Sector Employment in 2018, *10',
    marker=dict(
        color='rgba(144, 244, 244, 0.9)',
        line_color='rgba(217, 217, 217, 1.0)'
    )
))

fig.update_traces(mode='markers', marker=dict(line_width=1, symbol='circle', size=16))

fig.update_layout(
    title="Industry Cluster Categorical Dot Plot, Adams Country, Ohio, 2014-2018",
    xaxis=dict(
        showgrid=True,
        showline=True,
        linecolor='rgb(102, 102, 102)',
        tickfont_color='rgb(102, 102, 102)',
        showticklabels=True,
        dtick=10,
        ticks='outside',
        tickcolor='rgb(102, 102, 102)',
    ),
    margin=dict(l=140, r=40, b=50, t=80),
    legend=dict(
        font_size=10,
        yanchor='middle',
        xanchor='right',
    ),
    width=800,
    height=600,
    paper_bgcolor='white',
    plot_bgcolor='white',
    hovermode='closest',
)
fig.show()


From the "Industry Cluster Categorical Dot Plot, Adams Country, Ohio, 2014-2018" one may see that location quotient of the cluster of "Admin, Support, Waste" doubles in these years (grew by 100%).

Another fast growing sector is the "Construction", which LQ grew by more than 40% during this time period.
Sadly, such sectors as "Information", "Transportation & Warehousing" shrink during these years.

# References

1. Diane F. Primont and Bruce Domazlicky in "Industry Cluster Analysis for the Southeast Missouri Region," Center for Economic and Business Research, September 2008.
2. For cluster definition, refere www.statsamerica.org/innovation/about.html
3. For data, refere: http://www.statsamerica.org/innovation/anydata/
