# Rational Inattention Data Exploration

# Setup

## Objectives

1) Data cleaning: connect labels and data

2) Get all of the input output matrices

3) Transform them into a tensor (3d matrix) that has the entries with time

4) Join that with labels so that its easy to access

5) Make a plot for two or three years in our sample for us to see how it changed over time!
- https://economics.mit.edu/files/8135

##### Figure 1
<img src="images/figure3.png" alt="fishy" class="bg-primary mb-1" width="500px">

6) Try doing the same thing for the BLS data for PPI (producer price index)
- Weight on inflation for PPI is different

7) Organizing python code for accessing IO tables across time, merge it with label document. Check if the data can be accessed with an API, otherwise download it.

## Information on Data

`OUTPUT_IND9720`
- 1997-2020 historical industry output time series (columns 1 through 24 respectively)

`OUTPUT_COM9720`
- 1997-2020 historical commodity output time series (columns 1 through 24 respectively)

`OUTPUT_2030`
- Projected 2030 commodity output in column 1 and industry output in column 2

`USE` 
- 206 rows and 206 columns of data (row 206 is value added, column 206 is final demand, the rest are the intermediate cells)

`MAKE` 
- 205 rows and 205 columns of data

`FD` 
- 205 commodity rows and 153 detailed final demand sectors of data

`FDAGG`
- 205 commodity rows and 11 final demand categories of data aggregated from the 153 sectors
mentioned above

In [2]:
# Importing useful libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import requests

# 1) Data Cleaning

In [3]:
# Reading in data

sect_final_demand = pd.read_excel('sector_data/sect_final_demand_x153.xlsx')
sect_plan = pd.read_excel('sector_data/SectorPlan312.xlsx')
IO_industry_ts = pd.read_csv('REAL_OUTPUT_IND9720.csv')
IO_commodity_ts = pd.read_csv('REAL_OUTPUT_COM9720.csv')

## 1) Cleaning Final Demand so that we can map sector to a Final Demand Category

In [4]:
# reformatting raw sect_final_demand df so that we can map sector to a Final Demand Category
sect_final_demand_categories = sect_final_demand.iloc[7:18, :]
sect_final_demand_categories = sect_final_demand_categories.rename(columns={"Unnamed: 1": "Sector" , "Unnamed: 2": "First Column", 
                                             "Unnamed: 3": "Last Column", "Unnamed: 5": "Final Demand Category"})
sect_final_demand_categories = sect_final_demand_categories[["Sector", "First Column", "Last Column", "Final Demand Category"]].set_index("Sector")

In [5]:
# reformatting the raw sect_final_demand df
sect_final_demand = sect_final_demand.rename(columns={"Unnamed: 1": "Sector" , "Unnamed: 2": "Final Demand Category"})
sect_final_demand = sect_final_demand[["Sector", "Final Demand Category"]].dropna().loc[20:].set_index(np.arange(1, 154))

In [6]:
sect_final_demand

Unnamed: 0,Sector,Final Demand Category
1,1,New motor vehicles
2,2,Net purchases of used motor vehicles
3,3,Motor vehicle parts and accessories
4,4,Furniture and furnishings
5,5,Household appliances
...,...,...
149,149,State and local government consumption
150,150,State and local government gross investment
151,151,Margins on state and local government expendit...
152,152,Margins on state and local government expendit...


In [7]:
sect_final_demand_categories

Unnamed: 0_level_0,First Column,Last Column,Final Demand Category
Sector,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,1,81,Personal consumption expenditures (PCE)
2,82,113,Private investment in equipment (PEQ)
3,114,119,Private Investment in Intellectual Property Pr...
4,120,123,Private investment in nonresidential structures
5,124,127,Private investment in residential structures
6,128,131,Change in private inventories
7,132,135,Exports of goods and services
8,136,138,Imports of goods and services
9,139,143,Federal Government defense consumption and inv...
10,144,148,Federal Government non-defense consumption and...


## 2) Cleaning Sector Plan Data Set to map Sector to Industry/Commodity Description

In [8]:
IO_industry_ts = IO_industry_ts.rename(columns = {"SECTORNUMBER": "Sector"})

In [9]:
sect_plan = sect_plan.rename(columns = {"Bureau of Labor Statistics 205 Order Industry Sectoring Plan": "Sector"
                                        , "Unnamed: 1": "Industry/Commodity Description"})[["Sector", "Industry/Commodity Description"]]
sect_plan = sect_plan.dropna().iloc[1:].set_index(np.arange(1, 207))

In [10]:
sect_plan

Unnamed: 0,Sector,Industry/Commodity Description
1,1,Crop production
2,2,Animal production and aquaculture
3,3,Forestry
4,4,Logging
5,5,"Fishing, hunting and trapping"
...,...,...
202,202,Noncomparable imports
203,203,Scrap
204,204,Used and secondhand goods
205,205,Rest of the world adjustment


In [11]:
# merges the the industry_time_series with industry name
IO_industry_ts = sect_plan.merge(IO_industry_ts, how = "inner", on = "Sector").set_index(np.arange(1, 206))
IO_industry_ts.head()

Unnamed: 0,Sector,Industry/Commodity Description,1997,1998,1999,2000,2001,2002,2003,2004,...,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020
1,1,Crop production,202907.436,201448.843,207878.027,209351.486,205414.721,184805.32,201401.365,222607.426,...,222366.747,212973.309,236889.249,273385.07,270024.362,275179.121,278046.444,286076.975,301995.274,306286.955
2,2,Animal production and aquaculture,156586.946,153113.064,150649.245,155989.154,154027.864,170046.213,168793.154,165378.82,...,187034.475,192759.691,196314.93,171397.064,176127.985,197174.793,195149.565,193796.708,216011.501,222623.275
3,3,Forestry,5447.094,6582.076,7870.489,7482.575,8469.191,9240.114,9022.517,8721.551,...,4290.095,3549.681,3669.264,4053.104,3930.171,3925.115,4095.352,3893.746,3917.817,3768.085
4,4,Logging,14349.754,15708.266,17277.067,15202.156,15801.997,16040.069,15987.916,16370.555,...,13687.112,12346.719,12499.021,13915.862,13403.532,13438.668,14018.022,13282.207,13850.782,12590.164
5,5,"Fishing, hunting and trapping",9625.304,10532.185,10174.913,8148.947,9048.228,9215.99,9170.647,9025.255,...,8561.737,8492.931,8538.985,9315.955,9607.551,9214.639,9248.486,9158.06,8948.812,8621.897


In [12]:
categories = np.array(["Agriculture, forestry, fishing and hunting", "Mining", "Utilities", "Construction", 
          "Manufacturing", "Wholesale trade", "Retail trade", "Transportation and warehousing", "Information", 
          "Finance and insurance", "Real estate and rental and leasing", "Professional, scientific, and technical services",
          "Management of companies and enterprises",
          "Administrative and support and waste management and remediation services", "Educational services",
          "Health care and social assistance", "Arts, entertainment, and recreation", "Accommodation and food services",
          "Other services  (except public administration)", "Government", "Special industries"])


In [13]:
full_categories = categories.repeat([6, 5, 3, 1, 76, 1, 4, 9, 10, 4, 5, 9, 1, 9, 3, 12, 8, 2, 12, 20, 5])

Now we have successfully merged the sector index with the Industry/Commodity Description. 

# 2) Data Visualization

**If you haven't installed pyvis yet (a network visualization library), remove the comment and run the line of code below.**

In [14]:
pip install pyvis;

Collecting pyvis
  Using cached pyvis-0.1.9-py3-none-any.whl (23 kB)
Collecting jsonpickle>=1.4.1
  Using cached jsonpickle-2.0.0-py2.py3-none-any.whl (37 kB)
Installing collected packages: jsonpickle, pyvis
Successfully installed jsonpickle-2.0.0 pyvis-0.1.9
Note: you may need to restart the kernel to use updated packages.


In [15]:
import networkx as nx
from pyvis.network import Network 

In [16]:
# read in input-output matrices for given time-frame; lets say 2018-2020
REAL_USE_2018 = pd.read_csv("REAL_USE/REAL_USE_2018.csv")
REAL_USE_2019 = pd.read_csv("REAL_USE/REAL_USE_2019.csv")
REAL_USE_2020 = pd.read_csv("REAL_USE/REAL_USE_2020.csv")


In [17]:
# resetting column titles and indeces to improve legibility
REAL_USE_2018 = REAL_USE_2018.set_index("SECTORNUMBER")
REAL_USE_2018.columns = sect_plan["Industry/Commodity Description"]

## 1) Network Visualization

### We want to achieve something of this sort 
<img src="images/figure3.png" alt="fishy" class="bg-primary mb-1" width="600px">

### 1) Exploring how to visualize networks

In order to figure out how to visualize these IO matrices, I opted for scaling down the IO matrix (for now). This is to make sure that my node-connecting algorithm is working properly. Also, I am doing this because the library pyvis can handle larger networks with proper filtering mechanisms. 

In [18]:
# Modifies/scales down the IO matrix (will modify later to include the whole 206 x 206 matrix)
modified_REAL_USE_2018 = REAL_USE_2018.iloc[0:55]
modified_REAL_USE_2018 = modified_REAL_USE_2018.iloc[:, 0:55]
modified_REAL_USE_2018.head(10)

Industry/Commodity Description,Crop production,Animal production and aquaculture,Forestry,Logging,"Fishing, hunting and trapping",Support activities for agriculture and forestry,Oil and gas extraction,Coal mining,Metal ore mining,Nonmetallic mineral mining and quarrying,...,Glass and glass product manufacturing,Cement and concrete product manufacturing,"Lime, gypsum and other nonmetallic mineral product manufacturing",Iron and steel mills and ferroalloy manufacturing,Steel product manufacturing from purchased steel,Alumina and aluminum production and processing,Nonferrous metal (except aluminum) production and processing,Foundries,Forging and stamping,Cutlery and handtool manufacturing
SECTORNUMBER,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1.0,27150.515,15224.821,0.0,0.0,0.0,180.734,0.0,8.756,25.015,12.209,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2.0,741.142,40982.823,3.975,0.0,0.0,222.967,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3.0,0.0,0.0,121.803,2842.618,0.0,8.109,0.0,0.0,0.0,0.0,...,0.0,0.0,0.98,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4.0,0.0,0.0,0.0,855.545,0.0,0.0,0.0,78.678,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6.0,18949.082,2363.17,2743.25,308.677,16.402,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7.0,0.0,0.0,0.0,0.0,0.0,0.0,35107.31,0.007,0.03,0.29,...,0.128,0.048,0.131,0.208,0.0,12.846,0.0,0.253,0.027,0.0
8.0,0.0,619.819,0.0,0.694,0.0,0.0,127.792,3228.336,56.133,56.98,...,415.267,604.41,520.646,2509.529,22.178,124.041,60.441,82.523,1.024,0.373
9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2084.465,0.0,...,14.239,7.354,295.355,6346.105,152.73,680.666,5248.793,0.0,0.0,0.0
10.0,1768.965,129.216,0.0,0.0,0.0,0.225,0.0,324.216,6.372,578.83,...,336.87,3544.656,1514.961,27.957,0.0,0.0,0.0,114.235,0.0,0.521


In [19]:
# initializes the network
IO_network = Network('1200px', '1200px')

# initializes nodes and titles, respectively
nodes = np.arange(0, 55)
titles = modified_REAL_USE_2018.columns

# adds nodes to our network corresponding to the index in modified_REAL_USE_2018
IO_network.add_nodes(nodes, title = titles)

In [20]:
# adds edges to the nodes where a certain good/industry plays a role in the production of another
i = 0
scalar = .5
for x in modified_REAL_USE_2018:
    j = 0
    total = sum(modified_REAL_USE_2018[x])
    for value in modified_REAL_USE_2018[x]:
        if value > 0: 
            size_index = (value / total) * scalar
            IO_network.add_edge(i, j, x = size_index, weight= size_index)
        j = j + 1
    i = i + 1


In [21]:
# sets up the options of visualization 
IO_network.set_options(
"""
var options = {
  "physics": {
    "barnesHut": {
      "gravitationalConstant": -21500,
      "centralGravity": 0,
      "springLength": 75,
      "damping": 0.96,
      "avoidOverlap": 0.94
    },
    "maxVelocity": 15,
    "minVelocity": 1,
    "timestep": 0.4
  },
    "nodes": {
    "borderWidth": 2,
    "color": {
      "border": "rgba(77,106,233,1)",
      "highlight": {
        "border": "rgba(233,68,50,1)",
        "background": "rgba(255,124,137,1)"
      },
      "hover": {
        "border": "rgba(81,198,233,1)",
        "background": "rgba(161,244,255,1)"
      }
    },
    "font": {
      "size": 20,
      "face": "verdana",
      "strokeWidth": 4
    },
    "scaling": {
      "min": 6,
      "max": 72
    },
    "shapeProperties": {
      "borderRadius": 3
    },
    "size": 21
  }
}
""")


IO_network.show("IO_network_55.html") # visualizes the networks in X year (in this case 2018)

#### 1) Network visualization with 55 nodes.

We currently can't go to a large number of nodes (n > 70) without really slow loading times. Also the visualization is **really cluttered**.
##### Figure 2
<img src="images/REAL_USE_2018_55nodes.png" alt="fishy" class="bg-primary mb-1" width="800px">

**This above visualization may have too many edges so lets filter edges by the MIT paper's such that connections much be greater than 5% of total input.**

Define a color map to each industry category. 

In [22]:
colors = pd.Series(['#042333ff', '#0c2a50ff', '#13306dff', '#253582ff',
                 '#403891ff', '#593d9cff', '#6b4596ff', '#7e4e90ff',
                 '#90548bff', '#a65c85ff', '#b8627dff', '#cc6a70ff',
                 '#de7065ff', '#eb8066ff', '#f68f46ff', '#f9a242ff',
                 '#f9b641ff', '#f7cb44ff', '#c0369d', '#d44292', '#a3319f'])

In [23]:
colors = list(colors.repeat([6, 5, 3, 1, 76, 1, 4, 9, 10, 4, 5, 9, 1, 9, 3, 12, 8, 2, 12, 20, 5]))

Lets try visualizing with a scaled down IO Matrix

In [24]:
# Modifies/scales down the IO matrix (will modify later to include the whole 206 x 206 matrix)
modified_REAL_USE_2018_1 = REAL_USE_2018.iloc[0:55]
modified_REAL_USE_2018_1 = modified_REAL_USE_2018_1.iloc[:, 0:55]

In [25]:
# initializes the network
IO_network_modified = Network('1200px', '1200px')

# initializes nodes and titles, respectively
nodes = np.arange(0, 55)
titles = modified_REAL_USE_2018_1.columns

# adds nodes to our network corresponding to the index in modified_REAL_USE_2018
IO_network_modified.add_nodes(nodes, title = titles, )

In [26]:
# make a dict that maps category to color


In [27]:
# adds edges to the nodes where a certain good/industry plays a role in the production of another
i = 0
scalar = .5
for x in modified_REAL_USE_2018_1:
    j = 0
    categories
    total = sum(modified_REAL_USE_2018_1[x])
    for value in modified_REAL_USE_2018_1[x]:
        if value > .05 * total: 
            size_index = (value / total) * scalar
            IO_network_modified.add_edge(i, j, x = size_index, weight= size_index)
        j = j + 1
    i = i + 1


In [28]:
# sets up the options of visualization 
IO_network_modified.set_options(
"""
var options = {
  "physics": {
    "barnesHut": {
      "gravitationalConstant": -21500,
      "centralGravity": 0,
      "springLength": 75,
      "damping": 0.96,
      "avoidOverlap": 0.94
    },
    "maxVelocity": 15,
    "minVelocity": 1,
    "timestep": 0.4
  },
    "nodes": {
    "borderWidth": 2,
    "color": {
      "border": "rgba(77,106,233,1)",
      "highlight": {
        "border": "rgba(233,68,50,1)",
        "background": "rgba(255,124,137,1)"
      },
      "hover": {
        "border": "rgba(81,198,233,1)",
        "background": "rgba(161,244,255,1)"
      }
    },
    "font": {
      "size": 20,
      "face": "verdana",
      "strokeWidth": 4
    },
    "scaling": {
      "min": 6,
      "max": 72
    },
    "shapeProperties": {
      "borderRadius": 3
    },
    "size": 21
  }
}
""")


IO_network_modified.show("IO_network_modified_55.html") # visualizes the networks in X year (in this case 2018)

#### 2) Modified visualization with 55 nodes.

This visualization is much less cluttered and easier to read! Below I test whether it can visualize the full IO matrix effectively


**Key observation** 
- The network works "input to output" as IO matrices are conventionally interpreted (i.e. when you click on a node, you will notice red lines connecting to other nodes; this means that the node clicked on is an INPUT for the nodes that it is connected to, or OUTPUT nodes in this case)

##### Figure 3
<img src="images/REAL_USE_2018_55nodes_modified.png" alt="fishy" class="bg-primary mb-1" width="600px">





In [29]:
# Modifies/scales down the whole 206 x 206 matrix
modified_REAL_USE_2018_2 = REAL_USE_2018.iloc[0:205]
modified_REAL_USE_2018_2 = modified_REAL_USE_2018_2.iloc[:, 0:205]

In [30]:
# initializes the network
IO_network_modified1 = Network('1400px', '1400px')

# initializes nodes and titles, respectively
nodes = np.arange(0, 205)
titles = modified_REAL_USE_2018_2.columns

# adds nodes to our network corresponding to the index in modified_REAL_USE_2018
IO_network_modified1.add_nodes(nodes, title = titles)

In [31]:
# adds edges to the nodes where a certain good/industry plays a role in the production of another
i = 0
scalar = .5
for x in modified_REAL_USE_2018_2:
    j = 0
    total = sum(modified_REAL_USE_2018_2[x])
    for value in modified_REAL_USE_2018_2[x]:
        if value > .05 * total: 
            size_index = (value / total) * scalar
            IO_network_modified1.add_edge(i, j, x = size_index, weight= size_index)
        j = j + 1
    i = i + 1


In [32]:
# sets up the options of visualization 
IO_network_modified1.set_options(
"""
var options = {
  "physics": {
    "barnesHut": {
      "gravitationalConstant": -21500,
      "centralGravity": 0,
      "springLength": 75,
      "damping": 0.96,
      "avoidOverlap": 0.94
    },
    "maxVelocity": 15,
    "minVelocity": 1,
    "timestep": 0.4
  },
    "nodes": {
    "borderWidth": 2,
    "color": {
      "border": "rgba(77,106,233,1)",
      "highlight": {
        "border": "rgba(233,68,50,1)",
        "background": "rgba(255,124,137,1)"
      },
      "hover": {
        "border": "rgba(81,198,233,1)",
        "background": "rgba(161,244,255,1)"
      }
    },
    "font": {
      "size": 20,
      "face": "verdana",
      "strokeWidth": 4
    },
    "scaling": {
      "min": 6,
      "max": 72
    },
    "shapeProperties": {
      "borderRadius": 3
    },
    "size": 25
  }
}
""")


IO_network_modified1.show("IO_network_modified_all.html") # visualizes the networks in X year (in this case 2018)

#### 3) Modified visualization with all nodes (The Full IO Matrix).

Amazing, we can visualize the network for the entire year. This visualization is much less cluttered and easier to read!

##### Figure 4
<img src="images/REAL_USE_2018_all.png" alt="fishy" class="bg-primary mb-1" width="600px">

##### Figure 5
<img src="images/REAL_USE_2018_all_nooutliers.png" alt="fishy" class="bg-primary mb-1" width="600px">

##### Figure 6 (with color mapping to industry)
<img src="images/figure6.png" alt="fishy" class="bg-primary mb-1" width="600px">

Figure 6 is a little too "noisy". We can try to reduce the links by increasing the threshold and see how it looks! To do so, I'll define a function that takes in data (an I/O matrix bls style) and a threshold (a percentage limited from 0 to 1). Even then, I would highly discourage using color mapping to industry. While it provides useful information **the data is FAR too noisy.**

Just an example to demonstrate the noise... 

##### Figure 7 (with color mapping to industry, threshold = .1)
<img src="images/figure7.png" alt="fishy" class="bg-primary mb-1" width="600px">

**Color mapping with larger thresholds seems like a good idea though**

**To aid the visualization, let's create a function that takes an visualizes it into a network.**

In [33]:
def create_network(data, threshold = .05):
    if threshold > 1 or threshold < 0:
        print("Threshold must be a demical between 0 and 1") 
        return 
        
        
    # initializes the network
    IO_network = Network('1400px', '1400px')

    # initializes nodes and titles, respectively
    nodes = np.arange(0, 205)
    titles = data.columns

    # adds nodes to our network corresponding to the index in data
    IO_network.add_nodes(nodes, title = titles) # optional arg: color = viridis_magma
    
    # adds edges to the nodes where a certain good/industry plays a role in the production of another
    i = 0
    for x in data:
        j = 0
        total = sum(data[x])
        for value in data[x]:
            if value > threshold * total: 
                size_index = (value / total) * scalar
                IO_network.add_edge(i, j, x = size_index, weight= size_index)
            j = j + 1
        i = i + 1
        
    # sets up the options of visualization 
    IO_network.set_options(
    """
    var options = {
      "physics": {
        "barnesHut": {
          "gravitationalConstant": -21500,
          "centralGravity": 0,
          "springLength": 75,
          "damping": 0.96,
          "avoidOverlap": 0.94
        },
        "maxVelocity": 15,
        "minVelocity": 1,
        "timestep": 0.4
      },
        "nodes": {
        "borderWidth": 2,
        "color": {
          "border": "rgba(77,106,233,1)",
          "highlight": {
            "border": "rgba(233,68,50,1)",
            "background": "rgba(255,124,137,1)"
          },
          "hover": {
            "border": "rgba(81,198,233,1)",
            "background": "rgba(161,244,255,1)"
          }
        },
        "font": {
          "size": 20,
          "face": "verdana",
          "strokeWidth": 4
        },
        "scaling": {
          "min": 6,
          "max": 72
        },
        "shapeProperties": {
          "borderRadius": 3
        },
        "size": 25
      }
    }
    """)


    IO_network.show("IO_network.html") # visualizes the networks in X year 

In [35]:
create_network(modified_REAL_USE_2018_2, threshold = .15)

In [37]:
# creating the function that visualizes the table of top connections 

# industry input should be all lowercase

def topconnections(data, industry, topx = 10, magnitude = False, show_sector = False):
    data.index = [x.lower() for x in data.columns]
    data.columns = [x.lower() for x in data.columns]
    
    try:
        data.loc[:, [industry]]
    except:
        print("Industry is Spelled Wrong or Not Lowercase")
    
    top = data.loc[:, [industry]]
    if magnitude:
        top['magnitude'] = data.loc[:, [industry]] / sum(data.loc[:, industry])
    if show_sector:
        top['sector'] = full_categories
        
    return top.sort_values(by = industry, ascending = False).head(topx)
    

In [40]:
# lets test out our function on the logging industry
topconnections(modified_REAL_USE_2018_2, 'wholesale trade', topx = 15, magnitude = True, show_sector = True)

Unnamed: 0,wholesale trade,magnitude,sector
real estate,88046.531,0.116703,Real estate and rental and leasing
management of companies and enterprises,69125.304,0.091624,Management of companies and enterprises
"advertising, public relations, and related services",57004.54,0.075558,"Professional, scientific, and technical services"
wholesale trade,50385.358,0.066784,Wholesale trade
"management, scientific, and technical consulting services",26240.112,0.034781,"Professional, scientific, and technical services"
insurance carriers,24496.634,0.03247,Finance and insurance
couriers and messengers,23898.671,0.031677,Transportation and warehousing
employment services,21911.157,0.029043,Administrative and support and waste managemen...
business support services,21904.09,0.029033,Administrative and support and waste managemen...
computer systems design and related services,16370.015,0.021698,"Professional, scientific, and technical services"


## 2) Tensorflow Visualization

## 3) Time Series Visualization/Analysis

In [36]:
# def plot_IO_matrix(data, industry, start, end):
    

# 3) Next Steps

I am looking at ways to 
1) optimize my visualization technique it is "cleaner" and easier to read.
- I'm thinking of not counting "insubstantial" connections within the IO matrix. For example, if the input of wood to produce lets say electronics is less than X amount, I will not count it as a **strong enough connection** to create an edge between those two nodes. **Considering working with some sort of ratio relative to the total amount in each column (good/product).**
- Ways to make this network visualization more effective 
    - adjusting node size with proportion of x manufacturing good use to produce y good
    - perhaps making node color correspond to each specific industry category as we say in Final Demand/Sect Plan datasets
    - etc.
- Completed 11/9/2021 (mapping to sector portion)

2) find a way to make it easy to plug in IO matrix data for X year and immediately visualize it such that it is easy to compare between a range of years.
- Completed 11/5/2021

3) figure out tensorflow visualization (still wrapping my head around it).

4) find a way to add a legend to colors. Also find a color mapping that is easier to read. Distinct colors but not TOO distinct.

5) create functions to help aid the network visualization
- top 10 industries that x industry feeds into
    - Completed 11/13/2021
- network initializor/creator function 
    - Completed 11/13/2021
    
6) api work: find an api that contains that data that I have downloaded in 
- not available from BLS



