# Contents

1. Publish files to the knowledge graph.
2. Download files to the knowledge graph.
3. Union the downloaded files and save the result.
4. Publish the result to the knowledge graph.
5. Download the result and analyze it.

I think we can imagine a different researcher (analyst or journalist) doing each of the following chunks of steps: (1), (2-4), (5)

# 1. Publish files

Publish `Govt_Units_2017_Final_General_Purpose.csv` and `Govt_Units_2017_Final_Special_District.csv` to the knowledge graph.

In [33]:
import kgpl_client as kgpl

In [12]:
path1 = '/home/jennyvo/KNP/files_to_publish/Govt_Units_2017_Final_General_Purpose.csv'
file = kgpl.File(path1)
value = kgpl.value(cdds,"General_Purpose", "Jenny")
kgpl.variable(file,"General_Purpose from Jenny", "Jenny")

path2 = '/home/jennyvo/KNP/files_to_publish/Govt_Units_2017_Final_Special_District.csv'
file = kgpl.File(path2)
value = kgpl.value(cdds,"Special_District", "Jenny")
kgpl.variable(file,"Special_District from Jenny", "Jenny")

### Success!

Command line ouput:
```
(env) jennyvo@lasagna:~/KNP/client$ python3 pub.py
Created: KGPLValue with ID http://lasagna.eecs.umich.edu:8080/val/0
Created: KGPLVariable with ID http://lasagna.eecs.umich.edu:8080/var/0
Created: KGPLValue with ID http://lasagna.eecs.umich.edu:8080/val/1
Created: KGPLVariable with ID http://lasagna.eecs.umich.edu:8080/var/1
```

<img src="./screenshots/visualization_var0_var1.png"/>

# Step 2. Download files

Download files from the knowledge graph.

In [13]:
VAR_ID_GIVEN_BY_USER1 = 'http://lasagna.eecs.umich.edu:8080/var/0'

needed_var_id = kgpl.load_var(VAR_ID_GIVEN_BY_USER1)
val_kgpl = kgpl.load_val(needed_var_id.val_id)
data_source = val_kgpl.val

In [14]:
data_source

<kgpl_client.kgpl.File at 0x7f710444f750>

In [16]:
path_to_var0 = data_source.filename
path_to_var0

'/home/jennyvo/KNP/files_to_publish/Govt_Units_2017_Final_General_Purpose.csv'

In [17]:
VAR_ID_GIVEN_BY_USER1 = 'http://lasagna.eecs.umich.edu:8080/var/1'

needed_var_id = kgpl.load_var(VAR_ID_GIVEN_BY_USER1)
val_kgpl = kgpl.load_val(needed_var_id.val_id)
data_source = val_kgpl.val
path_to_var1 = data_source.filename
path_to_var1

'/home/jennyvo/KNP/files_to_publish/Govt_Units_2017_Final_Special_District.csv'

# Step 3. Union downloaded files

1. Load the dataframes.
    - Govt_Units_2017_Final_General_Purpose.csv
    - Govt_Units_2017_Final_Special_District.csv
2. Add a column with the name of the source file to each dataframe.
3. Combine the dataframes.
4. Save the resultant dataframe.

In [18]:
import numpy as np
import pandas as pd

### 1. Load dataframes


In [19]:
df_general = pd.read_csv(path_to_var0)
df_special = pd.read_csv(path_to_var1, encoding="ISO-8859-1")

### 2. Add a column with the name of the source file to each dataframe

In [20]:
files_general = [path_to_var0 for _ in range(len(df_general))]
df_general['ORIGIN_FILE'] = files_general

files_special = [path_to_var1 for _ in range(len(df_special))]
df_special['ORIGIN_FILE'] = files_special

In [21]:
df_general.head()

Unnamed: 0,CENSUS_ID,NAME,UNIT_TYPE,TITLE,ADDRESS1,ADDRESS2,CITY,STATE_AB,ZIP,ZIP4,WEB_ADDRESS,POPULATION,POPULATION_YEAR,FIPS_STATE,FIPS_COUNTY,FIPS_PLACE,COUNTY_AREA_NAME,ORIGIN_FILE
0,1100100100000,COUNTY OF AUTAUGA,1 - COUNTY,ADMINSTRATOR,135 N COURT ST,STE B,PRATTVILLE,AL,36067.0,3049.0,http://www.autaugaco.org,55504,2017,1,1,99001.0,AUTAUGA,/home/jennyvo/KNP/files_to_publish/Govt_Units_...
1,1100200200000,COUNTY OF BALDWIN,1 - COUNTY,COUNTY ADMINISTRATOR,312 COURTHOUSE SQ,STE 12,BAY MINETTE,AL,36507.0,4809.0,http://www.baldwincountyal.gov,212628,2017,1,3,99003.0,BALDWIN,/home/jennyvo/KNP/files_to_publish/Govt_Units_...
2,1100300300000,COUNTY OF BARBOUR,1 - COUNTY,CHAIRMAN,PO BOX 398,,CLAYTON,AL,36016.0,398.0,,25270,2017,1,5,99005.0,BARBOUR,/home/jennyvo/KNP/files_to_publish/Govt_Units_...
3,1100400400000,COUNTY OF BIBB,1 - COUNTY,ADMINISTRATOR,157 SW DAVIDSON DR,,CENTREVILLE,AL,35042.0,2277.0,,22668,2017,1,7,99007.0,BIBB,/home/jennyvo/KNP/files_to_publish/Govt_Units_...
4,1100500500000,COUNTY OF BLOUNT,1 - COUNTY,COUNTY ADMINISTRATOR,220 2ND AVE E,STE 106,ONEONTA,AL,35121.0,1702.0,http://www.co.blount.al.us/,58013,2017,1,9,99009.0,BLOUNT,/home/jennyvo/KNP/files_to_publish/Govt_Units_...


In [22]:
df_special.head()

Unnamed: 0,CENSUS_ID,NAME,FUNCTION_NAME,TITLE,ADDRESS1,ADDRESS2,CITY,STATE_AB,ZIP,ZIP4,WEB_ADDRESS,FIPS_STATE,FIPS_COUNTY,COUNTY_AREA_NAME,ORIGIN_FILE
0,1400111000000,FIVE STAR WATER SUPPLY DISTRICT,91 - Water Supply Utility,ADMINISTRATOR,PO BOX 680870,,PRATTVILLE,AL,36068,870.0,,1,1,AUTAUGA,/home/jennyvo/KNP/files_to_publish/Govt_Units_...
1,1400118300000,AUTAUGA COUNTY SEWER AUTHORITY,80 - Sewerage,AUTAUGA COUNTY COMMISION,135 N COURT STREET,SUITE B,PRATTVILLE,AL,36067,,,1,1,AUTAUGA,/home/jennyvo/KNP/files_to_publish/Govt_Units_...
2,1400120100000,PRATTVILLE AIRPORT AUTHORITY,01 - Air Transportation,CHAIRMAN,1450 AVIATION WAY,,PRATTVILLE,AL,36067,7336.0,http://prattvilleairport.8k.com/,1,1,AUTAUGA,/home/jennyvo/KNP/files_to_publish/Govt_Units_...
3,1400120300000,WEST AUTAUGA WATER AUTHORITY,91 - Water Supply Utility,GENERAL MANAGER,PO BOX 400,,AUTAUGAVILLE,AL,36003,400.0,http://www.westautaugawater.org,1,1,AUTAUGA,/home/jennyvo/KNP/files_to_publish/Govt_Units_...
4,1400120400000,NORTH DALLAS WATER AUTHORITY,91 - Water Supply Utility,CLERK,7590 AL HIGHWAY 22 N,,VALLEY GRANDE,AL,36701,9303.0,,1,1,AUTAUGA,/home/jennyvo/KNP/files_to_publish/Govt_Units_...


### 3. Combine the dataframes

In [23]:
df_general.columns

Index(['CENSUS_ID', 'NAME', 'UNIT_TYPE', 'TITLE', 'ADDRESS1', 'ADDRESS2',
       'CITY', 'STATE_AB', 'ZIP', 'ZIP4', 'WEB_ADDRESS', 'POPULATION',
       'POPULATION_YEAR', 'FIPS_STATE', 'FIPS_COUNTY', 'FIPS_PLACE',
       'COUNTY_AREA_NAME', 'ORIGIN_FILE'],
      dtype='object')

In [24]:
df_special.columns

Index(['CENSUS_ID', 'NAME', 'FUNCTION_NAME', 'TITLE', 'ADDRESS1', 'ADDRESS2',
       'CITY', 'STATE_AB', 'ZIP', 'ZIP4', 'WEB_ADDRESS', 'FIPS_STATE',
       'FIPS_COUNTY', 'COUNTY_AREA_NAME', 'ORIGIN_FILE'],
      dtype='object')

In [26]:
df_general_special = pd.concat([df_general,df_special], axis=0, ignore_index=True)

In [27]:
df_general_special

Unnamed: 0,CENSUS_ID,NAME,UNIT_TYPE,TITLE,ADDRESS1,ADDRESS2,CITY,STATE_AB,ZIP,ZIP4,WEB_ADDRESS,POPULATION,POPULATION_YEAR,FIPS_STATE,FIPS_COUNTY,FIPS_PLACE,COUNTY_AREA_NAME,ORIGIN_FILE,FUNCTION_NAME
0,1100100100000,COUNTY OF AUTAUGA,1 - COUNTY,ADMINSTRATOR,135 N COURT ST,STE B,PRATTVILLE,AL,36067,3049.0,http://www.autaugaco.org,55504,2017.0,1,1,99001.0,AUTAUGA,/home/jennyvo/KNP/files_to_publish/Govt_Units_...,
1,1100200200000,COUNTY OF BALDWIN,1 - COUNTY,COUNTY ADMINISTRATOR,312 COURTHOUSE SQ,STE 12,BAY MINETTE,AL,36507,4809.0,http://www.baldwincountyal.gov,212628,2017.0,1,3,99003.0,BALDWIN,/home/jennyvo/KNP/files_to_publish/Govt_Units_...,
2,1100300300000,COUNTY OF BARBOUR,1 - COUNTY,CHAIRMAN,PO BOX 398,,CLAYTON,AL,36016,398.0,,25270,2017.0,1,5,99005.0,BARBOUR,/home/jennyvo/KNP/files_to_publish/Govt_Units_...,
3,1100400400000,COUNTY OF BIBB,1 - COUNTY,ADMINISTRATOR,157 SW DAVIDSON DR,,CENTREVILLE,AL,35042,2277.0,,22668,2017.0,1,7,99007.0,BIBB,/home/jennyvo/KNP/files_to_publish/Govt_Units_...,
4,1100500500000,COUNTY OF BLOUNT,1 - COUNTY,COUNTY ADMINISTRATOR,220 2ND AVE E,STE 106,ONEONTA,AL,35121,1702.0,http://www.co.blount.al.us/,58013,2017.0,1,9,99009.0,BLOUNT,/home/jennyvo/KNP/files_to_publish/Govt_Units_...,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
77316,51402340100000,CENTRAL WESTON COUNTY SOLID WASTE DISPOSAL DIS...,,TREASURER SEC,PO BOX 443,,OSAGE,WY,82723,443.0,,,,56,45,,WESTON,/home/jennyvo/KNP/files_to_publish/Govt_Units_...,81 - Solid Waste Management
77317,51402340200000,CAMBRIA IMPROVEMENT AND SERVICE DISTRICT,,SHARRON ACKERMAN,806 SALT CREEK RD,,NEWCASTLE,WY,82701,,,,,56,45,,WESTON,/home/jennyvo/KNP/files_to_publish/Govt_Units_...,91 - Water Supply Utility
77318,51402360100000,WESTON COUNTY NATURAL RESOURCE DISTRICT,,DISTRICT MANAGER,1225 WASHINGTON BLVD,STE 3,NEWCASTLE,WY,82701,2982.0,,,,56,45,,WESTON,/home/jennyvo/KNP/files_to_publish/Govt_Units_...,88 - Soil and Water Conservation
77319,51402370200000,WESTON COUNTY PREDATORY ANIMAL DISTRICT,,SECRETARY,PO BOX 358,,UPTON,WY,82730,358.0,,,,56,45,,WESTON,/home/jennyvo/KNP/files_to_publish/Govt_Units_...,59 - Other Natural Resources


### 4. Save the resultant dataframe

In [31]:
path_to_union = '/home/jennyvo/KNP/files_to_publish/Govt_Units_2017_Final_Union.csv'
df_general_special.to_csv(path_to_union, sep=',', index=None)

# 4. Publish the union'ed result to the knowledge graph.

In [34]:
import kgpl_client as kgpl

In [36]:
path_to_union = '/home/jennyvo/KNP/files_to_publish/Govt_Units_2017_Final_Union.csv'
file = kgpl.File(path_to_union)
value = kgpl.value(file,"General_Special_Union", "Jenny")
kgpl.variable(value, "General_Special_Union from Jenny", "Jenny")

Created: KGPLValue with ID http://lasagna.eecs.umich.edu:8080/val/3
Created: KGPLVariable with ID http://lasagna.eecs.umich.edu:8080/var/2


<kgpl_client.kgpl.KGPLVariable at 0x7f6fcd7f3710>

# 5. Download the result and analyze it.

### Download the result

In [37]:
VAR_ID_GIVEN_BY_USER2 = 'http://lasagna.eecs.umich.edu:8080/var/2'

needed_var_id = kgpl.load_var(VAR_ID_GIVEN_BY_USER2)
val_kgpl = kgpl.load_val(needed_var_id.val_id)
data_source = val_kgpl.val
path_to_var2 = data_source.filename
path_to_var2

'/home/jennyvo/KNP/files_to_publish/Govt_Units_2017_Final_Union.csv'

In [39]:
df_union = pd.read_csv(path_to_var2)
df_union

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,CENSUS_ID,NAME,UNIT_TYPE,TITLE,ADDRESS1,ADDRESS2,CITY,STATE_AB,ZIP,ZIP4,WEB_ADDRESS,POPULATION,POPULATION_YEAR,FIPS_STATE,FIPS_COUNTY,FIPS_PLACE,COUNTY_AREA_NAME,ORIGIN_FILE,FUNCTION_NAME
0,1100100100000,COUNTY OF AUTAUGA,1 - COUNTY,ADMINSTRATOR,135 N COURT ST,STE B,PRATTVILLE,AL,36067,3049.0,http://www.autaugaco.org,55504,2017.0,1,1,99001.0,AUTAUGA,/home/jennyvo/KNP/files_to_publish/Govt_Units_...,
1,1100200200000,COUNTY OF BALDWIN,1 - COUNTY,COUNTY ADMINISTRATOR,312 COURTHOUSE SQ,STE 12,BAY MINETTE,AL,36507,4809.0,http://www.baldwincountyal.gov,212628,2017.0,1,3,99003.0,BALDWIN,/home/jennyvo/KNP/files_to_publish/Govt_Units_...,
2,1100300300000,COUNTY OF BARBOUR,1 - COUNTY,CHAIRMAN,PO BOX 398,,CLAYTON,AL,36016,398.0,,25270,2017.0,1,5,99005.0,BARBOUR,/home/jennyvo/KNP/files_to_publish/Govt_Units_...,
3,1100400400000,COUNTY OF BIBB,1 - COUNTY,ADMINISTRATOR,157 SW DAVIDSON DR,,CENTREVILLE,AL,35042,2277.0,,22668,2017.0,1,7,99007.0,BIBB,/home/jennyvo/KNP/files_to_publish/Govt_Units_...,
4,1100500500000,COUNTY OF BLOUNT,1 - COUNTY,COUNTY ADMINISTRATOR,220 2ND AVE E,STE 106,ONEONTA,AL,35121,1702.0,http://www.co.blount.al.us/,58013,2017.0,1,9,99009.0,BLOUNT,/home/jennyvo/KNP/files_to_publish/Govt_Units_...,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
77316,51402340100000,CENTRAL WESTON COUNTY SOLID WASTE DISPOSAL DIS...,,TREASURER SEC,PO BOX 443,,OSAGE,WY,82723,443.0,,,,56,45,,WESTON,/home/jennyvo/KNP/files_to_publish/Govt_Units_...,81 - Solid Waste Management
77317,51402340200000,CAMBRIA IMPROVEMENT AND SERVICE DISTRICT,,SHARRON ACKERMAN,806 SALT CREEK RD,,NEWCASTLE,WY,82701,,,,,56,45,,WESTON,/home/jennyvo/KNP/files_to_publish/Govt_Units_...,91 - Water Supply Utility
77318,51402360100000,WESTON COUNTY NATURAL RESOURCE DISTRICT,,DISTRICT MANAGER,1225 WASHINGTON BLVD,STE 3,NEWCASTLE,WY,82701,2982.0,,,,56,45,,WESTON,/home/jennyvo/KNP/files_to_publish/Govt_Units_...,88 - Soil and Water Conservation
77319,51402370200000,WESTON COUNTY PREDATORY ANIMAL DISTRICT,,SECRETARY,PO BOX 358,,UPTON,WY,82730,358.0,,,,56,45,,WESTON,/home/jennyvo/KNP/files_to_publish/Govt_Units_...,59 - Other Natural Resources


### Do whatever analyses with the downloaded data

In [40]:
from collections import Counter

In [42]:
titles = list(df_union.TITLE)
x = Counter(titles)

In [51]:
y = sorted(x.items(), key=lambda item: item[1])
y.reverse()
x_sorted = {k: v for k, v in y}

In [52]:
x_sorted

{'MAYOR': 5787,
 'CHAIRMAN': 3121,
 'CLERK': 2890,
 'ADMINISTRATOR': 2890,
 'TREASURER': 2752,
 'EXECUTIVE DIRECTOR': 2395,
 'CITY CLERK': 2225,
 'SUPERVISOR': 2129,
 'DIRECTOR': 2110,
 'SECRETARY': 1914,
 'TRUSTEE': 1658,
 'PRESIDENT': 1430,
 'TOWN CLERK': 986,
 'MANAGER': 973,
 'OFFICE MANAGER': 917,
 'FISCAL OFFICER': 898,
 nan: 842,
 'CHAIR': 806,
 'CHAIRPERSON': 792,
 'DISTRICT MANAGER': 772,
 'CLERK TREASURER': 702,
 'TOWNSHIP CHAIRMAN': 676,
 'FIRE CHIEF': 673,
 'CITY MANAGER': 638,
 'SECRETARY TREASURER': 589,
 'GENERAL MANAGER': 548,
 'ATTORNEY': 541,
 'FINANCE DIRECTOR': 538,
 'CITY ADMINISTRATOR': 524,
 'TOWNSHIP SUPERVISOR': 522,
 'BOOKKEEPER': 516,
 'SECRETARY/TREASURER': 492,
 'VILLAGE CLERK': 449,
 'ADMINISTRATIVE ASSISTANT': 420,
 'ACCOUNTANT': 407,
 'TOWN SUPERVISOR': 400,
 'AUDITOR-CONTROLLER': 367,
 'TOWN MANAGER': 361,
 'COMMISSIONER': 352,
 'SEC TREAS': 347,
 'CITY SECRETARY': 345,
 'TOWNSHIP TRUSTEE': 333,
 'COUNTY CLERK': 303,
 'LIBRARY DIRECTOR': 302,
 'CLERK/TR