In [1]:
from main import init_system
from api.apiutils import Relation
api, reporting = init_system("/home/raulcf/datadiscovery/test/datagov/", reporting=False)

Loading: */home/raulcf/datadiscovery/test/datagov/*

### Help Menu

You can use the system through an **API** object. API objects are returnedby the *init_system* function, so you can get one by doing:

***your_api_object = init_system('path_to_stored_model')***

Once you have access to an API object there are a few concepts that are useful to use the API. **content** refers to actual values of a given field. For example, if you have a table with an attribute called __Name__ and values *Olu, Mike, Sam*, content refers to the actual values, e.g. Mike, Sam, Olu.

**schema** refers to the name of a given field. In the previous example, schema refers to the word__Name__ as that's how the field is called.

Finally, **entity** refers to the *semantic type* of the content. This is in experimental state. For the previous example it would return *'person'* as that's what those names refer to.

Certain functions require a *field* as input. In general a field is specified by the source name (e.g. table name) and the field name (e.g. attribute name). For example, if we are interested in finding content similar to the one of the attribute *year* in the table *Employee* we can provide the field in the following way:

field = ('Employee', 'year') # field = [<source_name>, <field_name>)
Took 483.11504220962524 to load all data


In [2]:
%matplotlib inline

## Reporting

In [2]:
reporting.num_tables

AttributeError: 'bool' object has no attribute 'num_tables'

In [3]:
reporting.num_columns

AttributeError: 'bool' object has no attribute 'num_columns'

In [10]:
reporting.num_schema_sim_relations

659449.0

In [11]:
reporting.num_content_sim_relations

78055223.0

In [12]:
reporting.num_pkfk_relations

8875018.0

## Sustainability Queries

Are there city, region, state, of federal data that we can use to compare to our performance figures/metrics?

### Building energy use intensity (energy use divided by square footage)

**We start searching for tables that contain some of the terms of interest (in the table name)**

In [2]:
tables_with_energy = api.table_name_search("energy", max_results=100)
tables_with_building = api.table_name_search("building", max_results=100)
tables_with_energy.print_tables()
tables_with_building.print_tables()

total-energy-usage-937aa.csv
energy-efficiency-programs-and-estimated-energy-savings.csv
energy-employment.csv
total-energy-usage.csv
energy-efficiency-projects-95319.csv
energy-star-certified-furnaces.csv
energy-exports.csv
energy-star-certified-boilers.csv
energy-star-certification.csv
energy-exports-89cef.csv
energy-star-certified-dehumidifiers.csv
energy-employment-0f0af.csv
energy-star-certified-displays.csv
chicago-energy-benchmarking.csv
energy-usage-2010-24a67.csv
building-footprints.csv
building-footprint.csv
building-permits.csv
building-inventory.csv


In [3]:
table_drs = api.drs_from_table("energy-usage-2010-24a67.csv")
table_drs.print_columns()

Hit(nid='4117259332', db_name='datagov', source_name='energy-usage-2010-24a67.csv', field_name='KWH MARCH 2010', score=0)
Hit(nid='1597800726', db_name='datagov', source_name='energy-usage-2010-24a67.csv', field_name='KWH JULY 2010', score=0)
Hit(nid='3679520977', db_name='datagov', source_name='energy-usage-2010-24a67.csv', field_name='TOTAL KWH', score=0)
Hit(nid='1578789443', db_name='datagov', source_name='energy-usage-2010-24a67.csv', field_name='TERM APRIL 2010', score=0)
Hit(nid='3379888415', db_name='datagov', source_name='energy-usage-2010-24a67.csv', field_name='THERM JULY 2010', score=0)
Hit(nid='2667518991', db_name='datagov', source_name='energy-usage-2010-24a67.csv', field_name='THERM SEPTEMBER 2010', score=0)
Hit(nid='2640891580', db_name='datagov', source_name='energy-usage-2010-24a67.csv', field_name='TOTAL THERMS', score=0)
Hit(nid='121626585', db_name='datagov', source_name='energy-usage-2010-24a67.csv', field_name='KWH 2ND QUARTILE 2010', score=0)
Hit(nid='204338968

**From that output, I learn that another way they refer to footage is with "sqft", so I include that term in the next search**

In [4]:
schema_with_energy = api.schema_name_search("energy", max_results=1000)
schema_with_footage = api.schema_name_search("footage", max_results=1000)
schema_with_area = api.schema_name_search("area", max_results=1000)
schema_with_surface = api.schema_name_search("surface", max_results=1000)
schema_with_sqft = api.schema_name_search("sqft", max_results=1000)
schema_union1 = api.union(schema_with_footage, schema_with_area)
schema_union2 = api.union(schema_with_surface, schema_with_sqft)
schema_all = api.union(schema_union1, schema_union2)
#schema_with_energy.print_tables()
#schema_with_footage.print_tables()
print("Intersection")
intersection = api.intersection(api.table(schema_with_energy), api.table(schema_all))
intersection.print_tables()

Intersection
rsbs-mom-part-1-of-2-new-york-state-residential-statewide-baseline-study-survey-of-multifa.csv
energy-star-certified-commercial-griddles.csv
rsbs-mom-part-2-of-2-new-york-state-residential-statewide-baseline-study-survey-of-multifa.csv
existing-commercial-buildings-energy-performance-ordinance-report-00be5.csv
energy-and-water-data-disclosure-for-local-law-84-2011-f4180.csv
geothermal-case-studies-on-openei-74824.csv
rsbs-smo-part-2-of-2-new-york-state-residential-statewide-baseline-study-single-and-multif.csv
energy-star-certified-displays.csv
chicago-energy-benchmarking.csv


**A few promising results pop up**

In [5]:
table_drs = api.drs_from_table("existing-commercial-buildings-energy-performance-ordinance-report-00be5.csv")
table_drs.print_columns()

Hit(nid='617023847', db_name='datagov', source_name='existing-commercial-buildings-energy-performance-ordinance-report-00be5.csv', field_name='Floor Area', score=0)
Hit(nid='4288324814', db_name='datagov', source_name='existing-commercial-buildings-energy-performance-ordinance-report-00be5.csv', field_name='2011 Percent Better than National Median Source EUI', score=0)
Hit(nid='2817665803', db_name='datagov', source_name='existing-commercial-buildings-energy-performance-ordinance-report-00be5.csv', field_name='2012 Source EUI (kBtu/ft2)', score=0)
Hit(nid='3359214650', db_name='datagov', source_name='existing-commercial-buildings-energy-performance-ordinance-report-00be5.csv', field_name='2012 Total GHG Emissions Intensity (kgCO2e/ft2)', score=0)
Hit(nid='2077923954', db_name='datagov', source_name='existing-commercial-buildings-energy-performance-ordinance-report-00be5.csv', field_name='2013 Percent Better than National Median Site EUI', score=0)
Hit(nid='3077611711', db_name='datagov

In [6]:
table_drs = api.drs_from_table("energy-and-water-data-disclosure-for-local-law-84-2011-f4180.csv")
table_drs.print_columns()

Hit(nid='3257142652', db_name='datagov', source_name='energy-and-water-data-disclosure-for-local-law-84-2011-f4180.csv', field_name='Benchmarking Submission', score=0)
Hit(nid='2181167645', db_name='datagov', source_name='energy-and-water-data-disclosure-for-local-law-84-2011-f4180.csv', field_name='Water per Square Foot', score=0)
Hit(nid='3809414393', db_name='datagov', source_name='energy-and-water-data-disclosure-for-local-law-84-2011-f4180.csv', field_name='ENERGY STAR Score', score=0)
Hit(nid='1041161596', db_name='datagov', source_name='energy-and-water-data-disclosure-for-local-law-84-2011-f4180.csv', field_name='Number of Buildings', score=0)
Hit(nid='1338252548', db_name='datagov', source_name='energy-and-water-data-disclosure-for-local-law-84-2011-f4180.csv', field_name='BBL', score=0)
Hit(nid='245661640', db_name='datagov', source_name='energy-and-water-data-disclosure-for-local-law-84-2011-f4180.csv', field_name='Borough', score=0)
Hit(nid='82734626', db_name='datagov', so

In [7]:
table_drs = api.drs_from_table("chicago-energy-benchmarking.csv")
table_drs.print_columns()

Hit(nid='1670104371', db_name='datagov', source_name='chicago-energy-benchmarking.csv', field_name='ZIP Code', score=0)
Hit(nid='3170998686', db_name='datagov', source_name='chicago-energy-benchmarking.csv', field_name='Community Area', score=0)
Hit(nid='620099007', db_name='datagov', source_name='chicago-energy-benchmarking.csv', field_name='Primary Property Type', score=0)
Hit(nid='1031998861', db_name='datagov', source_name='chicago-energy-benchmarking.csv', field_name='# of Buildings', score=0)
Hit(nid='3957286182', db_name='datagov', source_name='chicago-energy-benchmarking.csv', field_name='Site EUI (kBtu/sq ft)', score=0)
Hit(nid='1379744581', db_name='datagov', source_name='chicago-energy-benchmarking.csv', field_name='Longitude', score=0)
Hit(nid='211891419', db_name='datagov', source_name='chicago-energy-benchmarking.csv', field_name='Data Year', score=0)
Hit(nid='97388107', db_name='datagov', source_name='chicago-energy-benchmarking.csv', field_name='ID', score=0)
Hit(nid='2

**I learn about a metric that seems relevant: EUI (kbtu/sq)**

In [8]:
kbtu = api.schema_name_search("kbtu", max_results=1000)
eui = api.schema_name_search("eui", max_results=1000)
res = api.union(kbtu, eui)
res.print_tables()

existing-commercial-buildings-energy-performance-ordinance-report-00be5.csv
energy-and-water-data-disclosure-for-local-law-84-2012-f60cd.csv
energy-and-water-data-disclosure-for-local-law-84-2011-f4180.csv
energy-star-certified-water-heaters.csv
energy-star-certified-light-commercial-hvac.csv
energy-star-certified-commercial-water-heaters.csv
doe-buildings-performance-database-sample-residential-data-fa45c.csv
new-york-state-government-building-energy-use-intensity-data-beginning-state-fiscal-year-2.csv
2010-2011-nyc-municipal-building-energy-benchmarking-results-6d21a.csv
chicago-energy-benchmarking.csv


**New, previously unseen results appear**

In [9]:
table_drs = api.drs_from_table("2010-2011-nyc-municipal-building-energy-benchmarking-results-6d21a.csv")
table_drs.print_columns()

Hit(nid='4292061037', db_name='datagov', source_name='2010-2011-nyc-municipal-building-energy-benchmarking-results-6d21a.csv', field_name='Campus Name', score=0)
Hit(nid='2599702945', db_name='datagov', source_name='2010-2011-nyc-municipal-building-energy-benchmarking-results-6d21a.csv', field_name='Borough', score=0)
Hit(nid='2043262897', db_name='datagov', source_name='2010-2011-nyc-municipal-building-energy-benchmarking-results-6d21a.csv', field_name='BIN', score=0)
Hit(nid='3640120069', db_name='datagov', source_name='2010-2011-nyc-municipal-building-energy-benchmarking-results-6d21a.csv', field_name='Agency', score=0)
Hit(nid='2184723470', db_name='datagov', source_name='2010-2011-nyc-municipal-building-energy-benchmarking-results-6d21a.csv', field_name='2011 Rating * ', score=0)
Hit(nid='3566535763', db_name='datagov', source_name='2010-2011-nyc-municipal-building-energy-benchmarking-results-6d21a.csv', field_name='2012 Rating *', score=0)
Hit(nid='4237206031', db_name='datagov',

In [10]:
table_drs = api.drs_from_table("new-york-state-government-building-energy-use-intensity-data-beginning-state-fiscal-year-2.csv")
table_drs.print_columns()

Hit(nid='317633124', db_name='datagov', source_name='new-york-state-government-building-energy-use-intensity-data-beginning-state-fiscal-year-2.csv', field_name=' Gross Floor Area  ', score=0)
Hit(nid='1669570837', db_name='datagov', source_name='new-york-state-government-building-energy-use-intensity-data-beginning-state-fiscal-year-2.csv', field_name=' Adjusted Source kBtu ', score=0)
Hit(nid='1002668164', db_name='datagov', source_name='new-york-state-government-building-energy-use-intensity-data-beginning-state-fiscal-year-2.csv', field_name=' Adjusted Source EUI  ', score=0)
Hit(nid='2781684408', db_name='datagov', source_name='new-york-state-government-building-energy-use-intensity-data-beginning-state-fiscal-year-2.csv', field_name='Agency Name', score=0)
Hit(nid='1066810792', db_name='datagov', source_name='new-york-state-government-building-energy-use-intensity-data-beginning-state-fiscal-year-2.csv', field_name='Building/Facility Name', score=0)
Hit(nid='1966847426', db_name=

In [11]:
table_drs = api.drs_from_table("doe-buildings-performance-database-sample-residential-data-fa45c.csv")
table_drs.print_columns()

Hit(nid='1288620036', db_name='datagov', source_name='doe-buildings-performance-database-sample-residential-data-fa45c.csv', field_name='AnnualGas_kBtu', score=0)
Hit(nid='2424221598', db_name='datagov', source_name='doe-buildings-performance-database-sample-residential-data-fa45c.csv', field_name='DataJamID', score=0)
Hit(nid='2984542959', db_name='datagov', source_name='doe-buildings-performance-database-sample-residential-data-fa45c.csv', field_name='Year', score=0)


**I have a handful of relevant datasets. Let's find more**

**Summary:**
    - energy-usage-2010-24a67.csv
    - existing-commercial-buildings-energy-performance-ordinance-report-00be5.csv
    - energy-and-water-data-disclosure-for-local-law-84-2011-f4180.csv
    - chicago-energy-benchmarking.csv
    - 2010-2011-nyc-municipal-building-energy-benchmarking-results-6d21a.csv
    - new-york-state-government-building-energy-use-intensity-data-beginning-state-fiscal-year-2.csv
    - doe-buildings-performance-database-sample-residential-data-fa45c.csv

In [12]:
table_drs = api.drs_from_table("existing-commercial-buildings-energy-performance-ordinance-report-00be5.csv")
res = api.similar_content_to(table_drs)
res.print_tables()

the-omnibus-surveys-omnibus-monthly-survey-2002-feb-csv-data.csv
assertive-community-treatment-act-outcomes.csv
public-health-statistics-screening-for-elevated-blood-lead-levels-in-children-aged-0-6-ye--fd65a.csv
childhood-lead-poisoning-surveillance-report-by-county-1994-b97e1.csv
ss08pusc.csv
building-permits-count-2004-2010-fed1b.csv
dob-cellular-antenna-filings-72123.csv
ipis-integrated-property-information-system-a5b7f.csv
the-omnibus-surveys-omnibus-monthly-survey-2009-oct-csv-data.csv
number-of-selected-inpatient-medical-procedures-california-hospitals-2005-2013.csv
local-law-48-of-2011-report-f66bf.csv
ss09pusc.csv
tgr2006se_curr_trtcu.csv
dcf-children-in-placement-annual-point-in-time-trend-by-race-ethnicity-group.csv
public-health-activities-and-services-2014-3009d.csv
brownfield-cleanup-program-certificates-of-completion.csv
spending-and-revenue-8cc3d.csv
illinois-population-changes-by-decade-1980-2010-bdb30.csv
residential-energy-consumption-survey-recs-files-energy-consump

In [17]:
rank_coverage = res.rank_coverage()

In [18]:
rank_coverage.print_tables_with_scores()

('existing-commercial-buildings-energy-performance-ordinance-report-00be5.csv', 0.24)
('ss08pusa.csv', 0.12)
('health-hospitals-system-outpatient-registrations-by-zip-fiscal-year-2010-ab403.csv', 0.08)
('boundaries-zoning-districts.csv', 0.06)
('the-omnibus-surveys-omnibus-monthly-survey-2002-feb-csv-data.csv', 0.06)
('assertive-community-treatment-act-outcomes.csv', 0.06)
('the-omnibus-surveys-omnibus-monthly-survey-2002-may-csv-data.csv', 0.06)
('mental-health-wait-times-d5beb.csv', 0.06)
('ss08pusc.csv', 0.06)
('building-permits-count-2004-2010-fed1b.csv', 0.06)
('dob-cellular-antenna-filings-72123.csv', 0.06)
('the-omnibus-surveys-omnibus-monthly-survey-2002-mar-csv-data.csv', 0.06)
('the-omnibus-surveys-omnibus-monthly-survey-2002-jan-csv-data.csv', 0.06)
('ipis-integrated-property-information-system-a5b7f.csv', 0.06)
('structural-pest-control-seminar-schedule-e1bf6.csv', 0.06)
('nmswcd.csv', 0.06)
('the-omnibus-surveys-omnibus-monthly-survey-2002-apr-csv-data.csv', 0.06)
('new-li

In [19]:
rank_certainty = res.rank_certainty()

In [20]:
rank_certainty.print_tables_with_scores()

('health-hospitals-system-outpatient-registrations-by-zip-fiscal-year-2010-ab403.csv', 23.0)
('building-permits-count-2004-2010-fed1b.csv', 4.0)
('the-omnibus-surveys-omnibus-monthly-survey-2002-dec-csv-data.csv', 4.0)
('the-omnibus-surveys-omnibus-monthly-survey-2002-mar-csv-data.csv', 4.0)
('the-omnibus-surveys-omnibus-monthly-survey-2002-july-csv-data.csv', 4.0)
('the-omnibus-surveys-omnibus-monthly-survey-2001-nov-csv-data.csv', 4.0)
('national-survey-of-pedestrian-and-bicyclist-attitudes-knowledge-and-behaviors-2012-survey.csv', 3.0)
('residential-energy-consumption-survey-recs-files-energy-consumption-2009.csv', 3.0)
('the-omnibus-surveys-omnibus-monthly-survey-2003-april-csv-data.csv', 3.0)
('mini-survey-data-for-the-advancing-national-integration-in-georgia-activity-mid-term-perfo.csv', 3.0)
('the-omnibus-surveys-omnibus-monthly-survey-2001-oct-csv-data.csv', 3.0)
('existing-commercial-buildings-energy-performance-ordinance-report-00be5.csv', 2.0)
('assertive-community-treatmen

**Found another potentially relevant dataset that was unknown before:** *residential-energy-consumption-survey-recs-files-energy-consumption-2009.csv*

In [21]:
table_drs = api.drs_from_table("energy-and-water-data-disclosure-for-local-law-84-2011-f4180.csv")
res = api.similar_content_to(table_drs)
res.print_tables()

solar-monthly-and-annual-average-latitude-tilt-gis-data-at-40km-resolution-for-mexico-cent-9a594.csv
salaries-esd-clackamas-fy-2013-46dd7.csv
ss09husb.csv
2013-campaign-expenditures-e8c6d.csv
energy-star-certified-uninterruptible-power-supplies.csv
tgr2006se_torr_urbcu.csv
tgr2006se_hard_grpcu.csv
solar-monthly-direct-normal-dni-gis-data-at-40km-resolution-for-bangladesh-from-nrel-f5201.csv
tgr2006se_soco_cdcu.csv
doe-high-school-directory-2013-2014-dbbff.csv
impaired-driving-death-rate-by-age-and-gender-2012-all-states-587fd.csv
vegetable-intake-ebe6c.csv
ss11husa.csv
impaired-driving-death-rate-by-age-and-gender-2012-region-10-seattle.csv
census-tract10-d6893.csv
tgr2006se_mora_zcta300.csv
nyc-clean-heat-dataset-d5995.csv
tgr2006se_deba_placeec.csv
ipis-integrated-property-information-system-a5b7f.csv
bid-data-on-projects-for-the-past-10-years-23396.csv
dob-job-application-filings-05eff.csv
the-omnibus-surveys-omnibus-monthly-survey-2009-oct-csv-data.csv
county-spending-f463a.csv
tgr

In [22]:
rank_coverage = res.rank_coverage()

In [25]:
rank_coverage.print_tables_with_scores()

('cip-project-phase-list-c4102.csv', 0.0625)
('salaries-esd-clackamas-fy-2013-46dd7.csv', 0.0625)
('ss09husb.csv', 0.0625)
('tgr2006se_oter_taz.csv', 0.0625)
('energy-star-certified-uninterruptible-power-supplies.csv', 0.0625)
('tgr2006se_hard_grpcu.csv', 0.0625)
('solar-monthly-and-annual-average-direct-normal-dni-gis-data-at-40km-resolution-for-mexico--6c105.csv', 0.0625)
('tgr2006se_mora_placecu.csv', 0.0625)
('annual-taxpayer-location-address-list-2014.csv', 0.0625)
('ccgisdata-parcel-2013.csv', 0.0625)
('doe-high-school-directory-2013-2014-dbbff.csv', 0.0625)
('2013-campaign-expenditures-e8c6d.csv', 0.0625)
('vegetable-intake-ebe6c.csv', 0.0625)
('ss11husa.csv', 0.0625)
('census-tract10-d6893.csv', 0.0625)
('tgr2006se_dona_puma1.csv', 0.0625)
('tgr2006se_mora_zcta300.csv', 0.0625)
('tgr2006se_torr_urbcu.csv', 0.0625)
('tgr2006se_bern_ctyec.csv', 0.0625)
('nyc-clean-heat-dataset-d5995.csv', 0.0625)
('tgr2006se_deba_placeec.csv', 0.0625)
('percentage-of-adults-who-report-driving-aft

In [26]:
rank_certainty = res.rank_certainty()

In [27]:
rank_certainty.print_tables_with_scores()

('cos-do-it-phone-and-online-survey-data-2013-9d52c.csv', 54.0)
('zambia-communications-support-for-health-safe-love-campaign-outcome-evaluation.csv', 43.0)
('car-allowance-rebate-system-cars-trade-in-vehicles-consumer-survey-csv-file.csv', 38.0)
('national-survey-of-pedestrian-and-bicyclist-attitudes-knowledge-and-behaviors-2012-survey.csv', 31.0)
('2000-school-survey-on-crime-and-safety.csv', 29.0)
('brec-parks.csv', 29.0)
('location-affordability-index-all-core-based-statistical-areas-cbsas.csv', 25.0)
('location-affordability-index-all-census-places.csv', 22.0)
('location-affordability-index-all-census-counties.csv', 22.0)
('residential-energy-consumption-survey-recs-files-energy-consumption-2009.csv', 20.0)
('national-beneficiary-survey-nbs-round-3.csv', 19.0)
('national-beneficiary-survey-nbs-round-2.csv', 19.0)
('ss13husa.csv', 17.0)
('national-beneficiary-survey-nbs-round-4.csv', 17.0)
('inpatient-psychiatric-facility-quality-measure-data-by-facility.csv', 16.0)
('ss11husa.csv'

In [38]:
table_drs = api.drs_from_table("chicago-energy-benchmarking.csv")
res = api.similar_content_to(table_drs)
res.print_tables()

LAP December 2014.csv
speed-camera-violations-997eb.csv
grocery-stores-2013-1836b.csv
311-service-requests-graffiti-removal-5072e.csv
tgr2006se_sier_urbcu.csv
sbir-awards-in-hawaii-phase-1-2000-to-2013-ad9af.csv
chicago-public-schools-high-school-progress-report-card-2012-2013-04a7d.csv
k00236data810031880_cnty_view.csv
tgr2006se_roos_lkh.csv
311-service-requests-garbage-carts-7e8e7.csv
ss07husd.csv
tgr2006se_colf_ctycu.csv
tgr2006se_sier_sdunicu.csv
home-owner-associations.csv
ss09husb.csv
dob-permit-issuance-36530.csv
tgr2006se_colf_urb00.csv
city-trees-full-dataset.csv
budget-2012-budget-recommendations-positions-and-salaries-ebc4f.csv
energy-star-certified-commercial-ovens.csv
department-of-commerce-central-business-licensing-system-report-e5556.csv
tgr2006se_linc_ctyec.csv
tgr2006se_lea_cbsacu.csv
salaries-oregon-lottery-payroll-report-as-of-june-30-2014-8018f.csv
tl_2010_35041_tabblock10.csv
ss13husa.csv
tgr2006se_deba_wat.csv
cabinet-breakdown-fy2013.csv
2013-campaign-expenditur

In [39]:
table_drs = api.drs_from_table("2010-2011-nyc-municipal-building-energy-benchmarking-results-6d21a.csv")
res = api.similar_content_to(table_drs)
res.print_tables()

chicago-public-schools-high-school-progress-report-2013-2014-4ed0c.csv
ss12pusa.csv
cdc-state-system-tobacco-legislation-smokefree-indoor-air-summary-3c01f.csv
table-17-solid-waste-recycled-in-tons-851c9.csv
ksd297data871466944_schd_view.csv
ksd316data278037142_schd_view.csv
local-development-corporations-bonds.csv
self-reported-incidents-by-all-health-facilities-in-colorado-from-2003-to-2013.csv
sfo-2011-customer-survey-data-set-887b0.csv
washington-state-criminal-justice-data-book.csv
processing-time-for-initial-disability-cases-involving-the-processing-centers.csv
k00271data295351758_cnty_view.csv
cumulative-canvass-7e59a.csv
idot-multi-year-programming-2012-2017-spot-improvements-73430.csv
medical-service-study-areas-2010.csv
2013-sfo-customer-survey-d3541.csv
tgr2006se_sanj_aianhhcu.csv
tgr2006se_taos_grpcu.csv
table-18-leaking-underground-storage-tanks-03601.csv
hood-canal-complete-29127.csv
performance-metrics-streets-sanitation-tree-debris-fea94.csv
tgr2006se_deba_zcta5cu.csv
c

In [42]:
table_drs = api.drs_from_table("doe-buildings-performance-database-sample-residential-data-fa45c.csv")
res = api.similar_content_to(table_drs)
res.print_tables()

animal-control-species-by-city-and-gender-fiscal-year-2010-60e6b.csv
ihda-illinois-housing-dev-auth-fy2010-governors-report-closings-home-4dfeb.csv
tax-credits-fiscal-year-2011.csv
ihda-illinois-housing-dev-auth-fy2010-governors-report-closings-lihtc-03660.csv
LAP_April_2010.csv
ihda-illinois-housing-dev-auth-fy2010-governors-report-closings-sf-trust-fund-bf75d.csv
ihda-illinois-housing-dev-auth-fy2010-governors-report-applications-lihtc-05fb0.csv
full-time-employees-by-job-classification-a0538.csv
child-support-enforcement-annual-data-report-form-157-yr-2010.csv
total-housing-units-by-occupancy-status-and-tenure-by-borough-d7fdc.csv
cdc-pramstat-data-for-2010-41a5a.csv
LAP December 2010.csv
ides-statewide-current-employment-statistics-d0003.csv
population-of-selected-asian-race-subgroups-in-new-york-city-by-borough-6e1f2.csv
maryland-funding-fy10-payments-data-7193d.csv
LAP_September_2010.csv
ihda-illinois-housing-dev-auth-fy2010-governors-report-applications-sf-trust-fund-3d541.csv
i

### Building water use intensity (water use divided by square footage)

**I can follow the same pattern as before**

In [46]:
tables_with_energy = api.table_name_search("water", max_results=1000)
tables_with_building = api.table_name_search("building", max_results=1000)
tables_with_energy.print_tables()
tables_with_building.print_tables()

watershed-water-quality-data.csv
water-contaminent-levels.csv
water-withdrawals-by-facility-beginning-2009.csv
beach-water-quality-automated-sensors-66a4b.csv
ccgisdata-metropolitan-water-reclamation-tax-dist-2012.csv
water-and-electric-usage-from-2005-2013-83298.csv
water-use-average-by-zipcode-8dbe0.csv
missouri-river-water-trail-access-points-e5bd8.csv
oregon-ag-water-quality-management-areas-map.csv
e-coli-water-contaminent-levels.csv
energy-star-certified-water-coolers.csv
water-fluoridation-statistics-percent-of-pws-population-receiving-fluoridated-water.csv
idph-community-water-fluoridation-2010-compliance-award-list-4c64f.csv
table-9-percentage-of-population-served-safe-drinking-water-daba6.csv
shellfish-harvesting-waters-sampling-stations-b5c72.csv
foia-request-log-water-management-4bb3b.csv
water-point-source-sampling-locations-23f9b.csv
water-contaminent-levels-63239.csv
residential-water-usage-by-month-2005-to-2013-e5c58.csv
water-well-and-closed-loop-well-examination-dates

**New, relevant datasets appear. Even some that are relevant to energy, but here are referred to as electric**

In [47]:
schema_with_energy = api.schema_name_search("water", max_results=1000)
schema_with_footage = api.schema_name_search("footage", max_results=1000)
schema_with_area = api.schema_name_search("area", max_results=1000)
schema_with_surface = api.schema_name_search("surface", max_results=1000)
schema_with_sqft = api.schema_name_search("sqft", max_results=1000)
schema_union1 = api.union(schema_with_footage, schema_with_area)
schema_union2 = api.union(schema_with_surface, schema_with_sqft)
schema_all = api.union(schema_union1, schema_union2)
#schema_with_energy.print_tables()
#schema_with_footage.print_tables()
print("Intersection")
intersection = api.intersection(api.table(schema_with_energy), api.table(schema_all))
intersection.print_tables()

Intersection
dpd-urban-villages-uvmfg-polygon-a4e49.csv
watershed-statistics-11b03.csv
reservoir-and-dam-statistics.csv
citizen-statewide-lake-monitoring-assessment-program-cslap-lakes.csv
census-blkgrp00-6916d.csv
find-a-missouri-utility-a32ff.csv
census-block10-b7af7.csv
city-of-albany-vacant-building-inventory-2013.csv
rsbs-smo-part-2-of-2-new-york-state-residential-statewide-baseline-study-single-and-multif.csv
census-tract00-b96e2.csv
public-fishing-rights-parking-areas.csv
parks-locations-10f58.csv
census-block00-b6fc8.csv
waofm-census-population-and-housing-2000-and-2010-309cc.csv
energy-and-water-data-disclosure-for-local-law-84-2011-f4180.csv
census-blkgrp10-9dd75.csv
rsbs-mom-part-2-of-2-new-york-state-residential-statewide-baseline-study-survey-of-multifa.csv
dpd-urban-villages-uvmfg-region-uc-f571c.csv
rsbs-mom-part-1-of-2-new-york-state-residential-statewide-baseline-study-survey-of-multifa.csv
waofm-legislative-districts-table-1-census-2010-population-and-housing-41d41.cs

### Density or number of LEED buildings (per square mile)

In [54]:
res = api.keyword_search("leed", max_results=10)
res.print_tables()

campaign-finance-filings-submitted-to-the-new-york-state-board-of-elections-beginning-1999.csv
existing-commercial-buildings-energy-performance-ordinance-report-00be5.csv
energy-star-portfolio-state-managed-buildings-d1251.csv
green-tour.csv
business-energy-tax-credit-program-fiscal-year-2014-0e49c.csv
driver-license-permit-and-non-driver-identification-cards-issued.csv
energy-star-portfolio-state-managed-buildings.csv


**Some relevant datasets that I'll use to follow the search**

In [58]:
table_drs = api.drs_from_table("energy-star-portfolio-state-managed-buildings-d1251.csv")
res = api.similar_content_to(table_drs)
res.print_tables()

maryland-ozone-exceedance-days-in-2010-1c221.csv
capt-school-performance-2013.csv
inpatient-psychiatric-facility-quality-measure-data-by-state.csv
capt-district-performance-2010-2012.csv
where-incidents-that-led-to-a-complaint-took-place-by-precinct-manhattan-2005-2009-93b55.csv
energy-star-portfolio-state-managed-buildings.csv
capt-school-performance-2010-2012.csv


### Power plant co2 emissions figures

In [62]:
res = api.table_name_search("emissions", max_results=1000)
res.print_tables()

greenhouse-gas-emissions-estimates.csv
greenhouse-gas-emissions-from-fuel-combustion-million-metric-tons-beginning-1990.csv
annual-carbon-dioxide-emissions-2005-2009-915fc.csv
pola-emissions-reduction-percentage-e7b51.csv
port-of-los-angeles-emission-from-port-operations-4912c.csv
greenhouse-gas-emissions-from-fuel-combustion-by-fuel-type-million-metric-tons.csv
emissions-stations-by-town.csv
pola-emissions-from-port-operations-nox-sox-dpm-a9072.csv


In [63]:
res = api.schema_name_search("emissions")
res.print_tables()

greenhouse-gas-emissions-estimates.csv
energy-star-certified-roof-products.csv
outdoor-wood-boilers-certified-for-sale-in-nys.csv
energy-and-water-data-disclosure-for-local-law-84-2012-f60cd.csv
energy-efficiency-completed-projects-beginning-1987.csv
greenhouse-gas-emissions-from-fuel-combustion-by-fuel-type-million-metric-tons.csv
table-14-toxic-release-inventory-tri-in-pounds.csv
table-14-toxic-release-inventory-tri-in-pounds-7518f.csv


**Many relevant datasets here**

In [65]:
table_drs = api.drs_from_table("energy-star-certified-roof-products.csv")
res = api.similar_content_to(table_drs)
res.print_tables()

dof-cooperative-comparable-rental-income-queens-fy-2011-2012-149d0.csv
idph-std-illinois-by-county-by-sex-by-age-group-chlamydia.csv
tl_2010_35029_vtd10.csv
energy-star-certified-displays.csv
washington-state-criminal-justice-data-book.csv
dcf-children-in-placement-annual-point-in-time-trend-by-age-group.csv
maryland-total-acres-outside-pfa-for-residential-development-2006-2010-67278.csv
tgr2006se_sier_urbcu.csv
pet-data-2016-animal-control-officer-service-calls.csv
tgr2006se_colf_nodes.csv
tgr2006se_curr_lkb.csv
tgr2006se_roos_lkh.csv
nndss-table-ii-chlamydia-to-coccidioidomycosis-85042.csv
disposition-of-offensive-language-allegations-2009-2df68.csv
dof-condominium-comparable-rental-income-manhattan-fy-2008-2009-000a6.csv
311-information-requests-by-month-61cfb.csv
energy-usage-2010-24a67.csv
ss07husd.csv
energy-star-certified-imaging-equipment.csv
energy-star-certified-ventilating-fans.csv
tgr2006se_losa_urbcu.csv
contracts-esd-lane-esd-fiscal-year-2013-279e7.csv
ss09husb.csv
chapte

### Waste diversion rates (what is diverted from landfill and incinerators)

In [66]:
res = api.keyword_search("waste")
res.print_tables()

mc311-service-requests-c9504.csv
household-hazardous-waste-facilities-ab6d0.csv
green-tour.csv


In [80]:
res = api.table_name_search("waste", max_results=100)
res.print_tables()

table-19-hazardous-waste-generated-eecb9.csv
2016-solid-waste-permit-fee-rulemaking.csv
table-17-solid-waste-recycled-in-tons-851c9.csv
household-hazardous-waste-facilities-ab6d0.csv
table-19-hazardous-waste-generated.csv
approved-licensees-and-registrants-for-trade-waste-15675.csv
solid-waste-management-facilities.csv
duplicate-human-waste-3e1e7.csv
table-17-solid-waste-recycled-in-tons.csv


In [79]:
res = api.schema_name_search("landfill", max_results=100)
res.print_tables()

renewable-energy-generation-capacity-e7ed5.csv
renewable-energy-generated-in-maryland-8f51e.csv
dpd-eca-landfill-4d799.csv


In [72]:
table_drs = api.drs_from_table("dpd-eca-landfill-4d799.csv")
res = api.similar_content_to(table_drs)
res.print_tables()

dof-cooperative-comparable-rental-income-queens-fy-2011-2012-149d0.csv
idph-std-illinois-by-county-by-sex-by-age-group-chlamydia.csv
tl_2010_35029_vtd10.csv
tl_2010_35007_areawater.csv
energy-star-certified-displays.csv
washington-state-criminal-justice-data-book.csv
maryland-total-acres-outside-pfa-for-residential-development-2006-2010-67278.csv
dcf-children-in-placement-annual-point-in-time-trend-by-age-group.csv
tgr2006se_gran_vtd00.csv
tgr2006se_sier_urbcu.csv
pet-data-2016-animal-control-officer-service-calls.csv
tgr2006se_colf_nodes.csv
tgr2006se_curr_lkb.csv
tgr2006se_roos_lkh.csv
nndss-table-ii-chlamydia-to-coccidioidomycosis-85042.csv
disposition-of-offensive-language-allegations-2009-2df68.csv
311-information-requests-by-month-61cfb.csv
dof-condominium-comparable-rental-income-manhattan-fy-2008-2009-000a6.csv
energy-usage-2010-24a67.csv
ss07husd.csv
energy-star-certified-imaging-equipment.csv
energy-star-certified-ventilating-fans.csv
tgr2006se_losa_urbcu.csv
contracts-esd-la

In [82]:
sc_landfill = api.schema_name_search("landfill", max_results=1000)
sc_waste = api.table_name_search("waste", max_results=1000)
res = api.intersection(sc_landfill, sc_waste)
res.print_tables()

**Let's dig deeper into the datasets found so far that seem relevant**

In [98]:
table_drs = api.drs_from_table("renewable-energy-generation-capacity-e7ed5.csv")
table_drs.pretty_print_columns()

SOURCE: renewable-energy-generation-capacity-e7ed5.csv			 FIELD: Year
SOURCE: renewable-energy-generation-capacity-e7ed5.csv			 FIELD: Offshore Wind (MW)
SOURCE: renewable-energy-generation-capacity-e7ed5.csv			 FIELD: Landfill Gas (MW)
SOURCE: renewable-energy-generation-capacity-e7ed5.csv			 FIELD: Waste-to-Energy (MW)
SOURCE: renewable-energy-generation-capacity-e7ed5.csv			 FIELD: Land-Based Wind (MW)
SOURCE: renewable-energy-generation-capacity-e7ed5.csv			 FIELD: Black Liquor (MW)
SOURCE: renewable-energy-generation-capacity-e7ed5.csv			 FIELD: Hydro (MW)
SOURCE: renewable-energy-generation-capacity-e7ed5.csv			 FIELD: Solar (MW)
SOURCE: renewable-energy-generation-capacity-e7ed5.csv			 FIELD: Animal Litter (MW)


In [100]:
table_drs = api.drs_from_table("usepa-environmental-quality-index-eqi-air-water-land-built-and-sociodemographic-domains-non-tr.csv")
table_drs.pretty_print_columns()

SOURCE: usepa-environmental-quality-index-eqi-air-water-land-built-and-sociodemographic-domains-non-tr.csv			 FIELD: STATE
SOURCE: usepa-environmental-quality-index-eqi-air-water-land-built-and-sociodemographic-domains-non-tr.csv			 FIELD: A_Benzyl_Cl
SOURCE: usepa-environmental-quality-index-eqi-air-water-land-built-and-sociodemographic-domains-non-tr.csv			 FIELD: A_chloroform
SOURCE: usepa-environmental-quality-index-eqi-air-water-land-built-and-sociodemographic-domains-non-tr.csv			 FIELD: A_EtCl
SOURCE: usepa-environmental-quality-index-eqi-air-water-land-built-and-sociodemographic-domains-non-tr.csv			 FIELD: A_HCBD
SOURCE: usepa-environmental-quality-index-eqi-air-water-land-built-and-sociodemographic-domains-non-tr.csv			 FIELD: A_MIBK
SOURCE: usepa-environmental-quality-index-eqi-air-water-land-built-and-sociodemographic-domains-non-tr.csv			 FIELD: A_MMA
SOURCE: usepa-environmental-quality-index-eqi-air-water-land-built-and-sociodemographic-domains-non-tr.csv			 FIELD: A_MeCl

In [102]:
table_drs = api.drs_from_table("facilities-management-closed-cells-as-of-october-28th-2011-6c095.csv")
table_drs.pretty_print_columns()

SOURCE: facilities-management-closed-cells-as-of-october-28th-2011-6c095.csv			 FIELD: Miscellaneous
SOURCE: facilities-management-closed-cells-as-of-october-28th-2011-6c095.csv			 FIELD: Engineer
SOURCE: facilities-management-closed-cells-as-of-october-28th-2011-6c095.csv			 FIELD: Ironworker
SOURCE: facilities-management-closed-cells-as-of-october-28th-2011-6c095.csv			 FIELD: Cells
SOURCE: facilities-management-closed-cells-as-of-october-28th-2011-6c095.csv			 FIELD: Plumbing
SOURCE: facilities-management-closed-cells-as-of-october-28th-2011-6c095.csv			 FIELD: Brickworker
SOURCE: facilities-management-closed-cells-as-of-october-28th-2011-6c095.csv			 FIELD: Glazier
SOURCE: facilities-management-closed-cells-as-of-october-28th-2011-6c095.csv			 FIELD: Division Facility
SOURCE: facilities-management-closed-cells-as-of-october-28th-2011-6c095.csv			 FIELD: Electrician Tech
SOURCE: facilities-management-closed-cells-as-of-october-28th-2011-6c095.csv			 FIELD: Tinsmith
SOURCE: facilitie

In [1]:
field = ('datagov', 'solid-waste-management-facilities.csv', 'Waste Types')
drs_field = api.drs_from_raw_field(field)
res = api.similar_content_to(drs_field)
res.print_tables()

NameError: name 'api' is not defined