### Using the Socrata API to access the Delaware Data Portal

#### Why?
- Programmatically access meta-data about datasets
- Bulk retrieval of a collection of datasets

#### Resources
- http://dev.socrata.com
- http://data.delaware.gov

In [6]:
import requests

#### Let's retrieve the portal data catalog

In [7]:
# Retrieve DDG catalog from socrata
# returns JSON 
resp = requests.get("http://api.us.socrata.com/api/catalog/v1?domains=data.delaware.gov")
catalog = resp.json()

# inspect the first result element
catalog['results'][0]

{'resource': {'name': 'Delaware Business Licenses',
  'id': '5zy2-grhr',
  'parent_fxf': None,
  'description': 'Information for businesses currently licensed in Delaware.',
  'attribution': 'Department of Finance, Division of Revenue',
  'type': 'dataset',
  'updatedAt': '2019-02-15T12:45:59.000Z',
  'createdAt': '2016-11-02T18:20:04.000Z',
  'page_views': {'page_views_last_week': 1892,
   'page_views_last_month': 9787,
   'page_views_total': 166270,
   'page_views_last_week_log': 10.886458695703706,
   'page_views_last_month_log': 13.25679838607939,
   'page_views_total_log': 17.34317703870508},
  'columns_name': ['License number',
   'Country',
   'Geocoded Location',
   'City',
   'Address 2',
   'Zip',
   'State',
   'Current license valid from',
   'Address 1',
   'Current license valid to',
   'Business name',
   'Business Activity',
   'Trade name'],
  'columns_field_name': ['license_number',
   'country',
   'geocoded_location',
   'city',
   'address_2',
   'zip',
   'state',

#### We can easily walk the catalog to get a list of dataset ids and descriptions

In [8]:
datasets = [(res['id'],res['name'],res['description']) 
            for res in [result['resource'] for result in catalog['results']]
            if res['type'] == 'dataset']

#inspect the first element
datasets[0]

('5zy2-grhr',
 'Delaware Business Licenses',
 'Information for businesses currently licensed in Delaware.')

#### we can also grab a dataset, as JSON or CSV

Here we grab as JSON

Note: data is paginated, limited to 50K per page

In [9]:
offset = 0
limit = 1000
resp = requests.get("https://data.delaware.gov/resource/{}.json?$limit={}&$offset={}".format('v6xy-7sgx', limit, offset))
data = resp.json()

print("{} records".format(len(data)))
print(data[0])

1000 records
{'districtname': 'State of Delaware', 'schoolyear': '2018', 'grade': 'Adult Education', 'districtcode': '0', 'schoolcode': '0', 'schoolname': 'State of Delaware', 'demographic': 'All Students'}


#### need a loop to fetch an entire dataset

In [10]:
more = True
dataset = []
limit = 25000
offset = 0
while more:
    print("working...")
    resp = requests.get("https://data.delaware.gov/resource/{}.json?$limit={}&$offset={}".format('v6xy-7sgx', limit, offset))
    data = resp.json()
    if (len(data) > 0):
        dataset += data
        offset += limit
    else:
        more = False

print("{} records".format(len(dataset)))
print(dataset[0])

working...
working...
working...
working...
working...
working...
working...
working...
working...
working...
working...
225763 records
{'districtname': 'State of Delaware', 'schoolyear': '2018', 'grade': 'Adult Education', 'districtcode': '0', 'schoolcode': '0', 'schoolname': 'State of Delaware', 'demographic': 'All Students'}
