# Socrata API Basics 
## Socrata Open Data API (SODA) Tutorial with Python and NYC Open Data
Author: Mark Bauer

Table of Contents
=================

   1. Introduction
   2. Sodapy
   3. Importing Libraries
   4. Sodapy Socrata Class
   5. Socrata APIs
       - 5.1 Socrata API
       - 5.2 Discovery API
       - 5.3 Metadata API

# 1. Introduction  
This notebook demonstrates how to interact with the [Socrata Open Data API](https://dev.socrata.com/) in Python and explores different methods for retrieving data from Socrata-based open data portals. Additionally, we'll explore the library [sodapy](https://github.com/xmunoz/sodapy), a Python client for the Socrata Open Data API.

Not only will we learn how to fetch data, but we'll also explore Socrata's available metadata using the [Discovery](https://dev.socrata.com/docs/other/discovery#?route=overview) and [Metadata](https://dev.socrata.com/docs/other/metadata#?route=overview) APIs. These endpoints are often underutilized, but they offer valuable insights, such as dataset download counts, page views, and more.

While the main focus is on understanding how to interact with the Socrata API, we'll use the sodapy library throughout the examples, as it provides a straightforward interface for working with the Socrata API in Python. Both the Socrata API endpoint as well as sodapy code snippets will be provided to give you flexibility in how you approach your data retrieval. For this tutorial, the code is written in Python and will use data from NYC Open Data as an example.

Finally, I encourage you to read the [Socrata Documentation](https://dev.socrata.com/), as well as the [API Docs](https://dev.socrata.com/docs/endpoints) for a comprehensive understanding. This project highlights some of the most popular methods for working with the API, but it is by no means exhaustive.

# 2. Sodapy

[Sodapy](https://github.com/xmunoz/sodapy) is a python client for the Socrata Open Data API. In order use sodapy, a source domain (i.e. the Socrata Open Data source you are trying to connect to) needs to be passed to the Socrata class. Additionally, if a user wants to query a specific dataset on Socrata Open Data, then the dataset identifier (i.e. the dataset id on the given source domain) needs to be passed as well. Below, we identify NYC Open Data's source domain: `data.cityofnewyork.us` and the dataset identifier for the NYC 311 dataset: `erm2-nwe9`. The screenshot below displays where we retrieve this information.

This is my preferred method for fetching data from NYC Open Data. However, please note that the Sodapy project is now archived on GitHub and is read-only.

![nyc-311-api-docs](images/nyc-311-api-docs.png)  

Source: https://dev.socrata.com/foundry/data.cityofnewyork.us/erm2-nwe9

![nyc-311-api-docs](images/nyc-311-api-docs.png)  

Screenshot: https://dev.socrata.com/foundry/data.cityofnewyork.us/erm2-nwe9

We will focus on three popular Sodapy methods:
- `.get()`
Read data from the requested resource (The Socrata API Endpoint)

-  `.datasets()`
Returns the list of datasets associated with a particular domain (Discovery API)

- `.get_metadata()`
Retrieve the metadata for a particular dataset (Metadata API)

For a comprehensive list of other available APIs provided by Socrata, check out the [Other APIs](https://dev.socrata.com/docs/other/) page in the documentation.

### Attention
When querying all records, be sure to set the `limit` parameter to a value large enough to exceed the total number of records in your dataset. If the `limit` value is set to exactly the same number as the records returned, you likely haven’t retrieved all the data. To avoid this, choose a `limit` that is larger than the dataset’s total size.

# 3. Importing Libraries

In [1]:
# importing libraries
import pandas as pd
import requests
from sodapy import Socrata

In [2]:
# documention for installing watermark: https://github.com/rasbt/watermark, performed for reproducibility
%reload_ext watermark
%watermark -u -t -d -v -p pandas,sodapy

Last updated: 2024-12-02 21:35:35

Python implementation: CPython
Python version       : 3.11.0
IPython version      : 8.6.0

pandas: 1.5.1
sodapy: 2.2.0



# 4. Sodapy Socrata Class 

Note:  
`WARNING:root:Requests made without an app_token will be subject to strict throttling limits.`  

To avoid these limits, it's recommended to use an [app token](https://dev.socrata.com/docs/app-tokens.html) when making API requests.

In [3]:
# implementation in sodapy
# source domain for NYC Open Data on Socrata
socrata_domain = 'data.cityofnewyork.us'

# Socrata(): The main class that interacts with the SODA API.

# the required arguments are:
#     domain: the domain you wish you to access
#     app_token: your Socrata application token
# simple requests are possible without an app_token, though these
# requests will be rate-limited.

# initialize client
client = Socrata(
    socrata_domain,
    app_token=None,
    timeout=100
)

# no app token is passed, and the timeout limit is 100 seconds

# examine object
print(client)



<sodapy.socrata.Socrata object at 0x15f16c490>


In [4]:
# print information about the Socrata object
print(f'type: {type(client)}')

type: <class 'sodapy.socrata.Socrata'>


In [5]:
# print attributes of object
for key, value in client.__dict__.items():
    print(f'{key}: {value}')

domain: data.cityofnewyork.us
session: <requests.sessions.Session object at 0x15f16c510>
uri_prefix: https://
timeout: 100


# 5. Socrata APIs

##  5.1 Socrata Open Data API
[Socrata's Open Data API](https://dev.socrata.com/docs/endpoints).

From [the docs](https://dev.socrata.com/docs/endpoints):
>The “endpoint” of a SODA API is simply a unique URL that represents an object or collection of objects. Every Socrata dataset, and even every individual data record, has its own endpoint. The endpoint is what you’ll point your HTTP client at to interact with data resources.
>
>All resources are accessed through a common base path of /resource/ along with their dataset identifier.

In Sodapy, we use the `.get()` method: read data from the requested resource. Options for content_type are JSON, CSV, and XML. This method performs a get request on these type of URLs: https://data.cityofnewyork.us/resource/erm2-nwe9.json?$limit=5.

### Using Socrata API URL

In [6]:
# dataset: DEP Green Infrastructure https://data.cityofnewyork.us/Environment/DEP-Green-Infrastructure/spjh-pz7h
url = 'https://data.cityofnewyork.us/resource/bs59-f3nu.json'
df = pd.read_json(url)

# preview data
df.head()

Unnamed: 0,the_geom,asset_id,gi_id,dep_contra,dep_cont_1,row_onsite,project_na,asset_type,status,asset_x_co,...,asset_leng,asset_widt,asset_area,gi_feature,tree_latin,tree_commo,constructi,construc_1,program_ar,status_gro
0,"{'type': 'Point', 'coordinates': [-73.81167623...",94002,1A,GQJA03-02,2,ROW,DDC JAM-003 Phase 2,ROWB,Constructed (Full Maintenance),1036475.0,...,17.0,5.0,85.0,Standard,Chionanthus retusus,Chinese Fringetree,GCJA03-2A,Package-1,Right of Way (ROW),Constructed
1,"{'type': 'Point', 'coordinates': [-73.81228577...",94012,GS6A,GQJA03-02,2,ROW,DDC JAM-003 Phase 2,ROWGS,Constructed (Full Maintenance),1036305.0,...,13.0,3.5,45.5,,No Tree,,GCJA03-2A,Package-1,Right of Way (ROW),Constructed
2,"{'type': 'Point', 'coordinates': [-73.81223444...",94017,GS8C,GQJA03-02,2,ROW,DDC JAM-003 Phase 2,ROWGS,Constructed (Full Maintenance),1036319.0,...,20.0,3.5,70.0,,No Tree,,GCJA03-2A,Package-1,Right of Way (ROW),Constructed
3,"{'type': 'Point', 'coordinates': [-73.81205974...",94019,GS8E,GQJA03-02,2,ROW,DDC JAM-003 Phase 2,ROWGS,Constructed (Full Maintenance),1036368.0,...,20.0,3.5,70.0,,No Tree,,GCJA03-2A,Package-1,Right of Way (ROW),Constructed
4,"{'type': 'Point', 'coordinates': [-73.81310191...",94021,10A,GQJA03-02,2,ROW,DDC JAM-003 Phase 2,ROWB,Constructed (Full Maintenance),1036079.0,...,13.0,4.0,52.0,Standard,Quercus palustris,Pin Oak,GCJA03-2A,Package-1,Right of Way (ROW),Constructed


In [7]:
# preview columns
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 30 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   the_geom    1000 non-null   object 
 1   asset_id    1000 non-null   int64  
 2   gi_id       1000 non-null   object 
 3   dep_contra  1000 non-null   object 
 4   dep_cont_1  1000 non-null   int64  
 5   row_onsite  1000 non-null   object 
 6   project_na  1000 non-null   object 
 7   asset_type  1000 non-null   object 
 8   status      1000 non-null   object 
 9   asset_x_co  1000 non-null   float64
 10  asset_y_co  1000 non-null   float64
 11  borough     1000 non-null   object 
 12  sewer_type  1000 non-null   object 
 13  outfall     1000 non-null   object 
 14  nyc_waters  1000 non-null   object 
 15  bbl         1000 non-null   int64  
 16  secondary_  1000 non-null   int64  
 17  community_  1000 non-null   int64  
 18  city_counc  1000 non-null   int64  
 19  assembly_d  1000 non-null   

### Using Sodapy

In [8]:
# source domain for NYC Open Data on Socrata
socrata_domain = 'data.cityofnewyork.us'

# green infrastructure dataset identifier
socrata_dataset_identifier = 'bs59-f3nu'

# initialize client
client = Socrata(
    socrata_domain,
    app_token=None,
    timeout=1000
)

# get data for the 311 dataset, limit to 5 rows
# notice the limit parameter
data = client.get(socrata_dataset_identifier)

# identify type of object returned
print(type(data))



<class 'list'>


In [9]:
# preview first element in list, this is one record
data[0]

{'the_geom': {'type': 'Point',
  'coordinates': [-73.81167623024226, 40.69138622900597]},
 'asset_id': '94002.0',
 'gi_id': '1A',
 'dep_contra': 'GQJA03-02',
 'dep_cont_1': '2',
 'row_onsite': 'ROW',
 'project_na': 'DDC JAM-003 Phase 2',
 'asset_type': 'ROWB',
 'status': 'Constructed (Full Maintenance)',
 'asset_x_co': '1036475.27735',
 'asset_y_co': '191223.227',
 'borough': 'Queens',
 'sewer_type': 'Combined',
 'outfall': 'JAM-003',
 'nyc_waters': 'Jamaica Bay and Tributaries',
 'bbl': '4095890001.0',
 'secondary_': '0.0',
 'community_': '410.0',
 'city_counc': '28.0',
 'assembly_d': '32.0',
 'asset_leng': '17.0',
 'asset_widt': '5.0',
 'asset_area': '85.0',
 'gi_feature': 'Standard',
 'tree_latin': 'Chionanthus retusus',
 'tree_commo': 'Chinese Fringetree',
 'constructi': 'GCJA03-2A',
 'construc_1': 'Package-1',
 'program_ar': 'Right of Way (ROW)',
 'status_gro': 'Constructed'}

In [10]:
# convert list to a df
df = pd.DataFrame(data)

# sanity check
print(df.shape)
df.head()

(1000, 30)


Unnamed: 0,the_geom,asset_id,gi_id,dep_contra,dep_cont_1,row_onsite,project_na,asset_type,status,asset_x_co,...,asset_leng,asset_widt,asset_area,gi_feature,tree_latin,tree_commo,constructi,construc_1,program_ar,status_gro
0,"{'type': 'Point', 'coordinates': [-73.81167623...",94002.0,1A,GQJA03-02,2,ROW,DDC JAM-003 Phase 2,ROWB,Constructed (Full Maintenance),1036475.27735,...,17.0,5.0,85.0,Standard,Chionanthus retusus,Chinese Fringetree,GCJA03-2A,Package-1,Right of Way (ROW),Constructed
1,"{'type': 'Point', 'coordinates': [-73.81228577...",94012.0,GS6A,GQJA03-02,2,ROW,DDC JAM-003 Phase 2,ROWGS,Constructed (Full Maintenance),1036305.46107,...,13.0,3.5,45.5,,No Tree,,GCJA03-2A,Package-1,Right of Way (ROW),Constructed
2,"{'type': 'Point', 'coordinates': [-73.81223444...",94017.0,GS8C,GQJA03-02,2,ROW,DDC JAM-003 Phase 2,ROWGS,Constructed (Full Maintenance),1036319.11813,...,20.0,3.5,70.0,,No Tree,,GCJA03-2A,Package-1,Right of Way (ROW),Constructed
3,"{'type': 'Point', 'coordinates': [-73.81205974...",94019.0,GS8E,GQJA03-02,2,ROW,DDC JAM-003 Phase 2,ROWGS,Constructed (Full Maintenance),1036367.52667,...,20.0,3.5,70.0,,No Tree,,GCJA03-2A,Package-1,Right of Way (ROW),Constructed
4,"{'type': 'Point', 'coordinates': [-73.81310191...",94021.0,10A,GQJA03-02,2,ROW,DDC JAM-003 Phase 2,ROWB,Constructed (Full Maintenance),1036078.81888,...,13.0,4.0,52.0,Standard,Quercus palustris,Pin Oak,GCJA03-2A,Package-1,Right of Way (ROW),Constructed


In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 30 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   the_geom    1000 non-null   object
 1   asset_id    1000 non-null   object
 2   gi_id       1000 non-null   object
 3   dep_contra  1000 non-null   object
 4   dep_cont_1  1000 non-null   object
 5   row_onsite  1000 non-null   object
 6   project_na  1000 non-null   object
 7   asset_type  1000 non-null   object
 8   status      1000 non-null   object
 9   asset_x_co  1000 non-null   object
 10  asset_y_co  1000 non-null   object
 11  borough     1000 non-null   object
 12  sewer_type  1000 non-null   object
 13  outfall     1000 non-null   object
 14  nyc_waters  1000 non-null   object
 15  bbl         1000 non-null   object
 16  secondary_  1000 non-null   object
 17  community_  1000 non-null   object
 18  city_counc  1000 non-null   object
 19  assembly_d  1000 non-null   object
 20  asset_len

Example with the QUERY parameter.

When querying all records, be sure to set the `limit` parameter to a value large enough to exceed the total number of records in your dataset. If the `limit` value is set to exactly the same number as the records returned, you likely haven’t retrieved all the data. To avoid this, choose a `limit` that is larger than the dataset’s total size.

In [12]:
# SoQL implementation with sodapy
# SoQL query string below:
# retrieve all columns and limit our records to 100

# green infrastructure dataset identifier
socrata_dataset_identifier = 'bs59-f3nu'

query = """
    SELECT *
    LIMIT 100
"""

# returned as JSON from API / converted to Python list of dictionaries by sodapy
# notice the query parameter
results = client.get(socrata_dataset_identifier, query=query)

# convert to pandas DataFrame
results_df = pd.DataFrame.from_records(results)

# sanity check
print(f'shape of data: {results_df.shape}')
results_df.head()

shape of data: (100, 30)


Unnamed: 0,the_geom,asset_id,gi_id,dep_contra,dep_cont_1,row_onsite,project_na,asset_type,status,asset_x_co,...,asset_leng,asset_widt,asset_area,gi_feature,tree_latin,tree_commo,constructi,construc_1,program_ar,status_gro
0,"{'type': 'Point', 'coordinates': [-73.81167623...",94002.0,1A,GQJA03-02,2,ROW,DDC JAM-003 Phase 2,ROWB,Constructed (Full Maintenance),1036475.27735,...,17.0,5.0,85.0,Standard,Chionanthus retusus,Chinese Fringetree,GCJA03-2A,Package-1,Right of Way (ROW),Constructed
1,"{'type': 'Point', 'coordinates': [-73.81228577...",94012.0,GS6A,GQJA03-02,2,ROW,DDC JAM-003 Phase 2,ROWGS,Constructed (Full Maintenance),1036305.46107,...,13.0,3.5,45.5,,No Tree,,GCJA03-2A,Package-1,Right of Way (ROW),Constructed
2,"{'type': 'Point', 'coordinates': [-73.81223444...",94017.0,GS8C,GQJA03-02,2,ROW,DDC JAM-003 Phase 2,ROWGS,Constructed (Full Maintenance),1036319.11813,...,20.0,3.5,70.0,,No Tree,,GCJA03-2A,Package-1,Right of Way (ROW),Constructed
3,"{'type': 'Point', 'coordinates': [-73.81205974...",94019.0,GS8E,GQJA03-02,2,ROW,DDC JAM-003 Phase 2,ROWGS,Constructed (Full Maintenance),1036367.52667,...,20.0,3.5,70.0,,No Tree,,GCJA03-2A,Package-1,Right of Way (ROW),Constructed
4,"{'type': 'Point', 'coordinates': [-73.81310191...",94021.0,10A,GQJA03-02,2,ROW,DDC JAM-003 Phase 2,ROWB,Constructed (Full Maintenance),1036078.81888,...,13.0,4.0,52.0,Standard,Quercus palustris,Pin Oak,GCJA03-2A,Package-1,Right of Way (ROW),Constructed


## 5.2  Discovery API
Socrata's [Discovery API](https://dev.socrata.com/docs/other/discovery#?route=overview).

In Sodapy, we use the `.datasets()` method: Returns the list of datasets associated with a particular domain.
WARNING: Large limits (>1000) will return megabytes of data, which can be slow on low-bandwidth networks, and is also a lot of data to hold in memory. This method performs a get request on these type of URLs: https://data.cityofnewyork.us/api/catalog/v1.

### Using Socrata API URL

In [13]:
# Discovery API
url = 'https://api.us.socrata.com/api/catalog/v1?search_context=data.ny.gov'

# fetch the JSON data from the web
response = requests.get(url)

# parse the JSON response
data_dict = response.json() 

# preview keys    
data_dict.keys()    



In [14]:
# preview results key, first element
data_dict['results'][0]

{'resource': {'name': 'Lottery Cash 4 Life Winning Numbers: Beginning 2014',
  'id': 'kwxv-fwze',
  'resource_name': None,
  'parent_fxf': [],
  'description': 'Go to http://on.ny.gov/1xRIvPz on the New York Lottery website for past Cash 4 Life results and payouts.',
  'attribution': 'New York State Gaming Commission',
  'attribution_link': 'http://nylottery.ny.gov/wps/portal/Home/Lottery/home/your+lottery/drawing+results/drawingresults_cash4life',
  'contact_email': 'opendata@its.ny.gov',
  'type': 'dataset',
  'updatedAt': '2024-12-02T11:05:05.000Z',
  'createdAt': '2014-06-17T19:47:54.000Z',
  'metadata_updated_at': '2024-12-02T11:05:04.000Z',
  'data_updated_at': '2024-12-02T11:05:05.000Z',
  'page_views': {'page_views_last_week': 482,
   'page_views_last_month': 2482,
   'page_views_total': 5932845,
   'page_views_last_week_log': 8.915879378835772,
   'page_views_last_month_log': 11.27786854617684,
   'page_views_total_log': 22.50029290430145},
  'columns_name': ['Draw Date', 'Cas

In [15]:
# convert into df
df = pd.DataFrame.from_records(data_dict['results'])

# sanity check
print(df.shape)
df.head()

(100, 8)


Unnamed: 0,resource,classification,metadata,permalink,link,owner,creator,preview_image_url
0,{'name': 'Lottery Cash 4 Life Winning Numbers:...,"{'categories': [], 'tags': [], 'domain_categor...",{'domain': 'data.ny.gov'},https://data.ny.gov/d/kwxv-fwze,https://data.ny.gov/Government-Finance/Lottery...,"{'id': 'xzik-pf59', 'user_type': 'interactive'...","{'id': 'xzik-pf59', 'user_type': 'interactive'...",
1,"{'name': 'For Hire Vehicles (FHV) - Active', '...","{'categories': [], 'tags': [], 'domain_categor...",{'domain': 'data.cityofnewyork.us'},https://data.cityofnewyork.us/d/8wbx-tsch,https://data.cityofnewyork.us/Transportation/F...,"{'id': '5fuc-pqz2', 'user_type': 'interactive'...","{'id': '5fuc-pqz2', 'user_type': 'interactive'...",
2,"{'name': 'Civil Service List (Active)', 'id': ...","{'categories': [], 'tags': [], 'domain_categor...",{'domain': 'data.cityofnewyork.us'},https://data.cityofnewyork.us/d/vx8i-nprf,https://data.cityofnewyork.us/City-Government/...,"{'id': '5fuc-pqz2', 'user_type': 'interactive'...","{'id': '5fuc-pqz2', 'user_type': 'interactive'...",
3,"{'name': 'DOB Job Application Filings', 'id': ...","{'categories': [], 'tags': [], 'domain_categor...",{'domain': 'data.cityofnewyork.us'},https://data.cityofnewyork.us/d/ic3t-wcy2,https://data.cityofnewyork.us/Housing-Developm...,"{'id': '5fuc-pqz2', 'user_type': 'interactive'...","{'id': '5fuc-pqz2', 'user_type': 'interactive'...",
4,"{'name': 'Medicaid Enrolled Provider Listing',...","{'categories': [], 'tags': [], 'domain_categor...",{'domain': 'health.data.ny.gov'},https://health.data.ny.gov/d/keti-qx5t,https://health.data.ny.gov/Health/Medicaid-Enr...,"{'id': 's9j2-nqmr', 'user_type': 'interactive'...","{'id': 's9j2-nqmr', 'user_type': 'interactive'...",


This is not our final dataframe that contains our data in a tabular format. Our information most likely is located in the `resource` column, but let's confirm.

Briefly review the other keys.

In [16]:
# first element in our results list, preview resources key
data_dict['results'][0]['resource']

{'name': 'Lottery Cash 4 Life Winning Numbers: Beginning 2014',
 'id': 'kwxv-fwze',
 'resource_name': None,
 'parent_fxf': [],
 'description': 'Go to http://on.ny.gov/1xRIvPz on the New York Lottery website for past Cash 4 Life results and payouts.',
 'attribution': 'New York State Gaming Commission',
 'attribution_link': 'http://nylottery.ny.gov/wps/portal/Home/Lottery/home/your+lottery/drawing+results/drawingresults_cash4life',
 'contact_email': 'opendata@its.ny.gov',
 'type': 'dataset',
 'updatedAt': '2024-12-02T11:05:05.000Z',
 'createdAt': '2014-06-17T19:47:54.000Z',
 'metadata_updated_at': '2024-12-02T11:05:04.000Z',
 'data_updated_at': '2024-12-02T11:05:05.000Z',
 'page_views': {'page_views_last_week': 482,
  'page_views_last_month': 2482,
  'page_views_total': 5932845,
  'page_views_last_week_log': 8.915879378835772,
  'page_views_last_month_log': 11.27786854617684,
  'page_views_total_log': 22.50029290430145},
 'columns_name': ['Draw Date', 'Cash Ball', 'Winning Numbers'],
 'c

In [17]:
# first element in our results list, preview classification key
data_dict['results'][0]['classification']

{'categories': [],
 'tags': [],
 'domain_category': 'Government & Finance',
 'domain_tags': ['cash 4 life', 'new york lottery', 'results', 'winning'],
 'domain_metadata': [{'key': 'Common-Core_Publisher',
   'value': 'State of New York'},
  {'key': 'Common-Core_Contact-Name', 'value': 'Open Data NY'},
  {'key': 'Common-Core_Contact-Email', 'value': 'opendata@its.ny.gov'},
  {'key': 'Additional-Resources_See-Also',
   'value': 'http://www.gaming.ny.gov/'},
  {'key': 'Dataset-Summary_Dataset-Owner',
   'value': 'New York State Gaming Commission'},
  {'key': 'Dataset-Summary_Contact-Information',
   'value': 'Info@gaming.ny.gov'},
  {'key': 'Dataset-Summary_Granularity', 'value': 'By draw'},
  {'key': 'Dataset-Summary_Coverage', 'value': 'Statewide'},
  {'key': 'Dataset-Summary_Data-Frequency',
   'value': 'Daily beginning 7/1/19; twice weekly previously'},
  {'key': 'Dataset-Summary_Posting-Frequency', 'value': 'Daily'},
  {'key': 'Dataset-Summary_Organization', 'value': 'The New York Lo

In [18]:
# first element in our results list, preview metadata key
data_dict['results'][0]['metadata']

{'domain': 'data.ny.gov'}

In [19]:
# first element in our results list, preview permalink key
data_dict['results'][0]['permalink']

'https://data.ny.gov/d/kwxv-fwze'

In [20]:
# first element in our results list, preview link key
data_dict['results'][0]['link']

'https://data.ny.gov/Government-Finance/Lottery-Cash-4-Life-Winning-Numbers-Beginning-2014/kwxv-fwze'

In [21]:
# first element in our results list, preview owner key
data_dict['results'][0]['owner']

{'id': 'xzik-pf59', 'user_type': 'interactive', 'display_name': 'NY Open Data'}

In [22]:
# first element in our results list, preview creator key
data_dict['results'][0]['creator']

{'id': 'xzik-pf59', 'user_type': 'interactive', 'display_name': 'NY Open Data'}

After confirming data is in our resource key, let's unnest these values.

In [23]:
# retrieve information in the resource key
df['resource']

0     {'name': 'Lottery Cash 4 Life Winning Numbers:...
1     {'name': 'For Hire Vehicles (FHV) - Active', '...
2     {'name': 'Civil Service List (Active)', 'id': ...
3     {'name': 'DOB Job Application Filings', 'id': ...
4     {'name': 'Medicaid Enrolled Provider Listing',...
                            ...                        
95    {'name': 'Lottery Pick 10 Winning Numbers: Beg...
96    {'name': 'Street Hail Livery (SHL) Permits', '...
97    {'name': 'NYPD Complaint Data Current (Year To...
98    {'name': 'Index Crimes by County and Agency: B...
99    {'name': 'TLC Approved LabCorp Patient Service...
Name: resource, Length: 100, dtype: object

In [24]:
# convert to a dataframe
df = pd.DataFrame.from_records(df['resource'])

# sanity check
print(df.shape)
df.head()

(100, 27)


Unnamed: 0,name,id,resource_name,parent_fxf,description,attribution,attribution_link,contact_email,type,updatedAt,...,columns_description,columns_format,download_count,provenance,lens_view_type,lens_display_type,locked,blob_mime_type,hide_from_data_json,publication_date
0,Lottery Cash 4 Life Winning Numbers: Beginning...,kwxv-fwze,,[],Go to http://on.ny.gov/1xRIvPz on the New York...,New York State Gaming Commission,http://nylottery.ny.gov/wps/portal/Home/Lotter...,opendata@its.ny.gov,dataset,2024-12-02T11:05:05.000Z,...,"[Draw date, Cash ball, Winning numbers]","[{'view': 'date', 'align': 'center'}, {'align'...",230847,official,tabular,table,False,,False,2021-04-27T14:13:45.000Z
1,For Hire Vehicles (FHV) - Active,8wbx-tsch,,[],"<b>PLEASE NOTE:</b> This dataset, which includ...",Taxi and Limousine Commission (TLC),,,dataset,2024-12-02T19:55:58.000Z,...,"[Vehicle VIN Number, Base Website, Base Teleph...","[{'displayStyle': 'plain', 'align': 'left'}, {...",0,official,tabular,table,False,,False,2021-04-05T13:20:47.000Z
2,Civil Service List (Active),vx8i-nprf,,[],A Civil Service List consists of all candidate...,Department of Citywide Administrative Services...,,,dataset,2024-12-02T14:16:52.000Z,...,[A candidate’s last name as it appears on thei...,"[{'displayStyle': 'plain', 'align': 'left'}, {...",0,official,tabular,table,False,,False,2024-01-12T16:15:05.000Z
3,DOB Job Application Filings,ic3t-wcy2,,[],This dataset contains all job applications sub...,Department of Buildings (DOB),,,dataset,2024-12-02T21:02:15.000Z,...,"[Street Name where Property is located, Longit...","[{'align': 'right'}, {'align': 'right'}, {'vie...",0,official,tabular,table,False,,False,2020-06-22T18:23:35.000Z
4,Medicaid Enrolled Provider Listing,keti-qx5t,,[],<b>Revalidation disclaimer</b>: The next anti...,New York State Department of Health,https://www.emedny.org/info/ProviderEnrollment...,,dataset,2024-12-02T19:12:09.000Z,...,[Longitude related to the service address for ...,"[{}, {}, {'align': 'left'}, {}, {}, {'view': '...",0,official,tabular,table,False,,False,2020-12-28T16:03:15.000Z


In [25]:
# preview columns
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 27 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   name                 100 non-null    object
 1   id                   100 non-null    object
 2   resource_name        1 non-null      object
 3   parent_fxf           100 non-null    object
 4   description          100 non-null    object
 5   attribution          89 non-null     object
 6   attribution_link     51 non-null     object
 7   contact_email        20 non-null     object
 8   type                 100 non-null    object
 9   updatedAt            100 non-null    object
 10  createdAt            100 non-null    object
 11  metadata_updated_at  100 non-null    object
 12  data_updated_at      100 non-null    object
 13  page_views           100 non-null    object
 14  columns_name         100 non-null    object
 15  columns_field_name   100 non-null    object
 16  columns_d

In [26]:
# preview values
df.head(3).T

Unnamed: 0,0,1,2
name,Lottery Cash 4 Life Winning Numbers: Beginning...,For Hire Vehicles (FHV) - Active,Civil Service List (Active)
id,kwxv-fwze,8wbx-tsch,vx8i-nprf
resource_name,,,
parent_fxf,[],[],[]
description,Go to http://on.ny.gov/1xRIvPz on the New York...,"<b>PLEASE NOTE:</b> This dataset, which includ...",A Civil Service List consists of all candidate...
attribution,New York State Gaming Commission,Taxi and Limousine Commission (TLC),Department of Citywide Administrative Services...
attribution_link,http://nylottery.ny.gov/wps/portal/Home/Lotter...,,
contact_email,opendata@its.ny.gov,,
type,dataset,dataset,dataset
updatedAt,2024-12-02T11:05:05.000Z,2024-12-02T19:55:58.000Z,2024-12-02T14:16:52.000Z


### Using Sodapy

In [27]:
# source domain for NYC Open Data on Socrata
socrata_domain = 'data.cityofnewyork.us'

# initialize client
client = Socrata(
    socrata_domain,
    app_token=None,
    timeout=100
)

# call sodapy's datasets method
datasets = client.datasets()

# sanity checks
print(f'object type: {type(datasets)}')
print(f'Number of datasets on NYC Open Data: {len(datasets):,}.')



object type: <class 'list'>
Number of datasets on NYC Open Data: 3,237.


In [28]:
# review type about first dataset
print(type(datasets[0]))

<class 'dict'>


In [29]:
# review information about keys
datasets[0].keys()

dict_keys(['resource', 'classification', 'metadata', 'permalink', 'link', 'owner', 'creator'])

In [30]:
# review information about resource key
resources = datasets[0]['resource'].items()

for key, value in resources:
    print(f'{key}: {value}\n')      

name: For Hire Vehicles (FHV) - Active

id: 8wbx-tsch

resource_name: None

parent_fxf: []

description: <b>PLEASE NOTE:</b> This dataset, which includes all TLC licensed for-hire vehicles which are in good standing and able to drive, is updated every day in the evening between 4-7pm. Please check the 'Last Update Date' field to make sure the list has updated successfully. 'Last Update Date'  should show either today or yesterday's date, depending on the time of day. If the list is outdated, please download the most recent list from the link below. 
http://www1.nyc.gov/assets/tlc/downloads/datasets/tlc_for_hire_vehicle_active_and_inactive.csv

TLC authorized For-Hire vehicles that are active. This list is accurate to the date and time represented in the Last Date Updated and Last Time Updated fields. For inquiries about the contents of this dataset, please email licensinginquiries@tlc.nyc.gov.

attribution: Taxi and Limousine Commission (TLC)

attribution_link: None

contact_email: 

In [31]:
# green infrastructure id on website URL
id_on_website = 'spjh-pz7h'

# get the list of datasets once
datasets = client.datasets()

# loop through the datasets to find the one with the matching identifier
for idx, dataset in enumerate(datasets):
    if dataset['resource']['id'] == id_on_website:
        print('We found the Green Infrastructure dataset!')
        print(f'Index is: {idx}')
        
        dataset_index = idx
        break

We found the Green Infrastructure dataset!
Index is: 196


In [32]:
data = datasets[dataset_index]['resource']
df = pd.DataFrame([data])

df.head()

Unnamed: 0,name,id,resource_name,parent_fxf,description,attribution,attribution_link,contact_email,type,updatedAt,...,columns_description,columns_format,download_count,provenance,lens_view_type,lens_display_type,locked,blob_mime_type,hide_from_data_json,publication_date
0,DEP Green Infrastructure,spjh-pz7h,,[],NYC Green Infrastructure Program initiatives. ...,Department of Environmental Protection (DEP),,,map,2024-11-06T14:58:19.000Z,...,[],[],27796,official,geo,map,False,application/zip,False,2017-08-31T20:33:51.000Z


In [33]:
# preview columns
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 27 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   name                 1 non-null      object
 1   id                   1 non-null      object
 2   resource_name        0 non-null      object
 3   parent_fxf           1 non-null      object
 4   description          1 non-null      object
 5   attribution          1 non-null      object
 6   attribution_link     0 non-null      object
 7   contact_email        0 non-null      object
 8   type                 1 non-null      object
 9   updatedAt            1 non-null      object
 10  createdAt            1 non-null      object
 11  metadata_updated_at  1 non-null      object
 12  data_updated_at      1 non-null      object
 13  page_views           1 non-null      object
 14  columns_name         1 non-null      object
 15  columns_field_name   1 non-null      object
 16  columns_data

In [34]:
# alternatively: normalize JSON, unnest values
df = pd.json_normalize(datasets[dataset_index]['resource'])

# sanity check
print(df.shape)
df.head()

(1, 32)


Unnamed: 0,name,id,resource_name,parent_fxf,description,attribution,attribution_link,contact_email,type,updatedAt,...,locked,blob_mime_type,hide_from_data_json,publication_date,page_views.page_views_last_week,page_views.page_views_last_month,page_views.page_views_total,page_views.page_views_last_week_log,page_views.page_views_last_month_log,page_views.page_views_total_log
0,DEP Green Infrastructure,spjh-pz7h,,[],NYC Green Infrastructure Program initiatives. ...,Department of Environmental Protection (DEP),,,map,2024-11-06T14:58:19.000Z,...,False,application/zip,False,2017-08-31T20:33:51.000Z,112,512,19241,6.820179,9.002815,14.231971


In [35]:
# preview columns
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 32 columns):
 #   Column                                Non-Null Count  Dtype  
---  ------                                --------------  -----  
 0   name                                  1 non-null      object 
 1   id                                    1 non-null      object 
 2   resource_name                         0 non-null      object 
 3   parent_fxf                            1 non-null      object 
 4   description                           1 non-null      object 
 5   attribution                           1 non-null      object 
 6   attribution_link                      0 non-null      object 
 7   contact_email                         0 non-null      object 
 8   type                                  1 non-null      object 
 9   updatedAt                             1 non-null      object 
 10  createdAt                             1 non-null      object 
 11  metadata_updated_at    

In [36]:
# preview values
df.T

Unnamed: 0,0
name,DEP Green Infrastructure
id,spjh-pz7h
resource_name,
parent_fxf,[]
description,NYC Green Infrastructure Program initiatives. ...
attribution,Department of Environmental Protection (DEP)
attribution_link,
contact_email,
type,map
updatedAt,2024-11-06T14:58:19.000Z


## 5.3  Metadata API
Socrata's [Metadata API](https://dev.socrata.com/docs/other/metadata#?route=overview).

In Sodapy, we use the `.get_metadata()` method: Retrieve the metadata for a particular dataset.

### Using Socrata API URL

In [37]:
# Metadata API with 311 dataset
# https://dev.socrata.com/docs/other/metadata#?route=overview

# all datasets on NYC Open Data, and expand nested values
url = 'https://data.cityofnewyork.us/api/views/'
df = pd.read_json(url)

# preview data
df.head()

Unnamed: 0,id,name,assetType,averageRating,category,createdAt,description,displayType,downloadCount,hideFromCatalog,...,blobId,blobMimeType,rowIdentifierColumnId,queryString,ratings,indexUpdatedAt,childViews,iconUrl,previewImageId,disabledFeatureFlags
0,6xyb-j5pk,NYC Address Points (Map),map,0,City Government,1732652029,Address points were developed to supplement th...,visualization_canvas_map,0,False,...,,,,,,,,,,
1,uf93-f8nk,NYC Address Points,dataset,0,City Government,1732641922,Address points were developed to supplement th...,table,28,False,...,,,,,,,,,,
2,b7aj-ck5a,NYC Greenhouse Gas Emissions Municipal Inventory,dataset,0,Environment,1732116009,The Inventory of New York City Greenhouse Gas ...,table,40,False,...,,,,,,,,,,
3,3g6p-4u5s,Building Footprints (Map),map,0,City Government,1731516162,Shapefile of footprint outlines of buildings i...,visualization_canvas_map,1,False,...,,,,,,,,,,
4,u9wf-3gbt,Building Footprints (P Layer),dataset,0,City Government,1731513726,Shapefile of footprint outlines of buildings i...,table,22,False,...,,,,,,,,,,


In [38]:
# preview columns
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3237 entries, 0 to 3236
Data columns (total 51 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   id                        3237 non-null   object 
 1   name                      3237 non-null   object 
 2   assetType                 3237 non-null   object 
 3   averageRating             3237 non-null   int64  
 4   category                  3137 non-null   object 
 5   createdAt                 3237 non-null   int64  
 6   description               3172 non-null   object 
 7   displayType               3237 non-null   object 
 8   downloadCount             3237 non-null   int64  
 9   hideFromCatalog           3237 non-null   bool   
 10  hideFromDataJson          3237 non-null   bool   
 11  locked                    3237 non-null   bool   
 12  modifyingViewUid          273 non-null    object 
 13  newBackend                3237 non-null   bool   
 14  numberOf

In [39]:
# example of Discovery API with the 311 dataset
# instead of expanding nested columns, let's keep only top-level columns
url = 'https://data.cityofnewyork.us/api/views/erm2-nwe9'

# fetch the JSON data from the web
response = requests.get(url)

# parse the JSON response
data_dict = response.json()  

# preview keys    
data_dict.keys()  

dict_keys(['id', 'name', 'assetType', 'attribution', 'averageRating', 'category', 'createdAt', 'description', 'displayType', 'downloadCount', 'hideFromCatalog', 'hideFromDataJson', 'locked', 'newBackend', 'numberOfComments', 'oid', 'provenance', 'publicationAppendEnabled', 'publicationDate', 'publicationGroup', 'publicationStage', 'rowClass', 'rowIdentifierColumnId', 'rowsUpdatedAt', 'rowsUpdatedBy', 'tableId', 'totalTimesRated', 'viewCount', 'viewLastModified', 'viewType', 'approvals', 'clientContext', 'columns', 'grants', 'metadata', 'owner', 'query', 'rights', 'tableAuthor', 'tags', 'flags'])

In [40]:
# convert to df
df = pd.DataFrame([data_dict])

df.head()

Unnamed: 0,id,name,assetType,attribution,averageRating,category,createdAt,description,displayType,downloadCount,...,clientContext,columns,grants,metadata,owner,query,rights,tableAuthor,tags,flags
0,erm2-nwe9,311 Service Requests from 2010 to Present,dataset,311,0,Social Services,1318225937,<b>NOTE:</b> The 311 dataset is currently show...,table,444317,...,"{'clientContextVariables': [], 'inheritedVaria...","[{'id': 585605889, 'name': 'Unique Key', 'data...","[{'inherited': False, 'type': 'viewer', 'flags...","{'rdfSubject': '0', 'rdfClass': '', 'jsonQuery...","{'id': '5fuc-pqz2', 'displayName': 'NYC OpenDa...","{'orderBys': [{'ascending': False, 'expression...",[read],"{'id': '5fuc-pqz2', 'displayName': 'NYC OpenDa...","[311, 311 service requests, city government, s...","[default, ownerMayBeContacted, restorable, res..."


In [41]:
# view columns, notice the decrease in number of columns compared to the previous example
# some values are still nested
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 41 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   id                        1 non-null      object
 1   name                      1 non-null      object
 2   assetType                 1 non-null      object
 3   attribution               1 non-null      object
 4   averageRating             1 non-null      int64 
 5   category                  1 non-null      object
 6   createdAt                 1 non-null      int64 
 7   description               1 non-null      object
 8   displayType               1 non-null      object
 9   downloadCount             1 non-null      int64 
 10  hideFromCatalog           1 non-null      bool  
 11  hideFromDataJson          1 non-null      bool  
 12  locked                    1 non-null      bool  
 13  newBackend                1 non-null      bool  
 14  numberOfComments          1 no

### Using Sodapy

In [42]:
# source domain for NYC Open Data on Socrata
socrata_domain = 'data.cityofnewyork.us'

# dataset id for NYC 311 on NYC Open Data on Socrata
socrata_dataset_identifier = 'erm2-nwe9'

# initialize client
client = Socrata(
    socrata_domain,
    app_token=None,
    timeout=100
)

# similar to: 'https://data.cityofnewyork.us/api/views/metadata/v1/'
metadata = client.get_metadata(socrata_dataset_identifier)

metadata.keys()



dict_keys(['id', 'name', 'assetType', 'attribution', 'averageRating', 'category', 'createdAt', 'description', 'displayType', 'downloadCount', 'hideFromCatalog', 'hideFromDataJson', 'locked', 'newBackend', 'numberOfComments', 'oid', 'provenance', 'publicationAppendEnabled', 'publicationDate', 'publicationGroup', 'publicationStage', 'rowClass', 'rowIdentifierColumnId', 'rowsUpdatedAt', 'rowsUpdatedBy', 'tableId', 'totalTimesRated', 'viewCount', 'viewLastModified', 'viewType', 'approvals', 'clientContext', 'columns', 'grants', 'metadata', 'owner', 'query', 'rights', 'tableAuthor', 'tags', 'flags'])

In [43]:
# convert to df
df = pd.DataFrame([metadata])

# preview data
df.head()

Unnamed: 0,id,name,assetType,attribution,averageRating,category,createdAt,description,displayType,downloadCount,...,clientContext,columns,grants,metadata,owner,query,rights,tableAuthor,tags,flags
0,erm2-nwe9,311 Service Requests from 2010 to Present,dataset,311,0,Social Services,1318225937,<b>NOTE:</b> The 311 dataset is currently show...,table,444317,...,"{'clientContextVariables': [], 'inheritedVaria...","[{'id': 585605889, 'name': 'Unique Key', 'data...","[{'inherited': False, 'type': 'viewer', 'flags...","{'rdfSubject': '0', 'rdfClass': '', 'jsonQuery...","{'id': '5fuc-pqz2', 'displayName': 'NYC OpenDa...","{'orderBys': [{'ascending': False, 'expression...",[read],"{'id': '5fuc-pqz2', 'displayName': 'NYC OpenDa...","[311, 311 service requests, city government, s...","[default, ownerMayBeContacted, restorable, res..."


In [44]:
# preview columns
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 41 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   id                        1 non-null      object
 1   name                      1 non-null      object
 2   assetType                 1 non-null      object
 3   attribution               1 non-null      object
 4   averageRating             1 non-null      int64 
 5   category                  1 non-null      object
 6   createdAt                 1 non-null      int64 
 7   description               1 non-null      object
 8   displayType               1 non-null      object
 9   downloadCount             1 non-null      int64 
 10  hideFromCatalog           1 non-null      bool  
 11  hideFromDataJson          1 non-null      bool  
 12  locked                    1 non-null      bool  
 13  newBackend                1 non-null      bool  
 14  numberOfComments          1 no

In [45]:
# example of the Metadata API
# https://dev.socrata.com/docs/other/metadata#?route=overview

# all datasets on NYC Open Data, notice the metadata in the URL
url = 'https://data.cityofnewyork.us/api/views/metadata/v1/'
df = pd.read_json(url)

# preview data
df.head()

Unnamed: 0,id,name,attribution,attributionLink,category,createdAt,dataUpdatedAt,dataUri,description,domain,...,hideFromCatalog,hideFromDataJson,license,metadataUpdatedAt,provenance,updatedAt,webUri,approvals,customFields,tags
0,6xyb-j5pk,NYC Address Points (Map),Office of Technology and Innovation (OTI),,City Government,2024-11-26T20:13:49+0000,2024-11-26T18:02:43+0000,https://data.cityofnewyork.us/resource/6xyb-j5pk,Address points were developed to supplement th...,data.cityofnewyork.us,...,False,False,,2024-11-26T20:14:56+0000,OFFICIAL,2024-11-26T20:14:56+0000,https://data.cityofnewyork.us/d/6xyb-j5pk,"[{'reviewedAt': 1732652072, 'reviewedAutomatic...","{'Update': {'Automation': 'No', 'Date Made Pub...",
1,uf93-f8nk,NYC Address Points,Office of Technology and Innovation (OTI),,City Government,2024-11-26T17:25:22+0000,2024-11-26T18:02:43+0000,https://data.cityofnewyork.us/resource/uf93-f8nk,Address points were developed to supplement th...,data.cityofnewyork.us,...,False,False,,2024-11-26T20:13:07+0000,OFFICIAL,2024-11-26T20:13:07+0000,https://data.cityofnewyork.us/d/uf93-f8nk,"[{'reviewedAt': 1732651869, 'reviewedAutomatic...","{'Update': {'Automation': 'No', 'Date Made Pub...",[address point]
2,b7aj-ck5a,NYC Greenhouse Gas Emissions Municipal Inventory,Mayor's Office of Climate and Environmental Ju...,https://climate.cityofnewyork.us/initiatives/n...,Environment,2024-11-20T15:20:09+0000,2024-11-20T15:48:44+0000,https://data.cityofnewyork.us/resource/b7aj-ck5a,The Inventory of New York City Greenhouse Gas ...,data.cityofnewyork.us,...,False,False,,2024-11-20T15:49:27+0000,OFFICIAL,2024-11-20T15:49:49+0000,https://data.cityofnewyork.us/d/b7aj-ck5a,"[{'reviewedAt': 1732117789, 'reviewedAutomatic...","{'Update': {'Automation': 'No', 'Date Made Pub...","[greenhouse gas emissions, greenhouse, fuel, e..."
3,3g6p-4u5s,Building Footprints (Map),Office of Technology and Innovation (OTI),,City Government,2024-11-13T16:42:42+0000,2024-11-26T17:51:19+0000,https://data.cityofnewyork.us/resource/3g6p-4u5s,Shapefile of footprint outlines of buildings i...,data.cityofnewyork.us,...,False,False,,2024-11-13T16:47:48+0000,OFFICIAL,2024-11-13T16:47:48+0000,https://data.cityofnewyork.us/d/3g6p-4u5s,"[{'reviewedAt': 1731516188, 'reviewedAutomatic...","{'Update': {'Automation': 'No', 'Date Made Pub...","[footprints, buildings]"
4,u9wf-3gbt,Building Footprints (P Layer),Office of Technology and Innovation (OTI),,City Government,2024-11-13T16:02:06+0000,2024-11-26T17:33:50+0000,https://data.cityofnewyork.us/resource/u9wf-3gbt,Shapefile of footprint outlines of buildings i...,data.cityofnewyork.us,...,False,False,,2024-11-26T17:32:07+0000,OFFICIAL,2024-11-26T17:32:07+0000,https://data.cityofnewyork.us/d/u9wf-3gbt,"[{'reviewedAt': 1731516092, 'reviewedAutomatic...","{'Update': {'Automation': 'No', 'Date Made Pub...","[building, footprint, footprints]"


In [46]:
# preview columns
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3237 entries, 0 to 3236
Data columns (total 21 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   id                 3237 non-null   object 
 1   name               3237 non-null   object 
 2   attribution        3170 non-null   object 
 3   attributionLink    449 non-null    object 
 4   category           3137 non-null   object 
 5   createdAt          3237 non-null   object 
 6   dataUpdatedAt      3060 non-null   object 
 7   dataUri            3237 non-null   object 
 8   description        3172 non-null   object 
 9   domain             3237 non-null   object 
 10  externalId         0 non-null      float64
 11  hideFromCatalog    3237 non-null   bool   
 12  hideFromDataJson   3237 non-null   bool   
 13  license            79 non-null     object 
 14  metadataUpdatedAt  3237 non-null   object 
 15  provenance         3237 non-null   object 
 16  updatedAt          3237 

In [47]:
# example of Metadata API for 311 dataset
url = 'https://data.cityofnewyork.us/api/views/metadata/v1/erm2-nwe9/'

# fetch the JSON data from the web
response = requests.get(url)

# parse the JSON response
data_dict = response.json()

# convert to a df
df = pd.DataFrame([data_dict])

# preview data
df.head()

Unnamed: 0,id,name,attribution,attributionLink,category,createdAt,dataUpdatedAt,dataUri,description,domain,...,hideFromCatalog,hideFromDataJson,license,metadataUpdatedAt,provenance,updatedAt,webUri,approvals,customFields,tags
0,erm2-nwe9,311 Service Requests from 2010 to Present,311,,Social Services,2011-10-10T05:52:17+0000,2024-12-03T02:35:30+0000,https://data.cityofnewyork.us/resource/erm2-nwe9,<b>NOTE:</b> The 311 dataset is currently show...,data.cityofnewyork.us,...,False,False,,2024-05-28T20:32:16+0000,OFFICIAL,2024-05-28T20:32:16+0000,https://data.cityofnewyork.us/d/erm2-nwe9,"[{'reviewedAt': 1524193398, 'reviewedAutomatic...","{'Update': {'Automation': 'Yes', 'Date Made Pu...","[311, 311 service requests, city government, s..."


In [48]:
# preview columns
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 21 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   id                 1 non-null      object
 1   name               1 non-null      object
 2   attribution        1 non-null      object
 3   attributionLink    0 non-null      object
 4   category           1 non-null      object
 5   createdAt          1 non-null      object
 6   dataUpdatedAt      1 non-null      object
 7   dataUri            1 non-null      object
 8   description        1 non-null      object
 9   domain             1 non-null      object
 10  externalId         0 non-null      object
 11  hideFromCatalog    1 non-null      bool  
 12  hideFromDataJson   1 non-null      bool  
 13  license            0 non-null      object
 14  metadataUpdatedAt  1 non-null      object
 15  provenance         1 non-null      object
 16  updatedAt          1 non-null      object
 17  w

In [49]:
# close connection
client.close()