# Inspecting KOMU's Files

This notebook is includes my initial attempts at using the FCC's [Office of Public Inspection Files APIs](https://publicfiles.fcc.gov/developer/) (OPIF). My goal is to figure out how to access political files for the 2015 election cycle.

These APIs are open to the public (no authentication required), and are found under a common url path.

In [1]:
api_url = "https://publicfiles.fcc.gov/api/"

Responses can be formatted in either json or xml. I'll use json.

## Dependencies


In [2]:
import requests
import requests_cache
from time import sleep

## Set up caching

In order to limit my requests to the FCC's servers, I'll cache any responses I receive. We'll use [request-cache's](https://requests-cache.readthedocs.io/en/latest/) default storage backend, which is a local SQLite db, and set it to expire every 12 hours.

In [3]:
requests_cache.install_cache(expire_after=43200)

## KOMU's Details

For starters, I'm going to focus on KOMU-TV, which is located here at the University of Missouri.

The OPIF Service Data API has a Facility Search endpoint. This endpoint accepts a single keyword parameter.

In [4]:
facility_search_endpoint = api_url + "service/facility/search/{keyword}.json"

In [5]:
r = requests.get(
    facility_search_endpoint.format(keyword='komu')
)

The results we're looking for are a couple of layers down. They include both counts and lists of entity type's found.

In [6]:
r.json()['results']['globalSearchResults']

{'tvResultsCount': 1,
 'fmResultsCount': 0,
 'amResultsCount': 0,
 'cableResultsCount': 0,
 'cableZipCodeResultsCount': 0,
 'sdarsResultsCount': 0,
 'dbsResultsCount': 0,
 'tvFacilityList': [{'id': '65583',
   'callSign': 'KOMU-TV',
   'service': 'Digital TV',
   'rfChannel': '8',
   'virtualChannel': '8',
   'licenseExpirationDate': '02/01/2022',
   'statusDate': '06/16/2009',
   'status': 'LICENSED',
   'communityCity': 'COLUMBIA',
   'communityState': 'MO',
   'facilityType': 'CDT',
   'frequency': '180.0',
   'activeInd': 'Y',
   'scannedLetterIds': '',
   'partyName': 'THE CURATORS OF THE UNIVERSITY OF MISSOURI',
   'partyAddress1': '1105 CARRIE FRANCKE DRIVE',
   'partyAddress2': '',
   'partyCity': 'COLUMBIA',
   'partyZip1': '65211',
   'partyZip2': '',
   'partyState': 'MO',
   'partyPhone': '(573)882-5768',
   'nielsenDma': 'COLUMBIA-JEFFERSON CITY',
   'networkAfil': 'NBC',
   'band': 'High V',
   'authAppId': '1316807'}],
 'fmFacilityList': [],
 'amFacilityList': [],
 'cabl

In [7]:
komu = r.json()['results']['globalSearchResults']['tvFacilityList'][0]

An important detail we'll need to hold onto is KOMU-TV's `entity_id`.

In [8]:
komu_id = komu['id']

In [9]:
komu_id

'65583'

## KOMU's Folders

Each facility has a set of folders that group together similar files. This is similar to the file directory structure you find on any computer's operating system. In order to interact with the public inspection files, the FCC provides a OPIF Manager API.

We can get all the folders in KOMU-TV's root directory by calling the `parentFolders` endpoint with the `entityId` and `sourceService` parameters.

In [10]:
parent_folders_endpoint = api_url + 'manager/folder/parentFolders.json'

In [11]:
payload = {'entityId': komu_id, 'sourceService': 'tv'}

In [12]:
r = requests.get(parent_folders_endpoint, params=payload)

Let's take a closer look at KOMU's folders.

In [13]:
for f in r.json()['folders']:
    print(f['folder_name'])
    print(f['entity_folder_id'])
    print('------------')

Applications and Related Materials
5289ec26-db91-3a1d-5513-867712eacce5
------------
Basic Info
d5d74fdb-4f3a-28b4-93e0-5ce82db90757
------------
Childrens TV Programming Reports
1ce1d408-cf73-44c5-bd86-684d4f2b0758
------------
Contour Maps
0dcd08f2-800e-84e9-be74-6da952e01f71
------------
Equal Employment Opportunity Records
b3878ebc-d31b-0eef-11dc-4ebc339f3498
------------
FCC Authorizations
03fd3934-5b1f-6511-1690-657cd398e253
------------
Ownership Reports
41c80621-9a6c-279d-49c8-23beafd16ebe
------------
Political Files
64651c6c-e5ae-a929-b58e-31be2ba0d26b
------------


We can also just go straight to getting the "Political Files" folder info by calling the `folder/path` endpoint.

In [14]:
folder_path_endpoint = api_url + 'manager/folder/path.json'

In [15]:
payload = {
    'folderPath': 'Political Files',
    'entityId': komu_id,
    'sourceService': 'tv'
}

In [16]:
r = requests.get(folder_path_endpoint, params=payload)

In [17]:
r.json()

{'status': 'success',
 'statusCode': 200,
 'statusMessage': 'Folder found.',
 'folder': [{'entity_folder_id': '64651c6c-e5ae-a929-b58e-31be2ba0d26b',
   'entity_id': '65583',
   'folder_name': 'Political Files',
   'folder_path': 'Political Files',
   'allow_rename_ind': 'N',
   'allow_subfolder_ind': 'N',
   'allow_upload_ind': 'N',
   'allow_delete_ind': 'N',
   'more_public_files_ind': 'N',
   'parent_folder_id': None,
   'file_count': None,
   'create_ts': '2016-02-29T14:51:41-05:00',
   'last_update_ts': '2016-02-29T14:51:41-05:00'}]}

Let's hold onto the id for the political folder.

In [18]:
komu_political_folder_id = r.json()['folder'][0]['entity_folder_id']

In [19]:
komu_political_folder_id

'64651c6c-e5ae-a929-b58e-31be2ba0d26b'

## KOMU's Political Files

We can get the contents of this folder using the `folder` endpoint, which requires both the `folderId` and the `entityId`.

In [20]:
folder_endpoint = api_url + 'manager/folder/id/{folderId}.json'

In [21]:
payload = {'entityId': komu_id}

In [22]:
r = requests.get(
    folder_endpoint.format(folderId=komu_political_folder_id),
    params=payload
)

Any folder has both subfolders:

In [23]:
subfolders = r.json()['folder']['subfolders']

And files:

In [24]:
files = r.json()['folder']['files']

Both of these are lists, which might be empty.

In [25]:
len(subfolders)

9

In [26]:
len(files)

0

Let's take a closer look at these subfolders.

In [27]:
for sf in subfolders:
    print(sf['folder_path'])
    print(sf['entity_folder_id'])
    print('file_count: {file_count}'.format(**sf))
    print('-----------------------------------------------')

Political Files/2012
a53428b0-31a8-4e9c-2704-6c74d8d0298b
file_count: 0
-----------------------------------------------
Political Files/2013
351c1159-7157-4def-b654-633265f6b90f
file_count: 0
-----------------------------------------------
Political Files/2014
be0022c5-b604-2769-1fd2-adc62db7d660
file_count: 83
-----------------------------------------------
Political Files/2015
8529294f-b895-ca7a-31a4-b792d30c40c2
file_count: 11
-----------------------------------------------
Political Files/2016
00897497-f06a-b4b4-c1b4-df630e88a3d8
file_count: 373
-----------------------------------------------
Political Files/2017
55aaa1b7-7049-2ffd-85be-381bd27cd00b
file_count: 30
-----------------------------------------------
Political Files/2018
c52ea73c-7e18-d81b-8baa-fcd9f46fff7b
file_count: 249
-----------------------------------------------
Political Files/2019
b662c0a3-0dcb-cbc9-4e61-67144f7c5812
file_count: 2
-----------------------------------------------
Political Files/2020
9e5c8718-ef4

Now let's take a closer look at the contents of one of these folders: `Political Files/2016`.

In [28]:
r = requests.get(
    folder_endpoint.format(folderId="00897497-f06a-b4b4-c1b4-df630e88a3d8"),
    params=payload
)

In [29]:
komu_2016_political_folder = r.json()['folder']

Looks like it doesn't have any files.

In [30]:
len(komu_2016_political_folder['files'])

0

Just more subfolders.

In [31]:
len(komu_2016_political_folder['subfolders'])

5

In [32]:
for sf in komu_2016_political_folder['subfolders']:
    print(sf['folder_path'])
    print(sf['entity_folder_id'])
    print('file_count: {file_count}'.format(**sf))
    print('-----------------------------------------------')

Political Files/2016/Federal
e5ecf909-3252-147d-962d-1773e7344aa0
file_count: 48
-----------------------------------------------
Political Files/2016/Local
2f814da8-e1c6-e981-059f-c270ea4c2835
file_count: 23
-----------------------------------------------
Political Files/2016/Non-Candidate Issue Ads
52a3207f-661d-63e8-3289-3bac17b7ed50
file_count: 120
-----------------------------------------------
Political Files/2016/State
40d16c28-845b-9278-c3c4-5bf715f5a4f8
file_count: 182
-----------------------------------------------
Political Files/2016/Terms and Disclosures
919a0c82-c792-bef6-5285-640458ebc916
file_count: 0
-----------------------------------------------


So now let's look in the `Political Files/2016/Federal`.

In [33]:
r = requests.get(
    folder_endpoint.format(folderId="e5ecf909-3252-147d-962d-1773e7344aa0"),
    params=payload
)

In [34]:
komu_2016_federal_political_folder = r.json()['folder']

Again, no files.

In [35]:
len(komu_2016_federal_political_folder['files'])

0

Just more sub-folders.

In [36]:
len(komu_2016_federal_political_folder['subfolders'])

3

In [37]:
for sf in komu_2016_federal_political_folder['subfolders']:
    print(sf['folder_path'])
    print(sf['entity_folder_id'])
    print('file_count: {file_count}'.format(**sf))
    print('-----------------------------------------------')

Political Files/2016/Federal/President
119a0bad-c466-acd1-6965-2d07431a827f
file_count: 17
-----------------------------------------------
Political Files/2016/Federal/US House
57933127-b03c-bca2-5ed5-bfaefb339541
file_count: 0
-----------------------------------------------
Political Files/2016/Federal/US Senate
73eb960e-2665-5aa3-95c4-12333931eebd
file_count: 31
-----------------------------------------------


So let's go another level down to `Political Files/2016/Federal/President`.

In [38]:
r = requests.get(
    folder_endpoint.format(folderId="119a0bad-c466-acd1-6965-2d07431a827f"),
    params=payload
)

In [39]:
komu_2016_presidential_folder = r.json()['folder']

Again, no files.

In [40]:
len(komu_2016_presidential_folder['files'])

0

Just more subfolders.

In [41]:
len(komu_2016_presidential_folder['subfolders'])

4

In [42]:
for sf in komu_2016_presidential_folder['subfolders']:
    print(sf['folder_path'])
    print(sf['entity_folder_id'])
    print('-----------------------------------------------')

Political Files/2016/Federal/President/Bernie Sanders
ee229891-bf04-09bd-6ff5-9239fc974441
-----------------------------------------------
Political Files/2016/Federal/President/DONALD TRUMP
9dc7bceb-dd9a-d9b7-8269-c09d0811cda0
-----------------------------------------------
Political Files/2016/Federal/President/HILLARY FOR AMERICA
148f8d5c-af0a-c02e-41a7-2664f7fa3e6e
-----------------------------------------------
Political Files/2016/Federal/President/TED CRUZ 2016
08a1a421-2608-ed95-405e-b8ff310b61be
-----------------------------------------------


Let's see what's in the 2016 Donald Trump folder.

In [43]:
r = requests.get(
    folder_endpoint.format(folderId="9dc7bceb-dd9a-d9b7-8269-c09d0811cda0"),
    params=payload
)

In [44]:
komu_2016_trump_folder = r.json()['folder']

In [45]:
len(komu_2016_trump_folder['files'])

3

In [46]:
len(komu_2016_trump_folder['subfolders'])

0

Three files and no subfolders.

We can download files using the OPIF File download endpoint. Here's the url format for this endpoint:

In [47]:
download_url = api_url + 'manager/download/{folder_id}/{file_manager_id}.pdf'

In [48]:
for f in komu_2016_trump_folder['files']:
    print('''-------------------------
file_id: {file_id}
file_name: {file_name}.{file_extension}'''.format(**f)
    )
    print(download_url.format(**f))

-------------------------
file_id: 3576c738-57bc-cabf-6671-4651a4a4e76d
file_name: 38329--1.pdf
https://publicfiles.fcc.gov/api/manager/download/9dc7bceb-dd9a-d9b7-8269-c09d0811cda0/8c68fcc4-4eed-5abf-813e-f87eb123740c.pdf
-------------------------
file_id: 2078cd14-0c72-bf68-6ab3-bcb2edd9aad3
file_name: 38330--1.pdf
https://publicfiles.fcc.gov/api/manager/download/9dc7bceb-dd9a-d9b7-8269-c09d0811cda0/e4f42749-f453-4122-0253-c0afd50c199b.pdf
-------------------------
file_id: c71cb9ce-7643-056d-5b5f-54fd7f2970c1
file_name: NAB FORM.pdf
https://publicfiles.fcc.gov/api/manager/download/9dc7bceb-dd9a-d9b7-8269-c09d0811cda0/cb412313-7757-b626-46bf-8041fc7811ae.pdf


The first two files are contracts between KOMU-TV and Strategic Media Services, on behalf of the adverstiser Donald Trump.

The third file is a National Association of Broadcasters (NAB) Political Advertising Agreement Form (PB-18). A blank version of this form is available [here](https://gab.org/wp-content/uploads/2016/06/pb18-form-final-c1.pdf).

## Getting all the service providers in Missouri

We can get all U.S. service providers in by type from the `facility/getall` endpoint of the OPIF Service Data API

In [49]:
facility_getall_endpoint = api_url + 'service/{serviceType}/facility/getall.json'

In [50]:
r = requests.get(facility_getall_endpoint.format(serviceType='tv'))

In [51]:
r.json()['message']

'2201 Facilities Returned'

In [52]:
r.json()['results']['facilityList']

[{'id': '-12', 'callSign': 'TV123', 'frequency': '1000.0', 'activeInd': 'N'},
 {'id': '146', 'callSign': 'WXIN', 'frequency': '656.0', 'activeInd': 'Y'},
 {'id': '148', 'callSign': 'KAKW-DT', 'frequency': '210.0', 'activeInd': 'Y'},
 {'id': '414', 'callSign': 'WXLV-TV', 'frequency': '560.0', 'activeInd': 'Y'},
 {'id': '710', 'callSign': 'WGIQ', 'frequency': '566.0', 'activeInd': 'Y'},
 {'id': '714', 'callSign': 'WDIQ', 'frequency': '192.0', 'activeInd': 'Y'},
 {'id': '1151', 'callSign': 'KAZQ', 'frequency': '488.0', 'activeInd': 'Y'},
 {'id': '2506', 'callSign': 'KAPP', 'frequency': '470.0', 'activeInd': 'Y'},
 {'id': '2770', 'callSign': 'KETS', 'frequency': '174.0', 'activeInd': 'Y'},
 {'id': '9629', 'callSign': 'WCCO-TV', 'frequency': '578.0', 'activeInd': 'Y'},
 {'id': '9913', 'callSign': 'WCMW', 'frequency': '506.0', 'activeInd': 'Y'},
 {'id': '9939', 'callSign': 'WOCB-CD', 'frequency': '620.0', 'activeInd': 'Y'},
 {'id': '417', 'callSign': 'WVAH-TV', 'frequency': '530.0', 'activeI

Sadly, the response doesn't include any location info for each station.

## Glossary

**OPIF:** Office of Public Inspection Files

**SDARS Data:** Satellite Digital Audio Radio Service

**DBS Data:** Direct Broadcast Satellite

**frn:** FCC Registration Number

## Questions

1. Does every TV station, radio station, or other service provide follow the same directory structure?
2. Are there other important types of files to collect (besides the contract agreements and the NAB forms)?
3. How similar are the contract agreements upload from other stations?