# Prepare NEH grant products data

<b>Author:</b> Jaren Haber, PhD </br>
<b>Date</b>: September 16, 2023 </br>
<b>Description</b>: This notebook imports NEH grant product data from the web and combines the different products into a single, large DataFrame. <br/>

## Initialize

In [1]:
# Import packages
import pandas as pd
from os.path import join

# Import local function(s)
from utils import get_unzip

## Import data from web

In [2]:
# Define URL of zipped file
data_url = 'https://apps.neh.gov/open/data/NEH_GrantProducts.zip'
fpath = '../data'

# Download zipped file and extract
get_unzip(data_url, fpath)

## Combine data for all product types

In [3]:
# Define file suffixes for grant product types
product_types = [
    'Collections', 
    'Articles', 
    'BlogPosts', 
    'Books', 
    'BookSections', 
    'Buildings', 
    'Catalogs', 
    'Centers', 
    'ComputerPrograms', 
    'ConferencePresentations', 
    'ConferenceInstituteSeminars', 
    'CourseMaterials', 
    'DatabaseEditions', 
    'Equipment', 
    'Exhibitions', 
    'FilmBroadcasts', 
    'Games', 
    'OpenAccessItems', 
    'Presentations', 
    'RadioBroadcasts', 
    'Reports', 
    'Scripts', 
    'Positions', 
    'WebResources'
]

In [4]:
## Load each product type and combine into one large DataFrame
dfs = [] # Create empty list for grant product DataFrames

# Loop over product types and add to long list of DFs
for product in product_types: 
    product_df = pd.read_xml(join('../data/NEH_GrantProducts', f'NEH_{product}.xml'))
    product_df['ProductType'] = product
    dfs.append(product_df)
    
df = pd.concat(dfs) # Combine DFs

# Inspect data
print("Count of rows, columns:", df.shape)
print(f"These are the {str(len(df.columns))} column names:")
print(", ".join([col for col in df.columns]))
print()
df.sample(10).iloc[:,:20] # Show first 20 columns of 10 random rows

Count of rows, columns: (22609, 44)
These are the 44 column names:
ID, ApplicationNumber, Abstract, Address, Director, Name, PrimaryURL, PrimaryURLDescription, SecondaryURL, SecondaryURLDescription, Year, ProductType, AccessModel, Author, Format, PeriodicalTitle, Publisher, Title, BlogTitle, Date, Website, Editor, ISBN, Translator, Type, BookTitle, CatalogType, ProgrammingLanguage_Platform, SourceAvailable, ConferenceName, DateRange, Location, Audience, Description, Curator, Producer, Writer, PublicationType, URL3, URL3Description, URL4, URL4Description, URL5, URL5Description



Unnamed: 0,ID,ApplicationNumber,Abstract,Address,Director,Name,PrimaryURL,PrimaryURLDescription,SecondaryURL,SecondaryURLDescription,Year,ProductType,AccessModel,Author,Format,PeriodicalTitle,Publisher,Title,BlogTitle,Date
1265,6814,FA-28562-89,,,,,https://www.worldcat.org/search?q=9780691047904,WorldCat entry,,,1991.0,Books,,"Grob, Gerald N.",,,Princeton: Princeton University Press,From Asylum to Community: Mental Health Policy...,,
1925,1868,CH-50421-07,This book shows how recent work in cognitive s...,,,,,,,,2002.0,Books,,Lawrence Zbikowski,,,Oxford University Press,"Conceptualizing Music: Cognitive Structure, Th...",,
480,23354,FT-264906-19,"invited lecture in course on ""Mujeres y letras...",,,,http://http://www.uimp.es/agenda-link.html?id ...,webpage of the Universidad Internacional Menén...,,,,ConferenceInstituteSeminars,,"Jaffe, Catherine M.",,,,The Women Writers of the Junta de Damas of the...,,
152,19205,RZ-230579-15,Public presentation in Oxford's Classical Arch...,,,,https://www.classics.ox.ac.uk/sites/default/fi...,,,,,Presentations,,B. D. Wescoat,,,,Shaping and Negotiating Sacred Terrain in the ...,,10/29/2018
210,19628,AKA-270241-20,Learning outcomes and course modules for Bodie...,,,,http://sarahdparrish.squarespace.com/new-page-5,Applied Visual Literacy website page for mater...,,,2020.0,CourseMaterials,,"John Christ, Suzanne Gaulocher",,,,Bodies of Art Course Materials,,
5735,516,CH-50421-07,"THELONIOUS MONK is the critically acclaimed, g...",,,,http://books.simonandschuster.com/Thelonious-M...,Publisher web site,,,2009.0,Books,,Robin D. G. Kelley,,,Simon &amp; Schuster,Thelonius Monk: The Life and Times of an Ameri...,,
3922,8791,FT-51304-03,,,,,https://www.worldcat.org/search?q=9780852556283,WorldCat entry,,,1997.0,Books,,"Scully, Pamela",,,"Portsmouth, NH: Heinemann",Liberating the Family? Gender and British Slav...,,
455,23846,ZRE-283698-22,Announcement on updates of the First American ...,,,,https://filsonhistorical.org/wp-content/upload...,"The Filson newsmagazine Volume 22, Number 4",,,2022.0,Articles,,Patrick Lewis,Journal,The Filson,The Filson Historical Society,National Endowment for the Humanities Project ...,,
1042,23549,ZPP-283625-22,Boston Public Library staff share their tips f...,,,,https://programminglibrarian.org/articles/brea...,Breaking It Down blog on Programming Librarian,,,,BlogPosts,,Hannah Arata,,,,Breaking It Down: Logistics of a Hybrid Program,,2022-06-23
2773,7923,EH-22282-00,,,,,https://www.worldcat.org/search?q=9780852550403,WorldCat entry,,,1988.0,Books,,"Miller, Joseph C.",,,Madison: University of Wisconsin Press,Way of Death: Merchant Capitalism and the Ango...,,


## Save combined data to disk

In [5]:
df.to_xml(join('../data/', 'NEH_GrantProducts_Combined.xml'))