# Scrape Earmarks PDFs From Appropriations Site

Site with all 2022 appropriations requests can be found here: https://appropriations.house.gov/transparency/fiscal-year-2022

This data was collected by the Bipartisan Policy Center's Congress Project. The script was written by Rachel Orey, Senior Policy Analyst at BPC.

In [43]:
import ctypes
from ctypes.util import find_library
find_library("".join(("gsdll", str(ctypes.sizeof(ctypes.c_voidp) * 8), ".dll")))

import camelot
import pandas as pd

def clean(string):
    import re
    if isinstance(string, str):
        string = string.replace("\\n","")    
        string = string.replace("\\xa0","")
        string = string.replace("â€“22B"," ")
        string = string.replace("â€”"," ")
        string = string.replace("'","")
        string = string.replace("\"","")
        string = re.sub('\s+',' ',string)
        string = string.replace('$',"")
        string = string.strip()
        string = string.replace("- ","")
    return(string)

In [2]:
links = [["Agriculture, Rural Development, Food and Drug Administration, and Related Agencies","https://appropriations.house.gov/sites/democrats.appropriations.house.gov/files/AG_CDS_002.pdf"],
    ["Commerce, Justice, Science, and Related Agencies","https://appropriations.house.gov/sites/democrats.appropriations.house.gov/files/CJS_CDS_V6.pdf"],
    ["Defense","https://appropriations.house.gov/sites/democrats.appropriations.house.gov/files/Defense_CDS.pdf"],
    ["Energy and Water Development, and Related Agencies","https://appropriations.house.gov/sites/democrats.appropriations.house.gov/files/EW_CDSV5.pdf"],
    ["Financial Services and General Government","https://appropriations.house.gov/sites/democrats.appropriations.house.gov/files/FSGG_CDS_3-4-327PM.pdf"],
    ["Homeland Security","https://appropriations.house.gov/sites/democrats.appropriations.house.gov/files/HOMELAND_CDS.PDF"],
    ["Interior, Environment, and Related Agencies","https://appropriations.house.gov/sites/democrats.appropriations.house.gov/files/INT_CDS_V4.PDF"],
    ["Labor, Health and Human Services, Education, and Related Agencies","https://appropriations.house.gov/sites/democrats.appropriations.house.gov/files/LHHS_CDS_V3.PDF"],
    ["Military Construction, Veterans Affairs, and Related Agencies","https://appropriations.house.gov/sites/democrats.appropriations.house.gov/files/MilCon_CDS.pdf"],
    ["Transportation, and Housing and Urban Development, and Related Agencies","https://appropriations.house.gov/sites/democrats.appropriations.house.gov/files/THUD_CDS_V5.PDF"]]

links = pd.DataFrame(links,columns=["Category","Link"])

### Prep for Parsing Data

While largely similar, some of the nine pdf tables from the approps website have different columns than the others. The below section seperates the categories according to which columns the "requestors" are in. (Other columns are just read from headers).

In [3]:
#seperate categories based on where the requestors columns are
five = ["Defense"]
fivesix = ["Financial Services and General Government","Interior, Environment, and Related Agencies","Military Construction, Veterans Affairs, and Related Agencies"]
threefour = ["Labor, Health and Human Services, Education, and Related Agencies"]
transp = ["Transportation, and Housing and Urban Development, and Related Agencies"]
cats = fivesix+threefour+transp

### Function to Scrape Data from PDFs

Depending on which category the PDF is, the scraper will collect the data and parse according to which columns are included and in which order.

In [4]:
def getdata(url,category):
    
    abc = camelot.read_pdf(url,pages='all')   #address of file location

    #adjust criteria based on first two columns
    if category != 'Military Construction, Veterans Affairs, and Related Agencies': #All the categories BUT military start with Agency and Account. This string is used to remove the header row from the dataset when it is included in subsequent pages. 
        columnteststrg = 'AgencyAccount'
    else:
        columnteststrg = 'AgencyState' #Military starts with Agency & State, so that string is used to remove header row

    #set index based on where the two Requestor columns are
    if category not in cats:
        indexone = 6
        indextwo = 7
    elif category in fivesix:
        indexone = 5
        indextwo = 6
    elif category in threefour:
        indexone = 3
        indextwo = 4

    if (category not in five) and (category not in transp):

        results = abc[0].df.copy()
        results.columns = results.iloc[0]
        results.columns.values[indexone] = "Requestors: House"
        results.columns.values[indextwo] = "Requestors: Senate"
        results.drop([0,1],inplace=True)

        for table in range(1,len(abc)):
            try:
                if columnteststrg in ''.join(abc[table].df.iloc[0]):
                    res = abc[table].df.copy()
                    res.columns = res.iloc[0]
                    res.columns.values[indexone] = "Requestors: House"
                    res.columns.values[indextwo] = "Requestors: Senate"
                    res.drop([0,1],inplace=True)
                    results = results.append(res)
                else:
                    res = abc[table].df.copy()
                    res.columns = results.columns.values
                    results = results.append(res)
            except:
                print(table)

    elif category in five: #just has one requestor column

        results = abc[0].df.copy()
        results.columns = results.iloc[0]
        results.drop([0],inplace=True)

        for table in range(1,len(abc)):
            if columnteststrg in ''.join(abc[table].df.iloc[0]):
                res = abc[table].df.copy()
                res.columns = res.iloc[0]
                res.drop([0],inplace=True)
                results = results.append(res)
            else:
                res = abc[table].df.copy()
                res.columns = results.columns.values
                results = results.append(res)
                
    elif category in transp: ##transp had variations within the sheet on columns, required special treatment

        for table in range(0,len(abc)):

            try: 
                if columnteststrg in ''.join(abc[table].df.iloc[0]):
                    res = abc[table].df.copy()
                    res.columns = res.iloc[0]
                    res.columns.values[6] = "Requestors: House"
                    res.columns.values[7] = "Requestors: Senate"
                    res.drop([0,1],inplace=True)
                    if table == 0:
                        results = res
                    else:
                        results = results.append(res)
                else:
                    res = abc[table].df.copy()
                    res.columns = results.columns.values
                    results = results.append(res)

            except:
                try: 
                    if columnteststrg in ''.join(abc[table].df.iloc[0]):
                        res = abc[table].df.copy()
                        res.columns = res.iloc[0]
                        res.columns.values[6] = "Requestors: House"
                        res.columns.values[7] = "Requestors: Senate"
                        res.drop([0,1],inplace=True)
                        if table == 0:
                            results = res
                        else:
                            results = results.append(res)
                    else:
                        res = abc[table].df.copy()
                        res.columns = ["Agency","Account","Project","State","Amount","Requestors: House","Requestors: Senate","Origination"]
                        results = results.append(res)
                except:
                    try:
                        if columnteststrg in ''.join(abc[table].df.iloc[0]):
                            res = abc[table].df.copy()
                            res.columns = res.iloc[0]
                            res.columns.values[6] = "Requestors: House"
                            res.columns.values[7] = "Requestors: Senate"
                            res.drop([0,1],inplace=True)
                            if table == 0:
                                results = res
                            else:
                                results = results.append(res)
                        else:
                            res = abc[table].df.copy()
                            res.columns = ["Agency","Account","Project","Recipient","State","Amount","Requestors: House","Requestors: Senate","Origination"]
                            results = results.append(res)
                    except:
                        display(table)
                        display(abc[table].df)

    results.reset_index(inplace=True,drop=True)
    
    try:
        #data cleaning
        results.columns = [clean(c) for c in results.columns]
        for c in results.columns:
            results[c] = results[c].map(lambda x: clean(x))
    
    except:
        print("failure cleaning",category)
        
    #display, return, save results
    results.to_csv("C:\\Users\\rorey\\OneDrive - Bipartisan Policy Center\\Congress\\Earmarks Data\\"+str(category)+".csv")
    display(results)
    return(results)


In [5]:
#for each link in the prefilled table with categories and links to pdf, scrape data with above function
for index,row in links.iterrows():
    results = getdata(row["Link"],row["Category"])

Unnamed: 0,Agency,Account,Project,Recipient,Location,Amount Provided,Requesters: House,Requesters: Senate,Origination
0,Animal and Plant Health Inspection Service,APHIS S&E,Statewide Pest Surveys,Alaska Division of Agriculture,AK,100000,,Murkowski,S
1,Animal and Plant Health Inspection Service,APHIS S&E,Feral Swine Management,Arkansas Department of Agriculture,AR,650000,,Boozman,S
2,Animal and Plant Health Inspection Service,APHIS S&E,Invasive Species Surveillance,Hawaii Department of Land and Natural Resources,HI,600000,,Hirono; Schatz,S
3,Animal and Plant Health Inspection Service,APHIS S&E,Kula Agricultural Fencing,Maui Office of Economic Development,HI,600000,,Schatz,S
4,Animal and Plant Health Inspection Service,APHIS S&E,O’Hare Federal Inspection Station,City of Chicago,IL,250000,,Durbin,S
...,...,...,...,...,...,...,...,...,...
234,Natural Resources Conservation Service,Watershed and Flood Prevention Operations,Mississippi Flood mitigation,Mississippi Watershed Operations,MS,8400000,,Hyde-Smith,S
235,Natural Resources Conservation Service,Watershed and Flood Prevention Operations,East Fork Irrigation Modernization,East Fork Irrigation District,OR,2500000,,Merkley; Wyden,S
236,Natural Resources Conservation Service,Watershed and Flood Prevention Operations,Ochoco Irrigation District Watershed Projects,Ochoco Irrigation District,OR,4875000,,Merkley; Wyden,S
237,Natural Resources Conservation Service,Watershed and Flood Prevention Operations,Wallowa Lake Dam Rehabilitation Project,Wallowa Lake Irrigation District,OR,2000000,,Merkley; Wyden,S


Unnamed: 0,Agency,Account,Recipient,Location,Project,Amount,Requesters: House,Requesters: Senate,Origination
0,DOC,NIST—Construction,Missouri State University,"Springfield, MO",Ozarks Health and Life Science Center,20000000,,Blunt,S
1,DOC,NIST—Construction,University of Maine,"Orono, ME",Green Engineering and Materials Research Facto...,10000000,,Collins,S
2,DOC,NIST—Construction,Burlington Technical Center,"Burlington, VT",Burlington Aviation Technology Center Facility,10000000,,Leahy,S
3,DOC,NIST—Construction,Fort Hays State University,"Hays, KS",Renovation of Forsyth Library,17000000,,Moran,S
4,DOC,NIST—Construction,Kansas State University Salina Aerospace and T...,"Salina, KS",Acquisition and Renovation of Aerospace Simula...,4750000,,Moran,S
...,...,...,...,...,...,...,...,...,...
491,NASA,SSMS,Springfield Museums Corporation,"Springfield, MA",Springfield Science Museum Upgrades,750000,,Markey; Warren,S
492,NASA,SSMS,Atchison Amelia Earhart Foundation,"Atchison, KS",Development of New Programs at the Amelia Earh...,1000000,,Moran,S
493,NASA,SSMS,McAuliffe-Shepard Discovery Center,"Concord, NH",McAuliffe-Shepard Discovery Center Planetarium...,348000,,Shaheen,S
494,NASA,SSMS,University of New Hampshire,"Durham, NH",University of New Hampshire Magnetometer Resea...,501000,,Shaheen,S


Unnamed: 0,Agency,Account,Recipient,Project Name,Amount,Requestor(s),Origination
0,Air Force,"RDTE,AF","Texas A&M University—Central Texas, Killeen, TX",Development of Cybersecurity Methodologies,2990000,Carter (TX),H
1,Air Force,"RDTE,AF","Central New York Defense Alliance, Rome, NY",Skydome: Trusted Smart-X Experimentation Envir...,200000,Tenney,H
2,Army,"RDTE,A","Georgia Southern University, Statesboro, GA",Soldier Athlete Human Performance Optimization,1500000,Carter (GA),H
3,Army,"RDTE,A","Pennington Biomedical Research Center, Baton R...",Center for Excellence in Military Health and P...,3566666,Graves (LA),H
4,Army,"RDTE,A","Coalition for National Trauma Research, San An...",National Trauma Research Repository Data Popul...,1900000,Ruppersberger,H
5,Army,"RDTE,A","APG Centennial Celebration Association, Belcam...",The Discovery Center at Water’s Edge,250000,Ruppersberger,H
6,Defense-Wide,"RDTE,DW","Kansas City Kansas Community College, Kansas C...",Automation Engineering Technology Program,1981000,Davids (KS),H
7,Defense-Wide,"RDTE,DW",National Center for Defense Manufacturing and ...,El Paso Makes Contract Support for El Paso Man...,964000,Escobar,H
8,Defense-Wide,"RDTE,DW","VA Tech University, Blacksburg, VA",Next Generation Explosives and Propellants,1000000,Griffith,H
9,Defense-Wide,"RDTE,DW","American Museum of Natural History, New York, NY",Novel Analytical and Empirical Approaches to t...,1500000,Nadler,H


Unnamed: 0,Agency,Account,Project Name; Recipient,Budget Request,Additional Amount,Total Amount Provided,Requesters: House,Requesters: Senate,Origination
0,Army Corps of Engineers (Civil),Construction,"Acequias Environmental Infrastructure, NM; U.S...",,1500000,1500000,,Heinrich; Luja´ n,S
1,Army Corps of Engineers (Civil),Construction,Beneficial Use of Dredged Material Pilot Progr...,,1775000,1775000,,Feinstein; Padilla,S
2,Army Corps of Engineers (Civil),Construction,"Calaveras County, Section 219, CA; U.S. Army C...",,1000000,1000000,,Feinstein; Padilla,S
3,Army Corps of Engineers (Civil),Construction,"Calumet Region, IN; U.S. Army Corps of Engineers",,10000000,10000000,Mrvan,,H
4,Army Corps of Engineers (Civil),Construction,"Carolina Beach and Vicinity, NC; U.S. Army Cor...",,2000000,2000000,Rouzer,Burr; Tillis,H/S
...,...,...,...,...,...,...,...,...,...
242,Department of Energy,Fossil Energy and Carbon Management,Coal Communities Regional Innovation Cluster; ...,,4000000,4000000,,Manchin,S
243,Department of Energy,Fossil Energy and Carbon Management,Coal Mine Methane Solutions; Community Office ...,,1200000,1200000,,Bennet; Hickenlooper,S
244,Department of Energy,Fossil Energy and Carbon Management,Emergency Backup Generator; Melakatla Indian C...,,540000,540000,,Murkowski,S
245,Department of Energy,Fossil Energy and Carbon Management,Enhanced Outcrop Methane Capture; Southern Ute...,,2500000,2500000,,Bennet; Hickenlooper,S


Unnamed: 0,Agency,Account,Project Name,Recipient,Amount,Requesters: House,Requesters: Senate,Origination
0,General Services Administration,Federal Buildings Fund,South State Street Properties,Everett McKinley Dirksen United States Courtho...,52000000,,Durbin,S
1,General Services Administration,Federal Buildings Fund,Santa Teresa Land Port of Entry Feasibility Study,"New Mexico Border Authority, Santa Teresa, NM",500000,,"Heinrich, Luja´ n",S
2,General Services Administration,Federal Buildings Fund,Dennis DiConcini Land Port of Entry Feasibilit...,General Services Administration,500000,,"Kelly, Sinema",S
3,General Services Administration,Federal Buildings Fund,Chamblee Campus Feasibility Study,General Services Administration,500000,,Warnock,S
4,National Archives and Records Administration,National Historical Publications and Records C...,Wisconsin Historical Society,"Wisconsin Historical Society, Madison, WI",500000,,Baldwin,S
...,...,...,...,...,...,...,...,...
140,Small Business Administration,Salaries and Expenses,Atlantic City Small Business Assistance Initia...,Atlantic City Office of the Business Administr...,800000,Van Drew,,H
141,Small Business Administration,Salaries and Expenses,RGV Small Business Innovation Research and Tec...,"Texas A&M Engineering Experiment Station, Coll...",500000,Vela,,H
142,Small Business Administration,Salaries and Expenses,Small Business Accelerator Program in the Atla...,"Urban League of Greater Atlanta, Inc., Decatur...",150000,Williams (GA),,H
143,Small Business Administration,Salaries and Expenses,Black and Diverse Business Wealth Initiative,"Louisville Metro Government, Louisville, KY",250000,Yarmuth,,H


Unnamed: 0,Agency,Account,Project,Recipient,State,Amount,Requesters: House,Requesters: Senate,Origination
0,FEMA,Emergency Operations Center,Tsunami Shelter for the Alutiiq Tribe of Old H...,Alutiiq Tribe of Old Harbor,,1500000,,Murkowski,S
1,FEMA,Pre-Disaster Mitigation,Lake Lenape Dam Flood Mitigation,Atlantic County Improvement Authority,NJ,4600000,Van Drew,,H
2,FEMA,Emergency Operations Center,Emergency Operations Center,Baker County Sheriff’s Office,OR,2000000,,Wyden,S
3,FEMA,Emergency Operations Center,Beaver County Emergency Operations Center,Beaver County Emergency Services,PA,320000,,Casey,S
4,FEMA,Emergency Operations Center,Benton County Emergency Operations Center,Benton County,OR,1000000,,"Merkley, Wyden",S
...,...,...,...,...,...,...,...,...,...
117,FEMA,Emergency Operations Center,Emergency Operations Center Facility Project,WV Division of Emergency Management,WV,955000,,Capito,S
118,FEMA,Pre-Disaster Mitigation,WV Water Treatment Plant Auxiliary Power Project,WV Division of Emergency Management,WV,708000,,Capito,S
119,FEMA,Emergency Operations Center,Yancey County—EOC,Yancey County,NC,150000,Cawthorn,,H
120,FEMA,Emergency Operations Center,York County emergency operations center and re...,York County Emergency Management Agency,ME,850000,,Collins,S


Unnamed: 0,Agency,Account,State,Project Recipient and Name,Amount,Requesters: House,Requesters: Senate,Origination
0,Bureau of Indian Affairs,Special Initiatives,AK,Alaska Native Justice Center for Alaska Tribal...,1000000,,Murkowski,S
1,Bureau of Indian Affairs,Special Initiatives,AK,Alaska Native Women’s Resource Center for Dome...,250000,,Murkowski,S
2,Bureau of Land Management,Land Acquisition,NM,Rio Grande del Norte National Monument,3000000,,Heinrich,S
3,Environmental Protection Agency,Science and Technology,AK,Kodiak Area Native Association for Kodiak Regi...,50000,,Murkowski,S
4,Environmental Protection Agency,Science and Technology,AK,University of Alaska for Alaska PFAS Remediati...,2000000,,Murkowski,S
...,...,...,...,...,...,...,...,...
603,National Park Service,Land Acquisition,MO,Gateway Arch National Park,2600000,,Blunt,S
604,National Park Service,Land Acquisition,NC,Guilford Courthouse National Military Park,200000,,Burr,S
605,National Park Service,Statutory and Contractual Aid,MD,City of Annapolis for Elktonia and Carr’s Beac...,2000000,,Cardin,S
606,National Park Service,Statutory and Contractual Aid,WV,New River Gorge Regional Development Authority...,1500000,,Manchin,S


Unnamed: 0,Agency,Account,Project Description,Requesters: House,Requesters: Senate,Amount,Origination
0,Department of Labor,Employment and Training Administration [ETA],"AIDS Service Center of Lower Manhattan, Inc. d...","Maloney, Carolyn B.",Schumer,1000000,H
1,Department of Labor,Employment and Training Administration [ETA],"American Indian OIC, MN, for job training prog...",,Smith,350000,S
2,Department of Labor,Employment and Training Administration [ETA],"Anne Arundel County Government, Annapolis, MD ...",Brown,,500000,H
3,Department of Labor,Employment and Training Administration [ETA],"Applied Behavioral Rehabilitation Institute, I...",,"Blumenthal, Murphy",25000,S
4,Department of Labor,Employment and Training Administration [ETA],Arizona Opportunities Industrialization Center...,Gallego,,1200000,H
...,...,...,...,...,...,...,...
1533,Department of Education,Higher Education,"Western New Mexico University, NM, for an outd...",,"Heinrich, Luja´ n",343000,S
1534,Department of Education,Higher Education,"William Jewell College, MO, for technology upg...",,Blunt,5000000,S
1535,Department of Education,Higher Education,"Worcester State University, Worcester, MA for ...",McGovern,,1000000,H
1536,Department of Education,Higher Education,"York College, CUNY, Jamaica, NY for health dis...",Meeks,,2000000,H


Unnamed: 0,Agency,State,Location,Project,Amount,Requesters: House,Requesters: Senate,Origination
0,Army,Alabama,Anniston Army Depot,Welding Facility,25010000,Rogers (AL),,H
1,Army,Alaska,Fort Wainwright,ERDC–CRREL Permafrost Tunnel Research Facility...,5400000,,Murkowski,S
2,Navy,Arizona,MCAS Yuma,Combat Training Tank Complex,29300000,,"Kelly, Sinema",S
3,Navy,California,NB Ventura County,Combat Vehicle Maintenance Facility,48700000,,Feinstein,S
4,Navy,California,NB Coronado,CMV–22B Aircraft Maintenance Hangar,63600000,,Feinstein,S
...,...,...,...,...,...,...,...,...
67,Army NG,Vermont,Ethan Allen AFB,Family Readiness Center: Unspecified Minor Con...,4665000,,"Leahy, Sanders",S
68,Army NG,Virginia,Sandston,Aircraft Maintenance Hangar: Planning and Design,5805000,,Warner,S
69,Air NG,Washington,Camp Murray ANGS,Air Support Operations Group Complex,27000000,,Murray,S
70,Air NG,Wisconsin,Volk Combat Readiness Training Center,Replace Aircraft Maintenance Hangar/Shops: Pla...,2280000,,Baldwin,S


failure cleaning Transportation, and Housing and Urban Development, and Related Agencies


Unnamed: 0,Agency,Account,Project,Recipient,State,Amount,Requesters: House,Requesters: Senate,Origination,Requestor(s)
0,Department of Transportation,"Transportation Planning, Research, and Develop...",EV ferry pilot program,Southeast Conference,AK,"$2,000,000",,Murkowski,S,
1,Department of Transportation,"Transportation Planning, Research, and Develop...",West Santa Ana Branch Transit Corridor,Los Angeles County Metropolitan \nTransport...,CA,1000000,,Feinstein,S,
2,Department of Transportation,"Transportation Planning, Research, and Develop...",America’s Volunteer Driver Center,ITNAmerica,ME,1000000,,Collins,S,
3,Department of Transportation,"Transportation Planning, Research, and Develop...",Sayreville Waterfront Multimodal Transportatio...,Sayreville Economic and Rede-\nvelopment A...,NJ,1316000,,Menendez,S,
4,Department of Transportation,"Transportation Planning, Research, and Develop...",Study to Reestablish Passenger Rail Between Re...,Berks County,PA,750000,"Houlahan, \nMeuser",Casey,H/S,
...,...,...,...,...,...,...,...,...,...,...
1487,Department of Housing and Urban Development,Community Development Fund,Shepherd University East Loop: Environmental R...,Shepherd University,WV,1475000,,Capito,S,
1488,Department of Housing and Urban Development,Community Development Fund,Mountaineer Recovery Village— Phase 1,Semper Liberi,WV,1500000,,Capito,S,
1489,Department of Housing and Urban Development,Community Development Fund,Crites Industrial Park,Hardy County Rural Development \nAuthority,WV,2268000,,"Capito, \nManchin",S,
1490,Department of Housing and Urban Development,Community Development Fund,Mount Hope Facilities upgrade,City of Mount Hope,WV,2393000,,Capito,S,


## Clean Files and Join Individual Category CSV Files into One Dataframe

In [65]:
allcategories = pd.DataFrame()

for index,row in links.iterrows():
    if index == 0:
        res = pd.read_csv("C:\\Users\\rorey\\OneDrive - Bipartisan Policy Center\\Congress\\Earmarks Data\\"+str(row["Category"])+".csv")

        ## clean and resave files
        
        res.columns = [clean(c) for c in res.columns]
        for c in res.columns:
            res[c] = res[c].map(lambda x: clean(x))
            
        try:
            res.drop(["Unnamed: 0"],axis=1,inplace=True)
        except:
            continue

        res.to_csv("C:\\Users\\rorey\\OneDrive - Bipartisan Policy Center\\Congress\\Earmarks Data\\"+str(row["Category"])+".csv",index=False)

        ## rename columns for consistency
        
        res.rename(columns={"Project Name":"Project","Amount Provided":"Amount","Total Amount Provided":"Amount","Project Description":"Project","Project Recipient and Name":"Project"},inplace=True)

        if "Project Name; Recipient" in res.columns:
            res["Project"] = res["Project Name; Recipient"].dropna().map(lambda x: x[:x.find(";")])
            res["Recipient"] = res["Project Name; Recipient"].dropna().map(lambda x: x[x.find(";")+1:])
            res.drop(["Project Name; Recipient"], axis=1,inplace=True)
    
        res["Category"] = [row["Category"]]*len(res["Agency"])
        
        allcategories = res
        
    else: # use first category to create results ("res") dataframe upon which to append others
        res = pd.read_csv("C:\\Users\\rorey\\OneDrive - Bipartisan Policy Center\\Congress\\Earmarks Data\\"+str(row["Category"])+".csv")
       
        res.columns = [clean(c) for c in res.columns]
        for c in res.columns:
            res[c] = res[c].map(lambda x: clean(x))
        try:
            res.drop(["Unnamed: 0"],axis=1,inplace=True)
        except:
            continue

        res.to_csv("C:\\Users\\rorey\\OneDrive - Bipartisan Policy Center\\Congress\\Earmarks Data\\"+str(row["Category"])+".csv",index=False)
        
        res.rename(columns={"Project Name":"Project","Amount Provided":"Amount","Total Amount Provided":"Amount","Project Description":"Project","Project Recipient and Name":"Project"},inplace=True)

    
        if "Project Name; Recipient" in res.columns:
            res["Project"] = res["Project Name; Recipient"].dropna().map(lambda x: x[:x.find(";")])
            res["Recipient"] = res["Project Name; Recipient"].dropna().map(lambda x: x[x.find(";")+1:])
            res.drop(["Project Name; Recipient"], axis=1,inplace=True)
    
        res["Category"] = [row["Category"]]*len(res["Agency"])


        allcategories = pd.concat([allcategories, res], ignore_index=True, sort=False)

try:
    allcategories.drop(["Unnamed: 0"],axis=1,inplace=True)
except:
    continue
    
allcategories = allcategories[['Category','Agency','Account','Project', 'Recipient','Location','State','Budget Request','Additional Amount','Amount','Requestors: House','Requestors: Senate','Requestor(s)','Origination']]

allcategories

Unnamed: 0,Agency,Account,Project,Recipient,Location,Amount,Requesters: House,Requesters: Senate,Origination,Category,Requestor(s),Budget Request,Additional Amount,State
0,Animal and Plant Health Inspection Service,APHIS S&E,Statewide Pest Surveys,Alaska Division of Agriculture,AK,100000,,Murkowski,S,"Agriculture, Rural Development, Food and Drug ...",,,,
1,Animal and Plant Health Inspection Service,APHIS S&E,Feral Swine Management,Arkansas Department of Agriculture,AR,650000,,Boozman,S,"Agriculture, Rural Development, Food and Drug ...",,,,
2,Animal and Plant Health Inspection Service,APHIS S&E,Invasive Species Surveillance,Hawaii Department of Land and Natural Resources,HI,600000,,Hirono; Schatz,S,"Agriculture, Rural Development, Food and Drug ...",,,,
3,Animal and Plant Health Inspection Service,APHIS S&E,Kula Agricultural Fencing,Maui Office of Economic Development,HI,600000,,Schatz,S,"Agriculture, Rural Development, Food and Drug ...",,,,
4,Animal and Plant Health Inspection Service,APHIS S&E,O’Hare Federal Inspection Station,City of Chicago,IL,250000,,Durbin,S,"Agriculture, Rural Development, Food and Drug ...",,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4969,Department of Housing and Urban Development,Community Development Fund,Shepherd University East Loop: Environmental R...,Shepherd University,,1475000,,Capito,S,"Transportation, and Housing and Urban Developm...",,,,WV
4970,Department of Housing and Urban Development,Community Development Fund,Mountaineer Recovery Village— Phase 1,Semper Liberi,,1500000,,Capito,S,"Transportation, and Housing and Urban Developm...",,,,WV
4971,Department of Housing and Urban Development,Community Development Fund,Crites Industrial Park,Hardy County Rural Development Authority,,2268000,,"Capito, Manchin",S,"Transportation, and Housing and Urban Developm...",,,,WV
4972,Department of Housing and Urban Development,Community Development Fund,Mount Hope Facilities upgrade,City of Mount Hope,,2393000,,Capito,S,"Transportation, and Housing and Urban Developm...",,,,WV


In [71]:
allcategories.to_excel("C:\\Users\\rorey\\OneDrive - Bipartisan Policy Center\\Congress\\Earmarks Data\\AllFullyFundedCDSProjects.xlsx",index=False)