# Who Owns the Large Buildings in Seattle?

## Problem

The GHGE dataset does not include buildings' owners. Scraping the eRealProperty website for building owners has two limitations:

1. The data quality is poor and many buildings don't have an owner listed.
1. Many corporations with multiple properties set up a separate LLC for each building. There is no straightforward way to trace a child corporation to its parent coroporation. This obfuscates the portfolio size of each company.

We will use the Corporations and Charities Filings System from the Secretary of State to figure this out. The basic process is:
    
1. Start with a company name.
1. Find that company's official name in CCFS.
1. Collect the principals/governors names from that company.
1. Collect all the businesses with those same governors. 
1. Human review to check which companies are connected based on number of overlapping governor, ID number, address, name, etc.
1. Profit?

The CCFS does not have a public API. API endpoints are in the utility methods found in each step. 

Our most up-to-date list of buildings owners, with owner names normalized (e.g., "City of Seattle" and "Seattle City" are both normalized to "City of Seattle"), can be found [here](https://github.com/linnealovespie/BPS/tree/dig_into_owners/experiments/worst_offenders#:~:text=updated_owners_2_15_23.csv). 

In [1]:
import pandas as pd
import numpy as np
import requests
from fuzzywuzzy import fuzz
import json
import os

## Step 1

Finding the CCFS business records for the building owners we already have.

In [60]:
# Utils for finding principals

search_for_business_url = 'https://cfda.sos.wa.gov/api/BusinessSearch/GetBusinessSearchList'

def get_business_search_payload(business_name):
    return {
        'Type': 'BusinessName',
        'SearchType': 'BusinessName',
        'SearchEntityName': business_name,
        'SortType': 'ASC',
        'SortBy': 'Entity Name',
        'SearchValue': business_name,
        'SearchCriteria': 'Contains',
        'IsSearch': 'true',
        'PageID': 1,
        'PageCount': 25,
    }

def get_business_search_results(business_name):
    r = requests.post(search_for_business_url, get_business_search_payload(business_name))
    return json.loads(r.text)

def extract_search_results(search_term, search_req_response):
    res_list = [[search_term, res['BusinessName'], res['UBINumber'], res['BusinessID'], res['PrincipalOffice']['PrincipalStreetAddress']['FullAddress']] for res in search_req_response]
    res_df = pd.DataFrame(res_list, columns=['SearchTerm', 'BusinessName', 'UBINumber', 'BusinessId', 'Address'])
    exact_match = res_df.index[res_df['BusinessName'] == search_term].tolist()
    if exact_match:
        res_df = pd.concat([res_df.iloc[[exact_match[0]],:], res_df.drop(exact_match[0], axis=0)], axis=0)
    return res_df
    

# Mark row as potential match: UBI number is a duplicate, or Address is the same
def determine_search_matches(search_results_df):
    search_results_df['address_match'] = search_results_df.duplicated(subset=['Address'], keep=False) 
    search_results_df['ubi_match'] = search_results_df.duplicated(subset=['UBINumber'], keep=False)
    search_results_df['id_match'] = search_results_df.duplicated(subset=['BusinessId'], keep=False)

def get_business_details(business_id):
    url = 'https://cfda.sos.wa.gov/api/BusinessSearch/BusinessInformation?businessID={business_id}'.format(business_id=business_id)
    r = requests.get(url)
    return json.loads(r.text)

In [3]:
buildings_and_landlords_df = pd.read_csv('../../experiments/worst_offenders/landlords_with_total_energy_use_2_16_23.csv')

In [4]:
buildings_and_landlords_df.head()

Unnamed: 0.1,Unnamed: 0,Owner,BuildingsOwned,TotalSquareFootage,TotalGHGEmissions__metric_tons_,TotalElectricity_kBtu_,TotalSteamUse_kBtu,TotalNaturalGas_kBtu,TotalOtherFuelUse_kBtu,AverageGHGEmissionsIntensity,AverageSiteEUI_kBtu_sf,AverageSourceEUI_kBtu_sf,AverageENERGYSTARScore
0,0,CITY OF SEATTLE,175,17038324,16357,501287744,6826192,247129120,0,1.172674,49.970175,108.376608,72.747899
1,1,UNIVERSITY OF WASHINGTON,43,18445870,39712,679730875,366595771,106707401,0,4.756098,147.265,321.9325,61.818182
2,2,STATE OF WASHINGTON,17,2768746,3614,88426130,0,59208493,0,2.1125,71.1625,134.625,55.375
3,3,KING COUNTY,16,4355059,22535,255701627,16515088,372904431,0,1.75,66.94375,137.29375,68.083333
4,4,OVERLOOK MAGNOLIA APARTMENTS LLC,11,388815,52,9911319,0,0,0,0.127273,25.418182,71.145455,74.545455


In [9]:
owner_search_list = buildings_and_landlords_df['Owner'].unique()

owner_search_list[:5]

array(['CITY OF SEATTLE', 'UNIVERSITY OF WASHINGTON',
       'STATE OF WASHINGTON', 'KING COUNTY',
       'OVERLOOK MAGNOLIA APARTMENTS LLC'], dtype=object)

In [11]:
n = 200
owner_search_chunks = [owner_search_list[i * n:(i + 1) * n] for i in range((len(owner_search_list) + n - 1) // n )]

[array(['CITY OF SEATTLE', 'UNIVERSITY OF WASHINGTON',
        'STATE OF WASHINGTON', 'KING COUNTY',
        'OVERLOOK MAGNOLIA APARTMENTS LLC',
        'CATHOLIC ARCHDIOCESE OF SEATTLE', 'CSHV NWCP SEATTLE LLC',
        'QWEST CORPORATION', 'PUBLIC STORAGE',
        'SELIG HOLDINGS COMPANY LLC', 'ACORN DEVELOPMENT LLC',
        'BREIER SCHEETZ PROPERTIES LLC', 'UNIVERSITY VILLAGE LIMITED PS',
        'SREH 2014 LLC', 'SEATTLE UNIVERSITY', 'ORCAS BUSINESS PARK L.L.C',
        'HARSCH INVESTMENT PROPERTIES II LLC',
        'SEATTLE PACIFIC UNIVERSITY',
        'CENTRAL PUGET SOUND REGIONAL TRANSIT AUTHORITY',
        'MEPT WESTWOOD VILLAGE LLC', 'ABSAROKA HOLDINGS LLC',
        'PLYMOUTH HOUSING GROUP', 'DOWNTOWN EMERGENCY SERVICE CENTER',
        'MERIDIAN ASSOCIATES APARTMENTS L L C', 'ESSEX PORTFOLIO LP',
        'FRED HUTCHISON CANCER RESEARCH CENTER', 'TRIPLE B VENTURES LLC',
        'SPEAR INVESTMENTS LLC', 'SAND POINT COMMUNITY HOUSING ASSOC',
        'SOUND', 'GEORGETOWN VENTURE

In [14]:
def get_potential_company_name_matches(owner_name):
    all_search_results = get_business_search_results(owner_name)
    extracted_results = extract_search_results(owner_name, all_search_results)
    determine_search_matches(extracted_results)
    return extracted_results

In [22]:
get_potential_company_name_matches('MURPHY BALLARD LLC')

Unnamed: 0,SearchTerm,BusinessName,UBINumber,BusinessId,Address,IsMatch
0,MURPHY BALLARD LLC,"MURPHY BALLARD, L.L.C.",602 509 455,860889,"1901 NW MARKET ST, SEATTLE, WA, 98107, UNITED ...",True


In [32]:
def get_company_list_name_matches(owner_list):
    matches = pd.DataFrame([], columns = ['SearchTerm', 'BusinessName', 'UBINumber', 'BusinessId', 'Address', 'IsMatch'])
    
    for owner in owner_list:
        matches = pd.concat([get_potential_company_name_matches(owner), matches], ignore_index=True)
    
    return matches
    

In [50]:
test_owners = ['BALLARD ASSOC LLC', 'MJWKING LLC', 'MARKET HOLDINGS COMPANY LLC', 'CAR WASH ENTERPRISES INC']

get_company_list_name_matches(test_owners)

Unnamed: 0,SearchTerm,BusinessName,UBINumber,BusinessId,Address,name_match,address_match,ubi_match,id_match,IsMatch
0,CAR WASH ENTERPRISES INC,"CAR WASH ENTERPRISES, INC.",578 073 701,603606,"3977 LEARY WAY NW, SEATTLE, WA, 98107-5041, UN...",True,False,False,False,
1,MARKET HOLDINGS COMPANY LLC,MARKET HOLDINGS COMPANY LLC,603 371 173,829753,"1000 2ND AVE STE 1800, SEATTLE, WA, 98104-3619...",True,False,False,False,
2,MARKET HOLDINGS COMPANY LLC,"BATS GLOBAL MARKETS HOLDINGS, INC.",604 953 022,1584354,,False,True,True,True,
3,MARKET HOLDINGS COMPANY LLC,CBOE SERVICES COMPANY,604 953 022,1584354,"8050 MARSHALL DR STE 120, LENEXA, KS, 66214-15...",False,False,True,True,
4,MARKET HOLDINGS COMPANY LLC,"EVERGREEN MARKET GROUP HOLDINGS, LLC",604 515 369,1318882,"4242 E VALLEY RD, RENTON, WA, 98057-4903, UNIT...",False,False,False,False,
5,MARKET HOLDINGS COMPANY LLC,EVERGREEN MARKET PROPERTY HOLDINGS LLC,604 513 076,1317864,"4242 E VALLEY RD, RENTON, CA, 98057-4903, UNIT...",False,False,False,False,
6,MARKET HOLDINGS COMPANY LLC,"FM MARKET HOLDINGS, LLC",602 751 027,699651,"111 N POST ST STE 300, SPOKANE, WA, 99201-4911...",False,False,False,False,
7,MARKET HOLDINGS COMPANY LLC,"GLOBAL MARKET INSITE INTERNATIONAL HOLDING, INC.",602 498 163,725208,"1100 112TH AVE NE #200, BELLEVUE, WA, 98004, U...",False,False,False,False,
8,MARKET HOLDINGS COMPANY LLC,"INFORMA MARKETS HOLDINGS, INC.",602 280 823,1070757,"1983 MARCUS AVE STE 250, NEW HYDE PARK, NY, 11...",False,False,False,False,
9,MARKET HOLDINGS COMPANY LLC,"INLAND NORTHWEST MARKET HOLDINGS, LLC",603 092 077,770143,"505 W RIVERSIDE AVE STE 500, SPOKANE, WA, 9920...",False,False,False,False,


In [61]:
# Trying this with our first 200 

owner_search_chunk_1 = get_company_list_name_matches(owner_search_chunks[0])

owner_search_chunk_1.head()

Unnamed: 0,SearchTerm,BusinessName,UBINumber,BusinessId,Address,address_match,ubi_match,id_match,IsMatch
0,ASHFORD SEATTLE WATERFRONT L P,ASHFORD SEATTLE WATERFRONT GP LLC,604 268 117,1193505,"14185 DALLAS PKWY STE 1100, DALLAS, TX, 75254-...",True,False,False,
1,ASHFORD SEATTLE WATERFRONT L P,ASHFORD SEATTLE WATERFRONT LEASING LLC,604 417 033,1268461,"14185 DALLAS PKWY STE 1100, DALLAS, TX, 75254-...",True,False,False,
2,ASHFORD SEATTLE WATERFRONT L P,ASHFORD SEATTLE WATERFRONT LP,602 712 212,549855,"14185 DALLAS PKWY STE 1100, DALLAS, TX, 75254,...",False,False,False,
3,ASHFORD SEATTLE WATERFRONT L P,ASHFORD TRS SEATTLE WATERFRONT LLC,604 268 118,1193506,"14185 DALLAS PKWY STE 1100, DALLAS, TX, 75254-...",True,False,False,
4,CRE WINSTON LLC,"CRE WINSTON, LLC",603 154 997,124091,"2930 WESTLAKE AVE N #100, SEATTLE, WA, 98109, ...",False,False,False,


In [63]:
owner_search_chunk_1.to_csv('owner_search_chunk_1.csv')

In [64]:
# 2nd chunk

owner_search_chunk_2 = get_company_list_name_matches(owner_search_chunks[1])

owner_search_chunk_2.head()

owner_search_chunk_2.to_csv('owner_search_chunk_2.csv')

In [66]:
# 3rd chunk
owner_search_chunk_3 = get_company_list_name_matches(owner_search_chunks[2])

owner_search_chunk_3.to_csv('owner_search_chunk_3.csv')

In [67]:
# 4th chunk

owner_search_chunk_4 = get_company_list_name_matches(owner_search_chunks[3])

owner_search_chunk_4.to_csv('owner_search_chunk_4.csv')

In [68]:
# 5th chunk

owner_search_chunk_5 = get_company_list_name_matches(owner_search_chunks[4])

owner_search_chunk_5.to_csv('owner_search_chunk_5.csv')

In [69]:
# 6th chunk

owner_search_chunk_6 = get_company_list_name_matches(owner_search_chunks[5])

owner_search_chunk_6.to_csv('owner_search_chunk_6.csv')

In [70]:
# 7th chunk

owner_search_chunk_7 = get_company_list_name_matches(owner_search_chunks[6])

owner_search_chunk_7.to_csv('owner_search_chunk_7.csv')

In [71]:
# 8th chunk

owner_search_chunk_8 = get_company_list_name_matches(owner_search_chunks[7])

owner_search_chunk_8.to_csv('owner_search_chunk_8.csv')

In [72]:
# 9th chunk

owner_search_chunk_9 = get_company_list_name_matches(owner_search_chunks[8])

owner_search_chunk_9.to_csv('owner_search_chunk_9.csv')

In [73]:
# 10th chunk

owner_search_chunk_10 = get_company_list_name_matches(owner_search_chunks[9])

owner_search_chunk_10.to_csv('owner_search_chunk_10.csv')

In [77]:
# 11th chunk

owner_search_chunk_11 = get_company_list_name_matches(owner_search_chunks[10])

owner_search_chunk_11.to_csv('owner_search_chunk_11.csv')

KeyboardInterrupt: 

In [79]:
all_building_owner_chunks = pd.concat([
    owner_search_chunk_1,
    owner_search_chunk_2,
    owner_search_chunk_3,
    owner_search_chunk_4,
    owner_search_chunk_5,
    owner_search_chunk_6,
    owner_search_chunk_7,
    owner_search_chunk_8,
    owner_search_chunk_9,
    owner_search_chunk_10,
    owner_search_chunk_11
])

In [86]:
all_building_owner_chunks.to_csv('../chunks.csv')

## (Optional) Step 1b

You can ask annotaters to check the possible matches found by the scraping script. 

Proposed categories (hat tip Alice): Match, possible match (will be reviewed by another annotator), not a match

Then you can filter out the "not a match" companies and use the resulting list as input for Step 2.

## Step 2

Fetch the principals for each company found in Step 1.

In [122]:
def get_business_details(business_id):
    url = 'https://cfda.sos.wa.gov/api/BusinessSearch/BusinessInformation?businessID={business_id}'.format(business_id=business_id)
    r = requests.get(url)
    return json.loads(r.text)

def extract_principals(business_res, business_id):
    agent = business_res['Agent']['EntityName']
    rows = [[
        # name of company?
        business_id,
        agent,
        'Entity' if principal['TypeID'] == 'E' else 'Individual',
        principal['PrincipalID'],
         principal['Name'] if principal['TypeID'] == 'E' else principal['FirstName'] + ' ' + principal['LastName']
    ] for principal in business_res['PrincipalsList']]
    return pd.DataFrame(rows, columns=['BusinessId', 'Agent', 'EntityType', 'PrincipalID', 'PrincipalName'])

def get_companies_principals(business_names_df):
    '''
    Takes a DF of companies with BusinessId and returns a DF of each company's principals, 
    with one row for each principal.
    '''
    principals = pd.DataFrame([], columns=['BusinessId', 'Agent', 'EntityType', 'PrincipalID', 'PrincipalName'])
    for business in business_names_df['BusinessId']:
        business_res = get_business_details(business)
        principals = pd.concat([extract_principals(business_res, business), principals], ignore_index=True)
    
    merged_principals = pd.merge(business_names_df, principals, on='BusinessId', how='left')
    
    return merged_principals

In [123]:
#create a table with principals and the business id, then left join on existing table to get business info + principal

principals_search_chunk_1 = get_companies_principals(owner_search_chunk_1)

In [124]:
principals_search_chunk_1.head()

Unnamed: 0,SearchTerm,BusinessName,UBINumber,BusinessId,Address,address_match,ubi_match,id_match,IsMatch,Agent,EntityType,PrincipalID,PrincipalName
0,ASHFORD SEATTLE WATERFRONT L P,ASHFORD SEATTLE WATERFRONT GP LLC,604 268 117,1193505,"14185 DALLAS PKWY STE 1100, DALLAS, TX, 75254-...",True,False,False,,CORPORATION SERVICE COMPANY,Entity,2826137,ASHFORD CHICAGO SENIOR MEZZ LLC
1,ASHFORD SEATTLE WATERFRONT L P,ASHFORD SEATTLE WATERFRONT GP LLC,604 268 117,1193505,"14185 DALLAS PKWY STE 1100, DALLAS, TX, 75254-...",True,False,False,,CORPORATION SERVICE COMPANY,Individual,3132878,J. ROBINSON HAYS III
2,ASHFORD SEATTLE WATERFRONT L P,ASHFORD SEATTLE WATERFRONT GP LLC,604 268 117,1193505,"14185 DALLAS PKWY STE 1100, DALLAS, TX, 75254-...",True,False,False,,CORPORATION SERVICE COMPANY,Individual,3132879,ROBERT G. HAIMAN
3,ASHFORD SEATTLE WATERFRONT L P,ASHFORD SEATTLE WATERFRONT LEASING LLC,604 417 033,1268461,"14185 DALLAS PKWY STE 1100, DALLAS, TX, 75254-...",True,False,False,,CORPORATION SERVICE COMPANY,Individual,2820892,ROBERT G HAIMAN
4,ASHFORD SEATTLE WATERFRONT L P,ASHFORD SEATTLE WATERFRONT LEASING LLC,604 417 033,1268461,"14185 DALLAS PKWY STE 1100, DALLAS, TX, 75254-...",True,False,False,,CORPORATION SERVICE COMPANY,Individual,3139500,J. ROBINSON HAYS III


In [128]:
# keep getting an error if creating a csv in this directory, so using a redundant file path
principals_search_chunk_1.to_csv('../building_owners/principals/principals_search_chunk_1.csv')

In [129]:
principals_search_chunk_2 = get_companies_principals(owner_search_chunk_2)
principals_search_chunk_2.to_csv('../building_owners/principals/principals_search_chunk_2.csv')

In [130]:
principals_search_chunk_3 = get_companies_principals(owner_search_chunk_3)
principals_search_chunk_3.to_csv('../building_owners/principals/principals_search_chunk_3.csv')

In [131]:
principals_search_chunk_4 = get_companies_principals(owner_search_chunk_4)
principals_search_chunk_4.to_csv('../building_owners/principals/principals_search_chunk_4.csv')

In [132]:
principals_search_chunk_5 = get_companies_principals(owner_search_chunk_5)
principals_search_chunk_5.to_csv('../building_owners/principals/principals_search_chunk_5.csv')

In [133]:
principals_search_chunk_6 = get_companies_principals(owner_search_chunk_6)
principals_search_chunk_6.to_csv('../building_owners/principals/principals_search_chunk_6.csv')

In [134]:
principals_search_chunk_7 = get_companies_principals(owner_search_chunk_7)
principals_search_chunk_7.to_csv('../building_owners/principals/principals_search_chunk_7.csv')

In [135]:
principals_search_chunk_8 = get_companies_principals(owner_search_chunk_8)
principals_search_chunk_8.to_csv('../building_owners/principals/principals_search_chunk_8.csv')

In [136]:
principals_search_chunk_9 = get_companies_principals(owner_search_chunk_9)
principals_search_chunk_9.to_csv('../building_owners/principals/principals_search_chunk_9.csv')

In [137]:
principals_search_chunk_10 = get_companies_principals(owner_search_chunk_10)
principals_search_chunk_10.to_csv('../building_owners/principals/principals_search_chunk_10.csv')

In [138]:
principals_search_chunk_11 = get_companies_principals(owner_search_chunk_11)
principals_search_chunk_11.to_csv('../building_owners/principals/principals_search_chunk_11.csv')

Now we need to find the companies these people are principals for.

Step 1: search for the person using the advanced search

Step 2: go through all the search results, hit the API for each listed business, and look at the principals listed. If one of them matches the person we're looking for, download relevant info. Otherwise, skip.
(NB: the search results don't show the principals' names in either the UI or API response, so you have to look at the full business listing.)


## Step 3

Find every company associated with the governors found in Step 2.

This is slightly convoluted because of the API. The process is:

a) Search for the principal's name using the advanced search API. 

b) Get a paginated list of results. This will *not* include the principal's name because... reasons?

c) For each result, send another request to the business information endpoint to fetch the business details.

d) If the company's principals include the original principal we were looking for, save the business' information.

In [141]:
def get_governor_payload(governor_name, page_num):
    return "Type=Principal&BusinessStatusID=0&SearchEntityName=&SearchType=&BusinessTypeID=0&AgentName=&PrincipalName={governor_name}&StartDateOfIncorporation=&EndDateOfIncorporation=&ExpirationDate=&IsSearch=true&IsShowAdvanceSearch=true&&&AgentAddress%5BIsAddressSame%5D=false&AgentAddress%5BIsValidAddress%5D=false&AgentAddress%5BisUserNonCommercialRegisteredAgent%5D=false&AgentAddress%5BIsInvalidState%5D=false&AgentAddress%5BbaseEntity%5D%5BFilerID%5D=0&AgentAddress%5BbaseEntity%5D%5BUserID%5D=0&AgentAddress%5BbaseEntity%5D%5BCreatedBy%5D=0&&AgentAddress%5BbaseEntity%5D%5BModifiedBy%5D=0&&AgentAddress%5BFullAddress%5D=%2C%20WA%2C%20USA&AgentAddress%5BID%5D=0&&&&AgentAddress%5BState%5D=WA&&AgentAddress%5BCountry%5D=USA&&&&&&&&PrincipalAddress%5BIsAddressSame%5D=false&PrincipalAddress%5BIsValidAddress%5D=false&PrincipalAddress%5BisUserNonCommercialRegisteredAgent%5D=false&PrincipalAddress%5BIsInvalidState%5D=false&PrincipalAddress%5BbaseEntity%5D%5BFilerID%5D=0&PrincipalAddress%5BbaseEntity%5D%5BUserID%5D=0&PrincipalAddress%5BbaseEntity%5D%5BCreatedBy%5D=0&&PrincipalAddress%5BbaseEntity%5D%5BModifiedBy%5D=0&&PrincipalAddress%5BFullAddress%5D=%2C%20WA%2C%20USA&PrincipalAddress%5BID%5D=0&&&&PrincipalAddress%5BState%5D=&&PrincipalAddress%5BCountry%5D=USA&&&&&&IsHostHomeSearch=&IsPublicBenefitNonProfitSearch=&IsCharitableNonProfitSearch=&IsGrossRevenueNonProfitSearch=&IsHasMembersSearch=&IsHasFEINSearch=&NonProfit%5BIsNonProfitEnabled%5D=false&NonProfit%5BchkSearchByIsHostHome%5D=false&NonProfit%5BchkSearchByIsPublicBenefitNonProfit%5D=false&NonProfit%5BchkSearchByIsCharitableNonProfit%5D=false&NonProfit%5BchkSearchByIsGrossRevenueNonProfit%5D=false&NonProfit%5BchkSearchByIsHasMembers%5D=false&NonProfit%5BchkSearchByIsHasFEIN%5D=false&NonProfit%5BFEINNoSearch%5D=&NonProfit%5BchkIsHostHome%5D%5Bnone%5D=false&NonProfit%5BchkIsHostHome%5D%5Byes%5D=false&NonProfit%5BchkIsHostHome%5D%5Bno%5D=false&NonProfit%5BchkIsPublicBenefitNonProfit%5D%5Bnone%5D=false&NonProfit%5BchkIsPublicBenefitNonProfit%5D%5Byes%5D=false&NonProfit%5BchkIsPublicBenefitNonProfit%5D%5Bno%5D=false&NonProfit%5BchkIsCharitableNonProfit%5D%5Bnone%5D=false&NonProfit%5BchkIsCharitableNonProfit%5D%5Byes%5D=false&NonProfit%5BchkIsCharitableNonProfit%5D%5Bno%5D=false&NonProfit%5BchkIsGrossRevenueNonProfit%5D%5Bnone%5D=false&NonProfit%5BchkIsGrossRevenueNonProfit%5D%5Byes%5D=false&NonProfit%5BchkIsGrossRevenueNonProfit%5D%5Bno%5D=false&NonProfit%5BchkIsGrossRevenueNonProfit%5D%5Bover500k%5D=false&NonProfit%5BchkIsGrossRevenueNonProfit%5D%5Bunder500k%5D=false&NonProfit%5BchkIsHasMembers%5D%5Bnone%5D=false&NonProfit%5BchkIsHasMembers%5D%5Byes%5D=false&NonProfit%5BchkIsHasMembers%5D%5Bno%5D=false&NonProfit%5BchkIsHasFEIN%5D%5Byes%5D=false&NonProfit%5BchkIsHasFEIN%5D%5Bno%5D=false&PageID={page_num}&PageCount=100".format(governor_name=governor_name, page_num=page_num)

governor_headers = {
    'Accept': 'application/json, text/plain, */*',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9',
'Connection': 'keep-alive',
'Content-Length': '2778',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'DNT': '1',
'Host': 'cfda.sos.wa.gov',
'Origin': 'https://ccfs.sos.wa.gov',
'Referer': 'https://ccfs.sos.wa.gov/',
'Sec-Fetch-Dest': 'empty',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'same-site',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36',
'sec-ch-ua': '"Google Chrome";v="111", "Not(A:Brand";v="8", "Chromium";v="111"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': "macOS"
}

def get_governor_results_json(governor_name, page_num):
    r = requests.post('https://cfda.sos.wa.gov/api/BusinessSearch/GetAdvanceBusinessSearchList', data=get_governor_payload(governor_name, page_num), headers=governor_headers)
    return json.loads(r.text)

def get_all_governor_search_results(governor_name):
    n = 1
    res_length = 100
    search_results = []
    
    while res_length == 100:
        res = get_governor_results_json(governor_name, n)
        search_results.append(res)
        n += 1
        res_length = len(res)
    
    return search_results

def get_governors_from_all_results_pages(governor_name):
    search_results = get_all_governor_search_results(governor_name)
    business_ids = [res['BusinessId'] for res in search_reults]
    
    # should include company name, see below
    principals = pd.DataFrame([], columns=['BusinessId', 'Agent', 'EntityType', 'PrincipalID', 'PrincipalName'])
    
    for id in business_ids:
        business_json = get_business_details(business_id)
        principals_df = extract_principals(business_json, id) # we should add the company name--write a variant of extract_principals
        # check if the principal is included
        if len(principals_df[principals_df['PrincipalName'] == governor_name]) > 0:
            pd.concat([extract_principals(business_res, business), principals], ignore_index=True)
    
    return principals_df

In [None]:
NB: I have not tested these functions, you will have to:
        - add the business name to the results returned in get_governors_from_all_results_pages
        - test the methods to make sure they work

You should be able to run the get_governors_from_all_results_pages for each row of the CSV generated in Step 2 to get a full list of principals.

Then you can have humans review the outcomes.

If you want to find out what a particular person is involved in, you can just do this process starting with step 3.