### Linking Cities with PUMA Codes

(PUMA is a US Census geographic unit; this file created in this notebook will be used to link cities to Census demographics in NB4)

---

In [1]:
import pandas as pd

In [2]:
rent_and_activity_data = pd.read_csv('rent_and_activity_data.csv')  # Only includes So Cal cities in LA & Orange counties (from NB1)

In [3]:
len(rent_and_activity_data)

113

### Examining PUMA Data

In [4]:
# Source: https://usa.ipums.org/usa/volii/cpuma0010.shtml
# PUMA description: https://usa.ipums.org/usa-action/variables/PUMA#description_section
pumas_df = pd.read_csv('PUMA2000_PUMA2010_crosswalk.csv')

In [5]:
pumas_df.head(1)

Unnamed: 0,State00,PUMA00,GEOID00,GISJOIN00,State10,PUMA10,GEOID10,GISJOIN10,State10_Name,PUMA10_Name,...,PUMA00_Pop10,PUMA10_Pop10,Part_Pop10,pPUMA00_Pop10,pPUMA10_Pop10,PUMA00_Land,PUMA10_Land,Part_Land,pPUMA00_Land,pPUMA10_Land
0,1,100,100100,G01000100,1,100,100100,G01000100,Alabama,"Lauderdale, Colbert, Franklin & Marion (Northe...",...,147137,186695,147137,100.0,78.81,3264203919,5400949424,3264203919,100.0,60.44


In [6]:
pumas_df.columns

Index(['State00', 'PUMA00', 'GEOID00', 'GISJOIN00', 'State10', 'PUMA10',
       'GEOID10', 'GISJOIN10', 'State10_Name', 'PUMA10_Name', 'CPUMA00',
       'CPUMA10', 'PUMA00_Pop00', 'PUMA10_Pop00', 'Part_Pop00',
       'pPUMA00_Pop00', 'pPUMA10_Pop00', 'PUMA00_Pop10', 'PUMA10_Pop10',
       'Part_Pop10', 'pPUMA00_Pop10', 'pPUMA10_Pop10', 'PUMA00_Land',
       'PUMA10_Land', 'Part_Land', 'pPUMA00_Land', 'pPUMA10_Land'],
      dtype='object')

In [7]:
# The first two PUMA digits correspond to county (which I'll use to filter the data below)
pumas_df['County'] = pumas_df.PUMA10.astype(str).apply(lambda x:x[:2]).astype(int)

In [8]:
puma_cols = ['State10', 'State10_Name', 'PUMA10', 'County', 'PUMA10_Name', 'PUMA10_Pop10', 'PUMA10_Land']

In [9]:
# Including only LA & OC PUMAs
pumas_df = pumas_df[puma_cols][(pumas_df.State10 == 6) & (pumas_df.County.isin([37, 59]))].sort_values('PUMA10')  

In [10]:
# To reveal all data in each cell (instead of cutting off at default max character length)
pd.set_option('display.max_colwidth', -1)

In [11]:
pumas_df.head()

Unnamed: 0,State10,State10_Name,PUMA10,County,PUMA10_Name,PUMA10_Pop10,PUMA10_Land
588,6,California,3701,37,Los Angeles County (North/Unincorporated)--Castaic,139801,5446091332
855,6,California,3701,37,Los Angeles County (North/Unincorporated)--Castaic,139801,5446091332
690,6,California,3701,37,Los Angeles County (North/Unincorporated)--Castaic,139801,5446091332
993,6,California,3701,37,Los Angeles County (North/Unincorporated)--Castaic,139801,5446091332
999,6,California,3701,37,Los Angeles County (North/Unincorporated)--Castaic,139801,5446091332


In [12]:
pumas_df = pumas_df.drop_duplicates().reset_index(drop=True)

In [13]:
pumas_df.head()

Unnamed: 0,State10,State10_Name,PUMA10,County,PUMA10_Name,PUMA10_Pop10,PUMA10_Land
0,6,California,3701,37,Los Angeles County (North/Unincorporated)--Castaic,139801,5446091332
1,6,California,3702,37,Los Angeles County (Northwest)--Santa Clarita City,188102,161374066
2,6,California,3703,37,Los Angeles County (North Central)--Lancaster City,163632,368397749
3,6,California,3704,37,Los Angeles County (North Central)--Palmdale City,164268,330295975
4,6,California,3705,37,Los Angeles County (North)--LA City (Northwest/Chatsworth & Porter Ranch),166932,95429064


---

Examining city naming format:

In [14]:
for city_name in pumas_df.PUMA10_Name.apply(lambda x:x.split('--').pop(1)):
    print(city_name)

Castaic
Santa Clarita City
Lancaster City
Palmdale City
LA City (Northwest/Chatsworth & Porter Ranch)
LA City (North Central/Granada Hills & Sylmar)
LA (North Central/Arleta & Pacoima) & San Fernando Cities
LA City (Northeast/Sunland, Sun Valley & Tujunga)
San Gabriel Valley Region (North)
Baldwin Park, Azusa, Duarte & Irwindale Cities
Glendora, Claremont, San Dimas & La Verne Cities
Pomona City
Covina & Walnut Cities
Diamond Bar, La Habra Heights (East) Cities & Rowland Heights
West Covina City
La Puente & Industry Cities
Arcadia, San Gabriel & Temple City Cities
Pasadena City
Glendale City
Burbank City
LA City (Northeast/North Hollywood & Valley Village)
LA City (North Central/Van Nuys & North Sherman Oaks)
LA City (North Central/Mission Hills & Panorama City)
LA City (Northwest/Encino & Tarzana)
LA City (Northwest/Canoga Park, Winnetka & Woodland Hills)
Calabasas, Agoura Hills, Malibu & Westlake Village Cities
LA City (Central/Pacific Palisades)
Santa Monica City
LA City (West Centr

---

Observations:
- Some cities are located in multiple PUMAs
- Los Angeles City is listed as "LA City"

---

Linking PUMA Codes to Corresponding City Names (in rent_and_activity_data):

In [15]:
len(pumas_df)

87

In [16]:
len(rent_and_activity_data.City)

113

In [17]:
# Extracting PUMA(s) corresponding to each city in 'all_rent_activity_data'
city_pumas_dict = {}
missing_pumas = []
counter = 1

for city in rent_and_activity_data.City:
    if city == 'Los Angeles':
        df = pumas_df[pumas_df.PUMA10_Name.str.contains('LA City')]  # To distinguish it from LA County
    elif city == 'Orange':
        df = pumas_df[pumas_df.PUMA10_Name.str.contains('Orange & Villa Park')]  # To distinguish it from Orange County
    else:
        df = pumas_df[pumas_df.PUMA10_Name.str.contains(city)] 
    
    if len(df) == 0:  # i.e. no PUMA corresponding to city in 'combined' df
        missing_pumas.append(city)
        city_pumas_dict[city] = []
    else:
        city_pumas_dict[city] = list(df.PUMA10.values)

In [18]:
len(city_pumas_dict)

113

In [19]:
city_pumas_dict

{'Aliso Viejo': [5903],
 'Anaheim': [5909, 5910],
 'Brea': [5906],
 'Buena Park': [5908],
 'Costa Mesa': [5918],
 'Cypress': [5908],
 'Dana Point': [],
 'Fountain Valley': [5918],
 'Fullerton': [5907],
 'Garden Grove': [5912, 5913],
 'Huntington Beach': [5914],
 'Irvine': [5904, 5905],
 'La Habra': [3714, 5906],
 'La Palma': [],
 'Laguna Beach': [],
 'Laguna Hills': [5903],
 'Laguna Niguel': [5901],
 'Laguna Woods': [],
 'Lake Forest': [5905],
 'Los Alamitos': [],
 'Mission Viejo': [5902],
 'Newport Beach': [5903],
 'Orange': [5911],
 'Placentia': [5907],
 'Rancho Santa Margarita': [5902, 5915],
 'San Clemente': [5901],
 'San Juan Capistrano': [5901],
 'Santa Ana': [5916, 5917],
 'Seal Beach': [5908],
 'Stanton': [5912],
 'Tustin': [],
 'Westminster': [5912],
 'Yorba Linda': [5906],
 'Agoura Hills': [3726],
 'Alhambra': [3736],
 'Arcadia': [3717],
 'Artesia': [3764],
 'Avalon': [],
 'Azusa': [3710],
 'Baldwin Park': [3710],
 'Bell Gardens': [3741],
 'Bellflower': [3756],
 'Beverly Hill

In [20]:
# Cities w/ no corresponding PUMA
missing_pumas

['Dana Point',
 'La Palma',
 'Laguna Beach',
 'Laguna Woods',
 'Los Alamitos',
 'Tustin',
 'Avalon',
 'El Segundo',
 'La Canada Flintridge',
 'Lomita',
 'Monrovia',
 'San Marino']

In [21]:
# Manually assigned PUMA value of a comparable bordering city that is included in pumas_df 
missing_pumas_dict = {
    'Dana Point':['Laguna Niguel'],
    'La Palma':['Cerritos'],
    'Laguna Beach':['Laguna Niguel'],
    'Laguna Woods':['Laguna Niguel'],
    'Los Alamitos':['Cypress'],
    'Tustin':['Irvine'],
    'Avalon':['Newport Beach'],
    'El Segundo':['Hawthorne'],
    'La Canada Flintridge':['Glendale'],
    'Lomita':['Torrance'],
    'Monrovia':['Arcadia'],
    'San Marino':['San Gabriel']
}

---

Assigning PUMA values to cities w/ missing PUMAs:

In [22]:
for key in missing_pumas_dict:
    pumas = []
    for city in missing_pumas_dict[key]:
        pumas = list(pumas_df.PUMA10[pumas_df.PUMA10_Name.str.contains(city)].values)
    city_pumas_dict[key] = pumas

In [23]:
city_pumas_dict

{'Aliso Viejo': [5903],
 'Anaheim': [5909, 5910],
 'Brea': [5906],
 'Buena Park': [5908],
 'Costa Mesa': [5918],
 'Cypress': [5908],
 'Dana Point': [5901],
 'Fountain Valley': [5918],
 'Fullerton': [5907],
 'Garden Grove': [5912, 5913],
 'Huntington Beach': [5914],
 'Irvine': [5904, 5905],
 'La Habra': [3714, 5906],
 'La Palma': [3764],
 'Laguna Beach': [5901],
 'Laguna Hills': [5903],
 'Laguna Niguel': [5901],
 'Laguna Woods': [5901],
 'Lake Forest': [5905],
 'Los Alamitos': [5908],
 'Mission Viejo': [5902],
 'Newport Beach': [5903],
 'Orange': [5911],
 'Placentia': [5907],
 'Rancho Santa Margarita': [5902, 5915],
 'San Clemente': [5901],
 'San Juan Capistrano': [5901],
 'Santa Ana': [5916, 5917],
 'Seal Beach': [5908],
 'Stanton': [5912],
 'Tustin': [5904, 5905],
 'Westminster': [5912],
 'Yorba Linda': [5906],
 'Agoura Hills': [3726],
 'Alhambra': [3736],
 'Arcadia': [3717],
 'Artesia': [3764],
 'Avalon': [5903],
 'Azusa': [3710],
 'Baldwin Park': [3710],
 'Bell Gardens': [3741],
 'B

In [24]:
city_pumas = pd.DataFrame(list(city_pumas_dict.items()), columns=['City', 'PUMAs'])

In [25]:
city_pumas

Unnamed: 0,City,PUMAs
0,Aliso Viejo,[5903]
1,Anaheim,"[5909, 5910]"
2,Brea,[5906]
3,Buena Park,[5908]
4,Costa Mesa,[5918]
5,Cypress,[5908]
6,Dana Point,[5901]
7,Fountain Valley,[5918]
8,Fullerton,[5907]
9,Garden Grove,"[5912, 5913]"


In [26]:
len(city_pumas)

113

In [27]:
city_pumas.to_csv('city_pumas.csv', index=False)

---