<h2><center>Predicting Home Values in Los Angeles’ South Bay</center></h2>
<h3><center>Springboard | Capstone 1:   Data Wrangling Code</center></h3>
<h4><center>By: Lauren Broussard</center></h4>

In [1]:
#import necessary modules
import glob as glob
import pandas as pd
import os

#### IMPORT DATA AND MERGE TABLES

In [2]:
def concat_redfin(pathway, collection_num):
    '''Function to merge all csv files together into one dataframe
    and create additional columns to track data'''

    # get list of filenames and verify length
    filenames = glob.glob(pathway)
    print("# of Files in Collection {collection}: ".format(collection=collection_num),len(filenames))

    # create empty list to hold dataframes
    redfin_lst = []

    # create list of dataframes, redfin_lst
    for file in filenames:
        df = pd.read_csv(file)
        redfin_lst.append(df)

    # append filename and collection number to each dataframe,
    for dataframe, filename in zip(redfin_lst, filenames):
        dataframe['FILENAME'] = os.path.basename(filename)
        dataframe['COLLECTION'] = collection_num
        dataframe['SOLD DATE'] = pd.to_datetime(dataframe['SOLD DATE'], format='%B-%d-%Y')
        
    # stack dataframes together
    redfin = pd.concat(redfin_lst)

     
    return redfin

In [3]:
# run concat_redfin function on each collection of files, and assign collection number
redfin1 = concat_redfin('../data/raw/Redfin Files/Redfin-1/*.csv', '1')
redfin2 = concat_redfin('../data/raw/Redfin Files/Redfin-2/*.csv', '2')

# of Files in Collection 1:  53
# of Files in Collection 2:  31


#### View initial information about the DataFrames

In [4]:
# print basic info about Collection #1 dataframe
print("Collection 1: Rows - {rows}; Columns - {columns}".format(rows=len(redfin1.index), \
                                                                columns=len(redfin1.columns)))

Collection 1: Rows - 11688; Columns - 29


In [5]:
# print basic info about Collection #2 dataframe
print("Collection 2: Rows - {rows}; Columns - {columns}".format(rows=len(redfin2.index), \
                                                                columns=len(redfin2.columns)))

Collection 2: Rows - 7839; Columns - 29


#### Final Merge 

In [6]:
# merge collections 1 and 2 together and reset index for new dataframe
south_bay_orig = pd.concat([redfin1,redfin2]).reset_index(drop=True)

In [7]:
# drop 'URL column' and inspect new dataframe
south_bay_orig.drop('URL (SEE http://www.redfin.com/buy-a-home/comparative-market-analysis FOR INFO ON PRICING)'\
                    ,axis=1,inplace=True)
south_bay_orig.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19527 entries, 0 to 19526
Data columns (total 28 columns):
 #   Column                      Non-Null Count  Dtype         
---  ------                      --------------  -----         
 0   SALE TYPE                   19527 non-null  object        
 1   SOLD DATE                   17133 non-null  datetime64[ns]
 2   PROPERTY TYPE               19527 non-null  object        
 3   ADDRESS                     19509 non-null  object        
 4   CITY                        19519 non-null  object        
 5   STATE OR PROVINCE           19527 non-null  object        
 6   ZIP OR POSTAL CODE          19507 non-null  float64       
 7   PRICE                       19527 non-null  int64         
 8   BEDS                        19349 non-null  float64       
 9   BATHS                       19294 non-null  float64       
 10  LOCATION                    17113 non-null  object        
 11  SQUARE FEET                 19072 non-null  float64   

#### DROPPING COLUMNS

Let's continue to inspect the data, and remove any additional unnecessary columns.

In [8]:
# Inspect Sale Type, Status columns
group_cols = south_bay_orig.groupby(['SALE TYPE', 'STATUS'])['SOLD DATE'].count()
print(group_cols)

SALE TYPE  STATUS
PAST SALE  Sold      17133
Name: SOLD DATE, dtype: int64


In [9]:
#create list of other columns to drop
cols_to_drop = ['SALE TYPE', 'NEXT OPEN HOUSE START TIME', 'NEXT OPEN HOUSE END TIME',\
                'STATUS','FAVORITE', 'INTERESTED', 'SOURCE']

# create new south_bay dataframe and drop columns
south_bay = south_bay_orig.drop(cols_to_drop, axis=1)

In [10]:
# verify remaining columns
print(south_bay.columns, len(south_bay.columns))

Index(['SOLD DATE', 'PROPERTY TYPE', 'ADDRESS', 'CITY', 'STATE OR PROVINCE',
       'ZIP OR POSTAL CODE', 'PRICE', 'BEDS', 'BATHS', 'LOCATION',
       'SQUARE FEET', 'LOT SIZE', 'YEAR BUILT', 'DAYS ON MARKET',
       '$/SQUARE FEET', 'HOA/MONTH', 'MLS#', 'LATITUDE', 'LONGITUDE',
       'FILENAME', 'COLLECTION'],
      dtype='object') 21


#### LOOK FOR MISSING/INCORRECT VALUES (BY FEATURE)

SOLD DATE:

In [11]:
# remove rows with no sold date
south_bay.dropna(subset=['SOLD DATE'], axis=0, inplace=True)

In [12]:
# find min and max sold dates for each collection 
coll1 = south_bay['COLLECTION'] == '1'
coll2 = south_bay['COLLECTION'] == '2'

print("Collection 1: Min Date - ", south_bay[coll1]['SOLD DATE'].min(), \
      "Max Date - ", south_bay[coll1]['SOLD DATE'].max())
print("Collection 2: Min Date - ", south_bay[coll2]['SOLD DATE'].min(), "Max Date - ", \
      south_bay[coll2]['SOLD DATE'].max())

Collection 1: Min Date -  2018-01-22 00:00:00 Max Date -  2020-01-24 00:00:00
Collection 2: Min Date -  2018-02-06 00:00:00 Max Date -  2020-02-07 00:00:00


In [13]:
# reduce dataframe to keep SOLD DATES from 2018-02-06 to 2020-01-24
south_bay = south_bay[(south_bay['SOLD DATE'] >= '2018-02-06') & (south_bay['SOLD DATE'] <= '2020-01-24')]

PROPERTY TYPE:

In [14]:
# look at values for property type column
group_type = south_bay.groupby(['PROPERTY TYPE'])['SOLD DATE'].count()
print(group_type)

PROPERTY TYPE
Condo/Co-op                  2610
Mobile/Manufactured Home      351
Multi-Family (2-4 Unit)      1069
Multi-Family (5+ Unit)        273
Single Family Residential    9950
Townhouse                    2606
Vacant Land                    56
Name: SOLD DATE, dtype: int64


In [15]:
# further inspect 'Condo/Co-op' 
south_bay[south_bay['PROPERTY TYPE'] == 'Condo/Co-op'].describe()

Unnamed: 0,ZIP OR POSTAL CODE,PRICE,BEDS,BATHS,SQUARE FEET,LOT SIZE,YEAR BUILT,DAYS ON MARKET,$/SQUARE FEET,HOA/MONTH,LATITUDE,LONGITUDE
count,2610.0,2610.0,2610.0,2610.0,2610.0,2492.0,2608.0,2610.0,2610.0,2573.0,2610.0,2610.0
mean,90420.4,537465.4,2.063218,1.893103,1147.47318,1159523.0,1981.980061,367.028352,477.033333,343.550719,33.849764,-118.345543
std,198.762475,253235.2,0.755856,0.624512,380.734403,31510840.0,14.498535,206.966548,157.521229,121.119032,0.072976,0.048119
min,90043.0,183000.0,0.0,0.75,397.0,4.0,1944.0,2.0,105.0,10.0,33.708504,-118.453205
25%,90275.0,375000.0,2.0,1.5,875.0,24724.0,1973.0,189.0,368.0,275.0,33.798982,-118.380794
50%,90302.0,469250.0,2.0,2.0,1098.0,65510.5,1980.0,359.5,441.0,338.0,33.832205,-118.341884
75%,90505.0,640000.0,2.0,2.0,1351.75,176982.0,1988.0,554.0,549.0,408.0,33.900681,-118.302446
max,90746.0,3599000.0,5.0,4.0,3150.0,1148808000.0,2019.0,730.0,1477.0,2346.0,33.980126,-118.246078


In [16]:
# further inspect 'Mobile/Manufactured Home' 
south_bay[south_bay['PROPERTY TYPE'] == 'Mobile/Manufactured Home'].describe()

Unnamed: 0,ZIP OR POSTAL CODE,PRICE,BEDS,BATHS,SQUARE FEET,LOT SIZE,YEAR BUILT,DAYS ON MARKET,$/SQUARE FEET,HOA/MONTH,LATITUDE,LONGITUDE
count,351.0,351.0,351.0,351.0,299.0,40.0,344.0,351.0,299.0,31.0,351.0,351.0
mean,90582.982906,135404.792023,2.136752,1.695157,1167.250836,486752.8,1981.781977,366.763533,119.187291,685.387097,33.818503,-118.30852
std,167.340576,108172.99229,0.644617,0.426805,381.750956,858894.6,16.269294,212.340665,88.758429,553.406763,0.040502,0.031317
min,90247.0,25.0,1.0,1.0,370.0,1302.0,1955.0,1.0,0.0,0.0,33.723242,-118.396473
25%,90502.0,65250.0,2.0,1.25,890.0,3737.5,1971.0,176.5,65.5,424.0,33.799656,-118.329708
50%,90710.0,105000.0,2.0,2.0,1160.0,79801.5,1975.0,385.0,101.0,550.0,33.809912,-118.306607
75%,90732.0,180000.0,3.0,2.0,1440.0,369689.0,2000.0,547.5,144.0,778.0,33.859199,-118.294537
max,90748.0,660000.0,4.0,2.75,2311.0,3113589.0,2018.0,722.0,688.0,2697.0,33.916221,-118.242632


In [17]:
# further inspect 'Multi-Family (2-4 Unit)'
south_bay[south_bay['PROPERTY TYPE'] == 'Multi-Family (2-4 Unit)'].describe()

Unnamed: 0,ZIP OR POSTAL CODE,PRICE,BEDS,BATHS,SQUARE FEET,LOT SIZE,YEAR BUILT,DAYS ON MARKET,$/SQUARE FEET,HOA/MONTH,LATITUDE,LONGITUDE
count,1069.0,1069.0,1065.0,1010.0,1016.0,1069.0,1068.0,1069.0,1016.0,5.0,1069.0,1069.0
mean,90330.855005,941030.4,5.784038,3.833168,2447.697835,6536.525725,1952.057116,349.572498,2755.102362,503.2,33.882303,-118.318409
std,256.994827,930930.6,3.156675,2.177917,1020.367559,5438.739009,24.63544,209.666254,34756.900603,192.363458,0.080528,0.048224
min,90001.0,51000.0,0.0,1.0,1.0,1345.0,1895.0,1.0,35.0,395.0,33.709403,-118.452421
25%,90059.0,589900.0,4.0,2.0,1679.75,5101.0,1932.5,167.0,274.0,396.0,33.829304,-118.356675
50%,90277.0,735000.0,5.0,3.0,2173.5,5888.0,1952.0,323.0,334.0,440.0,33.898502,-118.307881
75%,90502.0,1040000.0,7.0,4.0,3148.5,7123.0,1963.0,544.0,420.0,440.0,33.952489,-118.279133
max,90810.0,24250000.0,40.0,24.0,6039.0,144956.0,2019.0,729.0,635000.0,845.0,33.987501,-118.214689


In [18]:
# further inspect 'Multi-Family (5+ Unit)' 
south_bay[south_bay['PROPERTY TYPE'] == 'Multi-Family (5+ Unit)'].describe()

Unnamed: 0,ZIP OR POSTAL CODE,PRICE,BEDS,BATHS,SQUARE FEET,LOT SIZE,YEAR BUILT,DAYS ON MARKET,$/SQUARE FEET,HOA/MONTH,LATITUDE,LONGITUDE
count,273.0,273.0,269.0,271.0,273.0,273.0,272.0,273.0,273.0,1.0,273.0,273.0
mean,90350.095238,2722932.0,14.137546,15.274908,7662.974359,2308953.0,1960.672794,373.479853,978.150183,188.0,33.884127,-118.333385
std,196.19419,2636131.0,9.899855,49.988398,6866.522381,26797810.0,19.540088,218.84362,10067.124792,,0.075679,0.041168
min,90001.0,300000.0,0.0,1.0,3.0,2259.0,1912.0,2.0,106.0,188.0,33.712579,-118.442873
25%,90250.0,1275000.0,8.0,6.0,4297.0,6802.0,1953.0,159.0,254.0,188.0,33.823017,-118.361604
50%,90301.0,1699000.0,12.0,8.0,5918.0,8693.0,1960.0,434.0,301.0,188.0,33.889938,-118.339825
75%,90501.0,3200000.0,16.0,14.0,8910.0,12013.0,1965.0,562.0,388.0,188.0,33.953739,-118.299207
max,90745.0,24250000.0,73.0,784.0,81243.0,313675600.0,2018.0,721.0,166667.0,188.0,33.989008,-118.242705


In [19]:
# inspect 'Vacant Land' Property Type
south_bay[south_bay['PROPERTY TYPE'] == 'Vacant Land'].describe()

Unnamed: 0,ZIP OR POSTAL CODE,PRICE,BEDS,BATHS,SQUARE FEET,LOT SIZE,YEAR BUILT,DAYS ON MARKET,$/SQUARE FEET,HOA/MONTH,LATITUDE,LONGITUDE
count,46.0,56.0,3.0,0.0,0.0,56.0,0.0,56.0,0.0,0.0,56.0,56.0
mean,90459.108696,802399.8,0.0,,,16307.410714,,395.803571,,,33.813729,-118.331107
std,226.549796,723659.4,0.0,,,31739.495162,,205.788454,,,0.082748,0.044567
min,90247.0,134000.0,0.0,,,2120.0,,100.0,,,33.709628,-118.447281
25%,90275.0,380000.0,0.0,,,5552.5,,198.0,,,33.740054,-118.358161
50%,90297.0,525000.0,0.0,,,9539.5,,382.0,,,33.803782,-118.326383
75%,90731.0,966250.0,0.0,,,17197.75,,598.25,,,33.880477,-118.29353
max,90745.0,3125000.0,0.0,,,240608.0,,714.0,,,33.975688,-118.2787


In [20]:
# keep Property Type with values: Single Family Residential, Townhouse, Condo/Co-op, Mobile/Manufactured Home.
prop_types = ['Single Family Residential', 'Townhouse', 'Condo/Co-op', 'Mobile/Manufactured Home']

south_bay = south_bay[south_bay['PROPERTY TYPE'].isin(prop_types)]

In [21]:
# verify result
south_bay.groupby(['PROPERTY TYPE'])['SOLD DATE'].count()

PROPERTY TYPE
Condo/Co-op                  2610
Mobile/Manufactured Home      351
Single Family Residential    9950
Townhouse                    2606
Name: SOLD DATE, dtype: int64

ADDRESS: 

In [22]:
# view records with empty Address
south_bay[south_bay['ADDRESS'].isnull()]

Unnamed: 0,SOLD DATE,PROPERTY TYPE,ADDRESS,CITY,STATE OR PROVINCE,ZIP OR POSTAL CODE,PRICE,BEDS,BATHS,LOCATION,...,LOT SIZE,YEAR BUILT,DAYS ON MARKET,$/SQUARE FEET,HOA/MONTH,MLS#,LATITUDE,LONGITUDE,FILENAME,COLLECTION
1068,2018-12-21,Single Family Residential,,Manhattan Beach,CA,90266.0,3300000,4.0,3.5,143 - Manhattan Bch Tree,...,5060.0,1990.0,400.0,799.0,,SB18269897,33.891926,-118.401617,manhattan-beach_2.5M_plus.csv,1
3555,2019-09-04,Townhouse,,Carson,CA,90745.0,465000,3.0,2.5,139 - South Carson,...,,1980.0,143.0,329.0,285.0,SB19140889,33.82792,-118.282972,carson_nosfh_nocondo.csv,1
3926,2019-02-25,Single Family Residential,,Gardena,CA,90247.0,593750,3.0,2.0,,...,5452.0,1961.0,331.0,433.0,,19423232,33.874884,-118.288451,harbor-gateway-north.csv,1
5677,2019-02-25,Single Family Residential,,Gardena,CA,90247.0,593750,3.0,2.0,,...,5452.0,1961.0,334.0,433.0,,19423232,33.874884,-118.288451,gardena_sfh_south_of_red_bch_blvd.csv,1
10626,2018-06-22,Condo/Co-op,,Torrance,CA,90502.0,450000,3.0,1.5,123 - County Strip,...,132433.0,1979.0,582.0,350.0,,SB18111298,33.823876,-118.289422,carson_condos.csv,1
16655,2018-06-22,Condo/Co-op,,Torrance,CA,90502.0,450000,3.0,1.5,123 - County Strip,...,132433.0,1979.0,594.0,350.0,,SB18111298,33.823876,-118.289422,torrance_condos-0_to_500K.csv,2
16680,2018-04-13,Condo/Co-op,,Torrance,CA,90505.0,458500,3.0,2.0,,...,984919.0,1963.0,664.0,410.0,352.0,18309548,33.820458,-118.341884,torrance_condos-0_to_500K.csv,2


In [23]:
# drop rows with no address data
south_bay.dropna(subset=['ADDRESS'], axis=0, inplace=True)

In [24]:
# update addresses to mixed case 
south_bay['ADDRESS'] = south_bay['ADDRESS'].apply(lambda x: x.title())

CITY:

In [25]:
# change cities to mixed case
south_bay['CITY'] = south_bay['CITY'].astype(str).apply(lambda x: x.title()) 

In [26]:
# view rows with no CITY data
south_bay[south_bay['CITY'].isnull()]

Unnamed: 0,SOLD DATE,PROPERTY TYPE,ADDRESS,CITY,STATE OR PROVINCE,ZIP OR POSTAL CODE,PRICE,BEDS,BATHS,LOCATION,...,LOT SIZE,YEAR BUILT,DAYS ON MARKET,$/SQUARE FEET,HOA/MONTH,MLS#,LATITUDE,LONGITUDE,FILENAME,COLLECTION


In [27]:
# update missing City data
south_bay.at[3695, 'CITY'] = 'Harbor City'
south_bay.at[8764,'CITY'] = 'Palos Verdes Estates'

In [28]:
# group by City
south_bay.groupby(['CITY'])['SOLD DATE'].count()

CITY
Carson                     998
Compton                      3
County - Los Angeles         4
El Segundo                 250
Gardena                    833
Harbor                       1
Harbor City                448
Hawthorne                  840
Hermosa Beach              429
Inglewood                  667
Ladera Heights               2
Lawndale                   312
Lennox                      19
Lomita                     430
Long Beach                   2
Los Angeles               1155
Los Feliz                    1
Manhattan Beach            770
Nan                          2
Palos Verdes Estates       375
Palos Verdes Peninsula      13
Park Hills Heights           2
Playa Del Rey              347
Rancho Palos Verdes       1032
Redondo Beach             1820
Rolling Hills               39
Rolling Hills Estates      270
San Bernardino               1
San Pedro                 1104
Torrance                  3006
Venice                       1
View Park                    1
Wes

In [29]:
# check cities with under 5 or fewer sales 
small_city_count = ['Compton','County - Los Angeles', 'Harbor', 'Ladera Heights', 'Long Beach', 'Los Feliz',\
                'Park Hills Heights', 'San Bernardino', 'Venice', 'View Park']

city_check = south_bay[south_bay['CITY'].isin(small_city_count)]
city_check.to_csv('../data/interim/city_check.csv')

In [30]:
# create mapping for city names to update

small_city_map = {'County - Los Angeles': 'Los Angeles', 
                  'Los Feliz': 'Los Angeles', 'Harbor': 'Harbor City', 
                  'Ladera Heights': 'Los Angeles', 
                  'Park Hills Heights': 'Los Angeles', 
                  'View Park': 'Los Angeles', 'Venice': 'Playa Del Rey'}

In [31]:
# replace city names with mapping above 
south_bay.replace({'CITY' : small_city_map}, inplace=True)

In [32]:
# drop row with san bernardino city  
south_bay.drop(south_bay[south_bay['CITY'] == 'San Bernardino'].index, inplace=True)

In [33]:
# verify results
south_bay.groupby(['CITY'])['SOLD DATE'].count()

CITY
Carson                     998
Compton                      3
El Segundo                 250
Gardena                    833
Harbor City                449
Hawthorne                  840
Hermosa Beach              429
Inglewood                  667
Lawndale                   312
Lennox                      19
Lomita                     430
Long Beach                   2
Los Angeles               1165
Manhattan Beach            770
Nan                          2
Palos Verdes Estates       375
Palos Verdes Peninsula      13
Playa Del Rey              348
Rancho Palos Verdes       1032
Redondo Beach             1820
Rolling Hills               39
Rolling Hills Estates      270
San Pedro                 1104
Torrance                  3006
Westchester                 99
Wilmington                 234
Name: SOLD DATE, dtype: int64

ZIP/POSTAL CODES:

In [34]:
# view rows with no zip code data
south_bay[south_bay['ZIP OR POSTAL CODE'].isnull()]

Unnamed: 0,SOLD DATE,PROPERTY TYPE,ADDRESS,CITY,STATE OR PROVINCE,ZIP OR POSTAL CODE,PRICE,BEDS,BATHS,LOCATION,...,LOT SIZE,YEAR BUILT,DAYS ON MARKET,$/SQUARE FEET,HOA/MONTH,MLS#,LATITUDE,LONGITUDE,FILENAME,COLLECTION


In [35]:
# check for zip code errors
zip_check = (south_bay['ZIP OR POSTAL CODE'] < 90000) | (south_bay['ZIP OR POSTAL CODE'] >= 99999)
print(south_bay.loc[:,['ADDRESS', 'PROPERTY TYPE', 'CITY', 'ZIP OR POSTAL CODE']][zip_check])

                       ADDRESS PROPERTY TYPE    CITY  ZIP OR POSTAL CODE
3340  1984 #7 Rolling Vista Dr     Townhouse  Lomita             70717.0
4616  1984 #7 Rolling Vista Dr     Townhouse  Lomita             70717.0


In [36]:
# update zip codes to from 70717 to 90717
south_bay.at[7501, 'ZIP OR POSTAL CODE'] = 90717.0
south_bay.at[10126,'ZIP OR POSTAL CODE'] = 90717.0

PRICE: 

In [37]:
# check for min and max prices
south_bay[(south_bay['PRICE'] == south_bay['PRICE'].min()) | (south_bay['PRICE'] == south_bay['PRICE'].max())]

Unnamed: 0,SOLD DATE,PROPERTY TYPE,ADDRESS,CITY,STATE OR PROVINCE,ZIP OR POSTAL CODE,PRICE,BEDS,BATHS,LOCATION,...,LOT SIZE,YEAR BUILT,DAYS ON MARKET,$/SQUARE FEET,HOA/MONTH,MLS#,LATITUDE,LONGITUDE,FILENAME,COLLECTION
3302,2020-01-08,Mobile/Manufactured Home,24200 Walnut St #5,Torrance,CA,90501.0,25,2.0,1.0,129 - South Torrance,...,,1963.0,17.0,0.0,,CV19264617,33.806188,-118.310948,lomita_01_nosfh.csv,1
15817,2018-02-27,Single Family Residential,417 Paseo De La Playa,Redondo Beach,CA,90277.0,22650000,7.0,8.75,128 - Hollywood Riviera,...,62800.0,2005.0,709.0,2221.0,,PV17206016,33.810415,-118.39107,redondo_sfh-4_plus_beds.csv,2
16867,2020-01-08,Mobile/Manufactured Home,24200 Walnut St #5,Torrance,CA,90501.0,25,2.0,1.0,129 - South Torrance,...,,1963.0,29.0,0.0,,CV19264617,33.806188,-118.310948,torrance_nosfh-notownhome-nocondo.csv,2
18031,2018-02-27,Single Family Residential,417 Paseo De La Playa,Redondo Beach,CA,90277.0,22650000,7.0,8.75,128 - Hollywood Riviera,...,62800.0,2005.0,709.0,2221.0,,PV17206016,33.810415,-118.39107,torrance_sfh-5-plus-beds.csv,2


In [38]:
# view price info
south_bay['PRICE'].describe()

count    1.550900e+04
mean     9.846371e+05
std      9.211453e+05
min      2.500000e+01
25%      5.350000e+05
50%      7.250000e+05
75%      1.165000e+06
max      2.265000e+07
Name: PRICE, dtype: float64

In [39]:
# view low prices
south_bay[south_bay['PRICE'] < 10000] 

Unnamed: 0,SOLD DATE,PROPERTY TYPE,ADDRESS,CITY,STATE OR PROVINCE,ZIP OR POSTAL CODE,PRICE,BEDS,BATHS,LOCATION,...,LOT SIZE,YEAR BUILT,DAYS ON MARKET,$/SQUARE FEET,HOA/MONTH,MLS#,LATITUDE,LONGITUDE,FILENAME,COLLECTION
886,2018-06-16,Single Family Residential,120 E Hardy St,Inglewood,CA,90301.0,2100,2.0,1.0,102 - South Inglewood,...,3574.0,1926.0,585.0,2.0,,IN18136584,33.948858,-118.353954,inglewood_03_sfh_2_beds_or_less.csv,1
993,2019-03-26,Single Family Residential,9910 S Village Dr #1,Inglewood,CA,90305.0,1775,1.0,1.0,North Inglewood,...,13667.0,1958.0,302.0,0.0,,19-446172,33.946075,-118.32886,inglewood_03_sfh_2_beds_or_less.csv,1
3302,2020-01-08,Mobile/Manufactured Home,24200 Walnut St #5,Torrance,CA,90501.0,25,2.0,1.0,129 - South Torrance,...,,1963.0,17.0,0.0,,CV19264617,33.806188,-118.310948,lomita_01_nosfh.csv,1
16867,2020-01-08,Mobile/Manufactured Home,24200 Walnut St #5,Torrance,CA,90501.0,25,2.0,1.0,129 - South Torrance,...,,1963.0,29.0,0.0,,CV19264617,33.806188,-118.310948,torrance_nosfh-notownhome-nocondo.csv,2


In [40]:
# drop records where price < 10000
south_bay.drop(south_bay[south_bay['PRICE'] < 10000].index,axis=0,inplace=True)

BEDS:

In [41]:
# view empty data for Beds column 
south_bay[south_bay['BEDS'].isnull()]

Unnamed: 0,SOLD DATE,PROPERTY TYPE,ADDRESS,CITY,STATE OR PROVINCE,ZIP OR POSTAL CODE,PRICE,BEDS,BATHS,LOCATION,...,LOT SIZE,YEAR BUILT,DAYS ON MARKET,$/SQUARE FEET,HOA/MONTH,MLS#,LATITUDE,LONGITUDE,FILENAME,COLLECTION
8361,2018-03-27,Townhouse,3557 W 132Nd St,Hawthorne,CA,90250.0,595000,,,110 - East Hawthorne,...,7260.0,1940.0,669.0,426.0,,RS17252089,33.912505,-118.33494,hawthorne_nosfh_nocondo.csv,1
18378,2019-01-08,Townhouse,2518 Gates Ave,Redondo Beach,CA,90278.0,1550000,,,151 - N Redondo Bch/Villas North,...,7500.0,1951.0,394.0,572.0,,SB18283550,33.878096,-118.3648,redondo_townhome-0-2-beds.csv,2


In [42]:
#drop all rows with missing beds data
south_bay.dropna(subset=['BEDS'], axis=0, inplace=True)

In [43]:
# view range for num of beds
south_bay['BEDS'].describe()

count    15503.000000
mean         3.086499
std          1.094285
min          0.000000
25%          2.000000
50%          3.000000
75%          4.000000
max         32.000000
Name: BEDS, dtype: float64

In [44]:
# display beds that are 0
south_bay[(south_bay['BEDS'] == 0)]

Unnamed: 0,SOLD DATE,PROPERTY TYPE,ADDRESS,CITY,STATE OR PROVINCE,ZIP OR POSTAL CODE,PRICE,BEDS,BATHS,LOCATION,...,LOT SIZE,YEAR BUILT,DAYS ON MARKET,$/SQUARE FEET,HOA/MONTH,MLS#,LATITUDE,LONGITUDE,FILENAME,COLLECTION
1288,2018-11-06,Single Family Residential,2613 Crest Dr,Manhattan Beach,CA,90266.0,4530040,0.0,,142 - Manhattan Bch Sand,...,3512.0,1935.0,445.0,45300.0,,SB18024122,33.894668,-118.414787,manhattan-beach_2.5M_plus.csv,1
1308,2019-08-09,Single Family Residential,461 26Th St,Manhattan Beach,CA,90266.0,2700000,0.0,,142 - Manhattan Bch Sand,...,2698.0,1953.0,169.0,27000.0,,SB19039219,33.895254,-118.412626,manhattan-beach_2.5M_plus.csv,1
1684,2019-09-20,Single Family Residential,13905 Inglewood Ave,Hawthorne,CA,90250.0,470000,0.0,2.0,110 - East Hawthorne,...,3763.0,1954.0,124.0,285.0,,DW19224759,33.905094,-118.361558,del-aire.csv,1
4134,2018-04-24,Single Family Residential,10605 Compton Ave,Los Angeles,CA,90002.0,165000,0.0,,699 - Not Defined,...,3600.0,1913.0,641.0,125.0,,DW17279455,33.939427,-118.246381,watts_0_to_3_beds.csv,1
4192,2018-12-01,Single Family Residential,9702 Wilmington Ave,Los Angeles,CA,90002.0,193000,0.0,1.5,C37 - Metropolitan South,...,4493.0,1958.0,420.0,268.0,,DW18253758,33.948125,-118.239008,watts_0_to_3_beds.csv,1
4450,2018-02-16,Condo/Co-op,3601 W Hidden Ln #114,Rolling Hills Estates,CA,90274.0,295000,0.0,1.0,165 - PV Dr North,...,433780.0,1973.0,708.0,670.0,325.0,PV18009090,33.785169,-118.34164,rolling-hills-estates_no_sfh.csv,1
4494,2018-08-14,Condo/Co-op,6526 Ocean Crest Dr Unit A206,Rancho Palos Verdes,CA,90275.0,340000,0.0,1.0,171 - Country Club,...,167165.0,1973.0,529.0,789.0,282.0,PV18148707,33.763503,-118.393794,rolling-hills-estates_no_sfh.csv,1
4503,2018-11-09,Condo/Co-op,3602 W Estates Ln #114,Rolling Hills Estates,CA,90274.0,295000,0.0,1.0,165 - PV Dr North,...,433780.0,1973.0,442.0,670.0,338.0,SB18222834,33.785169,-118.34164,rolling-hills-estates_no_sfh.csv,1
4515,2018-12-11,Condo/Co-op,3605 W Hidden Ln #212,Rolling Hills Estates,CA,90274.0,307000,0.0,1.0,165 - PV Dr North,...,433780.0,1973.0,410.0,698.0,338.0,SB18233933,33.785169,-118.34164,rolling-hills-estates_no_sfh.csv,1
4521,2019-04-19,Condo/Co-op,6526 Ocean Crest Dr Unit A308,Rancho Palos Verdes,CA,90275.0,370000,0.0,1.0,171 - Country Club,...,167165.0,1973.0,281.0,858.0,291.0,CV19049069,33.763503,-118.393794,rolling-hills-estates_no_sfh.csv,1


BATHS:

In [45]:
# view empty data for Baths column 
south_bay[south_bay['BATHS'].isnull()]

Unnamed: 0,SOLD DATE,PROPERTY TYPE,ADDRESS,CITY,STATE OR PROVINCE,ZIP OR POSTAL CODE,PRICE,BEDS,BATHS,LOCATION,...,LOT SIZE,YEAR BUILT,DAYS ON MARKET,$/SQUARE FEET,HOA/MONTH,MLS#,LATITUDE,LONGITUDE,FILENAME,COLLECTION
1288,2018-11-06,Single Family Residential,2613 Crest Dr,Manhattan Beach,CA,90266.0,4530040,0.0,,142 - Manhattan Bch Sand,...,3512.0,1935.0,445.0,45300.0,,SB18024122,33.894668,-118.414787,manhattan-beach_2.5M_plus.csv,1
1308,2019-08-09,Single Family Residential,461 26Th St,Manhattan Beach,CA,90266.0,2700000,0.0,,142 - Manhattan Bch Sand,...,2698.0,1953.0,169.0,27000.0,,SB19039219,33.895254,-118.412626,manhattan-beach_2.5M_plus.csv,1
4134,2018-04-24,Single Family Residential,10605 Compton Ave,Los Angeles,CA,90002.0,165000,0.0,,699 - Not Defined,...,3600.0,1913.0,641.0,125.0,,DW17279455,33.939427,-118.246381,watts_0_to_3_beds.csv,1
7369,2018-10-25,Single Family Residential,3529 Pine Ave,Manhattan Beach,CA,90266.0,2149000,4.0,,143 - Manhattan Bch Tree,...,4640.0,1972.0,457.0,711.0,,SB18230875,33.900901,-118.399192,manhattan-beach_2.5M_max.csv,1
8806,2019-12-20,Townhouse,546 W Kelso St,Inglewood,CA,90301.0,1208888,10.0,,101 - North Inglewood,...,6000.0,2019.0,33.0,336.0,,SB19239438,33.95854,-118.365872,inglewood_01_noSFH_nocondo.csv,1
10189,2018-04-02,Single Family Residential,22003 Meyler St,Torrance,CA,90502.0,742000,3.0,,123 - County Strip,...,4970.0,2018.0,663.0,328.0,,SB18010337,33.827959,-118.294971,carson_sfh_600K_plus.csv,1
11158,2018-03-23,Single Family Residential,1122 E Anaheim St,Wilmington,CA,90744.0,297000,0.0,,East Wilmington,...,4800.0,1922.0,673.0,197.0,,17-248776,33.780515,-118.248528,wilmington_0_to_3_beds.csv,1
11314,2018-09-07,Single Family Residential,820 N Wilmington Blvd,Wilmington,CA,90744.0,360000,0.0,,West Wilmington,...,7584.0,1950.0,505.0,,,18-311382,33.779996,-118.274241,wilmington_0_to_3_beds.csv,1
11315,2018-09-07,Single Family Residential,814 N Wilmington Blvd,Wilmington,CA,90744.0,360000,0.0,,West Wilmington,...,7584.0,1950.0,505.0,250.0,,18-311388,33.779934,-118.273972,wilmington_0_to_3_beds.csv,1
14818,2018-04-02,Single Family Residential,22003 Meyler St,Torrance,CA,90502.0,742000,3.0,,123 - County Strip,...,4970.0,2018.0,675.0,328.0,,SB18010337,33.827959,-118.294971,torrance_sfh-700-800k.csv,2


In [46]:
#drop all rows with missing baths data
south_bay.dropna(subset=['BATHS'], axis=0, inplace=True)

In [47]:
# view range for num of baths
south_bay['BATHS'].describe()

count    15493.000000
mean         2.274963
std          0.990112
min          0.500000
25%          1.750000
50%          2.000000
75%          2.750000
max         25.000000
Name: BATHS, dtype: float64

BED/BATHS:

In [48]:
# view records with large amount of bedrooms or bathrooms
south_bay[(south_bay['BEDS'] > 10) | (south_bay['BATHS'] > 10)]

Unnamed: 0,SOLD DATE,PROPERTY TYPE,ADDRESS,CITY,STATE OR PROVINCE,ZIP OR POSTAL CODE,PRICE,BEDS,BATHS,LOCATION,...,LOT SIZE,YEAR BUILT,DAYS ON MARKET,$/SQUARE FEET,HOA/MONTH,MLS#,LATITUDE,LONGITUDE,FILENAME,COLLECTION
3882,2018-05-16,Townhouse,530 W 168Th St,Gardena,CA,90248.0,995000,12.0,6.0,116 - North Gateway,...,11648.0,1915.0,616.0,291.0,,SB18077856,33.879114,-118.283701,harbor-gateway-north.csv,1
6417,2018-05-16,Townhouse,530 W 168Th St,Gardena,CA,90248.0,995000,12.0,6.0,116 - North Gateway,...,11648.0,1915.0,619.0,291.0,,SB18077856,33.879114,-118.283701,gardena_01_nosfh_nocondo.csv,1
8890,2018-10-22,Townhouse,3533 W 108Th St,Inglewood,CA,90303.0,1000000,22.0,11.0,102 - South Inglewood,...,9067.0,1927.0,457.0,283.0,,SB18046396,33.93801,-118.333944,inglewood_01_noSFH_nocondo.csv,1
8989,2018-06-22,Townhouse,9725 Crenshaw Blvd,Inglewood,CA,90305.0,1299999,32.0,16.0,699 - Not Defined,...,7192.0,1955.0,579.0,,,RS18001633,33.947687,-118.327046,inglewood_01_noSFH_nocondo.csv,1
9498,2018-10-30,Single Family Residential,1 Buggy Whip Dr,Rolling Hills,CA,90274.0,22400000,9.0,25.0,Rolling Hills,...,322344.0,2001.0,449.0,439.0,6500.0,18-311552,33.760302,-118.356741,rolling-hills.csv,1
9801,2018-04-16,Townhouse,232 E 79Th St,Los Angeles,CA,90003.0,600000,21.0,9.0,C37 - Metropolitan South,...,5555.0,1959.0,649.0,251.0,,DW17199679,33.967653,-118.270592,florence_4_plus_beds.csv,1
10003,2018-02-27,Townhouse,637 E 83Rd St,Los Angeles,CA,90001.0,750000,20.0,12.0,C37 - Metropolitan South,...,5104.0,2015.0,697.0,208.0,,CV17175097,33.963935,-118.263649,florence_4_plus_beds.csv,1
12702,2018-07-06,Townhouse,2413 W Vanderbilt Ln W,Redondo Beach,CA,90278.0,1425000,20.0,16.0,151 - N Redondo Bch/Villas North,...,7491.0,1946.0,580.0,651.0,,SB18135934,33.87203,-118.367361,redondo_townhome-4plus_beds.csv,2
15575,2018-04-11,Townhouse,740 5Th St,San Pedro,CA,90731.0,675000,18.0,9.0,183 - Vista Del Oro,...,5002.0,1902.0,667.0,289.0,,PV17270018,33.739883,-118.293155,san-pedro_townhomes.csv,2
18040,2018-10-22,Single Family Residential,3642 Garnet St,Torrance,CA,90503.0,800000,11.0,7.0,131 - West Torrance,...,9682.0,1973.0,472.0,148.0,,PW18198352,33.842953,-118.350244,torrance_sfh-5-plus-beds.csv,2


In [49]:
# drop SFR on 1 Buggy Whip Dr and at 3642 Garnet St - indexes 10157, 18145
south_bay.drop(south_bay[(south_bay['ADDRESS'] == '1 Buggy Whip Dr')|\
                         (south_bay['ADDRESS'] == '3642 Garnet St')].index,axis=0,inplace=True)

LOCATION/NEIGHBORHOOD:

In [50]:
# view location data
south_bay.groupby(['LOCATION'])['SOLD DATE'].count()

LOCATION
101 - North Inglewood        193
102 - South Inglewood        130
103 - Ladera Heights           2
105 - Lennox                  46
107 - Holly Glen/Del Aire    373
                            ... 
West Torrance                 12
West Wilmington                6
Westchester                  515
Westside                       1
Wilmington                     1
Name: SOLD DATE, Length: 207, dtype: int64

In [51]:
# create new "neighborhood" column based on filename
# change string from filename like 'alondra-park_condo.csv' to 'Alondra Park'
south_bay['NEIGHBORHOOD'] = south_bay['FILENAME'].apply(lambda x: x.split("_")[0]\
                                                        .split(".")[0].title().replace('-',' '))

In [52]:
# verify new column information
south_bay.groupby(['NEIGHBORHOOD'])['SOLD DATE'].count()

NEIGHBORHOOD
Alondra Park               71
Carson                   1463
Del Aire                  145
El Segundo                252
Florence                  158
Gardena                   642
Harbor City               348
Harbor Gateway North      117
Harbor Gateway South      117
Hawthorne                 756
Hermosa Beach             382
Inglewood                 792
Lawndale                  230
Lennox                     40
Lomita                    353
Manhattan                 166
Manhattan Beach           572
Palos Verdes Estates      346
Playa Del Rey             411
Rancho Palos Verdes       893
Redondo                  1725
Rolling Hills              37
Rolling Hills Estates     486
San Pedro                1118
Torrance                 2751
Watts                     297
Westchester               585
Wilmington                238
Name: SOLD DATE, dtype: int64

In [53]:
# update Redondo to Redondo Beach and Manhattan to Manhattan Beach
south_bay.replace({'NEIGHBORHOOD' : {'Redondo': 'Redondo Beach', 'Manhattan': 'Manhattan Beach'}}, inplace=True)

In [54]:
# drop original location column
south_bay.drop('LOCATION', axis=1, inplace=True)

In [55]:
south_bay.columns

Index(['SOLD DATE', 'PROPERTY TYPE', 'ADDRESS', 'CITY', 'STATE OR PROVINCE',
       'ZIP OR POSTAL CODE', 'PRICE', 'BEDS', 'BATHS', 'SQUARE FEET',
       'LOT SIZE', 'YEAR BUILT', 'DAYS ON MARKET', '$/SQUARE FEET',
       'HOA/MONTH', 'MLS#', 'LATITUDE', 'LONGITUDE', 'FILENAME', 'COLLECTION',
       'NEIGHBORHOOD'],
      dtype='object')

SQUARE FEET:

In [56]:
# view empty data for Square Feet column 
south_bay[south_bay['SQUARE FEET'].isnull()]

Unnamed: 0,SOLD DATE,PROPERTY TYPE,ADDRESS,CITY,STATE OR PROVINCE,ZIP OR POSTAL CODE,PRICE,BEDS,BATHS,SQUARE FEET,...,YEAR BUILT,DAYS ON MARKET,$/SQUARE FEET,HOA/MONTH,MLS#,LATITUDE,LONGITUDE,FILENAME,COLLECTION,NEIGHBORHOOD
2342,2018-06-15,Single Family Residential,1132 Levinson St,Los Angeles,CA,90502.0,557000,5.0,2.0,,...,1959.0,589.0,,,21803605,33.83472,-118.295804,carson_sfh_550_to_600K.csv,1,Carson
3398,2018-10-01,Mobile/Manufactured Home,2350 250Th #58,Lomita,CA,90717.0,45000,1.0,1.0,,...,1980.0,481.0,,,PV18014851,33.797926,-118.324125,lomita_01_nosfh.csv,1,Lomita
3459,2019-03-26,Mobile/Manufactured Home,17701 Avalon Blvd #68,Carson,CA,90746.0,180000,2.0,1.75,,...,,305.0,,730.0,18-402926,33.86934,-118.267047,carson_nosfh_nocondo.csv,1,Carson
3480,2019-07-31,Mobile/Manufactured Home,17700 Avalon Blvd #383,Carson,CA,90746.0,245000,3.0,2.0,,...,2013.0,178.0,,,PV19055877,33.869201,-118.263954,carson_nosfh_nocondo.csv,1,Carson
3481,2019-09-24,Mobile/Manufactured Home,17700 Avalon Blvd #36,Carson,CA,90746.0,150000,2.0,2.0,,...,1975.0,123.0,,,OC19190455,33.869201,-118.263954,carson_nosfh_nocondo.csv,1,Carson
3489,2018-03-05,Mobile/Manufactured Home,1502 E Carson St St #103,Carson,CA,90745.0,175000,1.0,2.0,,...,2004.0,691.0,,,DW18005930,33.831092,-118.24547,carson_nosfh_nocondo.csv,1,Carson
3548,2018-04-17,Mobile/Manufactured Home,1065 Lomita,Harbor City,CA,90710.0,160000,2.0,2.0,,...,1972.0,648.0,,,18-314280,33.80006,-118.29447,carson_nosfh_nocondo.csv,1,Carson
3699,2019-08-29,Mobile/Manufactured Home,437 Carson #57,Carson,CA,90745.0,47500,1.0,1.0,,...,1965.0,149.0,,550.0,PW19157430,33.832616,-118.284374,carson_nosfh_nocondo.csv,1,Carson
3700,2019-09-02,Mobile/Manufactured Home,21207 Avalon Blvd #83,Carson,CA,90746.0,200000,2.0,2.0,,...,1974.0,145.0,,,CV19154513,33.838516,-118.266713,carson_nosfh_nocondo.csv,1,Carson
3707,2019-11-18,Mobile/Manufactured Home,715 W 220Th St #39,Torrance,CA,90502.0,30000,1.0,1.0,,...,1982.0,68.0,,,OC19235767,33.828448,-118.288134,carson_nosfh_nocondo.csv,1,Carson


In [57]:
# view empty square feet column by property type
south_bay[south_bay['SQUARE FEET'].isnull()].groupby(['PROPERTY TYPE'])['SOLD DATE'].count()

PROPERTY TYPE
Mobile/Manufactured Home     52
Single Family Residential     5
Townhouse                     1
Name: SOLD DATE, dtype: int64

In [58]:
# view properties with null square feet and are not mobile homes 
null_sqf = south_bay[(south_bay['SQUARE FEET'].isnull()) & (south_bay['PROPERTY TYPE'] != 'Mobile/Manufactured Home')]
null_sqf

Unnamed: 0,SOLD DATE,PROPERTY TYPE,ADDRESS,CITY,STATE OR PROVINCE,ZIP OR POSTAL CODE,PRICE,BEDS,BATHS,SQUARE FEET,...,YEAR BUILT,DAYS ON MARKET,$/SQUARE FEET,HOA/MONTH,MLS#,LATITUDE,LONGITUDE,FILENAME,COLLECTION,NEIGHBORHOOD
2342,2018-06-15,Single Family Residential,1132 Levinson St,Los Angeles,CA,90502.0,557000,5.0,2.0,,...,1959.0,589.0,,,21803605,33.83472,-118.295804,carson_sfh_550_to_600K.csv,1,Carson
5709,2019-11-04,Single Family Residential,16908 Normandie Ave,Gardena,CA,90247.0,800000,5.0,4.0,,...,2019.0,82.0,,185.0,19-446904,33.878597,-118.299295,gardena_sfh_south_of_red_bch_blvd.csv,1,Gardena
5859,2019-09-13,Single Family Residential,312 W Olive St,Los Angeles,CA,90301.0,610000,3.0,2.0,,...,1940.0,131.0,,,21905072,33.960182,-118.360188,inglewood_04_sfh_3_beds.csv,1,Inglewood
8989,2018-06-22,Townhouse,9725 Crenshaw Blvd,Inglewood,CA,90305.0,1299999,32.0,16.0,,...,1955.0,579.0,,,RS18001633,33.947687,-118.327046,inglewood_01_noSFH_nocondo.csv,1,Inglewood
11381,2019-09-09,Single Family Residential,6015 W 83Rd Pl,Los Angeles,CA,90045.0,1038487,3.0,2.0,,...,1943.0,135.0,,,19-483520,33.963142,-118.389632,westchester_02_SFH_0_to_3_beds.csv,1,Westchester
18050,2018-06-15,Single Family Residential,1132 Levinson St,Los Angeles,CA,90502.0,557000,5.0,2.0,,...,1959.0,601.0,,,21803605,33.83472,-118.295804,torrance_sfh-5-plus-beds.csv,2,Torrance


In [59]:
# drop rows
south_bay.drop(null_sqf.index,inplace=True)

In [60]:
# view square feet for mobile homes that are not null
south_bay[~(south_bay['SQUARE FEET'].isnull()) & (south_bay['PROPERTY TYPE'] == 'Mobile/Manufactured Home')].head()

Unnamed: 0,SOLD DATE,PROPERTY TYPE,ADDRESS,CITY,STATE OR PROVINCE,ZIP OR POSTAL CODE,PRICE,BEDS,BATHS,SQUARE FEET,...,YEAR BUILT,DAYS ON MARKET,$/SQUARE FEET,HOA/MONTH,MLS#,LATITUDE,LONGITUDE,FILENAME,COLLECTION,NEIGHBORHOOD
3234,2019-06-14,Mobile/Manufactured Home,2436 Lomita Blvd #19,Lomita,CA,90717.0,69500,2.0,1.5,800.0,...,1973.0,225.0,87.0,,SB18282786,33.804584,-118.326531,lomita_01_nosfh.csv,1,Lomita
3251,2019-08-09,Mobile/Manufactured Home,24200 Walnut #10,Torrance,CA,90501.0,170000,3.0,2.0,1620.0,...,2002.0,169.0,105.0,,SB19088581,33.806212,-118.310937,lomita_01_nosfh.csv,1,Lomita
3254,2019-02-07,Mobile/Manufactured Home,24100 S Pennsylvania Ave #92,Lomita,CA,90717.0,250000,2.0,1.75,1040.0,...,2005.0,352.0,240.0,,SB18194443,33.805923,-118.325014,lomita_01_nosfh.csv,1,Lomita
3300,2019-12-20,Mobile/Manufactured Home,24200 Walnut #29,Torrance,CA,90501.0,25000,2.0,2.0,1040.0,...,1977.0,36.0,24.0,,CV19262840,33.806188,-118.310948,lomita_01_nosfh.csv,1,Lomita
3301,2020-01-08,Mobile/Manufactured Home,24200 Walnut St #53,Torrance,CA,90501.0,13000,1.0,1.0,448.0,...,1963.0,17.0,29.0,,CV19262056,33.806188,-118.310948,lomita_01_nosfh.csv,1,Lomita


In [61]:
# fill mobile home square feet with average sq feet

# find average sq feet for mobile homes
mh_means = south_bay[south_bay['PROPERTY TYPE'] == 'Mobile/Manufactured Home'].mean(axis=0,skipna=True)

# get mean for square feet
mh_sqft_mean = mh_means['SQUARE FEET']


# fill na with mean
south_bay['SQUARE FEET'].fillna(mh_sqft_mean,inplace=True)

In [62]:
# view Square Feet of 0
south_bay[south_bay['SQUARE FEET'] == 0]

Unnamed: 0,SOLD DATE,PROPERTY TYPE,ADDRESS,CITY,STATE OR PROVINCE,ZIP OR POSTAL CODE,PRICE,BEDS,BATHS,SQUARE FEET,...,YEAR BUILT,DAYS ON MARKET,$/SQUARE FEET,HOA/MONTH,MLS#,LATITUDE,LONGITUDE,FILENAME,COLLECTION,NEIGHBORHOOD
5204,2018-12-13,Single Family Residential,2302 Ozone Ct,Hermosa Beach,CA,90254.0,919000,2.0,2.0,0.0,...,1925.0,408.0,,,SB18268749,33.870561,-118.402754,hermosa-beach_sfh.csv,1,Hermosa Beach
11373,2018-06-27,Single Family Residential,8323 Gonzaga Ave,Los Angeles,CA,90045.0,1200000,3.0,2.0,0.0,...,1938.0,574.0,,,18-344092,33.962519,-118.418026,westchester_02_SFH_0_to_3_beds.csv,1,Westchester
19506,2018-03-01,Single Family Residential,19507 Anza Ave,Torrance,CA,90503.0,935000,1.0,1.0,0.0,...,1974.0,707.0,,,PV18015102,33.853255,-118.363681,torrance_sfh-0-2-beds.csv,2,Torrance


In [63]:
# update property at 19507 Anza Ave: square feet 1762
south_bay.at[17234, 'SQUARE FEET'] = 1762

In [64]:
#drop other two properties
south_bay.drop(south_bay[south_bay['SQUARE FEET'] == 0].index, axis=0, inplace=True)

LOT SIZE:

In [65]:
# view length of empty data for LOT SIZE column 
len(south_bay[south_bay['LOT SIZE'].isnull()])

555

In [66]:
# view empty lot size by property type
south_bay[south_bay['LOT SIZE'].isnull()].groupby(['PROPERTY TYPE'])['SOLD DATE'].count()

PROPERTY TYPE
Condo/Co-op                  118
Mobile/Manufactured Home     309
Single Family Residential     12
Townhouse                    116
Name: SOLD DATE, dtype: int64

In [67]:
# create list of property types
prop_type = ['Condo/Co-op', 'Mobile/Manufactured Home', 'Single Family Residential', 'Townhouse']

#create list of columns to display
cols_list = ['PROPERTY TYPE','PRICE','BEDS','BATHS','SQUARE FEET','LOT SIZE','YEAR BUILT', \
             'DAYS ON MARKET', '$/SQUARE FEET']

- Condo/Co-op

In [68]:
# look at Condo/Co-op lots
condos = south_bay[south_bay['PROPERTY TYPE'] == prop_type[0]]
condos[cols_list].head(10)

Unnamed: 0,PROPERTY TYPE,PRICE,BEDS,BATHS,SQUARE FEET,LOT SIZE,YEAR BUILT,DAYS ON MARKET,$/SQUARE FEET
415,Condo/Co-op,540000,3.0,2.0,1190.0,102784.0,1968.0,558.0,454.0
416,Condo/Co-op,465000,2.0,2.0,905.0,102784.0,1968.0,208.0,514.0
421,Condo/Co-op,540000,3.0,2.0,1167.0,102539.0,1968.0,194.0,463.0
422,Condo/Co-op,363500,1.0,1.0,695.0,102539.0,1968.0,149.0,523.0
423,Condo/Co-op,454000,2.0,2.0,873.0,102539.0,1968.0,131.0,520.0
424,Condo/Co-op,508500,2.0,2.0,1114.0,102539.0,1968.0,117.0,456.0
425,Condo/Co-op,449000,2.0,2.0,873.0,102539.0,1968.0,90.0,514.0
427,Condo/Co-op,829000,4.0,4.0,1595.0,66543.0,2007.0,49.0,520.0
429,Condo/Co-op,349000,1.0,1.0,595.0,102539.0,1968.0,301.0,587.0
430,Condo/Co-op,505000,2.0,2.0,929.0,102784.0,1968.0,409.0,544.0


In [69]:
condos['LOT SIZE'].describe()

count    2.489000e+03
mean     1.160418e+06
std      3.152981e+07
min      4.000000e+00
25%      2.472400e+04
50%      6.525400e+04
75%      1.769820e+05
max      1.148808e+09
Name: LOT SIZE, dtype: float64

- Mobile/Manufactured Homes

In [70]:
# look at Mobile Homes lots missing data
mobile = south_bay[(south_bay['PROPERTY TYPE'] == prop_type[1]) & (~south_bay['LOT SIZE'].isnull())]
mobile[cols_list].head(10)

Unnamed: 0,PROPERTY TYPE,PRICE,BEDS,BATHS,SQUARE FEET,LOT SIZE,YEAR BUILT,DAYS ON MARKET,$/SQUARE FEET
3324,Mobile/Manufactured Home,165000,2.0,2.0,800.0,80889.0,2013.0,180.0,206.0
3327,Mobile/Manufactured Home,70000,2.0,1.0,800.0,58237.0,1955.0,715.0,88.0
3391,Mobile/Manufactured Home,73000,3.0,2.0,1200.0,268225.0,2003.0,267.0,61.0
3448,Mobile/Manufactured Home,109900,2.0,2.0,944.0,44256.0,2005.0,318.0,116.0
3455,Mobile/Manufactured Home,178000,2.0,1.75,1000.0,2257.0,1991.0,149.0,178.0
3458,Mobile/Manufactured Home,183500,2.0,1.75,1784.0,3113589.0,1979.0,339.0,103.0
3478,Mobile/Manufactured Home,139000,2.0,2.0,1400.0,3000.0,1979.0,227.0,99.0
3498,Mobile/Manufactured Home,230000,3.0,2.0,1231.0,2400.0,2005.0,712.0,187.0
3503,Mobile/Manufactured Home,173000,2.0,2.0,1200.0,72791.0,1976.0,367.0,144.0
3519,Mobile/Manufactured Home,192000,2.0,2.0,1250.0,2284648.0,2007.0,486.0,154.0


In [71]:
mobile['LOT SIZE'].describe()

count    4.000000e+01
mean     4.867528e+05
std      8.588946e+05
min      1.302000e+03
25%      3.737500e+03
50%      7.980150e+04
75%      3.696890e+05
max      3.113589e+06
Name: LOT SIZE, dtype: float64

- Single Family Residential

In [72]:
# look at Single Family Residential lots
sfr = south_bay[south_bay['PROPERTY TYPE'] == prop_type[2]]
sfr[cols_list].head(10)

Unnamed: 0,PROPERTY TYPE,PRICE,BEDS,BATHS,SQUARE FEET,LOT SIZE,YEAR BUILT,DAYS ON MARKET,$/SQUARE FEET
1,Single Family Residential,730000,7.0,5.0,3401.0,6651.0,2008.0,358.0,215.0
9,Single Family Residential,547000,4.0,2.0,1948.0,5399.0,1962.0,604.0,281.0
10,Single Family Residential,774000,5.0,3.5,2900.0,5857.0,1940.0,86.0,267.0
14,Single Family Residential,537500,4.0,2.0,1431.0,5238.0,1953.0,312.0,376.0
15,Single Family Residential,525000,4.0,3.0,1312.0,6246.0,1913.0,246.0,400.0
17,Single Family Residential,515000,5.0,2.0,1822.0,6701.0,1905.0,128.0,283.0
22,Single Family Residential,480000,4.0,2.0,1245.0,4800.0,1952.0,2.0,386.0
23,Single Family Residential,500000,4.0,2.0,1476.0,4806.0,1950.0,243.0,339.0
25,Single Family Residential,490000,4.0,2.0,1589.0,4806.0,1948.0,30.0,308.0
26,Single Family Residential,485000,4.0,2.0,1595.0,5570.0,1962.0,333.0,304.0


In [73]:
sfr['LOT SIZE'].describe()

count    9.913000e+03
mean     8.636242e+04
std      4.021049e+06
min      6.500000e+02
25%      5.074000e+03
50%      5.825000e+03
75%      7.206000e+03
max      2.426292e+08
Name: LOT SIZE, dtype: float64

- Townhome

In [74]:
# look at Townhome lots
townhome = south_bay[south_bay['PROPERTY TYPE'] == prop_type[3]]
townhome[cols_list].head(10)

Unnamed: 0,PROPERTY TYPE,PRICE,BEDS,BATHS,SQUARE FEET,LOT SIZE,YEAR BUILT,DAYS ON MARKET,$/SQUARE FEET
12,Townhouse,619900,4.0,2.0,1332.0,6605.0,1941.0,285.0,465.0
13,Townhouse,460000,5.0,4.0,1700.0,6402.0,1926.0,712.0,271.0
24,Townhouse,468000,4.0,3.0,1434.0,4816.0,1924.0,393.0,326.0
38,Townhouse,455000,4.0,2.0,1642.0,5205.0,1923.0,565.0,277.0
51,Townhouse,485000,4.0,2.0,1470.0,5020.0,1922.0,527.0,330.0
54,Townhouse,550000,4.0,4.0,2600.0,5999.0,1911.0,437.0,212.0
70,Townhouse,440000,4.0,2.0,1340.0,4759.0,1928.0,584.0,328.0
94,Townhouse,569000,4.0,3.0,1960.0,7514.0,1927.0,491.0,290.0
419,Townhouse,695000,3.0,3.0,1374.0,66632.0,2007.0,645.0,506.0
432,Townhouse,925000,4.0,2.0,1612.0,6704.0,1948.0,651.0,574.0


In [75]:
townhome['LOT SIZE'].describe()

count    2.485000e+03
mean     7.945422e+05
std      2.002350e+07
min      1.000000e+02
25%      6.465000e+03
50%      1.201900e+04
75%      5.873000e+04
max      6.758334e+08
Name: LOT SIZE, dtype: float64

- Fill NA

In [76]:
# fill NAs with median value for each property type

south_bay['LOT SIZE'] = south_bay.groupby(['PROPERTY TYPE'])['LOT SIZE'].apply(lambda x: x.fillna(x.median()))

YEAR BUILT:

In [77]:
# find empty years
no_year = south_bay[south_bay['YEAR BUILT'].isnull()]
no_year.to_csv('../data/interim/no year.csv')

In [78]:
# drop empty years
south_bay.dropna(subset=['YEAR BUILT'], axis=0, inplace=True)

In [79]:
# check for year built outside of 1818 to 2020 range
year_check = (south_bay['YEAR BUILT'] < 1818) | (south_bay['YEAR BUILT'] > 2020)
print(south_bay.loc[:,['SOLD DATE', 'ADDRESS', 'PROPERTY TYPE', 'YEAR BUILT','FILENAME']][year_check])

      SOLD DATE             ADDRESS              PROPERTY TYPE  YEAR BUILT  \
5373 2019-11-06  701 Longfellow Ave  Single Family Residential      2021.0   
7246 2019-11-06  701 Longfellow Ave  Single Family Residential      2021.0   

                          FILENAME  
5373         hermosa-beach_sfh.csv  
7246  manhattan-beach_2.5M_max.csv  


In [80]:
# drop property built in 2021
south_bay.drop(index=[5744,7925],inplace=True)

$/SQUARE FEET:

In [81]:
# pull empty data about $/SQUARE FEET
ppsqf_nulls = south_bay[south_bay['$/SQUARE FEET'].isnull()]

In [82]:
# create formula to fill $/SQUARE FEET
ppsqf_fill = south_bay['PRICE'] / south_bay['SQUARE FEET']

In [83]:
# fill na with ppsqf results
south_bay['$/SQUARE FEET'].fillna(ppsqf_fill, inplace=True)

HOA/MONTH:

In [84]:
# view empty data for HOA/MONTH column 
south_bay[south_bay['HOA/MONTH'].isnull()]

Unnamed: 0,SOLD DATE,PROPERTY TYPE,ADDRESS,CITY,STATE OR PROVINCE,ZIP OR POSTAL CODE,PRICE,BEDS,BATHS,SQUARE FEET,...,YEAR BUILT,DAYS ON MARKET,$/SQUARE FEET,HOA/MONTH,MLS#,LATITUDE,LONGITUDE,FILENAME,COLLECTION,NEIGHBORHOOD
1,2019-02-01,Single Family Residential,1641 Bay View Ave,Wilmington,CA,90744.0,730000,7.0,5.0,3401.0,...,2008.0,358.0,215.0,,SB18278853,33.796254,-118.271532,wilmington_4_plus_beds.csv,1,Wilmington
9,2018-05-31,Single Family Residential,1410 W Sandison St,Wilmington,CA,90744.0,547000,4.0,2.0,1948.0,...,1962.0,604.0,281.0,,SB18091442,33.792195,-118.280823,wilmington_4_plus_beds.csv,1,Wilmington
10,2019-10-31,Single Family Residential,1703 N Marine Ave,Wilmington,CA,90744.0,774000,5.0,3.5,2900.0,...,1940.0,86.0,267.0,,PW19223929,33.797547,-118.265430,wilmington_4_plus_beds.csv,1,Wilmington
12,2019-04-15,Townhouse,1702 N Neptune Ave,Wilmington,CA,90744.0,619900,4.0,2.0,1332.0,...,1941.0,285.0,465.0,,CV18258904,33.797256,-118.270032,wilmington_4_plus_beds.csv,1,Wilmington
13,2018-02-12,Townhouse,721 Pioneer Ave,Wilmington,CA,90744.0,460000,5.0,4.0,1700.0,...,1926.0,712.0,271.0,,DW18004410,33.779983,-118.248288,wilmington_4_plus_beds.csv,1,Wilmington
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19521,2018-11-09,Single Family Residential,2445 251St St,Lomita,CA,90717.0,800000,2.0,1.0,1030.0,...,1915.0,454.0,777.0,,18-387308,33.797001,-118.326949,torrance_sfh-0-2-beds.csv,2,Torrance
19523,2019-12-13,Single Family Residential,3112 Winlock Rd,Torrance,CA,90505.0,915000,3.0,2.0,1576.0,...,1950.0,55.0,581.0,,19-519406,33.794174,-118.341834,torrance_sfh-0-2-beds.csv,2,Torrance
19524,2018-10-19,Single Family Residential,18334 Falda Ave,Torrance,CA,90504.0,573500,2.0,1.0,873.0,...,1953.0,475.0,657.0,,18-362202,33.863952,-118.329735,torrance_sfh-0-2-beds.csv,2,Torrance
19525,2020-01-02,Single Family Residential,1615 Juniper Ave,Torrance,CA,90503.0,839000,2.0,1.0,1230.0,...,1949.0,35.0,682.0,,SB19261266,33.832390,-118.337177,torrance_sfh-0-2-beds.csv,2,Torrance


In [85]:
#view HOA/MONTH column with 0 values
south_bay[south_bay['HOA/MONTH'] == 0]

Unnamed: 0,SOLD DATE,PROPERTY TYPE,ADDRESS,CITY,STATE OR PROVINCE,ZIP OR POSTAL CODE,PRICE,BEDS,BATHS,SQUARE FEET,...,YEAR BUILT,DAYS ON MARKET,$/SQUARE FEET,HOA/MONTH,MLS#,LATITUDE,LONGITUDE,FILENAME,COLLECTION,NEIGHBORHOOD
3607,2018-11-02,Mobile/Manufactured Home,17701 S Avalon Blvd #334,Carson,CA,90746.0,295000,3.0,2.0,1814.0,...,1979.0,449.0,163.0,0.0,SB18233065,33.86934,-118.267047,carson_nosfh_nocondo.csv,1,Carson
9504,2019-10-29,Single Family Residential,31 Chuckwagon Rd,Rolling Hills,CA,90274.0,2500000,4.0,3.0,3106.0,...,1959.0,85.0,805.0,0.0,PV19135556,33.761317,-118.339666,rolling-hills.csv,1,Rolling Hills
9508,2019-03-21,Single Family Residential,23 Georgeff Rd,Rolling Hills,CA,90274.0,1990000,3.0,2.5,2559.0,...,1957.0,307.0,778.0,0.0,PV18260415,33.759767,-118.344597,rolling-hills.csv,1,Rolling Hills
9522,2019-08-27,Single Family Residential,70 Portuguese Bend Rd,Rolling Hills,CA,90274.0,1335000,4.0,3.0,3122.0,...,1949.0,148.0,428.0,0.0,PV17256560,33.747823,-118.351628,rolling-hills.csv,1,Rolling Hills
9524,2019-10-21,Single Family Residential,8 Pine Tree Ln,Rolling Hills,CA,90274.0,2725000,3.0,2.25,2812.0,...,1952.0,93.0,969.0,0.0,PV18062471,33.767945,-118.348729,rolling-hills.csv,1,Rolling Hills
9528,2018-04-12,Single Family Residential,2316 Via Carrillo,Palos Verdes Estates,CA,90274.0,1675000,4.0,3.0,2399.0,...,1964.0,653.0,698.0,0.0,PV18043994,33.77556,-118.414781,palos-verdes-estates_4_plus_beds.csv,1,Palos Verdes Estates
10821,2019-03-19,Single Family Residential,19 Buckskin Ln,Rolling Hills Estates,CA,90274.0,1900000,4.0,2.0,2016.0,...,1947.0,312.0,942.0,0.0,PV18287807,33.773739,-118.332885,rolling-hills-estates_01_sfh.csv,1,Rolling Hills Estates


In [86]:
# fill NA HOA/MONTH values with 0 
south_bay['HOA/MONTH'].fillna(value=0, inplace=True)

#### DUPLICATE VALUES

- We'll first check for duplicates on all columns but the newly created ones

In [87]:
# create subset columns to check for duplicates 
subset_cols = ['SOLD DATE', 'PROPERTY TYPE', 'ADDRESS', 'CITY', 'STATE OR PROVINCE',\
               'ZIP OR POSTAL CODE', 'PRICE', 'BEDS', 'BATHS', 'SQUARE FEET',\
               'LOT SIZE', 'YEAR BUILT', 'DAYS ON MARKET', '$/SQUARE FEET',\
               'HOA/MONTH', 'LATITUDE', 'LONGITUDE']

In [88]:
# display duplicates and save to file to inspect
sb_dupes = south_bay[south_bay.duplicated(subset=subset_cols , keep=False)]
sb_dupes = sb_dupes.sort_values(by=['ADDRESS'])
sb_dupes.to_csv('../data/interim/sb_dupes1.csv')

In [89]:
# create subset with duplicated values & update neighborhood value with name of city
dupe_bool = south_bay.duplicated(subset = subset_cols, keep=False)
south_bay.loc[dupe_bool,'NEIGHBORHOOD'] = south_bay['CITY']

In [90]:
# verify one address from original duplicate file to check neighborhood change
south_bay[south_bay['ADDRESS'] == '10019 S Burl Ave Unit N']

Unnamed: 0,SOLD DATE,PROPERTY TYPE,ADDRESS,CITY,STATE OR PROVINCE,ZIP OR POSTAL CODE,PRICE,BEDS,BATHS,SQUARE FEET,...,YEAR BUILT,DAYS ON MARKET,$/SQUARE FEET,HOA/MONTH,MLS#,LATITUDE,LONGITUDE,FILENAME,COLLECTION,NEIGHBORHOOD
1024,2018-09-21,Single Family Residential,10019 S Burl Ave Unit N,Inglewood,CA,90304.0,530000,2.0,1.0,832.0,...,1942.0,488.0,637.0,0.0,218010629,33.944791,-118.362654,inglewood_03_sfh_2_beds_or_less.csv,1,Inglewood
5024,2018-09-21,Single Family Residential,10019 S Burl Ave Unit N,Inglewood,CA,90304.0,530000,2.0,1.0,832.0,...,1942.0,488.0,637.0,0.0,218010629,33.944791,-118.362654,lennox.csv,1,Inglewood


In [91]:
# drop duplicates
south_bay.drop_duplicates(subset=subset_cols,keep='first',inplace=True)

- Next, we'll look for duplicates based on MLS#. The MLS# should be a unique value. 

In [92]:
# look at MLS# duplicates
sb_dupes2 = south_bay[south_bay.duplicated(subset= 'MLS#', keep=False)]
sb_dupes2.sort_values(by=['ADDRESS']).to_csv('../data/interim/sb_dupes2.csv')

Upon inspecting the CSV output, the values that seem to differ between the duplicate rows are the neighborhood and the days on market column. We will use the max value for days on market, and update the neighborhood values as before - and drop the others rows. 

In [93]:
# create subset with duplicated values & update neighborhood value as before
dupe_bool = south_bay.duplicated(subset = 'MLS#', keep=False)
south_bay.loc[dupe_bool,'NEIGHBORHOOD'] = south_bay['CITY']

In [94]:
# sort values then drop duplicates
south_bay.sort_values(['ADDRESS','DAYS ON MARKET'])
south_bay.drop_duplicates('MLS#', keep='last', inplace=True)

In [95]:
# verify one address from original MLS duplicate file to see Days on Market change
# days on market should be 608
south_bay[south_bay['ADDRESS'] == '1001 Park Circle Dr']

Unnamed: 0,SOLD DATE,PROPERTY TYPE,ADDRESS,CITY,STATE OR PROVINCE,ZIP OR POSTAL CODE,PRICE,BEDS,BATHS,SQUARE FEET,...,YEAR BUILT,DAYS ON MARKET,$/SQUARE FEET,HOA/MONTH,MLS#,LATITUDE,LONGITUDE,FILENAME,COLLECTION,NEIGHBORHOOD
11706,2018-06-08,Condo/Co-op,1001 Park Circle Dr,Torrance,CA,90502.0,570000,3.0,2.5,1624.0,...,1984.0,608.0,351.0,315.0,SB18086197,33.814929,-118.293785,torrance_condos-500K_plus.csv,2,Torrance


In [96]:
south_bay.drop_duplicates(subset=subset_cols,keep='first',inplace=True)

#### VARIABLE TYPES

In [97]:
# change variable types for: ZIP OR POSTAL CODE, YEAR BUILT
south_bay['ZIP OR POSTAL CODE'] = south_bay['ZIP OR POSTAL CODE'].astype(int)
south_bay['YEAR BUILT'] = south_bay['YEAR BUILT'].astype(int)

#### INSPECT FINAL DATA FRAME

In [98]:
# drop filename, state and collection column
south_bay.drop(['STATE OR PROVINCE','FILENAME','COLLECTION'], axis=1,inplace=True)

In [99]:
# reorder columns to move price to front and drop state and MLS# column *

cols = ['SOLD DATE', 'PROPERTY TYPE', 'ADDRESS', 'CITY', 'PRICE',
       'ZIP OR POSTAL CODE', 'BEDS', 'BATHS', 'SQUARE FEET',
       'LOT SIZE', 'YEAR BUILT', 'DAYS ON MARKET', '$/SQUARE FEET',
       'HOA/MONTH', 'MLS#', 'LATITUDE', 'LONGITUDE', 'NEIGHBORHOOD']

south_bay = south_bay[cols]

In [100]:
# reset index
south_bay = south_bay.reset_index(drop=True)

In [101]:
south_bay.head()

Unnamed: 0,SOLD DATE,PROPERTY TYPE,ADDRESS,CITY,PRICE,ZIP OR POSTAL CODE,BEDS,BATHS,SQUARE FEET,LOT SIZE,YEAR BUILT,DAYS ON MARKET,$/SQUARE FEET,HOA/MONTH,MLS#,LATITUDE,LONGITUDE,NEIGHBORHOOD
0,2019-02-01,Single Family Residential,1641 Bay View Ave,Wilmington,730000,90744,7.0,5.0,3401.0,6651.0,2008,358.0,215.0,0.0,SB18278853,33.796254,-118.271532,Wilmington
1,2018-05-31,Single Family Residential,1410 W Sandison St,Wilmington,547000,90744,4.0,2.0,1948.0,5399.0,1962,604.0,281.0,0.0,SB18091442,33.792195,-118.280823,Wilmington
2,2019-10-31,Single Family Residential,1703 N Marine Ave,Wilmington,774000,90744,5.0,3.5,2900.0,5857.0,1940,86.0,267.0,0.0,PW19223929,33.797547,-118.26543,Wilmington
3,2019-04-15,Townhouse,1702 N Neptune Ave,Wilmington,619900,90744,4.0,2.0,1332.0,6605.0,1941,285.0,465.0,0.0,CV18258904,33.797256,-118.270032,Wilmington
4,2018-02-12,Townhouse,721 Pioneer Ave,Wilmington,460000,90744,5.0,4.0,1700.0,6402.0,1926,712.0,271.0,0.0,DW18004410,33.779983,-118.248288,Wilmington


In [102]:
south_bay.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13631 entries, 0 to 13630
Data columns (total 18 columns):
 #   Column              Non-Null Count  Dtype         
---  ------              --------------  -----         
 0   SOLD DATE           13631 non-null  datetime64[ns]
 1   PROPERTY TYPE       13631 non-null  object        
 2   ADDRESS             13631 non-null  object        
 3   CITY                13631 non-null  object        
 4   PRICE               13631 non-null  int64         
 5   ZIP OR POSTAL CODE  13631 non-null  int64         
 6   BEDS                13631 non-null  float64       
 7   BATHS               13631 non-null  float64       
 8   SQUARE FEET         13631 non-null  float64       
 9   LOT SIZE            13631 non-null  float64       
 10  YEAR BUILT          13631 non-null  int64         
 11  DAYS ON MARKET      13631 non-null  float64       
 12  $/SQUARE FEET       13631 non-null  float64       
 13  HOA/MONTH           13631 non-null  float64   

In [103]:
south_bay.to_csv('../data/processed/south_bay_cleaned.csv', index=False)