# Locations Information

In this notebook, we will use the Google Maps API. To do this, we need to have a Google Maps API key. To replicate this as another user requires that the user set up a Google Cloud Platform account and enable the Google Maps API. This is likely impractical, so this notebook is primarily for informational purposes. The resulting data will be stored in an exported CSV file with location coding information. 

In [1]:
import pandas as pd
import numpy as np
import googlemaps
from datetime import datetime
from haversine import haversine
import os

#### First, we load in the Craigslist data that was pulled in the "Craigslist_Data_Pull" file. 

We will first perform a conditional check that will decide which data to use. In the case that the data does not yet exist, it will use a data pull that was done one time for purposes of EDA so as to be able to replicate results (since commentary was made on these results). 

This is only meant to show that this data was originally used for EDA. It should be the case that the data **does** indeed exist, and that we will be updating it with the latest Craigslist refresh to get daily data.

In [2]:
# File paths
original_file_path = '../Data/Locations_Data_Added_For_EDA.csv'
daily_refresh_file_path = '../Data/refreshed_craigslist_data.csv'

In [3]:
# Check if the file exists
if os.path.exists(original_file_path):
    # If the file exists, load data from the alternative file
    df = pd.read_csv(daily_refresh_file_path)
    print(f'File used: {daily_refresh_file_path}')
else:
    # If the file does not exist, load data from the existing file path
    df = pd.read_csv(existing_file_path)
    print(f'File used: {existing_file_path}')

File used: ../Data/refreshed_craigslist_data.csv


#### Next, we set up and load in the API information.

In [4]:
# Use the API key associated with my account
google_api_key = 'AIzaSyA3T4JqlZdlnZIqTUkbGulILleHQeRIP6A'

In [5]:
# Set up the API key in Google Maps
gmaps = googlemaps.Client(key=google_api_key)

#### Below, we write a function that loops through the Craigslist data and creates latitudes and longitudes.

When we call the function, we use zip(*), which will pair the latitudes and longitudes together and assign them to latitude and longitude columns. This is a form of parallel iteration that is more concise than using a for loop. 

In [6]:
 # loops through the Craigslist data and creates latitudes and longitudes.
def geocode_address(address):
    # Geocode the address using Google Maps API
    geocode_result = gmaps.geocode(address)

    if geocode_result:
        lat = geocode_result[0]['geometry']['location']['lat']
        long = geocode_result[0]['geometry']['location']['lng']
        return lat, long
    else:
        return None, None # Return none if geocoding fails

In [7]:
df['latitude'], df['longitude'] = zip(*df['Full Address'].map(geocode_address))

#### Let's confirm we got latitudes and longitudes information for each available address.

In [8]:
# Checking the dataframe to view a sample of what we pulled
df[['Full Address', 'latitude', 'longitude']].head()

Unnamed: 0,Full Address,latitude,longitude
0,None listed,,
1,"237 Fourth Avenue, Venice, CA 90291",33.99809,-118.475972
2,"11916 West Pico Boulevard, Los Angeles, CA 90064",34.029804,-118.448669
3,"972 Hilgard Ave, Los Angeles, CA 90024",34.061724,-118.441019
4,"1720 Pacific Ave, Venice, CA 90291",33.987065,-118.470748


#### Next, we begin the process of pulling store information to compare with our addresses.

We start by writing a function that converts miles to meters. We need to do this in order to use the distance as the default radius in the upcoming "find_stores" function.

In [9]:
# Convert the radius from miles to meters
def miles_to_meters(miles):
    return miles * 1609.34

#### Set up Santa Monica's distance specifications.

The below code uses the same area as the Craigslist map we did. The centerpoint is in Santa Monica and the radius is an area encompassing a circle around the city 3.6 miles in each direction.

In [10]:
# Santa Monica's latitude and longitude
santa_monica_lat = 34.0259
santa_monica_lng = -118.4965
santa_monica_location = (santa_monica_lat, santa_monica_lng)
default_radius = miles_to_meters(3.6) # 3.6 mile radius within Santa Monica 

#### Function to find stores using the store's name as input.

Below we write and run a function that will take in a list containing store names, such as "Whole Foods" and "Erewhon" and return the stores within the radius we have set up.

In [11]:
# Define the function to find stores
def find_stores(store_list, store_address_info):

    for the_store in store_list:
    
        # Perform a nearby search for stores around Santa Monica within a 3.6-mile radius
        results = gmaps.places_nearby(location=santa_monica_location, 
                                      keyword=the_store, 
                                      radius=default_radius)
        
        # Extracting and storing the names, addresses, and coordinates in the grocery_stores_info list
        for place in results['results']:
            store_info = {
                'Name': place['name'],
                'Address': place.get('vicinity', 'Address not provided'),
                'Latitude': place['geometry']['location']['lat'],
                'Longitude': place['geometry']['location']['lng']
            }
            
            # Append store information to the list
            store_address_info.append(store_info)

#### The next three cells will use this function to find premium, mid-tier, and budget grocery stores.

We will use all three types of stores in our analysis. There is some debate about how to categorize each store. Some stores, like Trader Joe's, do not fit into a traditional "tier" in terms of cost. Given their unique brand, some people might prefer going to Trader Joe's over premium options even if money is not an issue. Thus, choosing stores could be done in a more systematic way. However, for simplicity and proof of concept, this analysis is using "tribal" knowledge of the California grocery store options from the author's part. Expanding the scope of this project would indeed use a more systematic approach in order to categorize them. 

In [12]:
# Initialize an empty list to store grocery store information
premium_grocery_stores_address_info = []

# List of grocery stores to be included
search_premium_grocery_stores = ['Whole Foods Market', 'Erewhon', 'Bristol Farms']

# Call the function 
find_stores(search_premium_grocery_stores, premium_grocery_stores_address_info)

# Print the list of stores to verify
for store in premium_grocery_stores_address_info:
    print(store)

{'Name': 'Whole Foods Market', 'Address': '11666 National Blvd, Los Angeles', 'Latitude': 34.0224904, 'Longitude': -118.4379608}
{'Name': 'Whole Foods Market', 'Address': '1050 Gayley Ave, Los Angeles', 'Latitude': 34.0611873, 'Longitude': -118.4469309}
{'Name': 'Whole Foods Market', 'Address': '225 Lincoln Blvd, Venice', 'Latitude': 34.0011613, 'Longitude': -118.4698813}
{'Name': 'Whole Foods Market', 'Address': '2201 Wilshire Blvd, Santa Monica', 'Latitude': 34.0332265, 'Longitude': -118.4812706}
{'Name': 'Whole Foods Market', 'Address': '11737 San Vicente Blvd, Los Angeles', 'Latitude': 34.0536284, 'Longitude': -118.4673501}
{'Name': 'Whole Foods Market', 'Address': '2121 Cloverfield Blvd, Santa Monica', 'Latitude': 34.0220438, 'Longitude': -118.4656809}
{'Name': 'Whole Foods Market', 'Address': '1425 Montana Ave, Santa Monica', 'Latitude': 34.0328379, 'Longitude': -118.4946332}
{'Name': 'Erewhon', 'Address': '585 Venice Blvd., Venice', 'Latitude': 33.9897008, 'Longitude': -118.4620

In [13]:
# Initialize an empty list to store grocery store information
midTier_grocery_stores_address_info = []

# List of grocery stores to be included
search_midTier_grocery_stores = ['Ralphs Fresh Fare', 'Vons', 'Trader Joe\'s']

# Call the function 
find_stores(search_midTier_grocery_stores, midTier_grocery_stores_address_info)

# Print the list of stores to verify
for store in midTier_grocery_stores_address_info:
    print(store)

{'Name': 'Ralphs Fresh Fare', 'Address': '4700 Admiralty Way, Marina Del Rey', 'Latitude': 33.9798932, 'Longitude': -118.4389794}
{'Name': 'Ralphs Fresh Fare', 'Address': '1644 Cloverfield Blvd, Santa Monica', 'Latitude': 34.0270416, 'Longitude': -118.4738669}
{'Name': 'Ralphs Fresh Fare', 'Address': '11727 W Olympic Blvd, Los Angeles', 'Latitude': 34.0349195, 'Longitude': -118.4490712}
{'Name': 'Ralphs Fresh Fare', 'Address': '4311 Lincoln Blvd, Marina Del Rey', 'Latitude': 33.9842533, 'Longitude': -118.4432258}
{'Name': 'Ralphs Fresh Fare', 'Address': '12057 Wilshire Blvd, Los Angeles', 'Latitude': 34.04497, 'Longitude': -118.4670728}
{'Name': 'Ralphs Fresh Fare', 'Address': '10861 Weyburn Ave, Los Angeles', 'Latitude': 34.0630182, 'Longitude': -118.4444677}
{'Name': 'Ralphs Fresh Fare', 'Address': '15120 Sunset Blvd, Pacific Palisades', 'Latitude': 34.0450033, 'Longitude': -118.5241476}
{'Name': 'Ralphs', 'Address': '11361 National Blvd, Los Angeles', 'Latitude': 34.0260615, 'Longit

In [14]:
# Initialize an empty list to store grocery store information
budget_grocery_stores_address_info = []

# List of grocery stores to be included
search_budget_grocery_stores = ['Costco Wholesale', 'Smart and Final']

# Call the function 
find_stores(search_budget_grocery_stores, budget_grocery_stores_address_info)

# Print the list of stores to verify
for store in budget_grocery_stores_address_info:
    print(store)

{'Name': 'Costco Wholesale', 'Address': '13463 Washington Blvd, Marina Del Rey', 'Latitude': 33.9927494, 'Longitude': -118.4464242}
{'Name': 'Costco Pharmacy', 'Address': '13463 Washington Blvd, Marina Del Rey', 'Latitude': 33.9930324, 'Longitude': -118.4469763}
{'Name': 'Costco Bakery', 'Address': '13463 Washington Blvd, Marina Del Rey', 'Latitude': 33.9927736, 'Longitude': -118.4469117}
{'Name': 'Smart & Final Extra!', 'Address': '11221 W Pico Blvd, Los Angeles', 'Latitude': 34.0366391, 'Longitude': -118.4375995}
{'Name': 'Smart & Final', 'Address': '12210 Santa Monica Blvd W, Los Angeles', 'Latitude': 34.039804, 'Longitude': -118.4644063}
{'Name': 'Smart & Final Extra!', 'Address': '604 Lincoln Blvd, Venice', 'Latitude': 34.0006368, 'Longitude': -118.4642158}


#### Now that we have the stores information, we can calculate distances and put them in the DataFrame.

Our goal is to input the nearest one in each category, so we will ultimately be using the minimum after looping through each store in each category in a given row. Units will be in miles. We will use haversine distance, which calculates the great-circle distance between two points on a sphere given their longitudes and latitudes. 

Haversine distance accounts for the Earth's curvature, and is done independent of the Google Maps API. It is chosen for its simplicity, but does not necessarily account for real-world road conditions. 

In [15]:
# Define a function to calculate haversine distance
def haversine_distance(coord1, coord2):
    return haversine(coord1, coord2, unit='mi')  # Returns distance in miles

In [16]:
# function to find the nearest premium grocery store for each listing and calculate the distance
def find_nearest_grocery_store(listing_lat, listing_lng, grocery_store_list):

    # Check if coordinates are n/a before continuing
    if pd.isna(listing_lat) or pd.isna(listing_lng):
        return None, "N/A"

    else: 
        # Initialize min_distance to be None
        min_distance = None
    
        # Initialize the nearest store
        nearest_store = None
        
        # Loop over each premium grocery store
        for store in grocery_store_list:
            store_coord = (store['Latitude'], store['Longitude'])
            listing_coord = (listing_lat, listing_lng)
            
            # Calculate the distance
            distance = haversine_distance(listing_coord, store_coord)
            
            # Update minimum distance if it's lower than the current minimum
            # Collect the nearest store's information to put into its own column
            if min_distance is None or distance < min_distance:
                min_distance = distance
                nearest_grocery_store = f"{store['Name']} - {store['Address']}"
        
        # Return the minimum distance
        return min_distance, nearest_grocery_store

#### Adding store information the DataFrame

Now that we have our minimum store distances, we can add them to the DataFrame with the below function.

In [17]:
def add_store_distances_to_dataframe (df):

    # Dictionary to hold the types of stores and their respective info lists
    store_types = {
        'budget': budget_grocery_stores_address_info,
        'midTier': midTier_grocery_stores_address_info,
        'premium': premium_grocery_stores_address_info
    }

    # Loop through each store type and calculate the nearest store and distance
    for store_type, stores_info in store_types.items():
        distance_col_name = f'nearest_{store_type}_grocery_store_distance'
        store_col_name = f'nearest_{store_type}_grocery_store'
        # Apply the find_nearest_grocery_store function and assign the results
        df[[distance_col_name, store_col_name]] = pd.DataFrame(
            df.apply(
                lambda row: find_nearest_grocery_store(row['latitude'], row['longitude'], stores_info), 
                axis=1).tolist(), index=df.index)
    return df

In [18]:
df = add_store_distances_to_dataframe(df)

#### Checking out the final result.

Let's see a sample of what the results are. Further exploration will be done in the EDA portion of this project.

In [19]:
df.head()

Unnamed: 0,Title,Price,Bedrooms,Square Feet,Full Address,monthly,apartment,cats are OK - purrr,dogs are OK - wooof,laundry on site,...,w/d hookups,date_added,latitude,longitude,nearest_budget_grocery_store_distance,nearest_budget_grocery_store,nearest_midTier_grocery_store_distance,nearest_midTier_grocery_store,nearest_premium_grocery_store_distance,nearest_premium_grocery_store
0,Title Not Found,Price Not Found,Bedrooms Info Not Found,Square Feet Not Found,None listed,0,0,0,0,0,...,0,3/20/24,,,,,,,,
1,1 Bedroom in the Heart of Venice* Plank Floors...,"$2,895",1br,750,"237 Fourth Avenue, Venice, CA 90291",1,1,0,0,0,...,0,3/20/24,33.99809,-118.475972,0.69602,"Smart & Final Extra! - 604 Lincoln Blvd, Venice",0.843992,"Ralphs - 910 Lincoln Blvd, Venice",0.408345,"Whole Foods Market - 225 Lincoln Blvd, Venice"
2,"SPECIALS, Rooftop Sky Deck, Brand New 1+1 Bren...","$3,438",1br,711,"11916 West Pico Boulevard, Los Angeles, CA 90064",1,1,1,1,0,...,0,3/20/24,34.029804,-118.448669,0.790421,"Smart & Final Extra! - 11221 W Pico Blvd, Los ...",0.267695,"Trader Joe's - 11755 W Olympic Blvd, Los Angeles",0.794601,"Whole Foods Market - 11666 National Blvd, Los ..."
3,1 Bedroom 1 Bath Westwood Apartment in Westwoo...,"$3,175",1br,,"972 Hilgard Ave, Los Angeles, CA 90024",1,1,0,0,1,...,0,3/20/24,34.061724,-118.441019,1.744201,"Smart & Final Extra! - 11221 W Pico Blvd, Los ...",0.162153,"Trader Joe's - 1000 Glendon Ave, Los Angeles",0.340418,"Whole Foods Market - 1050 Gayley Ave, Los Angeles"
4,"Dishwasher, Efficient Appliances, 1 Bed","$2,895",1br,575,"1720 Pacific Ave, Venice, CA 90291",1,1,0,0,1,...,0,3/20/24,33.987065,-118.470748,1.009629,"Smart & Final Extra! - 604 Lincoln Blvd, Venice",0.997125,"Ralphs - 910 Lincoln Blvd, Venice",0.528287,"Erewhon - 585 Venice Blvd., Venice"


In [20]:
original_export_path = '../Data/Locations_Data_Added_For_EDA.csv'
refresh_export_path = '../Data/Daily_Refresh_With_Locations_Data.csv'

In [21]:
# Check if the file exists
if os.path.exists(original_file_path):
    # If the file exists, export data to the refresh path
    df.to_csv(refresh_export_path, index=False)
    print(f'File exported: {refresh_export_path}')
else:
    # If the file does not exist, export the data to the EDA file
    df.to_csv(original_export_path, index=False)
    print(f'File exported: {existing_file_path}')

File exported: ../Data/Daily_Refresh_With_Locations_Data.csv
