# Scraping for Publix

**** This file will need to be modified based on the website where you scrape from ****

We are currently only focusing on publix within Polk county. The list of all Publix is from this link (http://www.city-data.com/locations/PublixSuperMarkets/Polk-County-FL.html). 

To process this information within anylogic 

### This script scrapes the city-data website and returns a CSV containing the address, city, state and zipcode.

In [3]:
import csv
import openpyxl
import requests
from bs4 import BeautifulSoup

# Send a GET request to the URL
url = 'http://www.city-data.com/locations/PublixSuperMarkets/Polk-County-FL.html'
response = requests.get(url)

# Create a BeautifulSoup object
soup = BeautifulSoup(response.content, 'html.parser')

# Find all the address elements on the page
address_elems = soup.find_all('div', {'itemprop': 'address'})

# Extract the address information from each element and store it in a list of dictionaries
addresses = []
for address_elem in address_elems:
    street_address = address_elem.find('span', {'itemprop': 'streetAddress'}).text.strip()
    city = address_elem.find('span', {'itemprop': 'addressLocality'}).text.strip()
    state = address_elem.find('span', {'itemprop': 'addressRegion'}).text.strip()
    zip_code = address_elem.find('span', {'itemprop': 'postalCode'}).text.strip()

    address_dict = {
        'Street Address': street_address,
        'City': city,
        'State': state,
        'Zip Code': zip_code
    }

    addresses.append(address_dict)


lats = [28.052180,27.902750,28.311190,28.178010,28.257800,28.105435,27.9585908,27.8927587,27.9367233]
longs =[-81.775290,-81.841770,-81.669660,-81.638610,-81.591570,-81.6385006,-81.6207432,-81.5891238,-81.9883637]
for i in range(len(addresses)):
    
    addresses[i]['Latitude'] = lats[i]
    addresses[i]['Longitude'] = longs[i]
    
# Write the address information to a CSV file
workbook = openpyxl.Workbook()
worksheet = workbook.active

fieldnames = ['Street Address', 'City', 'State', 'Zip Code', 'Latitude', 'Longitude']
worksheet.append(fieldnames)

for address in addresses:
    worksheet.append([address[fieldname] for fieldname in fieldnames])

workbook.save('publix_addresses.xlsx')


### Adding to Anylogic

You will need to add this as a database. Format the lat and long as doubles, then in the publix agent select 'loaded from database' select the sheetname. Add parameters 'GIS Location Latitude' to column latitude and 'GIS Location Longitude' to longitude. 

#### This will need to be done with citrus packers and citrus producing counties in the future

# Future Works : 

It will be useful to have all supermarket or whatever retailers you find necessary. This is up to future disucssion

List of all publix stores: https://malls.fandom.com/wiki/Publix/Locations

### When a larger dataset of supermarkets is scrape we will need to pull longitutde and latitude using Google Geocoding API

The API setup is listed here, you will need to generate an API key to put in 'api_key': https://developers.google.com/maps/documentation/geocoding/overview
You will need to create a cloud project with billing information from this link: https://developers.google.com/maps/documentation/elevation/cloud-setup

The function is listed below and will iterate through the dictionary and add the longitute and latitude 

In [None]:
import requests

def geocode_address(address):
    api_key = 'YOUR_API_KEY' # replace with your actual API key
    url = f'https://maps.googleapis.com/maps/api/geocode/json?address={address}&key={api_key}'
    response = requests.get(url).json()
    if response['status'] == 'OK':
        lat = response['results'][0]['geometry']['location']['lat']
        lng = response['results'][0]['geometry']['location']['lng']
        return lat, lng
    else:
        return None


for address_dict in addresses:
full_address = f"{address_dict['Street Address']}, {address_dict['City']}, {address_dict['State']} {address_dict['Zip Code']}"
coordinates = geocode_address(full_address)
if coordinates:
    address_dict['Latitude'] = coordinates[0]
    address_dict['Longitude'] = coordinates[1]
else:
    address_dict['Latitude'] = None
    address_dict['Longitude'] = None
