<img src='img/cover.png'>

## Introduction

#### Topic

For the theme of this project, I selected the opiate overdose crisis. We've come a long way toward recognizing addiction as a mental health issue rather than a criminal one, but there is still a lot of stigma and misunderstanding. I believe in shining a light into that darkness. And the overdose crisis is a personal issue to me; I lost my cousin, Vince, and I lost my uncle, Kevin. 

#### Structure

This project, Learn Understand Help, is broken out into three components.

<img src='img/LUH-overview.png'>

<img src="img/learn.png">

In [10]:
# NOTE: Please start Jupyter Notebook from a command prompt while 
# inside the folder of the Learn_Understand_Help repository, 
# so relative links in the notebook will work no matter where in your 
# file system your repository lives.

<img src='img/understand.png'>

<img src="img/help.png">

# HELP: Get Naloxone, reverse overdoses

## Installations

In [2]:
# !conda create --name shopify flask matplotlib numpy pandas requests bs4

# From inside the environment:
# !conda install -c conda-forge folium
# !conda install -c gusdunn pdfplumber


## Imports

In [1]:
import folium
import math
import numpy as np
import pandas as pd
# import pdfplumber
import requests
import time

from flask import Flask, render_template
from folium.plugins import MarkerCluster

%matplotlib inline
# %xmode Minimal
# %xmode Plain
# %xmode Context
%xmode Verbose 
# Verbose exception mode: This is for me for testing; feel free to turn it off!

Exception reporting mode: Verbose


## Functions

In [2]:
# This function uses the GeoJS API to identify or approximate 
# a user's location NOTE: This is based on IP address, which
# reduces accuracy

def identify_user_location():
    
    get_ip = requests.get('https://get.geojs.io/v1/ip.json') # request user's IP address
    ip_address = get_ip.json()['ip']                         # parse via json
    location_request = requests.get('https://get.geojs.io/v1/ip/geo/' + ip_address + '.json') 
    # request additional params
    location_parameters = location_request.json()            # parse via json 
    user_longitude = location_parameters['longitude']
    user_latitude = location_parameters['latitude']
    return (user_longitude, user_latitude)

# This function uses the GeoCoder.ca API to get a pair of 
# lat/long coordinates by looking up the postal code

def postal_code_to_lat_long(postcode):
    location_request = requests.get('https://geocoder.ca/?postal=' + postcode + '&geoit=XML&json=1')
    location_parameters = location_request.json()
    latitude = location_parameters['latt']
    longitude = location_parameters['longt']
    return latitude, longitude

# The Haversine formula is a mathematical equation for finding the distance 
# between two points on the globe given that it is spherical and not flat.
# The original author of this implementation is Wayne Dyck. I edited it to better suit
# my objectives.


def distance(origin_lat, origin_long, dest_lat, dest_long):

    earths_radius = 6371 # km

    dlat = math.radians(dest_lat - origin_lat)
    dlon = math.radians(dest_long - origin_long)
    a = math.sin(dlat/2) * math.sin(dlat/2) + math.cos(math.radians(origin_lat)) \
        * math.cos(math.radians(dest_lat)) * math.sin(dlon/2) * math.sin(dlon/2)
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
    d = earths_radius * c

    return d


# This simple function takes a pandas series representing the contents 
# of a dataframe column, and isolates the first value. It strips out some 
# extraneous values that are part of the series.

def isolate_the_variable(pandas_series):
    return pandas_series.tolist()[0]

## Part 1: Generating the list of pharmacies

A list of Ontario pharmacies can be sourced online here: 
https://www.manionwilkins.com/wp-content/uploads/2018/07/Quarterly-Disp-Fee-Rpt_with-DA-COB-claims-ESC-BOB-Q2-2018-Ontario-only-wdc-EN.pdf

Note that the Toronto pharmacy database that I have built from it is strictly for proof of concept, and a production model will need to be validated for accuracy. 

In [14]:
# Launch PDFplumber to access the table data
pdf = pdfplumber.open("pdf/Ontario_pharmacy_list.pdf")

# I feel like there must be a more efficient way to do this, 
# but I will leave it for now to keep moving forward on this project

# Access the Toronto pages of the PDF
p37 = pdf.pages[37]
p38 = pdf.pages[38]
p39 = pdf.pages[39]
p40 = pdf.pages[40]
p41 = pdf.pages[41]
p42 = pdf.pages[42]
p43 = pdf.pages[43]

# Extract the text from them
text37 = p37.extract_text()
text38 = p38.extract_text()
text39 = p39.extract_text()
text40 = p40.extract_text()
text41 = p41.extract_text()
text42 = p42.extract_text()
text43 = p43.extract_text()

# concatenate them
text37 = text37 + text38
text37 = text37 + text39
text37 = text37 + text40
text37 = text37 + text41
text37 = text37 + text42
text37 = text37 + text43
all_text = text37


**MPORTANT:** Certain aspects of reviewing and correcting the output of the above could not be automated, so they were undertaken manually, applying domain knowledge. The final output can be found in the file **csv/toronto_pharmacies_initial.csv** in this Github repository.

### Reading the pharmacies into the dataframe

In [15]:
pharmacies_to_process = pd.read_csv('csv/toronto_pharmacies_initial.csv')

### Exploratory data analysis

Let's have a look at our (comparatively simple) dataframe.

In [16]:
pharmacies_to_process.columns  # they are all strings, this is fine

Index(['Name', 'Address', 'City', 'Postcode'], dtype='object')

In [17]:
pharmacies_to_process.describe()

Unnamed: 0,Name,Address,City,Postcode
count,559,559,559,559
unique,347,556,1,531
top,SHOPPERS DRUG MART,939 O'CONNOR DRIVE,Toronto,M4B2S7
freq,85,2,559,3


In [18]:
pharmacies_to_process.head()

Unnamed: 0,Name,Address,City,Postcode
0,99 PHARMACY,436 DUNDAS STREET W.,Toronto,M5T1G7
1,ACTION PHARMACY,101-2425 BLOOR STREET WEST,Toronto,M6S4W4
2,ALBION FINCH PHARMACY,"6230 FINCH AVE WEST, UNIT A102",Toronto,M9V0A1
3,ALLCURES PHARMACY,31 ST. DENNIS DRIVE. SUITE #1,Toronto,M3C1G7
4,APEX DRUG MART,90 EGLINTON AVE. EAST,Toronto,M4P2Y3


In [19]:
pharmacies_to_process.tail()

Unnamed: 0,Name,Address,City,Postcode
554,YONGE DRUG MART,104-2399 YONGE ST,Toronto,M4P2E7
555,YONGE ELMWOOD PHARMACY,201-5025 YONGE STREET,Toronto,M2N5P2
556,YORK MEDICAL PHARMACY,"491 CHURCH ST, UPSTAIRS",Toronto,M4Y2C6
557,YOURS PHARMACY,796 QUEEN STREET EAST,Toronto,M4M1H4
558,ZARA'S PHARMACY,"1908 GERRARD STREET EAST, UNIT B",Toronto,M4L2C1


In [20]:
# Check for duplicates. There is one.

duplicateDFRow = pharmacies_to_process[pharmacies_to_process.duplicated()]
print(duplicateDFRow)

                   Name              Address     City Postcode
394  SHOPPERS DRUG MART  2528 BAYVIEW AVENUE  Toronto   M2L1A9


### Data cleaning and transformation

In [21]:
# change to Upper/lower case; use a list comprehension
for columns in pharmacies_to_process.columns:
    pharmacies_to_process[columns] = pharmacies_to_process[columns].str.title() 

# get rid of any pharmacies with identical addresses
pharmacies_to_process.drop_duplicates(subset ="Address", keep = False, inplace = True)

# establish proper formatting for postal code 
pharmacies_to_process['Postcode'] = [str(i[:3] + ' ' + i[3:]) for i in pharmacies_to_process['Postcode']]

# Confirm case change and postal code
pharmacies_to_process.head(1)

Unnamed: 0,Name,Address,City,Postcode
0,99 Pharmacy,436 Dundas Street W.,Toronto,M5T 1G7


In [22]:
# Confirm re duplicates, empty dataframe is expected result
duplicateDFRow = pharmacies_to_process[pharmacies_to_process.duplicated()]
print(duplicateDFRow)

Empty DataFrame
Columns: [Name, Address, City, Postcode]
Index: []


In [23]:
# Add latitude and longitude columns to the dataframe
# Notice the issue with Zara's Pharmacy, can this be corrected via regex?

pharmacies_to_process = pharmacies_to_process.assign(Latitude=np.nan,Longitude=np.nan)
pharmacies_to_process.tail()

Unnamed: 0,Name,Address,City,Postcode,Latitude,Longitude
554,Yonge Drug Mart,104-2399 Yonge St,Toronto,M4P 2E7,,
555,Yonge Elmwood Pharmacy,201-5025 Yonge Street,Toronto,M2N 5P2,,
556,York Medical Pharmacy,"491 Church St, Upstairs",Toronto,M4Y 2C6,,
557,Yours Pharmacy,796 Queen Street East,Toronto,M4M 1H4,,
558,Zara'S Pharmacy,"1908 Gerrard Street East, Unit B",Toronto,M4L 2C1,,


In [24]:
# Iterate through the list and query for the longitude and latitude using the Geocoder API

# PLEASE NOTE that this geocoder server can throttle requests at any time, resulting in
# the inability to get the necessary information, so it may take a day or two to get 
# all the longitudes/latitudes. If you have set your exception mode
# to verbose as I have above, the error message will indicate the throttling. A csv with 
# the longitude/latitudes that I pulled using this code is included as part of the Github
# repository.

# Unhash the code below (and the subsequent cell) if you wish to run the process to gather the latitude
# and longitude coordinates. If you don't get throttled, it should complete in just under an hour

# for i in range(len(pharmacies_to_process)):  # for each observation
    
#     # format a request in the manner that the Geocoder API expects
#     location_request = requests.get('https://geocoder.ca/?postal=' + pharmacies_to_process['Postcode'][i] + '&geoit=XML&json=1')
#     location_parameters = location_request.json() # Get details in json format
#     latitude = location_parameters['latt']
#     longitude = location_parameters['longt']
    
#     try:
#         pharmacies_to_process['latitude'][i] = latitude
            
#     except:
#         pharmacies_to_process['latitude'][i] = np.nan
                    
#     try:
#         pharmacies_to_process['longitude'][i] = longitude
            
#     except:
#         pharmacies_to_process['longitude'][i] = np.nan
            
#     time.sleep(5)

In [25]:
# Save the transformed data
# pharmacies_to_process.to_csv('csv/toronto_pharmacies_saved.csv', index = False)

## Part 2: Using spatial data to locate the nearest pharmacy

### Convert to geopandas dataframe and add geometry info

In [3]:
user_name = input('Welcome to the Naloxone pharmacy locator. May I please have your first name? ')

user_location = identify_user_location()
user_longitude = user_location[0]
user_latitude = user_location[1]
    
# Set up a dataframe with their information
user_details = {'Name': [user_name],
            'Address': ['Current User Location'],
            'City': ['Current city'],
            'Postcode': ["Current location postal code"],
            'Latitude': [user_latitude],
            'Longitude': [user_longitude]}
    
user_data = pd.DataFrame(user_details)

# Read in the pharmacies list and append the user's information to that dataframe
pharmacy_locator = pd.read_csv('csv/toronto_pharmacies_final.csv', dtype = {'Longitude' : 'float64', 'Latitude' : 'float64'})
pharmacy_locator = pharmacy_locator.append(user_data)

# Convert to a geopandas geodataframe
# pharmacy_locator = gpd.GeoDataFrame(pharmacy_locator, geometry=gpd.points_from_xy(pharmacy_locator.Longitude, pharmacy_locator.Latitude))

# Ensure that all the lat/long data is datatype float not str 
pharmacy_locator = pharmacy_locator.astype({'Longitude' : 'float', 'Latitude' : 'float'})

# Sort by location using pandas built-in sort_values function, 
# default algorithm is quicksort (average time commplexity is Î¸(n log(n))) 
pharmacy_locator.sort_values(['Longitude', 'Latitude'], ascending=[False, False], inplace=True)
pharmacy_locator.reset_index(drop=True, inplace=True)

# Now that the list (including the user's location) is sorted by location,
# identify the two closest locations, which should be directly above and below it
user_index = pharmacy_locator[pharmacy_locator['Name']==user_name].index.values
user_observation = pharmacy_locator.loc[user_index]
user_latitude = user_observation['Latitude'].tolist()[0] 
user_longitude = user_observation['Longitude'].tolist()[0]
user_coordinates = user_latitude, user_longitude

previous_observation = pharmacy_locator.loc[user_index - 1]
subsequent_observation = pharmacy_locator.loc[user_index + 1]

# Create distance-from-user column
pharmacy_locator.insert(6, 'Distance', 0.0)

# Assign to variables the long and lat of the user's location and the
# two locations closest to it. 

origin_lat = isolate_the_variable(user_observation['Latitude'])
origin_long = isolate_the_variable(user_observation['Longitude'])

previous_lat = isolate_the_variable(previous_observation['Latitude'])
previous_long = isolate_the_variable(previous_observation['Longitude'])

subsequent_lat = isolate_the_variable(subsequent_observation['Latitude'])
subsequent_long = isolate_the_variable(subsequent_observation['Longitude'])

previous_distance = distance(origin_lat, origin_long, previous_lat, previous_long)
subsequent_distance = distance(origin_lat, origin_long, subsequent_lat, subsequent_long)

if previous_distance >= subsequent_distance:
    closest_pharmacy = previous_observation
else:
    closest_pharmacy = subsequent_observation

closest_pharmacy_latitude = isolate_the_variable(closest_pharmacy['Latitude']) 
closest_pharmacy_longitude = isolate_the_variable(closest_pharmacy['Longitude'])
closest_pharmacy_coordinates = closest_pharmacy_latitude, closest_pharmacy_longitude
    
closest_PN = closest_pharmacy['Name']
closest_pharmacy_name = isolate_the_variable(closest_PN)

closest_PA = closest_pharmacy['Address']
closest_pharmacy_address = isolate_the_variable(closest_PA)

closest_PLat = closest_pharmacy['Latitude']
closest_pharmacy_latitude = isolate_the_variable(closest_PLat)

closest_PLong = closest_pharmacy['Longitude']
closest_pharmacy_latitude = isolate_the_variable(closest_PLong)

print('Identifying your location, ' + user_name + '. Your closest pharmacy is ' + closest_pharmacy_name + ', located at ' + closest_pharmacy_address + '.')


Welcome to the Naloxone pharmacy locator. May I please have your first name? Shawn
Identifying your location, Shawn. Your closest pharmacy is Rexall, located at 474 Spadina Ave.


In [4]:
print('Here is a map showing your nearest pharmacy, ' + user_name + '.')

Here is a map showing your nearest pharmacy, Shawn.


In [10]:
#Create the map
pharmacy_map = folium.Map(location = user_coordinates, zoom_start = 13)
                         
folium.Marker(user_coordinates, popup = 'You are here').add_to(pharmacy_map)
folium.Marker(closest_pharmacy_coordinates, popup = closest_pharmacy_name).add_to(pharmacy_map)

#Display the map
pharmacy_map


### Building a webapp version with Flask

In [13]:
# Instantiate flask

app = Flask(__name__)

@app.route('/')

def home():
    return "<h1> Hello World </h1>"

if __name__ =="__main__":
    app.run(debug=True, use_reloader=False)

 * Serving Flask app "__main__" (lazy loading)
 * Environment: production
   Use a production WSGI server instead.
 * Debug mode: on


 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
127.0.0.1 - - [02/Sep/2020 05:22:46] "[37mGET / HTTP/1.1[0m" 200 -
127.0.0.1 - - [02/Sep/2020 05:22:47] "[33mGET /favicon.ico HTTP/1.1[0m" 404 -
