In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import requests
import datetime
import json

from utils import get_data

<h1 style="text-align: center;">Data gathering</h1>

<hr style="border: 0.5px dashed;">
<h2>1. Detroit 911 calls</h2>

### Main database 
The data can be view and obtain from here: https://hub.arcgis.com/datasets/detroitmi::911-calls-for-service?geometry=-85.963%2C42.028%2C-80.880%2C42.738

As noted on the data website, these records go back to September 20, 2016.

Accessing the data using an API in also available at this API endpoint: https://opengis.detroitmi.gov/opengis/rest/services/PublicSafety/911CallsForService/FeatureServer/0/query

Documentation of this API can be found here:
- https://developers.arcgis.com/rest/services-reference/query-feature-service-layer-.htm
- https://opengis.detroitmi.gov/opengis/sdk/rest/index.html#/Query_Feature_Service_Layer/02ss0000002r000000/

In [244]:
# GET DATA
# NOTE THE USE OF QUOTATION MARKS FOR DATE
# The time here is in local time of the sender.
years = [2016, 2017, 2018, 2019, 2020, 2021]
for year in years:
    date_start = f"'{year}-01-01 00:00:00'" # 911 calls from this date forward
    date_end = f"'{year+1}-01-01 00:00:00'" # 911 calls upto but NOT include this date.
    last_30_days = False
    # Get data
    df = get_data.get_911_records_Detroit(date_start, date_end, last_30_days=last_30_days)
    # write dataframe to file
    file_name = f'911_Calls_{year}_file0.csv'
    df.to_csv(f"data/raw/Detroit_911_calls/{file_name}", mode="w", header=True, index=False)

--------------------------------------------------
2021-04-08 17:04:48 Sending request ...
2021-04-08 17:05:01 Received.
--------------------------------------------------
2021-04-08 17:05:03 Sending request ...
2021-04-08 17:05:17 Received.
--------------------------------------------------
2021-04-08 17:05:20 Sending request ...
2021-04-08 17:05:35 Received.
--------------------------------------------------
2021-04-08 17:05:38 Sending request ...
2021-04-08 17:05:52 Received.
--------------------------------------------------
2021-04-08 17:05:55 Sending request ...
2021-04-08 17:05:58 Received.
Number of records:  130482
Number of pages:  5
--------------------------------------------------
2021-04-08 17:06:01 Sending request ...
2021-04-08 17:06:20 Received.
--------------------------------------------------
2021-04-08 17:06:22 Sending request ...
2021-04-08 17:06:40 Received.
--------------------------------------------------
2021-04-08 17:06:43 Sending request ...
2021-04-08 17:0

### Data last 30 days
**Notes**: 

For 911 call records in the last 30 days, it is best to query from the 'last-30-days' database here: https://hub.arcgis.com/datasets/detroitmi::911-calls-for-service-last-30-days-1?geometry=-84.781%2C42.126%2C-82.239%2C42.481 
**This database seems more responsive (with faster update on most recent calls) than the big database.**

More exploration can be done here: https://www.arcgis.com/home/webmap/viewer.html?layers=2901fec24266445588b4a3bf67098886

The API endpoint for this database is here: https://services2.arcgis.com/qvkbeam7Wirps6zC/arcgis/rest/services/911_Calls_for_Service_(Last_30_Days)/FeatureServer/0/query

In [262]:
now = datetime.datetime.now()
now_str = now.strftime('%Y-%m-%d %H:%M:%S')
prev_30_days = now - datetime.timedelta(days=30)
prev_30_days_str = prev_30_days.strftime('%Y-%m-%d %H:%M:%S')
print('Now: ', now_str)
print('Prev 30 days: ', prev_30_days_str)
date_start = f"'{prev_30_days_str}'" # 911 calls from this date forward
date_end = f"'{now_str}'" # 911 calls upto but NOT include this date.
last_30_days = True
# Get data
df = get_data.get_911_records_Detroit(date_start, date_end, last_30_days=last_30_days)
# write dataframe to file
file_name = f'911_Calls_2021_file1.csv'
df.to_csv(f"data/raw/Detroit_911_calls/{file_name}", mode="w", header=True, index=False)

Now:  2021-04-08 18:04:19
Prev 30 days:  2021-03-09 18:04:19
--------------------------------------------------
2021-04-08 18:04:19 Sending request ...
2021-04-08 18:04:30 Received.
--------------------------------------------------
2021-04-08 18:04:33 Sending request ...
2021-04-08 18:04:39 Received.
--------------------------------------------------
2021-04-08 18:04:42 Sending request ...
2021-04-08 18:04:46 Received.
Number of records:  80745
Number of pages:  3


---
## 2. Detroit neighborhoods info and polygons.

The data can be view and obtain from here: https://hub.arcgis.com/datasets/detroitmi::current-city-of-detroit-neighborhoods

The API endpoint is here: https://services2.arcgis.com/qvkbeam7Wirps6zC/arcgis/rest/services/Current_City_of_Detroit_Neighborhoods/FeatureServer/0/query. For API documentation, use the same as of the 911 calls API above.

In [3]:
# New name to store data. DO NOT include extension.
file_name = 'neighborhoods_info_2021-04-09' # NOT include extension
# Get boundary data from city
data = get_data.get_neighborhood_boundary_Detroit()
# Write json to file to be used later when making geo-heatmap
with open(f'data/raw/{file_name}.json', 'w') as outfile:
    json.dump(data, outfile)
# Convert json to dataframe
df = get_data.boundary_json_to_df(data)
# write neighborhood information to csv file
df.to_csv(f"data/raw/{file_name}.csv", mode="w", header=True, index=False)

<hr style="border: 0.5px dashed;">
<h2>3. Gather tweets</h2>

GPS-tagged tweets could be used as a indicator for people movement within the city and hence could potentially be useful in forcasting emergency calls volume and location. These data are optional for the AI models presented here.Unfortunately, it is later shown that the amount of GPS-tagged tweets are so small that prove non-useful.

<b>NOTE: Twitter API</b>
<ul>
    <li>Time stamp: Twitter time stamp in GMT</li>
    <li>Radius: "mi" or "km". Maximum 25mi.</li>
</ul>
<b>Current issues:</b>
<ul>
    <li>2020-09-23</li>
    <ul>
        <li>While can search tweets comming from around a geo-coordinate. Twitter will fall back to user's profile coordinate if tweets' geotag is not found and not enabled. However strangely, when I queried at (42.437298,-82.951111) Detroit, I got tweets by user_id=1308885595586953222, who profile is in England.</li>
    </ul>
</ul>
<b>Tweets processing</b>
<ul>
    <li>Select only geo-tagged tweet</li>
    <li>Remove tweets related to advertising (e.g. jobs, traffic updates, internet bot). DON'T Know how to do this yet.</li>
</ul>

In [2]:
# CREATE THE QUERY OBJECT
# Import custom utility package
import utils
# Import personal Twitter API secrets
from keys import my_api_secrets
# List of gps coordinates (radius=1 mile) covering Detroit
# This is used for Twitter REST API
coords = utils.Detroit_gps.coords_1mile
r = "1mi"
# Detroit bounding box. Used for Twitter stream API
box = utils.Detroit_gps.box
# **CSV FILE INCREMENT**
num = 6

# Create tweet query object
# NOTE: Twitter_query object will write csv files into
# directory "data/raw/tweets/"
detroit_tweets = utils.Twitter_query.Twq(my_api_secrets.twitter_secrets, coords, r, box, num)

In [None]:
# If need to reimport the utils package
#import importlib
#importlib.reload(utils)

In [4]:
# METHOD 1: REST API, search (pull) request.
# Run search once
detroit_tweets.search()
# Schedule search for automatic run in future
#detroit_tweets.repeated_search(interval=1) #every 1hour

Local time: 2020-12-24	0:1
Tweets: 2241	Users: 2241	Places: 1718
--------------------------------------------------


In [None]:
# METHOD 2: LIVE STREAM API, twitter push request.
# start stream
detroit_tweets.start_stream()

In [None]:
# stop stream
detroit_tweets.stop_stream()

In [None]:
detroit_tweets.search_api.rate_limit_status()