### Today's agenda

* Location analytics
    - Real-time geographic DA
    - Historical geographic DA
* Important fields where LA is used
* Python pacakges for Geospatial visualization
* Earthquake data preparation
* Earthquake data visualization

<center><h1>Location Analytics - Geospatial Visualization</h1></center>

* The ability to gain insights from the location or geographic component of business data.
* The important component is the location data.
* GIS - Geographic Information System

<!-- <img src="https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fwww.decisionanalyst.com%2Fmedia%2Fimg%2Fgeolocation.jpg"> -->

<img src="https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fwebassets.tomtom.com%2Fm%2F4b2c2dbb5560493c%2Foriginal%2Flocation-intelligence-hero_desktop_loc-intel_2000x1200.jpg&f=1&nofb=1">

<br>

**Credits** - Image from Internet

### Real-time Geographic Data Analytics

Getting insights from the data that comes into the system and relating that to a particluar location is called Real-time Geographic data analytics.

* Getting route navigation in Google Maps
* Courier and postal services
* Military serives
    - Getting the exact location of the enemy movements on the map to get informed

<br>

<img src="https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Flh5.ggpht.com%2F8v_bu98_4M0bxKmmlnLaSlfkRG8EYRjqrw4W4zCUCN4lRYr-lD0EsTUaC8VZXg_Yfq3_%3Dw300&f=1&nofb=1">

**Credits** - Image from Internet

### Historic Geographic Data Analytics

Getting insights from the historic data so as to predict the furture occurrence or disaster (purely by chance) completely based on the geographic location.

* Predict and prevent (taking necessary steps) the occurrence based on the past data
* Disaster prevention efforts
    - Floods
    - Volcanoes
    - Earthquakes

<img src="https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fsites.google.com%2Fsite%2Fnaturaldisasterpreventiongin%2F_%2Frsrc%2F1358266713367%2Fprevention-prediction%2Fimgres.jpeg%3Fheight%3D240%26width%3D320&f=1&nofb=1">

* Chain retailers benefit from the historical spending habits based on location to increase the sales

**Credits** - Image from Internet

### Important fields/topics

Location analytics (Geospatial science) can be used in every field/study where data is generated with respect to the location.

<img src="https://www.cdc.gov/gis/flexslider/images/slide1_data.jpg">

**Credits** - Image from Internet

* Demographic study - statistical study of population of all the areas based on location
    - Analysis based on education details
    - Analysis based on ethnicity
    - Analysis based on religion
    - Analysis based on births and deaths
    - ...
* Measurement of **Natural Occurrences** that happen on timely basis
* Increase or decrease of pollution in a particular city or country
* Geographic crime analysis
    - Analyse the crime data
    - Identify the crime types and hotspots
    - Help law enforcement plan for crime prevention
* ETC

### Python packages for Geospatial Visualization

* GeoViews      → explore & visualize geographical, meteorological, and oceanographic datasets
* Folium        → widely used geospatial data visualization library built on Leaflet.js framework
* Plotly        → Interactive map visualization - effectively uses Mapbox api
* KeplerGL      → built exclusively for jupyter notebook to visualize big geospatial data
* GeoPandas     → workhorse for working geo-data
* geonamescache → used to retrieve location datasets in the form of python dictionaries 

### `import` packages

In [1]:
import pandas as pd
import plotly.graph_objects as go

If you do not have the above packages, you can install by typing these commands on `Command Prompt` (CMD) - 

* `pip install pandas --user`
* `pip install plotly --user`

### Dataset description

**Earthquake data (from Yesterday)** - The data is obtained from `USGS` datasources. The data is updated every 1 minute. In this example we don't deal with streaming data.

* time
* latitude
* longitude
* mag (magnitude)
* place

In [2]:
eqdf = pd.read_csv('all_day.csv')
eqdf.shape

(355, 22)

In [3]:
eqdf.head()

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
0,2020-12-16T10:15:32.270Z,38.802502,-122.818337,1.14,0.55,md,7.0,127.0,0.01697,0.04,...,2020-12-16T10:17:09.175Z,"6km WNW of The Geysers, CA",earthquake,0.7,2.13,,1.0,automatic,nc,nc
1,2020-12-16T10:13:43.276Z,35.867401,-96.679527,3.884372,1.8,ml,24.0,53.922234,0.079339,0.159045,...,2020-12-16T10:15:37.030Z,"9 km WSW of Shamrock, Oklahoma",earthquake,2.4485,2.067139,,18.0,automatic,ok,ok
2,2020-12-16T09:55:30.640Z,19.504999,-155.652328,4.92,1.86,md,20.0,56.0,,0.33,...,2020-12-16T09:58:29.890Z,"22 km ENE of Honaunau-Napoopoo, Hawaii",earthquake,0.7,1.37,1.92,4.0,automatic,hv,hv
3,2020-12-16T09:50:35.827Z,17.7737,-65.7058,10.6,3.5,ml,,150.0,0.387,0.34,...,2020-12-16T10:13:48.013Z,"31 km SE of Emajagua, Puerto Rico",earthquake,3.6,10.8,0.057,41.0,reviewed,us,us
4,2020-12-16T09:35:49.950Z,37.635834,-118.879669,4.65,1.31,md,12.0,90.0,0.01608,0.02,...,2020-12-16T09:50:05.053Z,"9km E of Mammoth Lakes, CA",earthquake,0.39,0.79,0.22,13.0,automatic,nc,nc


In [4]:
print(eqdf.columns)

Index(['time', 'latitude', 'longitude', 'depth', 'mag', 'magType', 'nst',
       'gap', 'dmin', 'rms', 'net', 'id', 'updated', 'place', 'type',
       'horizontalError', 'depthError', 'magError', 'magNst', 'status',
       'locationSource', 'magSource'],
      dtype='object')


In [5]:
# take subset data (time, latitude, longitude, mag, place)
eqdf = eqdf[['time', 'latitude', 'longitude', 'mag', 'place']]

In [6]:
# head()
eqdf.head()

Unnamed: 0,time,latitude,longitude,mag,place
0,2020-12-16T10:15:32.270Z,38.802502,-122.818337,0.55,"6km WNW of The Geysers, CA"
1,2020-12-16T10:13:43.276Z,35.867401,-96.679527,1.8,"9 km WSW of Shamrock, Oklahoma"
2,2020-12-16T09:55:30.640Z,19.504999,-155.652328,1.86,"22 km ENE of Honaunau-Napoopoo, Hawaii"
3,2020-12-16T09:50:35.827Z,17.7737,-65.7058,3.5,"31 km SE of Emajagua, Puerto Rico"
4,2020-12-16T09:35:49.950Z,37.635834,-118.879669,1.31,"9km E of Mammoth Lakes, CA"


In [7]:
# shape
eqdf.shape

(355, 5)

In [9]:
# take "place" separately and display head
eqdf['place'].head(10)

0                6km WNW of The Geysers, CA
1            9 km WSW of Shamrock, Oklahoma
2    22 km ENE of Honaunau-Napoopoo, Hawaii
3         31 km SE of Emajagua, Puerto Rico
4                9km E of Mammoth Lakes, CA
5                   24 km S of Mina, Nevada
6                   30 km S of Mina, Nevada
7                   24 km S of Mina, Nevada
8                 35 km SSW of Mina, Nevada
9                   25 km S of Mina, Nevada
Name: place, dtype: object

### Data Preperation

In [10]:
# example
dp = '6km WNW of The Geysers, CA'
sa, a = dp.split(', ')
print(sa)
print(a)

6km WNW of The Geysers
CA


Separate `place` by → **,**

In [11]:
# area_list → split
area_list = eqdf['place'].str.split(', ').to_list()

Print first `5` from `area_list`

In [12]:
area_list[:5]

[['6km WNW of The Geysers', 'CA'],
 ['9 km WSW of Shamrock', 'Oklahoma'],
 ['22 km ENE of Honaunau-Napoopoo', 'Hawaii'],
 ['31 km SE of Emajagua', 'Puerto Rico'],
 ['9km E of Mammoth Lakes', 'CA']]

Make two columns `area` and `sub_area` from area_list

In [13]:
area = []
sub_area = []

for p in area_list:
    if len(p) == 2:
        area.append(p[1])
        sub_area.append(p[0])
    else:
        area.append(p[0])
        sub_area.append(p[0])

Print first `5` from `area`

In [16]:
area[:5]

['CA', 'Oklahoma', 'Hawaii', 'Puerto Rico', 'CA']

Print first `5` from `sub_area`

In [17]:
sub_area[:5]

['6km WNW of The Geysers',
 '9 km WSW of Shamrock',
 '22 km ENE of Honaunau-Napoopoo',
 '31 km SE of Emajagua',
 '9km E of Mammoth Lakes']

Add `area` and `sub_area` as columns in `eqdf`

In [18]:
eqdf['sub_area'] = sub_area
eqdf['area'] = area

In [19]:
# head()
eqdf.head()

Unnamed: 0,time,latitude,longitude,mag,place,sub_area,area
0,2020-12-16T10:15:32.270Z,38.802502,-122.818337,0.55,"6km WNW of The Geysers, CA",6km WNW of The Geysers,CA
1,2020-12-16T10:13:43.276Z,35.867401,-96.679527,1.8,"9 km WSW of Shamrock, Oklahoma",9 km WSW of Shamrock,Oklahoma
2,2020-12-16T09:55:30.640Z,19.504999,-155.652328,1.86,"22 km ENE of Honaunau-Napoopoo, Hawaii",22 km ENE of Honaunau-Napoopoo,Hawaii
3,2020-12-16T09:50:35.827Z,17.7737,-65.7058,3.5,"31 km SE of Emajagua, Puerto Rico",31 km SE of Emajagua,Puerto Rico
4,2020-12-16T09:35:49.950Z,37.635834,-118.879669,1.31,"9km E of Mammoth Lakes, CA",9km E of Mammoth Lakes,CA


In [20]:
# shape
eqdf.shape

(355, 7)

Remove the column `place` from `eqdf`

In [21]:
# prep_df
prep_df = eqdf.drop(columns=['place'], axis=1)

In [22]:
# head()
prep_df.head()

Unnamed: 0,time,latitude,longitude,mag,sub_area,area
0,2020-12-16T10:15:32.270Z,38.802502,-122.818337,0.55,6km WNW of The Geysers,CA
1,2020-12-16T10:13:43.276Z,35.867401,-96.679527,1.8,9 km WSW of Shamrock,Oklahoma
2,2020-12-16T09:55:30.640Z,19.504999,-155.652328,1.86,22 km ENE of Honaunau-Napoopoo,Hawaii
3,2020-12-16T09:50:35.827Z,17.7737,-65.7058,3.5,31 km SE of Emajagua,Puerto Rico
4,2020-12-16T09:35:49.950Z,37.635834,-118.879669,1.31,9km E of Mammoth Lakes,CA


Consider the data where magnitude is greater than or equal to `2`

In [23]:
prep_df.shape

(355, 6)

In [24]:
# prep_df
prep_df = prep_df[prep_df['mag'] >= 2]

In [25]:
# shape
prep_df.shape

(70, 6)

In [26]:
prep_df.head()

Unnamed: 0,time,latitude,longitude,mag,sub_area,area
3,2020-12-16T09:50:35.827Z,17.7737,-65.7058,3.5,31 km SE of Emajagua,Puerto Rico
15,2020-12-16T08:56:57.370Z,39.0285,-123.065002,2.45,13km W of Lakeport,CA
22,2020-12-16T07:58:55.461Z,-32.8577,-72.0384,4.5,43 km WNW of Valparaíso,Chile
24,2020-12-16T07:40:08.790Z,38.1397,-118.055,2.1,28 km S of Mina,Nevada
29,2020-12-16T07:06:49.237Z,61.5395,-146.439,2.4,45 km N of Valdez,Alaska


### Data Visualization

Check the frequency of occurrence by `area`

In [27]:
# occ_freq → value_counts().to_frame()
occ_freq = prep_df['area'].value_counts().to_frame()

In [28]:
occ_freq

Unnamed: 0,area
Puerto Rico,15
CA,13
Alaska,11
Nevada,8
Hawaii,5
Idaho,3
Chile,3
Oklahoma,2
Japan,2
Dominican Republic,2


default plot will not be so interactive

In [37]:
# occ_freq.plot(kind='pie', subplots=True)

In [30]:
# index
occ_freq.index

Index(['Puerto Rico', 'CA', 'Alaska', 'Nevada', 'Hawaii', 'Idaho', 'Chile',
       'Oklahoma', 'Japan', 'Dominican Republic', 'Philippines', 'Banda Sea',
       'Vanuatu', 'Washington', 'Honduras'],
      dtype='object')

`Pie` chart with `Plotly`

* labels → index
* values → area
* lrtb → 0

refer → https://plotly.com/python/pie-charts/

In [49]:
# pie chart
trace = go.Pie(
    labels=occ_freq.index,
    values=occ_freq.area
)

layout = go.Layout(
    height=400,
    width=600,
    margin=dict(l=0, r=0, t=0, b=0)
)

fig = go.Figure(data=[trace], layout=layout)
fig.show()

`Bar` chart of magnitude based on area with `Plotly`

* x → sub_area
* y → mag
* lrtb → 0

refer → https://plotly.com/python/bar-charts/

In [45]:
region = 'Alaska'
# rdf from prep_df
rdf = prep_df[prep_df['area'] == region]

In [46]:
# head()
rdf

Unnamed: 0,time,latitude,longitude,mag,sub_area,area
29,2020-12-16T07:06:49.237Z,61.5395,-146.439,2.4,45 km N of Valdez,Alaska
32,2020-12-16T06:58:29.763Z,60.5611,-152.0898,2.0,42 km W of Salamatof,Alaska
50,2020-12-16T05:49:24.834Z,53.6285,-160.1968,5.3,191 km S of Sand Point,Alaska
70,2020-12-16T04:24:53.513Z,60.5795,-142.638,3.6,96 km S of McCarthy,Alaska
79,2020-12-16T03:46:19.322Z,60.9246,-149.0159,2.2,8 km ESE of Girdwood,Alaska
99,2020-12-16T02:34:49.938Z,54.6518,-159.9375,3.1,84 km SSE of Sand Point,Alaska
276,2020-12-15T15:53:31.414Z,63.1054,-148.9871,2.0,31 km S of Cantwell,Alaska
293,2020-12-15T14:39:56.160Z,60.5026,-151.9571,2.6,37 km WSW of Salamatof,Alaska
298,2020-12-15T14:31:53.551Z,61.0575,-152.35,2.2,65 km W of Tyonek,Alaska
343,2020-12-15T11:36:39.304Z,62.8413,-150.5035,3.7,40 km NNE of Petersville,Alaska


In [48]:
# bar chart
trace = go.Bar(
    x=rdf['sub_area'],
    y=rdf['mag']
)

layout = go.Layout(
    height=400,
    width=600,
    margin=dict(l=0, r=0, b=0, t=0)
)

fig = go.Figure(data=[trace], layout=layout)
fig.show()

### Map Visualization

* lat → latitudes
* lon → longitudes
* mode → markers
    - size → 10
    - color → red
* text → sub_area

refer → https://plotly.com/python/scattermapbox/

In [53]:
region = 'Alaska'

rdf = prep_df[prep_df['area'] == region]
rdf.head()

Unnamed: 0,time,latitude,longitude,mag,sub_area,area
29,2020-12-16T07:06:49.237Z,61.5395,-146.439,2.4,45 km N of Valdez,Alaska
32,2020-12-16T06:58:29.763Z,60.5611,-152.0898,2.0,42 km W of Salamatof,Alaska
50,2020-12-16T05:49:24.834Z,53.6285,-160.1968,5.3,191 km S of Sand Point,Alaska
70,2020-12-16T04:24:53.513Z,60.5795,-142.638,3.6,96 km S of McCarthy,Alaska
79,2020-12-16T03:46:19.322Z,60.9246,-149.0159,2.2,8 km ESE of Girdwood,Alaska


In [54]:
region_lats = rdf['latitude'].to_list()
region_lons = rdf['longitude'].to_list()
region_text = rdf['sub_area'].to_list()

In [55]:
trace = go.Scattermapbox(
    lat=region_lats,
    lon=region_lons,
    mode='markers',
    marker=dict(
        size=10,
        color='red'
    ),
    text=region_text,
    hoverinfo='text',
    showlegend=False
)

layout = go.Layout(
    height=400,
    width=600,
    margin=dict(l=0, r=0, t=0, b=0),
    mapbox_style='stamen-terrain',
    mapbox=dict(
        center=dict(
            lat=region_lats[0],
            lon=region_lons[0]
        ),
        zoom=3
    )
)

fig = go.Figure(data=[trace], layout=layout)
fig.show()

In [None]:
# from IPython.display import Image
# Image(filename='newplot.png')

### What did we learn?

* Location Analytics
* Where it is used and how it is important for the business or the govt agency
* Python modules that support geospatial visualization
* Earthquake data - using pandas
    - Data preparation
    - Data filtering
* Earthquake data visualization

In [56]:
"BYE"

'BYE'