<h1>HASS & IRDC Summer School - Intro to GIS Analysis Workshop</h1>

<h3>Setup</h3>

Run the following block of code before continuing.

In [1]:
import warnings
import pandas as pd
import folium
warnings.filterwarnings("ignore")

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


<h2>Part 1: Point Distibution Analysis</h2>

Point distribution analysis is a method used to describe the shape, centre, and spread of geospatial data. This typically involves visualising the data, calculating the geo midpoint, and calculating statistics related to the distances between points in the dataset.

Complete each of the exercises below by running the code and answering any corresponding questions.

<b>Exercise (a):</b> Import the required module from GeoJikuu to perform point distribution analysis.

In [2]:
from geojikuu.descriptives.spatial_distribution import PointDistribution

<b>Exercise (b):</b> Load the dataset titled <i>'TLCMLayer_154 (POW camps).csv'</i> from file and store it as a DataFrame. Then, call the DataFrame's head() function to preview the first 5 observations in the dataset.

In [3]:
df = pd.read_csv("TLCMLayer_154 (POW camps).csv")
df.head()

Unnamed: 0,ghap_id,layer_id,title,record_type,latitude,longitude,datestart,dateend,linkback,created_at,updated_at,dataset_order,Country,Region,Description,HowManyPOWs,HowManyPOWsNotes,Coordinates notes
0,t7100,154,Assumption Convent,Other,14.546,121.022,1941,1945,https://en.wikipedia.org/wiki/List_of_Japanese...,2021-12-01 17:36:21,2023-12-11 17:48:22,0,Phillipine Islands,Manila Area: 31,"WWII Japanese Prisoner Of War camps, primarily...",,,Now a college. Burned down during the Liberati...
1,t7101,154,Ateneo de Manila,Other,14.64,121.078,1941,1945,https://en.wikipedia.org/wiki/List_of_Japanese...,2021-12-01 17:36:21,2023-12-11 17:48:22,1,Phillipine Islands,Manila Area: 31,"WWII Japanese Prisoner Of War camps, primarily...",,,
2,t7102,154,Bachrach Garage,Other,14.657,120.983,1941,1945,https://en.wikipedia.org/wiki/List_of_Japanese...,2021-12-01 17:36:21,2023-12-11 17:48:22,2,Phillipine Islands,Manila Area: 31,"WWII Japanese Prisoner Of War camps, primarily...",,,Estimated coordinates. POW used to repair Japa...
3,t7103,154,Bilibid,Other,14.382,121.029,1941,1945,https://en.wikipedia.org/wiki/List_of_Japanese...,2021-12-01 17:36:21,2023-12-11 17:48:22,3,Phillipine Islands,Manila Area: 31,"WWII Japanese Prisoner Of War camps, primarily...",,,http://philippinephilatelist.net/Collections/J...
4,t7104,154,Camp Murphy,Other,14.609,121.064,1941,1945,https://en.wikipedia.org/wiki/List_of_Japanese...,2021-12-01 17:36:21,2023-12-11 17:48:22,4,Phillipine Islands,Manila Area: 31,"WWII Japanese Prisoner Of War camps, primarily...",,,


Answer the following questions:
* What does this dataset represent?
* What does each row represent?
* What does each column represent?

<b>Exercise (c):</b> The following block of code performs an operation on the DataFrame and then calls the head() function to display the first five observations (just like the previous exercise). Run the code, and then explain what that operation has done.

In [4]:
df.drop(df.columns.difference(['title','latitude', 'longitude']), axis=1, inplace=True)
df.head()

Unnamed: 0,title,latitude,longitude
0,Assumption Convent,14.546,121.022
1,Ateneo de Manila,14.64,121.078
2,Bachrach Garage,14.657,120.983
3,Bilibid,14.382,121.029
4,Camp Murphy,14.609,121.064


<b>Exercise (d):</b> It's time to visualise the dataset as points on a map. Modify the colours, opacity, and size of the points to your liking, then run the following block of code.

In [5]:
research_area = [19.8407136, 111.2045519]
init_map = folium.Map(location=research_area, zoom_start=3)

init_coords = list(zip(df['latitude'], df['longitude']))

place_name = df["title"].values.tolist()

for i in range(0, len((init_coords))):
    popup = "<b>Place: </b>" + place_name[i]
    folium.Circle(init_coords[i], color='orange', fillColor='orange', fillOpacity=0.5, radius=1000, popup=popup).add_to(init_map)
   

folium.TileLayer(
    tiles = 'https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x}',
    attr = 'Esri',
    name = 'Esri Satellite',
    overlay = False,
    control = True
    ).add_to(init_map)

init_map

<b>Exercise (e):</b> Given the map above, describe the visual distribution of the dataset. Can you see any interesting shapes or patterns? Where do the majority of observations seem to be?

<b>Exercise (f):</b> Run the following line of code to create a PointDistribution object. This will be used to calculate our point distribution metrics.

In [6]:
pow_pd = PointDistribution(list(zip(df['latitude'], df['longitude'])))

<b>Exercise (g):</b> Next, we will calculate the geo midpoint and place it on our map. Modify the visual properties of the geo midpoint, and then run the next block of code.

In [7]:
midpoint = pow_pd.geo_midpoint()
folium.Circle(midpoint, color='red', fillColor='red', fillOpacity=0.5, radius=1000).add_to(init_map)
init_map

<b>Exercise (h):</b> In reference to the map above, where does the geo midpoint reside? Which observation is closest to the geo midpoint?

<b>Exercise (i):</b> Run the following code and explain what the output means. Keep in mind that by default, these functions calculate the displacement of each point in relation to each other point.

In [8]:
pow_pd.mean_displacement()

2709.6464240151295

In [9]:
pow_pd.displacement_quartiles()

{'MIN': 0.0,
 'Q1': 1533.7053087238141,
 'MEDIAN': 2588.7100084978656,
 'Q3': 3810.4378937826814,
 'MAX': 8043.807931612851,
 'IQR': 2276.7325850588672,
 'RANGE': 8043.807931612851}

<b>Exercise (j):</b> Use the information you have discovered to desrcibe the dataset in a paragraph.

<b>Exercise (k):</b> Ponder the real-world implications of what you have found. Do you have any hypotheses or explanations regarding the results of the analysis? Be encouraged to share your thoughts and ideas with others.

In [10]:
init_map

<h2>Part 2: Spacetime Point Distibution Analysis</h2>

Spacetime distribution analysis is a method used to describe the shape, centre, and spread of spatiotemporal data. It is similar to point distribution analysis, but we instead calculate the spatiotemporal midpoint, and calculate statistics related to the spatiotemporal distances between points.

As before, complete each of the exercises below by running the code and answering any corresponding questions.

<b>Exercise (a):</b> Import the required module from GeoJikuu to perform point distribution analysis.

In [11]:
from geojikuu.descriptives.spacetime_distribution import SpacetimePointDistribution

<b>Exercise (b):</b> Load the dataset titled <i>'Japan earthquakes 2018.csv'</i> from file and store it as a DataFrame. Then, call the DataFrame's head() function to preview the first 5 observations in the dataset.

In [12]:
df = pd.read_csv("Japan earthquakes 2018.csv")
df.head()

Unnamed: 0,date,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,id,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
0,27/11/2018,48.378,154.962,35.0,4.9,mb,,92,5.044,0.63,...,us1000hx1b,"269km SSW of Severo-Kuril'sk, Russia",earthquake,7.6,1.7,0.036,248,reviewed,us,us
1,26/11/2018,36.0733,139.783,48.82,4.8,mww,,113,1.359,1.13,...,us1000hwvf,"3km SSW of Sakai, Japan",earthquake,6.0,6.1,0.071,19,reviewed,us,us
2,26/11/2018,50.0727,156.142,66.34,4.6,mb,,128,3.191,0.62,...,us1000hwkm,"67km S of Severo-Kuril'sk, Russia",earthquake,9.7,7.8,0.045,151,reviewed,us,us
3,26/11/2018,38.8576,141.8384,50.56,4.5,mb,,145,1.286,0.84,...,us1000hwmi,"26km SSE of Ofunato, Japan",earthquake,8.4,9.5,0.156,12,reviewed,us,us
4,25/11/2018,33.95,134.4942,38.19,4.6,mb,,104,0.558,0.61,...,us1000hwig,"9km SW of Komatsushima, Japan",earthquake,3.4,10.1,0.132,17,reviewed,us,us


Answer the following questions:
* What does this dataset represent?
* What does each row represent?
* What does each column represent?

<b>Exercise (c):</b> Run the following line of code to drop the unwanted variables. Then, preview the results using head().

In [13]:
df.drop(df.columns.difference(['latitude', 'longitude', 'date', 'place']), axis=1, inplace=True)
df.head()

Unnamed: 0,date,latitude,longitude,place
0,27/11/2018,48.378,154.962,"269km SSW of Severo-Kuril'sk, Russia"
1,26/11/2018,36.0733,139.783,"3km SSW of Sakai, Japan"
2,26/11/2018,50.0727,156.142,"67km S of Severo-Kuril'sk, Russia"
3,26/11/2018,38.8576,141.8384,"26km SSE of Ofunato, Japan"
4,25/11/2018,33.95,134.4942,"9km SW of Komatsushima, Japan"


<b>Exercise (d):</b> Run the following line of code to create a SpacetimePointDistribution object. This will be used to calculate our spacetime point distribution metrics.

In [14]:
eq_sd = SpacetimePointDistribution(list(zip(df['latitude'], df['longitude'], df["date"])))

<b>Exercise (e):</b> Run the following line of code and explain what the output means.

In [15]:
eq_sd.mean_displacement()

{'MEAN SPATIAL DISPLACEMENT': 1301.3662294401524,
 'MEAN TEMPORAL DISPLACEMENT': 106.00779134071527}

<b>Exercise (f):</b> Again, run the following line of code and explain what the output means.

In [16]:
eq_sd.displacement_quartiles()

{'MIN': {'DIS': 0.3493184190352504, 'TEMP': 0},
 'Q1': {'DIS': 652.7609312758815, 'TEMP': 37.0},
 'MEDIAN': {'DIS': 1230.5673335507308, 'TEMP': 88.0},
 'Q3': {'DIS': 1812.10748919843, 'TEMP': 165.0},
 'MAX': {'DIS': 3999.6194348761337, 'TEMP': 329},
 'IQR': {'DIS': 1159.3465579225485, 'TEMP': 128.0},
 'RANGE': {'DIS': 3999.2701164570985, 'TEMP': 329}}

<b>Exercise (g):</b> Finally, run the following line of code and explain what the output means.

In [17]:
eq_sd.geo_temporal_midpoint()

(36.811573171324596, 140.76995788198764, '11/07/2018')

<b>Exercise (h):</b> Use the information you have discovered to desrcibe the dataset in a paragraph.

<b>Exercise (i):</b> Ponder the real-world implications of what you have found. Do you have any hypotheses or explanations regarding the results of the analysis? Be encouraged to share your thoughts and ideas with others.

<b>Exercise (j) (CHALLENGE):</b> Repurpose the appropriate code from Part 1 to map the spatial coordinates of the <i>'Japan earthquakes 2018.csv'</i> dataset. Add the 'place' and 'date' variables of each earthquake to each point's popup window.