In [3]:
import requests
from IPython.display import Markdown

url = 'https://kata.geosci.ai/challenge/birthquakes'
r = requests.get(url)
print(r.status_code)
Markdown(r.text)

200


# Birthquakes

We are going to look at earthquakes, on your birthdate. Birthquakes!

We will also be implementing the haversine formula for determining the distance between two ponts on the earth's surface.

This challenge is a bit different from the previous ones. You can use any old string for your key, as usual, but if you use a date, you'll get data for that date. For example:

      url = 'https://kata.geosci.ai/challenge/birthquakes'
      params = {'key': '1980-06-30'}  # <-- The key can be a date.
      r = requests.get(url, params)

Your challenge input is now `r.text`. There is a header row containing the names of the columns, plus a number of data rows or 'records'. Each row has 13 columns, and represents the data for a single earthquake.

You need to answer the following questions:

1. How many records (i.e. earthquakes) are there?
2. What is the depth **in metres** of the earthquake with the largest **Magnitude**? (If there's more than one, give the deepest.)
3. What is the great circle distance **to the nearest km**, as given by the haversine formula, between the epicentres of the two **largest** earthquakes, as measured by magnitude? (Again, if two earthquakes are equal in magnitude, choose the deepest first.)
4. Consider all pairs of events. How many pairs are within 100 km of each other? (The events must be less than 100 km from each other. A pair that is exactly 100 km apart would **not** be included.)

Note that because we're asking about epicentres, you don't need to worry about depth when calculating great circle distances.

For Question 4, only count unique pairs. For example, in the diagram below there are 15 pairs of points altogether, of which there are 7 pairs with a mutual distance of < 100 km here &mdash; 1 pair on the left and 6 on the right:

      
      x                  x
                            x
         x              x  x
            ==========
              100 km


## Haversine formula

There are several formulas for computing [great circle distance](https://en.wikipedia.org/wiki/Great-circle_distance) on a sphere. The simplest accurate one is the haversine formula, which is described here.

Given two points with (_latitude_, _longitude_), we'll denote point 1 with $(\varphi_1, \lambda_1)$ and point 2 with $(\varphi_2, \lambda_2)$. Then distance _d_ is related to radius _r_ by:

$$   d  = 2r \arcsin\left(\sqrt{\sin^2\left(\frac{\varphi_2 - \varphi_1}{2}\right) + \cos(\varphi_1) \cos(\varphi_2)\sin^2\left(\frac{\lambda_2 - \lambda_1}{2}\right)}\right)$$

Some hints about implementing this in Python:

- Use $r = 6371\ \mathrm{km}$ for the radius of the earth.
- $\sin^2(x)$ means $\sin(x) \times \sin(x)$.
- Both the `math` module and NumPy have the functions `sin()`, `cos()`; these functions expect radians, so an angle in degrees must be converted to radians with `radians()` before giving it to the function.
- The arcsine function in `math` is called `asin()`; in NumPy it's `arcsin()`.
- The function should return distances **to the nearest km**.
- You should get the following results from your function:
  - The distance from (0, 0) to (0, 1) is 111 km.
  - The distance from (0, 2.35) to (90, 2.35) is 10008 km. [(Why?)](https://en.wikipedia.org/wiki/History_of_the_metre)
  - The distance from (44.65, -63.58) to (53.73, -1.86) is 4448 km.


## A quick reminder how this works

This document is formatted in Markdown.

You can retrieve your data, which is always a string, by choosing any Python string as a **`<KEY>`** and substituting here:
    
    https://kata.geosci.ai/challenge/birthquakes?key=<KEY>
                                                     ^^^^^
                                                     use your own string here

To answer question 1, make a request like:

    https://kata.geosci.ai/challenge/birthquakes?key=<KEY>&question=1&answer=1234
                                                     ^^^^^          ^        ^^^^
                                                     your key       Q        your answer

To get a hint for question 1, do this (a key is not needed):

    https://kata.geosci.ai/challenge/birthquakes?question=1
                                                          ^
                                                          Q

[Complete instructions at kata.geosci.ai](https://kata.geosci.ai/challenge)

[An example notebook to get you started](https://gist.github.com/kwinkunks/50f11dac6ab7ff8c3e6c7b34536501a2)

----

© 2021 Agile Scientific, licensed CC-BY


In [4]:
my_key = "1992-07-03"
params = {'key': my_key}
r = requests.get(url, params)
print(r)
source = r.text

<Response [200]>


In [6]:
import pandas as pd
import numpy as np
from math import sin, cos, asin, sqrt, radians
from scipy.spatial.distance import cdist
from io import StringIO

output = StringIO(source)
quakes = pd.read_csv(output, sep='|', parse_dates=['Time'])

quakes.head()
    
#     ...

Unnamed: 0,#EventID,Time,Latitude,Longitude,Depth/km,Author,Catalog,Contributor,ContributorID,MagType,Magnitude,MagAuthor,EventLocationName
0,ci3034289,1992-07-03 23:59:50.670,34.078,-116.371,2.837,ci,ci,ci,ci3034289,mc,1.83,ci,"7km SE of Yucca Valley, California"
1,nc302573,1992-07-03 23:55:39.860,41.405167,-122.138167,12.087,nc,nc,nc,nc302573,md,2.59,nc,"16 km N of McCloud, California"
2,ci3096175,1992-07-03 23:55:09.560,34.553,-116.53,-0.076,ci,ci,ci,ci3096175,mc,2.39,ci,"39km WSW of Ludlow, California"
3,ci3034288,1992-07-03 23:54:05.800,34.111,-116.993,3.129,ci,ci,ci,ci3034288,mc,2.2,ci,"10km NNE of Yucaipa, California"
4,ci3034287,1992-07-03 23:52:41.060,34.961,-116.923,-0.785,ci,ci,ci,ci3034287,mc,2.42,ci,"11km NE of Barstow, California"


In [8]:
# How many records (i.e. earthquakes) are there?
answer1 = quakes.shape[0]

# What is the depth in metres of the earthquake with the largest Magnitude? (If there's more than one, give the deepest.)
max_magnitude_filter = quakes['Magnitude'] == quakes['Magnitude'].max()
answer2 = quakes.loc[max_magnitude_filter, 'Depth/km'].max() * 1000

# What is the great circle distance to the nearest km, as given by the haversine formula, between the epicentres of the two largest earthquakes
def haversine(pt1, pt2):
    r = 6371 #km
    lat1 = radians(pt1[0])
    lat2 = radians(pt2[0])
    lon1 = radians(pt1[1])
    lon2 = radians(pt2[1])
    d = 2 * r * asin(sqrt( sin((lat2-lat1)/2)**2 + cos(lat1) * cos(lat2) * (sin((lon2-lon1)/2)**2)))
    return round(d)

assert haversine((0, 0), (0, 1)) == 111
assert haversine((0, 2.35), (90, 2.35)) == 10008
assert haversine((44.65, -63.58), (53.73, -1.86)) == 4448

largest_quakes = quakes.sort_values(['Magnitude','Depth/km'], ascending=False)[:2].reset_index()
answer3 = haversine(
    (largest_quakes.loc[0, 'Latitude'], largest_quakes.loc[0, 'Longitude']),
    (largest_quakes.loc[1, 'Latitude'], largest_quakes.loc[1, 'Longitude'])
)

# Consider all pairs of events. How many pairs are within 100 km of each other?
coords = quakes[['Latitude', 'Longitude']]
dist_matrix = cdist(coords, coords, lambda u, v: haversine(u, v))

# np.tril returns the lower triangle of an array. 
#  the -1 argument indicates not to include the diagonal 
# ie. start one diaganal below the main diagonal
answer4 = np.tril(dist_matrix < 100, -1).sum() 


print(answer1, answer2, answer3, answer4)

617 33000.0 102 83559


In [84]:
answers = [answer1, answer2, answer3, answer4]

for n, answer in enumerate(answers, start=1):

    params = {'key': my_key,
            'question': n,
            'answer': answer
            }

    r = requests.get(url, params)
    print(f'Answer: {n}')
    print(r.text)

Answer: 1
Correct!
Answer: 2
Correct!
Answer: 3
Correct!
Answer: 4
Correct! The next challenge is https://kata.geosci.ai/challenge/fossil-hunting - good luck!
