# Logistic Regression: Predicting Experience During Earthquake

### Predicts the magnitude of a hypothetical earthquake at a given location

- US earthquake data is from http://earthquake.usgs.gov; 77,161 data points for earthquakes from 1990 to 2018.
- US zip codes are from https://www.census.gov/geo/maps-data/data/gazetteer2018.html

In [1]:
import pandas as pd
import csv

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

## Create an intermediate file based on the magnitude
We assign a "felt" value (0 not felt, 6 certainly felt) to each data point.

Index | Magnitude | Feeling | Damage
---: | :---: | :--- | :---
0| (0, 2.5) | imperceptible | impossible
1 | \[2.5, 4) | indistinguishable | very unlikely
2 | \[4, 5) | noticeable | unlikely
3 | \[5, 6) | alarming | possibly
4 | \[6, 8) | disturbing | likely
5 | \[8, 10) | severe | very likely
6 | \[10, 14\] | catastropic | certain

In [2]:
ref = [[2.5, "IMPERCEPTIBLE", "IMPOSSIBLE"],
       [4, "INDISTINGUISHABLE", "VERY UNLIKELY"],
       [5, "NOTICEABLE", "UNLIKELY"],
       [6, "ALARMING", "POSSIBLE"],
       [8, "DISTURBING", "LIKELY"],
       [10, "SEVERE", "VERY LIKELY"],
       [14, "CATASTROPHIC", "CERTAIN"]]

assert ref[0][0] > 0 and ref[-1][0] <= 14

In [3]:
EARTHQUAKES_FILE = "USearthquakes_felt.csv"

with open('USearthquakes1990to2018.csv', 'r') as csvinput:
    with open(EARTHQUAKES_FILE, 'w') as csvoutput:
        writer = csv.writer(csvoutput, lineterminator='\n')
        reader = csv.reader(csvinput)
        all = []
        row = next(reader)
        row.append('felt')
        all.append(row)

        for row in reader:
            if float(row[4]) < 2.5:
                row.append(0)
                all.append(row)
            elif float(row[4]) < 4.:
                row.append(1)
                all.append(row)
            elif float(row[4]) < 5.:
                row.append(2)
                all.append(row)
            elif float(row[4]) < 6.:
                row.append(3)
                all.append(row)
            elif float(row[4]) < 8.:
                row.append(4)
                all.append(row)
            elif float(row[4]) < 10.:
                row.append(5)
                all.append(row)
            elif float(row[4]) >= 10.:
                row.append(6)
                all.append(row)

        writer.writerows(all)

## Diagnostic dataset information

In [4]:
# Load the dataset
dataset = pd.read_csv(EARTHQUAKES_FILE)
print("Number of observations =", len(dataset))
print("Data set headers = {h}".format(h=list(dataset.columns.values)))
print("First few rows...\n", dataset.head())

Number of observations = 77161
Data set headers = ['datetime', 'latitude', 'longitude', 'depth', 'magnitude', 'felt']
First few rows...
                    datetime   latitude   longitude  depth  magnitude  felt
0  2018-12-18T04:21:40.550Z  40.934334 -124.629837  14.82       3.03     1
1  2018-12-17T18:02:39.570Z  36.094000 -117.882000   4.09       2.62     1
2  2018-12-17T11:26:24.210Z  38.057667 -118.875168  10.70       2.69     1
3  2018-12-17T07:42:48.530Z  36.462900  -98.773700   7.92       3.00     1
4  2018-12-17T02:52:01.740Z  35.956833 -116.734500  -0.34       3.62     1


In [5]:
def train_logistic_regression(train_x, train_y):    
    return LogisticRegression().fit(train_x, train_y)

In [6]:
def model_accuracy(trained_model, features, targets):
    return trained_model.score(features, targets)

In [7]:
training_features = ['latitude', 'longitude']
target = 'felt'

# Split into test and train, ignore FutureWarnings
train_x, test_x, train_y, test_y = train_test_split(dataset[training_features], dataset[target], train_size=0.7)
trained_logistic_regression_model = train_logistic_regression(train_x, train_y)
    
print("train_x size =", train_x.shape)
print("train_y size =", train_y.shape)
print("test_x size =", test_x.shape)
print("test_y size =", test_y.shape)



train_x size = (54012, 2)
train_y size = (54012,)
test_x size = (23149, 2)
test_y size = (23149,)


In [8]:
def findMatch(inpt):
    """
    Find the latitude and longitude for the corresponding zip code.
    """
    
    found, lat, lon = 0, 0, 0
    if len(str(inpt)) > 5 or len(str(inpt)) < 3:
        print("Invalid input")
    else:
        for col in lines:
            if inpt == col[0]:
                found, lat, lon = 1, col[1], col[2]
                break
    return found, lat, lon

In [9]:
print("\nTrain Accuracy =", model_accuracy(trained_logistic_regression_model, train_x, train_y))
print("Test Accuracy =", model_accuracy(trained_logistic_regression_model, test_x, test_y))


Train Accuracy = 0.964119084647856
Test Accuracy = 0.9632381528359756


## User input and result
Promt for zip code to get result.

In [10]:
with open('USzipcodes.csv', 'r') as csvfile:
    lines = csv.reader(csvfile)
    
    zipcode = input("Enter 5-digit zip code: ")
    found, lat, lon = findMatch(zipcode)
    if found:
        print("Latitude and longitude at zip code:", lat, lon)
    else:
        print("Zipcode not found")
        
    print("\nGenerating result for the given user input...")
    feltval = int(trained_logistic_regression_model.predict([[float(lat), float(lon)]]))
    print("A hypothetical earthquake, right here, right now, would be between", 0 if feltval < 1 else ref[feltval - 1][0],
          "and", ref[feltval][0], "on the Richter scale; it would feel", ref[feltval][1], "and it is", ref[feltval][2],
          "to cause damage.")

Enter 5-digit zip code: 92119
Latitude and longitude at zip code: 32.817888 -117.031956

Generating result for the given user input...
A hypothetical earthquake, right here, right now, would be between 2.5 and 4 on the Richter scale; it would feel INDISTINGUISHABLE and it is VERY UNLIKELY to cause damage.
