## Challenge: LANL Earthquake Prediction

It is a kaggle challenge found: [here](https://www.kaggle.com/c/LANL-Earthquake-Prediction).

###  Introduction
Forecasting earthquakes is one of the most important problems in Earth science because of their devastating consequences. Current scientific studies related to earthquake forecasting focus on three key points: when the event will occur, where it will occur, and how large it will be.

In this competition, you will address when the earthquake will take place. Specifically, you’ll predict the time remaining before laboratory earthquakes occur from real-time seismic data.

If this challenge is solved and the physics are ultimately shown to scale from the laboratory to the field, researchers will have the potential to improve earthquake hazard assessments that could save lives and billions of dollars in infrastructure.

This challenge is hosted by Los Alamos National Laboratory which enhances national security by ensuring the safety of the U.S. nuclear stockpile, developing technologies to reduce threats from weapons of mass destruction, and solving problems related to energy, environment, infrastructure, health, and global security concerns.


### Data
First of all, we need to understand the data. For training dataset, there are two columns, named: 'acoustic_data' and 'time_to_failure'.
* acoustic_data: some data measured with certain device, which serves as **X** in the model building later on.
* time_to_failure: in micro-seconds, the time gap till the event of the lab earth quake, which serves as **Y**.

For testing, it is a list of data segments of 'acoustic_data', from which to predict the next 'time_to_failure'. It is not clear some of these segments are coming from same quake or each of them coming from separated quakes. 

In [1]:
# loading all the tools!
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import sys # system specific
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, LSTM
from keras.optimizers import SGD
from keras.models import model_from_json
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
import math
import datetime

Using TensorFlow backend.


#### Training Data Exlotory
Take a look at the training data: train.csv, which has a size over 9GB. We need to figure out how many lab earth quakes are there in this dataset.

Loading the code below (running would take hours).

```python
  train = pd.read_csv(ftrain, dtype={'acoustic_data': np.int16, 'time_to_failure': np.float64})
  train.rename({"acoustic_data": "X", "time_to_failure": "Y"}, axis="columns", inplace=True)
  nall = len(train.Y.values)
  print("Start calculation, total entries: ", nall)
  jump=int(1e6)
  jump2=int(1e4)
  jump3=int(1e2)
  idx = 1
  while idx < nall:
    val = train.Y.values[idx]
    if idx + jump < nall and val - train.Y.values[idx + jump ] < 0 :
      idx = idx + jump - 1
    elif idx + jump2 < nall and val - train.Y.values[idx + jump2 ] < 0 :
      idx = idx + jump2 - 1
    elif idx + jump3 < nall and val - train.Y.values[idx + jump3 ] < 0 :
      idx = idx + jump3 - 1
    elif val - train.Y.values[idx -1 ] > 0.01:
      print("index: ", idx-1, "value: ", train.Y.values[idx-1], " next value: ", val)
    idx = idx + 1
```