# INDUCED SEISMICITY IN CALIFORNIA
** Wastewater injection and earthquake activity in California (1980-2017) **

## Background and Literature Review

#### Wastewater Injection

* In the process of oil extraction and recovery, water is used (e.g. in hydraulic facturing (fracking) for natural gas, fluid is pumped into a well to increase pore pressure and create fractures to aid in the extraction of natural gas) and produced as a byproduct. 

<img src="images/ellsworth_wells.jpg">

* The water rises to the surface and must be disposed of. Typically, this water is taken off-site and injected into the ground into an injection well. 

* Why might this be a concern?
    

#### Potential Effects

* Studies have found that wastewater injection may be inducing earthquakes in the eastern and central U.S.

* (Shirzaei 2016) found that in eastern Texas, wastewater injection has caused an increase in seismic activity. They attribute the 4.8 magnitude ($M_w$) earthquake in 2012 to wastewater injection. 

    * They argue that wastewater injection increases pore pressure in the surrounding geology which can cause faults to move closer to failure.

* (Hough 2015) shows a temporal and spatial relationship between wastewater disposal wells and earthquake activity in Oklahoma in the last century. 

* (Ellsworth 2013) reviews the literature of induced seismicity.
    * $M_w$ 4.0 in Ohio, December 2011 [1]
    * $M_w$ 4.7 in Arkansas, February 2011 [2]
    * $M_w$ 4.4 in Texas, September 2011 [3]
    * $M_w$ 4.8 in Texas, October 2011 [4]

[1]: W.-Y. Kim, Induced seismicity associated with fluid injection into a deep well in Youngstown, Ohio. J. Geophys. Res. 10.1002/jgrb.50247 (2013).

[2]: S. Horton, Disposal of hydrofracking waste fluid by injection into subsurface aquifers triggers earthquake swarm in central Arkansas with potential for damaging earthquake. Seismol. Res. Lett. 83, 250–260 (2012). doi: 10.1785/gssrl.83.2.250

[3]: S. D. Davis, W. D. Pennington, Induced seismic deformation in the Cogdell oil field of west Texas. Bull. Seismol. Soc. Am. 79, 1477–1495 (1989).

[4]: W. D. Pennington, S. D. Davis, The evolution of seismic barriers and asperities caused by the depressuring of fault planes in oil and gas fields of south Texas. Bull. Seismol. Soc. Am. 76, 939–948 (1986).

<img src="images/ellsworth_eq_time.jpg">

#### Statement from OGS

The Oklahoma Geological Survey released a statement in 2015 that acknowledges the increase of seismic activity due to increases in wastewater injection.

* "Based on observed seismicity rates and geographical trends...the rates and trends in seismicity are very unlikely to represent a naturally occurring process." 

* "The seismicity rate is now about 600	times greater	than the background seismicity rate"


#### Conclusion

Given what has been observed in Oklahoma and other states in the central/eastern U.S., what can we say about the relationship between wastewater injection and seismicity? Can we say that wastewater injection causes earthquakes?

#### Conclusion

Probably not.

The mechanism for seismicity induced by wastewater is complicated and many questions remain.

* How does the effect of wastewater injection depend on geology?
* What are the expected time horizons? 
* What is the "threshold" of wastewater needed?
* Does rate of injection matter? (data not available)
* How does existing background rate affect potential for induced seismicity?

Especially due to differences in geology and background rate, the results in Oklahoma are not automatically externally valid.


#### Why California?

(Goebel)

* Higher injection volumes
* But also higher background activity

<img src="images/goebel.jpg">

#### Why California?

Up for debate!

There is no consensus for whether or not there is some casual relationship between wastewater injection and seismic activity in California.

* (Goebel 2015) and (Goebel 2016) show no relationship in general for California and some relationship for an earthquake swarm in the Central Valley.

* (McClure 2017) shows a strong relationship in Oklahoma and a weak relationship in California.


#### Geography - Injection and Earthquake Activity

[insert plot of California with wells, earthquakes, grid]

#### Discussion of McClure (2017)

** Methodology **

* Divides California into a grid of approximately 0.2 x 0.2 degrees longitude/latitude.
* Tests associations between water injections and earthquakes within each block.
* Defines a model for natural and induced seismicity: 

$$ y_{ij} = \text{Poi}(e^{\mathcal{N}(0, \sigma_i)}(\mu_i + \beta_ix_{ij}) + ay_{i,j-1}e^{\mathcal{N}(0,\sigma_{II})}) $$ 

where $y_{ij}$ is the number of earthquakes in block $i$ and year $j$; $x_{ij}$ is the cummulative water injected; $\sigma_i$ is a measure for the variance of seismicity; $\mu_{ij}$ is the rate of natural seismicity; $\beta_i$ is the dependency on water volume; $a$ is the degree to which earthquakes cluster; and $\sigma_{II}$ is the variability of the clustering.


* For each block, the maximum likelihood values for tttttttttttoooooooooooooodooooooo are fitted for two models: one with $\beta_i = 0$ and one where $\beta_i$ is estimated. 

* The likelihood ratio of the two models is then taken. The null distribution of the likelihood ratio is computed non-parametrically by permuting the water injections between blocks and introducing a random time offset. 

* P-values are calculated and aggregated across blocks using Fisher's method (Fisher's combined test). 


#### Findings

(McClure 2017) concluded 

## Methodology

### Test statistics:  Lag-Adjusted Spearman Rank Correlation Test


#### Motivation
Since we know that if wastewater injection induces seismicity, the effect of injection on seismicity will occur at some lag $\geq$ 0 months. If the lag is > 0 (e.g. if wastewater injection causes increased seismicity at least 1 month in the future), then we expect seismicity to depend on past water injection. The purpose of creating adjusting the Spearman Rank Correlation by a particular lag is to quantify correlation if it occurs at lag > 0. 

### Spearman's Rank Correlation
Spearman's rank correlation, $r_s$, of two processes, $X$ and $Y$, is defined as follows: 

$$ r_s = 1 - \frac{6D}{N^3-N} $$ 

where $r_s$ is the rank correlation, $N$ is the total number of observations in the sample and $D$ is defined as follows: 

$$ D = \sum{[\text{rank}(X_i) - \text{rank}(Y_i)]^2} $$

Equivalently, we can write $r_s$ in terms of the moments of the ranked data (this is Pearson's correlation coefficient computed on ranked data):

$$ r_s = \frac{\text{cov}(\text{rank}(X),\text{rank}(Y))}{\sigma_{\text{rank}(X)}\sigma_{\text{rank}(Y)}} $$

A statistical test could be defined where the null hypothesis $H_0$ is $r_s \neq 0$. If at least one of the series is exchangeable then under the null hypothesis, all permutations of the data are equally likely to occur and each $r_s$ generated is equally likely. Therefore a p-value could be generated by computing 1 minus the percentile of the original $r_s$ in relation to the distribution of $r_s$ generated by the permutations.  

#### The Test
Spearman's rank correlation can be proved to be defined as follows: 

$$r_s = 1 - \frac{6D}{N^3-N}$$ 

where $r_s$ is the rank correlation, $N$ is the total number of observations in the sample and $D$ is defined as follows: 

$$ D = \frac{1}{3}N(N+1)(2N+1) - 2\sum{iT_i} $$ 

where $i$ is defined as the index of one of the lists and $T_i$ is the rank of observation $i$ in one of the lists. 

Therefore, a statistical test could be defined where the null hypothesis $H_0$ is $r_s \neq 0$. Since the only non-constant in the equation is $\sum{iT_i}$, it would be sufficient to reject $H_0$ for large values of $\sum{iT_i}$. To generate a p-value for this test, the original $\sum{iT_i}$ can be compared to all permutations of $T_i$ (since under the null hypothesis it would be equally likely to see any rank $T_i$ to be paired with any $i$). 

Adjusting this test for different lags would simply mean shifting the two lists of ranks by some lag $k$ and running the above method with $N-k$ observations. However, since N changes with the lag, it is easiest to simply compute $r_s$, the Spearman rank correlation, as the test statistic. 

#### Lags and P-norm

Initially, in order to account for the effect of lags, we look at the correlations of the data at different lags and choose the maximum correlation among them. However, in doing so, we ignores the information provided by other lags, which may also have some contribution to the overall relation between water injection and earthquakes.

We can generalize this with a p-norm. 

The p-norm of a vector $x = (x_1,x_2,...x_n)$ is

$$ \|x\|_p=(\sum_{i=1}^n |x_i|^p)^{1/p}.$$

As $p$ approaches $\infty$, the p-norm becomes max, $ \|x\|_\infty=\max(|x_1|, |x_2|,...,|x_n|)$.

#### Permutation Test

We define our test statistics as:

$$ T(rk_w, rk_e) = \|(r_i(rk_w, rk_e) | i \in [0,12])\|_p, $$

where $r_i(rk_w, rk_e)$ is the pearson correlation of water injection ranks and earthquake ranks at lag $i$.

$H_0$ implies that $T$ is invariant unde permutations, that is,

$\mathcal G$ is a finite group of transformations, $\# \mathcal G = G$.

For permutation $g \in \mathcal{G}$, $T(rk_w, rk_e) \sim gT(rk_w, rk_e) = T(rk_w, g*rk_e)$.

$G' = \text{# of } g \in \mathcal G \text{ s.t. } T \geq gT$.

We reject $H_0$ at level $\alpha$ if $\frac{G'}{G}$.

#### Permutation and Clustering

* Earthquakes tend to cluster, so when we permute monthly data, such clustering pattern would be destroyed.
* To preserve the clusters in earthquake data, we permuate by blocks.

### Simulation

#### Simulating Earthquake Data

Motivation: McClure's Model

$$ y_{ij} = \text{Poi}(e^{\mathcal{N}(0, \sigma_i)}(\mu_i + \beta_ix_{ij}) + ay_{i,j-1}e^{\mathcal{N}(0,\sigma_{II})}). $$ 

where $y_{ij}$ is the number of earthquakes in block $i$ and year $j$; $x_{ij}$ is the cummulative water injected; $\sigma_i$ is a measure for the variance of seismicity; $\mu_{ij}$ is the rate of natural seismicity; $\beta_i$ is the dependency on water volume; $a$ is the degree to which earthquakes cluster; and $\sigma_{II}$ is the variability of the clustering.

We constructed the simulated earthquake based on Mcclure's model with some adjustments.

$$ y_{ij} = \text{Poi}(e^{\mathcal{N}(0, 1)}[\mu_i + \sum_{k=0}^5 Unif(0,1)\beta_{ik}x_{ij}] + e^{\mathcal{N}(0,1)}ay_{i,j-1}). $$ 

In [2]:
# simulation animation