## Predicting fractures by applying neural networks to conventional well-logging data  
Joshua Poirier  
Geoscientist  
NEOS  
April 2017  

### Abstract  

This case study comes from Guangren Shi's chapter on Artificial Neural Networks from the book "Data Mining and Knowledge Discovery for Geoscientists." You can
purchase the book from [Amazon](https://www.amazon.com/Data-Mining-Knowledge-Discovery-Geoscientists/dp/0124104371/ref=sr_1_1?ie=UTF8&qid=1490908644&sr=8-1&keywords=data+mining+and+knowledge+discovery+for+geoscientists+%2B+guangren+shi) or directly from the publisher, [Elsevier](https://www.elsevier.com/books/data-mining-and-knowledge-discovery-for-geoscientists/shi/978-0-12-410437-2). The objective is to predict fractures using conventional well-logging data. This data has practical value when the data of the imaging log and core samples are limited.  

Shi describes the scenario as follows:  

> Located southeast of the Biyang Sag in Nanxiang Basin in central China, the Anpeng Oil-field covers an area of about 17.5 square kilometers, close to Tanghe-zaoyuan in the northwest-west, striking a large boundary fault in the south, and close to a deep sag in the east. As an inherited nose structure plunging from northwest to southeast, this oilfield is a simple structure without faults, where commercial oil and gas flows have been discovered (Ming et al., 2005; Wang et al., 2006). One of its favorable pool-forming conditions is that the fractures are found to be well developed at formations as deep as 2800 m or more. These fractures provide favorable oil-gas migration pathways and enlarged the accumulation space.

Computationally, instead of writing the neural network code from scratch I'll be using TFLearn, a high level library built on top of TensorFlow to build neural networks. TensorFlow was developed by Google, and is open-source (free!).  

### Introduction  

The data was transcribed from Shi's book and includes data from 33 samples in Wells An1 and An2, of which he used 29 as learning samples; holding out 4 as a test set. The data features available are summarized below. Units are not given as each log has been normalized over the interval [0, 1].  

| Variable name | Description                                                  |
| ------------- | ------------------------------------------------------------ |
| Sample        | Sample number                                                |
| Well          | Well number                                                  |
| Depth         | Measured depth in meters                                     |
| DT            | Acoustic time                                                |
| RHO           | Compensated neutron density                                  |
| PHIN          | Compensated neutron porosity                                 |
| R_XO          | Microspherically focused resistivity                         |
| R_LLD         | Deep laterolog resistivity                                   |
| R_LLS         | Shallow laterolog resistivity                                |
| R_DS          | Absolute difference between R_LLD and R_LLS                  |
| IL            | Fracture identification determined by imaging log (1=fracture, 2=nonfracture) |

I'll get started by loading in the Python libraries I'll be using!

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf
import tflearn

We need some data before we can get started with neural networks, let's load it!

In [2]:
fname = 'data/fracture_data.csv'
data = pd.read_csv(fname)
data.head()

Unnamed: 0,Sample,Well,Depth,DT,RHO,PHIN,R_XO,R_LLD,R_LLS,R_DS,IL
0,1,An1,3065.13,0.5557,0.2516,0.8795,0.3548,0.6857,0.6688,0.0169,1
1,2,An1,3089.68,0.9908,0.011,0.8999,0.6792,0.5421,0.4071,0.135,1
2,3,An1,3098.21,0.4444,0.1961,0.5211,0.716,0.7304,0.6879,0.0425,1
3,4,An1,3102.33,0.4028,0.3506,0.5875,0.6218,0.6127,0.584,0.0287,1
4,5,An1,3173.25,0.3995,0.3853,0.0845,0.5074,0.892,0.841,0.051,1


I'll take Shi's data splitting a step further by splitting the learning samples into training and validation subsets. This will help avoid overfitting. 