## Kernel: `Fund-d21`

# Day 2 - Tutorial 1

In this tutorial, we will learn how to import and manipulate well log data for reservoir evaluation. The tutorial is subdivided into four sections:

1. Exploring Well log data (LAS file)
2. Importing well tops 
3. Defining facies using well logs
4. Plotting well log data

## Exploring Well Log Data 

Log ASCII Standard (LAS) files are the most common oil & gas industry format used for storing well log data.

In this portion of the tutorial we are going to import a .las file to explore and manipulate its content

### Step 1: Install Required Packages

In [0]:
# !pip install lasio
# !pip install plotly

# If the libraries are already installed in the current environment, the output message will be "Requirement already satisfied"

### Step 2: Import Libraries

In [0]:
# Import required libraries

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import lasio

### Step 3: Import the .las file 

Use the Lasio library to import the .LAS file -> "Diamond-14.las"


In [0]:
# Read the .LAS file

import dataiku, os

folder_path = dataiku.Folder('data').get_path()
file_path_D14 = os.path.join(folder_path, "Diamond-14.las")

D14 = lasio.read(file_path_D14)

### Step 4: Display the Data in the .LAS file

Now that our file has been loaded, we can display its content: Header, curves description, and log data 

In [0]:
# Display the header of the .LAS file



In [0]:
# Display the description of the log curves present in the .LAS file



In [0]:
# Display a summary of the log data in the .LAS file



### Step 5: Create a Dataframe 

In this step we will convert the data loaded using lasio ('D14') to a pandas dataframe. This step will facilitate the data manipulation and further plotting

In [0]:
# Create the dataframe using the "D14" dataset, call it D14_logs



In [0]:
# Print a summary of the newly created dataframe



In [0]:
# Display the first few rows of the dataframe



# The number inside the brackes refers to the amount of rows to be displayed

In [0]:
# Generate descriptive statistics of the log data



In [0]:
# We can also use the .T() function to transpose the data frame (index and columns)




### Step 6: Replace Negative Values by NaN

From the step above we can see that some logs (RHOB, DT, etc) contain negative numbers. Such values are most likely associated with tool errors and therefore should be replaced 


In [0]:
# Replace negative values on the 'RHOB' and 'DT' logs using the .mask () function



In [0]:
# Check the statistics for the well logs that we replaced the negative values for



### Step 7: Compute a New Well Log 

After cleaning our log data from negative values, we can now use it to estimate geological properties. As an example, we will calculate an Acoustic Impedance (AI) log using the density (RHOB) and sonic logs (DT)

AI= Bulk density x Velocity

Note that the sonic log measures transit time, so we will need to compute velocity

Velocity= 1000000 / DT 

In [0]:
# Calculate acoustic impedance (AI): AI= density x velocity

D14_logs['AI'] = D14_logs['RHOB'] * 1000000 / D14_logs["DT"]

In [0]:
# Validate the resulting AI log by looking at its stats



### Step 8: Replace Infinity Values

Note that the min and max values of the AI log are -inf and inf. 

In this case, the infitine (inf) values potencially come when 0 is in the denominator (DT = 0)

We can use the .mask () function to replace inf values by NaN

In [0]:
# Replace the inf and -inf data in the AI log by NaN

D14_logs.AI = D14_logs.AI.mask((D14_logs.AI == np.inf) | (D14_logs.AI == -np.inf), np.NaN)

In [0]:
# Validate the results by looking at the stats of the AI log again



### Step 9: Identify and Handle Outliers

We will create a box plot to detect potential outliers in the data. We will use the AI log as an example

In [0]:
# Evaluate the range of the data from a boxplot and identify potential outliers in the AI

D14_logs[["AI"]].plot(kind="box", title = "AI Log - Raw Data")

plt.show()

In [0]:
# First, let's print the number of rows in the AI column before dropping outliers



In [0]:
# Remove the rows for AI>25000 (Feel free to play with this number)

D14_logs = D14_logs[D14_logs.AI < 25000]

In [0]:
# Print the number of rows after dropping the outliers



In [0]:
# Display the content of the edited AI column



In [0]:
# Create a boxplot of the AI data after dropping the outliers



In [0]:
# Create a histogram of the AI log to validate the data range after dropping outliers 

D14_logs.AI.plot(kind="hist", title = "Acoustic Impedance Log")
plt.show()

### Step 10: Create a Sub-set of Data

To facilitate the visualization and manipulation of the log data, we can create a sub-data frame to keep only the well logs that are required for the rest of the tutorial

Among the 38 columns in the original .LAS files (+ AI log), we will make a sub-set containing the following curves: CALI, GR, DT, RHOB, NPHI, PHIT, RT, VCL, and AI


In [0]:
# Create a sub-set selecting the following columns: 'CALI', 'GR','DT','RHOB','NPHI','PHIT','RT','VCL','AI'



# Double brackets return a dataframe

In [0]:
# Print a summary of the resulting sub-set

D14_final.head(10)

## Import Well Tops 

Import well tops to use them as depth filters to limit the calculations to the zone of interest 

- Import Well Tops 
- Define area of Interest between two well tops 

### Step 1: Load Well Tops Data From a .csv File

In [0]:
# Read well tops from a .csv file. Note that the .csv file was previously imported into Dataiku

top_file_path = os.path.join(folder_path, 'Tops.csv')
mydataset = dataiku.Dataset("Tops")
tops = mydataset.get_dataframe()

# Display the content of the loaded well tops file

tops

### Step 2: Run Basic Operations on the Well Tops

In [0]:
# Compute the average depth for each formation top for the three wells (Diamond-14, DIamond-10 and Diamond-03)

tops.groupby("Surface").mean()

### Step 3: Re-Arrange the well tops dataframe  

In [0]:
# Re-arrange the dataframe using the .pivot_table () function for: columns= Surface and index= Well name

tops.pivot_table(columns="Surface", index="Well name")

### Step 4: Sort Well Tops

In [0]:
# Sort the tops in each well based on depth ('MD')

tops.groupby("Well name").apply(lambda df_: df_.sort_values(by="MD"))

### Step 5: Create a Sub-Set of Well Logs Over a Specific Zone

Now, we will create a sub-set of our log dataframe to keep only the values within the interval of "HOUSTON" and "HOUSTON_BASE" well tops


In [0]:
# Index the 'tops' dataframe based on well ("Well name") and well top ("Surface")

tops.set_index(["Well name", "Surface"], inplace=True)

# Display the resulting dataframe



#As shown in the output table, the data now has 2 index columns, Well name and Surface

In [0]:
# Define two variables to store the top= HOUSTON and base= HOUSTON_BASE of our zone of interest

top = tops.loc[("Diamond-14", "HOUSTON"), "MD"]



In [0]:
# Use the .loc() fuction to retrieve the data values in the zone of interest (within 'top' and 'base')

D14_ZOI=D14_final.loc[(D14_final.index > top) & (D14_final.index < base)]

# Display the resulting dataframe



## Facies Classification Using Well Logs

In this portion of the tutorial we will create a facies classification based on a Gamma Ray cut off

In [0]:
# Create a function to define the facies classes -> GR> 50 Shale, GR<50 Sand

def GR_Facies(x):
    if x < 50:
        return "Sand"
    else:
        return "Shale"

In [0]:
# Create a column named 'Facies Type' and apply the 'GR_Facies' function to it

D14_ZOI["Facies Type"] = D14_ZOI['GR'].apply(GR_Facies)


# Display a summary of the dataset. It should include the 'Facies Type' column



## Plotting Well Log Data

The analysis of well log data relies on a variety of plots (line plots with data vs depth), histograms, crossplots, etc.

In this exercise we will explore the usage of various python libraries including matplotlib, seaborn and plotly, to create the most frequently used plots for well log evaluation:

- Line plot
- 2D and 3D scatter plot
- Box plots
- Histogram 
- Correlation matrix

In [0]:
# Using Matplotlib create a vertical plot of the GR log within the ZOI

plt.plot(D14_ZOI['GR'],D14_ZOI.index)

plt.show()


In [0]:
# We can improve the plot above by adjusting the size, adding a title, axis labels and a grid

plt.figure(figsize=(2, 8))

plt.title("GR D14 Well")

plt.ylabel("Depth")

plt.xlabel("GR")

plt.grid(True)

plt.plot(D14_ZOI["GR"],D14_ZOI.index, color='brown', marker='.')
plt.show()

In [0]:
# Use Matplotlib to create a scatter plot of Gamma Ray Vs Bulk Density, with markers colored by Transit Time

D14_final.plot(x="GR", y="RHOB", kind="scatter",
               figsize=(8,8),
               c="DT",cmap="plasma", 
               vmin=90,vmax=150,
               xlim=(10,110),ylim=(1.8,2.5), sharex=False)

plt.show()

# sharex=False is added to force the display of the xlabel, you can try removing it from the .plot() fuction

In [0]:
# Use plotly to create a 3D plot or Neutron porosity, Bulk density and Gamma Ray with markers colored by DT

D_Scatt = px.scatter_3d(data_frame=D14_ZOI, 
                        x='NPHI', 
                        y='RHOB',
                        z='GR', 
                        color='DT')

D_Scatt.show()

# Make sure to explore the icons on the top right of the plot!

In [0]:
# Use plotly to create a histogram of the Bulk density log. Customize the bin number ('nbins')! 

hist1 = px.histogram(data_frame=D14_ZOI, 
                     x=['RHOB'], nbins=20,
                     width= 600, height=600,
                     title='Bulk Density distribution within the Houston ZOI')

hist1.show()

In [0]:
# We can also can create a histogram of the bulk density grouped/colored by facies

fig = px.histogram(data_frame=D14_ZOI,
                   x='RHOB', nbins=20,
                   color="Facies Type",
                   width= 500, height=500,
                   title='Bulk Density by Facies')

fig.show()

In [0]:
# Use plotly to create a boxplot of the distribution of the gamma ray by facies

box_plot = px.box(data_frame=D14_ZOI,
                  x='Facies Type', y='GR',
                  color='Facies Type',
                  width= 500, height=500,
                  title='Gamma Ray by Facies')

box_plot.show()

# Make sure to hover over the boxplot to read the statistics!

In [0]:
# Use plotly to create a strip plot of the Neutron porosity, grouped by facies

fig = px.strip(data_frame=D14_ZOI,
               y='NPHI', x= 'Facies Type',
               color="Facies Type",
               width= 500, height=500,
               title='Distribution of Neutron Porosity log by facies')
fig.show()

In [0]:
# Use Plotly to create a scatterplot of Bulk density versus Neutron Porosity and compute a trend line for the data

fig = px.scatter(data_frame= D14_ZOI,
                 x='NPHI',y="RHOB",
                 color="Facies Type",
                 trendline='ols',
                 width = 500, height=500,
                 title='Bulk Density versus Neutron Porosity by facies')
fig.show()

# OLS stands for Ordinary Least Squares (OLS) regression

### Calculate and Visualize a Correlation Matrix 

In [0]:
# Use the function .corr() to compute the pearson correlation coefficient for all columns within the D14.ZOI dataframe

Matrix_Full= D14_ZOI.corr(method ='pearson').round(2)

Matrix_Full

In [0]:
# Now let's select a subset of well logs to facilitate the visualization of the correlation matrix

Matrix_Small = D14_ZOI[['DT', 'GR', 'RHOB', 'NPHI']].corr().round(2)

Matrix_Small

In [0]:
# Use seaborn to create a heat map of the correlation matrix ('Matrix_Small')

sns.heatmap(Matrix_Small, 
            annot=True, 
            vmax=1, 
            vmin=-1, 
            center=0,
            cmap='vlag')
plt.show()

In [0]:
# We can use Plotly to create a scatterplot matrix to display the graphical correlation between some logs ['RHOB','NPHI','DT','GR']

fig = px.scatter_matrix(data_frame= D14_ZOI, 
                        dimensions= ['RHOB','NPHI','DT','GR'], 
                        title= 'Correlation of Bulk Density, Neutron Porosity, Sonic and Gamma Ray logs by facies',
                        color= "Facies Type",
                        width= 700,height= 650)
fig.show()

In [0]:
# We can also use plotly to create a parallel Coordinates plot to visualize the correlation between some logs ['RHOB','NPHI','DT','GR']

fig = px.parallel_coordinates(data_frame=D14_ZOI, 
                              dimensions=['RHOB',"NPHI","DT","GR"],
                              color="GR")
fig.show()

# Each line represents a row in the data frame!