# <b> <font color='#154d70'> Term Project </font> </b>: <font color='orange'>Deep Neural Networks for Automated XRMI Image log Evaluataion</font>

### <font color='#ce9169'>Introduction</font>
In petroleum engineering, understanding the dip and azimuth direction of layers in subsurface formations is crucial for optimizing many hydrocarbon recovery and production processes. One of the techniques used to acquire this information is analysis of subsurface image logs. However, manually interpreting these images to determine dip and azimuth can be time-consuming and prone to human error.
In this final project, you will develop a deep learning approach to automate the prediction of dip and azimuth of the layering from XRMI image data. By leveraging the power of deep learning algorithms and image processing techniques, you will create a model that can accurately identify and characterize layering orientations in XRMI images, potentially saving significant time and resources in image log analysis.

### <font color='#ce9169'>Project Description</font>

#### <font color='#315d84'> Data Preparation:</font>
> You will receive a set of XRMI data representing image logs by depth.
> Additionally, you will receive an Excel file containing the corresponding dip and azimuth values for some depths, which will serve as the ground truth for training and evaluating your deep learning model.

#### <font color='#315d84'> Data Preprocessing:</font>
> Extract the XRMI images from the files and convert them to a suitable format for the next steps.
> Preprocess the images by applying appropriate techniques such as splitting,  resizing, or normalization to enhance the model's  performance.
> Split the dataset into training, validation, and testing sets.

#### <font color='#315d84'>Deep Learning Model Development:</font>
> Design and implement a deep learning model architecture suitable for image classification or regression tasks, depending on the nature of the dip and azimuth data.
> Explore different convolutional neural network (CNN) architectures, recursive networks, transfer learning techniques, or other deep learning approaches that can effectively capture patterns and features from the XRMI images.
> Train your model using the preprocessed XRMI images and the corresponding dip and azimuth values.

#### <font color='#315d84'>Model Evaluation and Optimization:</font>
> Evaluate the performance of your trained model on the test set using appropriate metrics such as accuracy, precision, recall, or mean squared error, depending on the task (classification or regression).
> Analyze the model's performance and identify areas for improvement, such as adjusting hyperparameters, modifying the model architecture, or incorporating additional data preprocessing techniques.
> Iterate and refine your model until you achieve satisfactory performance.

#### <font color='#315d84'>Deployment and Visualization:</font>
> Implement visualization techniques to display the predicted orientations, enhancing the interpretability of your results. Plot any needed figures to show the performance of your model. 

#### <font color='#315d84'>Documentation and Presentation:</font>
> Document your project thoroughly, including the data preprocessing steps, model architecture, training process, evaluation metrics, and any insights or challenges encountered during the project.
> Prepare a presentation and report to showcase your work.





This project will challenge you to apply your knowledge of deep learning, image processing, and petroleum engineering principles. By completing this project, you will gain valuable experience developing practical solutions for real-world problems and demonstrate your ability to leverage cutting-edge techniques in image log analysis.


In [None]:
!pip install PyMuPDF

In [6]:
import fitz  # PyMuPDF
import os

# Function to convert PDF to images
def pdf_to_images(pdf_path, output_folder, zoom_x=2.0, zoom_y=2.0):
    pdf_document = fitz.open(pdf_path)
    image_paths = []
    for page_num in range(len(pdf_document)):
        page = pdf_document.load_page(page_num)
        # Set zoom factors. 2.0 means 200% zoom
        mat = fitz.Matrix(zoom_x, zoom_y)
        pix = page.get_pixmap(matrix=mat)
        image_path = os.path.join(output_folder, f'page_{page_num + 1}.png')
        pix.save(image_path)
        image_paths.append(image_path)
    return image_paths

# Paths
pdf_path = 'XRMI_raw_HiRes.pdf'  # Replace with your PDF path
output_folder = 'output_images'
os.makedirs(output_folder, exist_ok=True)

# Convert PDF to images
image_paths = pdf_to_images(pdf_path, output_folder, zoom_x=2.0, zoom_y=2.0)

# Verify conversion
print(f"Converted images saved at: {output_folder}")
for image_path in image_paths:
    print(image_path)


Converted images saved at: output_images
output_images\page_1.png


In [28]:
from PIL import Image, ImageFile
import os

# Allow PIL to load truncated images
ImageFile.LOAD_TRUNCATED_IMAGES = True

# Increase the pixel limit
Image.MAX_IMAGE_PIXELS = None

# Load the image
input_image_path = 'raw.png'
output_folder = 'cropped_images'

# Create the output folder if it doesn't exist
os.makedirs(output_folder, exist_ok=True)

try:
    # Open the image
    image = Image.open(input_image_path)
    image.verify()  # Verify the image to ensure it is not corrupted
    image = Image.open(input_image_path)  # Reopen the image after verification

    # Initial coordinates and step size
    left, top = 214, 747.25
    right, bottom = 889, 889
    step = 141.75
    num_pictures = 1192 * 2
    counter = 0
    # Loop to crop and save images
    for i in range(num_pictures):
        # Define the cropping box
        crop_box = (left, top, right, bottom)
        
        # Crop the image
        cropped_image = image.crop(crop_box)

        # Save the cropped image
        output_image_path = os.path.join(output_folder, f'{counter+26930}.jpg')
        counter += 5

        cropped_image.save(output_image_path)
        
        # Update the coordinates for the next crop
        top += step
        bottom += step

    print(f'Successfully cropped and saved {num_pictures} images to {output_folder}')

except OSError as e:
    print(f'Error: {e}')
except ValueError as ve:
    print(f'Value Error: {ve}')


Successfully cropped and saved 2384 images to cropped_images


select data

In [20]:
import pandas as pd

# Load the CSV file
csv_file_path = 'XRMI_analysis.csv'  # Replace with your CSV file path

# Read the CSV file into a DataFrame
df = pd.read_csv(csv_file_path)

# Drop rows that are completely empty
df = df.dropna(how='all')

# Display the cleaned DataFrame
df.head(5)


Unnamed: 0,No.,Depth,type,Dip,Direction
0,1.0,2700.691,20.0,3.74417,111.039
1,2.0,2710.806,20.0,1.84363,42.7792
2,3.0,2716.606,20.0,1.86042,54.0
3,4.0,2717.759,20.0,6.64042,357.896
4,5.0,2717.967,20.0,3.73382,114.779


In [21]:
df = df[df['Dip'] < 90]
df = df[df['Dip'] > 0]
df = df[df['Direction'] < 360]
df = df[df['Direction'] > 0]
df = df.reset_index()


In [22]:
df

Unnamed: 0,index,No.,Depth,type,Dip,Direction
0,0,1.0,2700.691,20.0,3.74417,111.0390
1,1,2.0,2710.806,20.0,1.84363,42.7792
2,2,3.0,2716.606,20.0,1.86042,54.0000
3,3,4.0,2717.759,20.0,6.64042,357.8960
4,4,5.0,2717.967,20.0,3.73382,114.7790
...,...,...,...,...,...,...
859,883,866.0,3868.894,20.0,5.63643,63.3507
860,884,867.0,3869.381,20.0,7.20498,67.0909
861,885,868.0,3873.694,20.0,4.19508,70.8312
862,886,869.0,3882.423,20.0,7.44927,37.1688


In [23]:
depth = df.Depth.tolist()

In [24]:
depth = [int(x * 100) for x in depth]

df.Depth = depth
df

Unnamed: 0,index,No.,Depth,type,Dip,Direction
0,0,1.0,270069,20.0,3.74417,111.0390
1,1,2.0,271080,20.0,1.84363,42.7792
2,2,3.0,271660,20.0,1.86042,54.0000
3,3,4.0,271775,20.0,6.64042,357.8960
4,4,5.0,271796,20.0,3.73382,114.7790
...,...,...,...,...,...,...
859,883,866.0,386889,20.0,5.63643,63.3507
860,884,867.0,386938,20.0,7.20498,67.0909
861,885,868.0,387369,20.0,4.19508,70.8312
862,886,869.0,388242,20.0,7.44927,37.1688


In [26]:
import pandas as pd

# Sample data
data = {
    'depth': [2703, 2704, 2706, 2706],
    'direction': [100, 200, 300, 400],
    'dip': [400, 300, 200, 100]
}

# Create DataFrame
df = pd.DataFrame(data)

# Create a new column to group by every 50 centimeters
df['group'] = (df['depth'] // 5) * 5

# Group by the new column and calculate the mean
grouped = df.groupby('group').agg({
    'depth': 'first',
    'direction': 'mean',
    'dip': 'mean'
}).reset_index(drop=True)

# Adjust the depth to represent the midpoint of each group
grouped['depth'] = grouped['depth'] // 5 * 5

# Display the result
print(grouped)


   depth  direction    dip
0   2700      150.0  350.0
1   2705      350.0  150.0


In [8]:
import os
import shutil

# Source directory containing the images
source_folder = 'rows'  # Replace with the path to your source folder

# Destination folder where the matching images will be copied
destination_folder = 'data'  # Replace with the path to your destination folder

# List of indexes (as integers)
index_list = depth
# Create the destination folder if it doesn't exist
os.makedirs(destination_folder, exist_ok=True)

# List all files in the source folder
all_files = os.listdir(source_folder)

# Iterate over the list of indexes
for index in index_list:
    # Convert the index to a string and add the file extension
    filename = str(index) + '.jpg'
    
    # Check if the file exists in the source folder
    if filename in all_files:
        # Construct the source and destination paths
        source_path = os.path.join(source_folder, filename)
        destination_path = os.path.join(destination_folder, filename)
        
        # Copy the file to the destination folder
        shutil.copyfile(source_path, destination_path)
        print(f'Copied {filename} to {destination_folder}')
    else:
        print(f'File {filename} not found in {source_folder}')


Copied 27006.jpg to data
Copied 27108.jpg to data
Copied 27166.jpg to data
Copied 27177.jpg to data
Copied 27179.jpg to data
Copied 27182.jpg to data
Copied 27183.jpg to data
Copied 27184.jpg to data
Copied 27209.jpg to data
Copied 27211.jpg to data
Copied 27213.jpg to data
Copied 27228.jpg to data
Copied 27231.jpg to data
Copied 27233.jpg to data
Copied 27234.jpg to data
Copied 27235.jpg to data
Copied 27237.jpg to data
Copied 27239.jpg to data
Copied 27240.jpg to data
Copied 27242.jpg to data
Copied 27244.jpg to data
Copied 27245.jpg to data
Copied 27250.jpg to data
Copied 27251.jpg to data
Copied 27260.jpg to data
Copied 27269.jpg to data
Copied 27294.jpg to data
Copied 27307.jpg to data
Copied 27343.jpg to data
Copied 27350.jpg to data
Copied 27397.jpg to data
Copied 27408.jpg to data
Copied 27436.jpg to data
Copied 27439.jpg to data
Copied 27443.jpg to data
Copied 27447.jpg to data
Copied 27450.jpg to data
Copied 27459.jpg to data
Copied 27466.jpg to data
Copied 27473.jpg to data


In [9]:
import pandas as pd
import os
import numpy as np
from sklearn.model_selection import train_test_split
from tensorflow.keras.preprocessing.image import load_img, img_to_array

# Load the dataset

# Define the function to load images
def load_images(image_paths, target_size=(128, 128)):
    images = []
    for img_path in image_paths:
        img = load_img(img_path, target_size=target_size)
        img = img_to_array(img) / 255.0  # Normalize to [0, 1]
        images.append(img)
    return np.array(images)

# Load images from the paths in the dataframe

df2 = df.copy()
df2['Depth'] = depth
depths = df2['Depth'].values

df2['Depth'] = df2['Depth'].astype(str)


image_paths = df2['Depth'].apply(lambda x: os.path.join('data', f'{x}.jpg')).tolist()
images = load_images(image_paths)
images = images / 255.0

# Extract the labels
dips = df['Dip'].values
directions = df['Direction'].values

# Combine dips and directions into a single array
labels = np.column_stack((dips, directions))


In [15]:
time_steps = 5  # Define your time step size
num_samples = len(images) - time_steps + 1

X = np.array([images[i:i+time_steps] for i in range(num_samples)])
y = np.array([labels[i+time_steps-1] for i in range(num_samples)])

# Split the data into training and testing sets
X_train, X_test, y_train, y_test, depths_train, depths_test = train_test_split(images, labels, depths, test_size=0.2, random_state=42)


In [18]:
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Flatten, Dense, Dropout, TimeDistributed, LSTM, BatchNormalization

def build_cnn_lstm_model(input_shape):
    cnn_input = Input(shape=input_shape)
    
    # TimeDistributed wrapper for the CNN part
    x = TimeDistributed(Conv2D(32, (3, 3), activation='relu'))(cnn_input)
    x = TimeDistributed(BatchNormalization())(x)
    x = TimeDistributed(MaxPooling2D((2, 2)))(x)
    x = TimeDistributed(Conv2D(64, (3, 3), activation='relu'))(x)
    x = TimeDistributed(BatchNormalization())(x)
    x = TimeDistributed(MaxPooling2D((2, 2)))(x)
    x = TimeDistributed(Conv2D(128, (3, 3), activation='relu'))(x)
    x = TimeDistributed(BatchNormalization())(x)
    x = TimeDistributed(MaxPooling2D((2, 2)))(x)
    x = TimeDistributed(Flatten())(x)
    x = TimeDistributed(Dense(128, activation='relu'))(x)
    x = Dropout(0.5)(x)
    
    # LSTM part
    x = LSTM(128, return_sequences=False)(x)
    
    dip_output = Dense(1, name='dip_output')(x)
    direction_output = Dense(1, name='direction_output')(x)
    
    model = Model(inputs=cnn_input, outputs=[dip_output, direction_output])
    return model

# Define the input shape
input_shape = (time_steps, 128, 128, 3)
model = build_cnn_lstm_model(input_shape)

model.compile(optimizer='adam', loss={'dip_output': 'mse', 'direction_output': 'mse'}, metrics={'dip_output': 'mae', 'direction_output': 'mae'})

model.summary()


ValueError: Kernel shape must have the same length as input, but received kernel of shape (3, 3, 3, 32) and input of shape (None, 128, 3).

In [17]:
# Train the model
history = model.fit(
    X_train, 
    [y_train[:, 0], y_train[:, 1]], 
    validation_split=0.2, 
    epochs=50, 
    batch_size=32
)


Epoch 1/50


ValueError: Input 0 of layer "functional_5" is incompatible with the layer: expected shape=(None, 5, 128, 128, 3), found shape=(None, 128, 128, 3)

In [26]:
import pandas as pd
import os
import numpy as np
from sklearn.model_selection import train_test_split
from tensorflow.keras.preprocessing.image import load_img, img_to_array
from tensorflow.keras.utils import to_categorical

# Load the dataset (assuming 'df' and 'depth' are already defined)

# Define the function to load images
def load_images(image_paths, target_size=(128, 128)):
    images = []
    for img_path in image_paths:
        img = load_img(img_path, target_size=target_size)
        img = img_to_array(img) / 255.0  # Normalize to [0, 1]
        images.append(img)
    return np.array(images)

# Load images from the paths in the dataframe
df2 = df.copy()
df2['Depth'] = depth
depths = df2['Depth'].values

df2['Depth'] = df2['Depth'].astype(str)
image_paths = df2['Depth'].apply(lambda x: os.path.join('data', f'{x}.jpg')).tolist()
images = load_images(image_paths)

# Extract the labels
dips = df['Dip'].values
directions = df['Direction'].values

# Combine dips and directions into a single array
labels = np.column_stack((dips, directions))

# Define the time steps and create sequences
time_steps = 5
num_samples = len(images) - time_steps + 1

X = np.array([images[i:i + time_steps] for i in range(num_samples)])
y = np.array([labels[i + time_steps - 1] for i in range(num_samples)])

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Split the labels into separate arrays for dips and directions
y_train_dip = y_train[:, 0]
y_train_direction = y_train[:, 1]
y_test_dip = y_test[:, 0]
y_test_direction = y_test[:, 1]

import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Flatten, Dense, Dropout, TimeDistributed, LSTM, BatchNormalization

def build_cnn_lstm_model(input_shape):
    cnn_input = Input(shape=input_shape)

    # TimeDistributed wrapper for the CNN part
    x = TimeDistributed(Conv2D(32, (3, 3), activation='relu'))(cnn_input)
    x = TimeDistributed(BatchNormalization())(x)
    x = TimeDistributed(MaxPooling2D((2, 2)))(x)
    x = TimeDistributed(Conv2D(64, (3, 3), activation='relu'))(x)
    x = TimeDistributed(BatchNormalization())(x)
    x = TimeDistributed(MaxPooling2D((2, 2)))(x)
    x = TimeDistributed(Conv2D(128, (3, 3), activation='relu'))(x)
    x = TimeDistributed(BatchNormalization())(x)
    x = TimeDistributed(MaxPooling2D((2, 2)))(x)
    x = TimeDistributed(Flatten())(x)
    x = TimeDistributed(Dense(128, activation='relu'))(x)
    x = Dropout(0.5)(x)

    # LSTM part
    x = LSTM(128, return_sequences=False)(x)

    dip_output = Dense(1, name='dip_output')(x)
    direction_output = Dense(1, name='direction_output')(x)

    model = Model(inputs=cnn_input, outputs=[dip_output, direction_output])
    return model

# Define the input shape
input_shape = (time_steps, 128, 128, 3)
model = build_cnn_lstm_model(input_shape)

# Compile the model
model.compile(optimizer='adam', loss={'dip_output': 'mse', 'direction_output': 'mse'}, metrics={'dip_output': 'mae', 'direction_output': 'mae'})

model.summary()

# Train the model
history = model.fit(
    X_train, 
    {'dip_output': y_train_dip, 'direction_output': y_train_direction}, 
    validation_split=0.2, 
    epochs=50, 
    batch_size=32
)

# Evaluate the model
loss, dip_loss, direction_loss, dip_mae, direction_mae = model.evaluate(X_test, [y_test_dip, y_test_direction])
print(f'Test Dip MAE: {dip_mae}')
print(f'Test Direction MAE: {direction_mae}')


Epoch 1/50
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m46s[0m 2s/step - dip_output_mae: 3.7206 - direction_output_mae: 77.0481 - loss: 8121.6133 - val_dip_output_mae: 2.5385 - val_direction_output_mae: 72.1385 - val_loss: 6879.7280
Epoch 2/50
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 2s/step - dip_output_mae: 2.1004 - direction_output_mae: 73.0518 - loss: 8041.8828 - val_dip_output_mae: 1.7096 - val_direction_output_mae: 66.6941 - val_loss: 6077.9683
Epoch 3/50
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 2s/step - dip_output_mae: 2.2008 - direction_output_mae: 69.5867 - loss: 7595.0396 - val_dip_output_mae: 2.0155 - val_direction_output_mae: 56.9729 - val_loss: 4893.2397
Epoch 4/50
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 2s/step - dip_output_mae: 2.2761 - direction_output_mae: 66.4614 - loss: 7226.4473 - val_dip_output_mae: 1.8327 - val_direction_output_mae: 54.0063 - val_loss: 4569.4634
Epoch 5/50
[1m1

KeyboardInterrupt: 

In [20]:
# Evaluate the model
results = model.evaluate(X_test, [y_test[:, 0], y_test[:, 1]])

# Print evaluation results
print(f'Total Loss: {results[0]}')
print(f'Dip Loss (MSE): {results[1]}')
print(f'Direction Loss (MSE): {results[2]}')


[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 229ms/step - dip_output_mae: 2.8652 - direction_output_mae: 26.9920 - loss: 2264.4551
Total Loss: 1986.1986083984375
Dip Loss (MSE): 2.4861974716186523
Direction Loss (MSE): 26.280580520629883


In [None]:
y_pred_dip, y_pred_direction = model.predict(X_test)

# Calculate R-squared for each output
r2_dip = r2_score(y_test_dip, y_pred_dip)
r2_direction = r2_score(y_test_direction, y_pred_direction)

# Print R-squared values
print(f'Test Dip R²: {r2_dip}')
print(f'Test Direction R²: {r2_direction}')