## Image Processing: Obtaining resistivity data in HEXCODE from Geophysics data (image)

### 📌 Why This Notebook?
In geophysics, **terrameter readings** provide valuable insights into subsurface resistivity. However, in this case, the raw numerical data is unavailable—I only have **images** of these readings extracted from presentation slides.

### 🔍 The Manual Process
Normally, I would:
1. **Manually extract color values** from the images.
2. **Cross-check each color** against a resistivity scale/legend.
3. **Manually enter resistivity values** into a GIS attribute table.
4. **Generate depth-based resistivity rasters** in GIS using multiple cross-sections.

This process can be **slow, repetitive, and tedious**, especially when working with multiple images.

### 🚀 What This Notebook Does
To **automate** the process, this script:
✔️ Reads the **terrameter image** and overlays a structured **grid**.  
✔️ Extracts the **most common color** in each grid cell.  
✔️ Converts colors to **HEX values** for easy processing.  
✔️ Saves the extracted color data in a structured format (**CSV & Excel heatmap**).  

### ⏭️ Next Steps
- Use **coordinate data (X, Y, Z)** to georeference the extracted values.  
- Replace **HEX colors** with actual **resistivity values** using the legend from terrameter reading + interpretations of what they mean.  
- Integrate the processed data into **GIS** to generate resistivity-based depth rasters from multiple cross-sections.  

By automating this **first step**, I can **significantly speed up** the process of converting image-based terrameter readings into **usable geospatial data**.

In [39]:
import numpy as np
import pandas as pd
import openpyxl
from openpyxl.styles import PatternFill
from PIL import Image
from collections import Counter

In [40]:
# --------------------------------------
# Step 1: Load Image and Extract Grid Data
# --------------------------------------

'''
The grid_overlayed_s2.png is a sample image file that contains manually drawn grids.  
These grids do not affect the color extraction process, except perhaps very slightly due to the grid pixels,  
which is negligible since the code identifies the most common color in each grid cell.  

I manually added the grids beforehand to visualize the structure and assess whether the resolution  
was coarse but sufficient for my needs. Based on the image size in pixels, I predetermined the  
grid dimensions to be 40 × 9, but this may vary depending on the image you use.  

This script primarily uses **PIL** to convert an image into a **mosaic of grid cells**  
based on a user-defined cell size. From each cell, the most common colour is obtained, which is later 
replaced by the corresponding resistivity value of that section's resistivity legend.
'''

image_path = "image.png"
output_csv = "grid_color_data_fixed.csv"

# Load the image
image = Image.open(image_path)
pixels = image.convert('RGB')
width, height = image.size

# Define grid dimensions (40 columns x 9 rows)
num_cols, num_rows = 40, 9
grid_width_px = width / num_cols ## this is X, width
grid_height_px = height / num_rows ## this is Z, elevation

# Prepare to store extracted data
grid_data = []

# Function to extract most common color from a grid cell
def extract_most_common_color(x_start, z_start):
    grid_pixels = [pixels.getpixel((x, z)) 
                   for z in range(int(z_start), int(z_start + grid_height_px))
                   for x in range(int(x_start), int(x_start + grid_width_px))]
    return Counter(grid_pixels).most_common(1)[0][0]

# Loop through each grid cell
for row in range(num_rows):
    for col in range(num_cols):
        x_start = col * grid_width_px
        z_start = row * grid_height_px
        most_common_color = extract_most_common_color(x_start, z_start)
        hex_color = '#{:02x}{:02x}{:02x}'.format(*most_common_color)
        grid_data.append([col, row, most_common_color, hex_color])

# Create DataFrame and save to CSV
df = pd.DataFrame(grid_data, columns=['Grid_X', 'Grid_Z', 'RGB_Color', 'Hex_Color'])  ## The image is a longitudinal xsec, so X is width and Z is depth.
df.to_csv(output_csv, index=False)
print(f"Grid data saved to {output_csv}")

Grid data saved to grid_color_data_fixed.csv


In [41]:
'''
The following two steps are optional. They are only included to visualize the resolution after extraction.  
You can skip these steps entirely if you don’t need a visualization.  

In my case, for the geospatial analysis I needed to perform,  
the CSV file from Step 1 was sufficient to proceed with GIS-related tasks.
'''

# --------------------------------------
# Step 2: Reshape Data for Heatmap
# --------------------------------------
def reshape_csv(input_csv, output_csv):
    df = pd.read_csv(input_csv)
    reshaped_df = df.pivot(index='Grid_Z', columns='Grid_X', values='Hex_Color')
    reshaped_df.to_csv(output_csv, index=True)
    print(f"Reshaped data saved to {output_csv}")
    return reshaped_df

# --------------------------------------
# Step 3: Create Heatmap in Excel
# --------------------------------------
def create_excel_heatmap(reshaped_df, output_excel):
    workbook = openpyxl.Workbook()
    sheet = workbook.active
    
    for row in range(reshaped_df.shape[0]):
        for col in range(reshaped_df.shape[1]):
            hex_color = reshaped_df.iloc[row, col]
            fill_color = PatternFill(start_color=hex_color.lstrip('#'), 
                                     end_color=hex_color.lstrip('#'), 
                                     fill_type='solid')
            cell = sheet.cell(row=row+1, column=col+1)
            cell.fill = fill_color
    
    workbook.save(output_excel)
    print(f"Heatmap saved to {output_excel}")



In [42]:
# --------------------------------------
# Step 4: Run the Full Process
# --------------------------------------
csv_output = "grid_color_data_fixed.csv"
reshaped_csv = "reshaped_grid_data.csv"
excel_output = "heatmap.xlsx"

reshaped_df = reshape_csv(csv_output, reshaped_csv)
create_excel_heatmap(reshaped_df, excel_output)

Reshaped data saved to reshaped_grid_data.csv
Heatmap saved to heatmap.xlsx
