# Python for Drilling Engineers - Module 1
## Introduction
Welcome to the Python for Drilling Engineers course! In this module, we'll cover the basics of Python and data manipulation for drilling-related applications.

**Why Python?**
- Open-source and widely used in data science
- Great for automating repetitive tasks
- Strong ecosystem for data analysis and visualization

**What's in it for a drilling engineer?**
- Automate drilling reports and calculations
- Analyze well logs and real-time drilling data
- Improve decision-making with data-driven insights

## Python Basics: Built-in Data Structures
Before working with datasets, let's cover fundamental Python data structures.

### Summary Table: Python Data Structures

| Data Structure | Ordered? | Mutable? | Duplicates Allowed? | Best Use Case |
|---------------|---------|----------|------------------|--------------|
| **List** (`list`) | ✅ Yes | ✅ Yes | ✅ Yes | General-purpose, ordered data storage |
| **Dictionary** (`dict`) | ❌ No (Python 3.7+ maintains insertion order) | ✅ Yes | ❌ No (keys must be unique) | Key-value lookups, structured data |
| **Tuple** (`tuple`) | ✅ Yes | ❌ No | ✅ Yes | Immutable, fixed collections |
| **Set** (`set`) | ❌ No | ✅ Yes (elements can be added/removed) | ❌ No | Unique element storage, set operations |



### Lists
**Definition:**
A list is an ordered, mutable (modifiable) collection that allows duplicate elements.

**Key Features:**
- Ordered: elements maintain the order in which they were added
- Mutable: Elements can be changed, added, or removed.
- Allows Duplicates: Multiple elements with the same value are allowed.

**When to Use?**
- When you need an ordered collection of items.
- When frequent updates (insertion/deletion/modification) are needed.
- When you want to store heterogeneous data (e.g., ["Drill Bit", 10, 45.7]).

In [None]:
# Lists
drilling_tools = ['Bit', 'Mud Motor', 'MWD', 'Rotary Table']
print(drilling_tools[0])  # Access first item
print(len(drilling_tools))  # Number of elements

### Dictionaries
**Definition:** A dictionary is an unordered collection of key-value pairs, where keys are unique and immutable.

**Key Features:**
- Key-value pairs: Allows efficient lookups.
- Keys must be unique: No duplicate keys are allowed.
- Mutable: You can update values or add new key-value pairs.

**When to Use?**
- When you need fast lookups based on unique keys.
- When you need to store related attributes (e.g., drilling parameters per well).
- When you need flexible and structured data storage.

In [None]:
# Dictionaries
drilling_data = {
    'Depth': 5000,
    'ROP': 50,
    'Mud Weight': 10.5
}
print(drilling_data['Depth'])  # Accessing dictionary value

### Tuples
**Definition:** A tuple is an ordered, immutable collection that allows duplicate elements.

**Key Features:**
- Ordered: Elements maintain their order.
- Immutable: Cannot be changed after creation.
- Allows Duplicates: Multiple identical elements are allowed.

**When to Use?**
- When you need a fixed collection that should not change.
- When performance is critical (tuples are faster than lists).
- When using as dictionary keys (since they are immutable).

In [None]:
# Tuples
drilling_parameters = (5000, 50, 10.5)  # Immutable list
print(drilling_parameters[0])  # Access first item

### Sets
**Definition:** A set is an unordered, mutable collection that only stores unique elements.

**Key Features:**
- Unordered: No guaranteed element order.
- Mutable (but only for adding/removing elements).
- No Duplicates: Automatically removes duplicates.

**When to Use?**
- When you need to store unique values only (e.g., unique well names).
- When you need fast membership testing (in operator is fast).
- When performing set operations (union, intersection, difference).

In [None]:
# Sets
drilling_tools = {'Bit', 'Mud Motor', 'MWD', 'Rotary Table'}
print(drilling_tools)  # Unique elements
# Step 1: Adding an element to a set
drilling_tools.add('Casing')
print('Step 1 Result:')
print(drilling_tools)
# Step 2: Removing an element from a set
drilling_tools.remove('Bit')
print('Step 2 Result:')
print(drilling_tools)
# Step 3: Check if an element exists in a set
print('Step 3 Result:')
print('Bit' in drilling_tools)  # Returns False
# Iterating through a set
for tool in drilling_tools:
    print(tool)

# List Comprehensions
# Create a list of drilling tools with 'Drill' prefix
drilling_tools = ['Bit', 'Mud Motor', 'MWD', 'Rotary Table']
drilling_tools_with_prefix = [f'Drill {tool}' for tool in drilling_tools]
print(drilling_tools_with_prefix)  # ['Drill Bit', 'Drill Mud Motor', 'Drill MWD', 'Drill Rotary Table']

### Dictionary Comprehension

In [None]:
# Dictionary Comprehensions
# Create a dictionary with drilling tools and their depths
drilling_tools = ['Bit', 'Mud Motor', 'MWD', 'stabilizer']
tool_od_list = [12.25, 8.5, 8.5, 11.75]  # Outer diameter list
# tool_length_list = [1.75, 26.3, 28.5, 7.45]  # Length list
# Create a dictionary with drilling tools and their ODs
drilling_tool_dict = {tool: od for tool, od in zip(drilling_tools, tool_od_list)}
print(drilling_tool_dict)  # {'Bit': 12.25, 'Mud Motor': 8.5, 'MWD': 8.5, 'stabilizer': 11.75}

# Get the OD of the MWD from the dictionary:
mwd_od = drilling_tool_dict.get('MWD')
print(mwd_od)  # 8.5


In [None]:
# F-strings & for loops:
tool_type = 'Bit'
print(f'The {tool_type} has an outer diameter of {drilling_tool_dict[tool_type]} inches.')

# Iterate through the dictionary and print each tool's name and outer diameter
for name, value in drilling_tool_dict.items():
    print(f'The {name} has an outer diameter of {value} inches.')

## Working with DataFrames
We'll use Pandas to create and manipulate dataframes.

In [None]:
import pandas as pd

# Creating a simple DataFrame
data = {'Depth': [1050, 1100, 1150, 1200], 'ROP': [323, 350, 355, 385], 'WOB': [42, 43, 48, 50], 'RPM': [120, 120, 120, 120], 
        'DIFF': [458, 473, 491, 526]}
df = pd.DataFrame(data)
print(df)

## Uploading Data Files
We'll demonstrate how to upload CSV, Excel, and LAS files.

In [None]:
# Get my current path:
import os

# Get the current working directory
# This is the directory where the script is running
current_path = os.getcwd()

# replace \ with \\ in current_path
current_path = current_path.replace('\\', '\\\\')
print(current_path)  # Print the current path

# upload file from current_path
file_path = current_path + '\\\\16A_78-32_time_data_10s_intervals_standard.csv'
print(file_path)

forge_16A_df = pd.read_csv(file_path)


## Rapid Dataset Reviews
Let's start by taking a look at the format of the data pulled in from the CSV.

In [None]:
print(forge_16A_df.shape)  # Display the shape of the DataFrame
print(f'Row Count: {forge_16A_df.shape[0]}; Column Count: {forge_16A_df.shape[1]}')  # Display row and column count
print(forge_16A_df.columns)  # Display the column names
print(forge_16A_df.head(15))  # Display the first 15 rows


Now Let's take a look at the data types and non-null counts for each column.

In [None]:
print(forge_16A_df.info(max_cols=None))  # Display DataFrame info

Several columns are null. Let's remove them to focus our efforts.

In [None]:
# Remove columns with lte 1 non-null value
forge_16A_df = forge_16A_df.dropna(axis=1, thresh=2)
print(forge_16A_df.info(max_cols=None))

## Run Pandas Profiling Report to Explore the Data Further
First, let's define the data type in each column.

In [None]:
# Set the columns with 'Date' in the header to datetime
for col in forge_16A_df.columns:
    if 'Date' in col:
        forge_16A_df[col] = pd.to_datetime(forge_16A_df[col], errors='coerce')

# Set all other columns to float
for col in forge_16A_df.columns:
    if 'Date' not in col:
        forge_16A_df[col] = pd.to_numeric(forge_16A_df[col], errors='coerce')

Now, let's generate a profile report using the ydata-profiling library (formerly pandas profiling).

In [None]:
from ydata_profiling import ProfileReport
profile = ProfileReport(forge_16A_df, title="Forge 16A Data Analysis", explorative=True)
profile.to_notebook_iframe()
# Save the profile report to an HTML file
profile.to_file(output_file="forge_16A_report.html")

We're ready to start our analysis.

Let's wrap our heads around the dataset by visualizing a common DVD curve using 

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

print(forge_16A_df.columns)  # Check the columns in the DataFrame
print(f'Row count: {forge_16A_df.shape[0]}')  # Check the number of rows

# Rename column headers
forge_16A_df.rename(columns={'Date': 'rig_time', 'Bit Depth': 'bit_depth', 'Bit Diameter': 'bit_size', 'Top Drive Revolutions per Minute': 'surf_rpm', 'Bit Revolutions per Minute': 'bit_rpm',
                             'Mechanical Specific Energy - Surface': 'mse_surface', 'Mechanical Specific Energy - Downhole': 'mse_downhole', 'Weight on Bit': 'wob', 'Differential Pressure': 'diff_press',
                             'Block Position': 'block_height', 'Differential Pressure': 'diff_press', 'Rate of Penetration (Depth/Hour)': 'rop', 'Depth Hole Total Vertical Depth': 'tvd',
                             'Inclination': 'inc', 'Azimuth': 'azi'}, inplace=True)
df = forge_16A_df[['rig_time', 'tvd', 'bit_depth', 'rop', 'wob', 'diff_press', 'surf_rpm', 'bit_rpm', 'mse_surface', 'mse_downhole']].copy()
print(df.head(10))

# drop first row
df.drop(index=0, inplace=True)  # Drop the first row
# reduce df to take every 12th row
df = df.iloc[::12, :]  # Take every 12th row
df.sort_values(by='rig_time', inplace=True)  # Sort by rig_time
# drop rows where rig_time or bit_depth is null
df.dropna(subset=['rig_time', 'bit_depth'], inplace=True)
# set rig_time as datetime
df['rig_time'] = pd.to_datetime(df['rig_time'], errors='coerce')  # Convert to datetime
df['rig_time'] = df['rig_time'].dt.strftime('%Y-%m-%d %H:%M:%S')  # Format datetime

# set bit_depth as float
df['bit_depth'] = df['bit_depth'].astype(float)  # Convert to float

print(f'Reduced row count: {df.shape[0]}')  # Check the number of rows after reduction

# plot line graph x axis = rig_time, y axis = bit_depth, then invert the y-axis
plt.figure(figsize=(10, 6))
plt.plot(df['rig_time'], df['bit_depth'], label='Bit Depth', color='blue')
# invert y-axis
plt.gca().invert_yaxis()  # Invert the y-axis
plt.xlabel('Rig Time')
plt.ylabel('Bit Depth')
plt.title('DvD Curve')

In [None]:
# Uploading an Excel file
df_excel = pd.read_excel('sample_data.xlsx')  # Replace with your file
df_excel.head()

**Handling LAS Files:**
To work with LAS files, we'll use the `lasio` library.

In [None]:
import lasio

# Reading a LAS file
las = lasio.read('sample.las')  # Replace with your LAS file
print(las.keys())

## Data Transformation & KPI Calculation
Now, let's transform data and calculate key performance indicators (KPIs).

In [None]:
# Creating a calculated column (MSE Calculation Example)
df['MSE'] = df['Depth'] * df['ROP'] * 1.2  # Simplified calculation
df.head()

In [None]:
# Grouping Data
df_grouped = df.groupby('Depth').mean()
df_grouped

## Merging and Concatenating DataFrames
We often need to merge multiple datasets.

In [None]:
# Merging two DataFrames
df2 = pd.DataFrame({'Depth': [1000, 2000, 3000], 'Mud Weight': [10.5, 10.8, 11]})
df_merged = pd.merge(df, df2, on='Depth')
df_merged.head()

## Data Visualization with Matplotlib & Seaborn

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Bar Chart Example
plt.figure(figsize=(8,4))
sns.barplot(x='Depth', y='ROP', data=df)
plt.title('ROP vs Depth')
plt.show()

In [None]:
# Scatterplot Example
plt.figure(figsize=(8,4))
sns.scatterplot(x='Depth', y='MSE', data=df)
plt.title('MSE vs Depth')
plt.show()

## Final Exercise
Try the following:
1. Create a new DataFrame with Well Name, Depth, and ROP.
2. Upload a CSV file and explore the data.
3. Merge two DataFrames with a common column.
4. Create a scatterplot of Depth vs ROP.

**Congratulations on completing Module 1!** 🎉