<img src="https://raw.githubusercontent.com/johnryan417/python-for-drilling-engineers/main/main/assets/animo_signature_hallow.png" alt="Animo Logo" width="400">

# Welcome to Python for Drilling Engineers!
## Today's Agenda
- Intro's & Course Overview
- Jupyter Notebooks & Python
- Understand Python's Data Structures
- Introduction to DataFrames
- First look at FORGE 16A Dataset

## Introduction
Welcome to the Python for Drilling Engineers course! In this module, we'll cover the basics of Python and data manipulation for drilling-related applications.

**Why Python?**
- Open-source and widely used in data science
- Easy to learn - clean and readable syntax
- Great for automating repetitive tasks
- Strong ecosystem for data analysis, ML/AI, and visualization

**Why this course matters?**
- Reclaim your time --> Automate data QC, KPI's, reports, and custom calcs
- Bring your ideas to life!
- Build a personal brand as a forward thinking operational leader.
- Make Data-Driven Decisions

# Course Overview

- Session 1: Jupyter Notebooks, Python Basics & Data Structures, Intro to Pandas
- Session 2: Pandas Deep-Dive
- Session 3: Data QC Methods
- **Mid-Course Break**
- Session 4: Unsupervised Learning
- Session 5: Supervised Learning
- Session 6: Advanced ML Models

**Key to Success** - Practice! Practice! Practice!

**What's one current challenge you're facing that you want to be able to tackle by the end of this course?**

-----------------------------------------------------------------------------------------------------------------------------------------------

**Identify Unique BHA Runs on DVD Curves**

<img src="https://raw.githubusercontent.com/johnryan417/python-for-drilling-engineers/main/main/assets/DVD_run_numbers.png" alt="DVD Runs" width="600">

**Rapid Review of Drilling Parameters**

<img src="https://raw.githubusercontent.com/johnryan417/python-for-drilling-engineers/main/main/assets/param_plots_forge_16_a.png" alt="Param Plots" width="600">


<img src="https://raw.githubusercontent.com/johnryan417/python-for-drilling-engineers/main/main/assets/wob_doc_crossplot.png" alt="Cross Plots" width="600">


**Calculate KPIs**

<img src="https://raw.githubusercontent.com/johnryan417/python-for-drilling-engineers/main/main/assets/param_kpis.png" alt="KPIs" width="600">


**Optimize Params Automatically**

<img src="https://raw.githubusercontent.com/johnryan417/python-for-drilling-engineers/main/main/assets/rop_wob_heat.png" alt="heat" width="600">


**Generate Parameter Road Maps**

<img src="https://raw.githubusercontent.com/johnryan417/python-for-drilling-engineers/main/main/assets/drl_roadmap.jpg" alt="DRM" width="600">


**Unsupervised Learning - Common Trend Identification**

<img src="https://raw.githubusercontent.com/johnryan417/python-for-drilling-engineers/main/main/assets/k_means_clusters.png" alt="KMeans" width="800">


**Supervised Learning - Rig States**

<img src="https://raw.githubusercontent.com/johnryan417/python-for-drilling-engineers/main/main/assets/slide_rot_supervised.png" alt="Rig States" width="600">

----------------------------------------------------------------------------------------------------------------------------------------------------------------

# Python for Drilling Engineers - Module 1

## 1 - Comfort with Jupyter Notebooks

**How to access Jupyter Notebooks in Google Colab environment:**
1. Navigate to https://github.com/johnryan417/python-for-drilling-engineers
2. Find the Jupyter notebook you want to load and open the file.
3. In the URL, Type 'tocolab' after 'github' and before '.com'
   1. Ex. https://githubtocolab.com/johnryan417/python-for-drilling-engineers/blob/main/module_1/module_1_cohort_2.ipynb
4. In Colab you can also create new Jupyter notebooks and save directly in Google Drive.

## 2 - Some Python Pre-Req's
Let's get some basics out of the way first.

### 2.1 - Python is an "Interpreted Language"
- This means Python executes code in the order its given.
- In a regular `.py`  file, it runs **line-by-line** from top to bottom.
- In a **Jupyter Notebook**, each code block can be run independently - but it still depends on the order in which you run them.

In [None]:
# Code Block 1
mud_weight = 10.5

In [None]:
# Code Block 2
print(mud_weight)

### 2.2 - Pip Install & Imports

`pip` is Python’s package manager — it helps you install external tools and libraries that aren’t built into Python by default.

**Example**
```bash
pip install pandas
```
**Note:** You only need to `pip install` once per environment or machine.

Once a package is installed, you still need to impor it into your Python script or notebook.

**Example**

```
import pandas as pd
```

This line of code:
- Loads the pandas library into your script
- Gives it the name pd (common convention)
- Let's you access its functions, for example `pd.read_csv()` to load 

**Note:** You must import in every notebook/script where you want to use the package.

In [79]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sklearn

df = pd.DataFrame({'Depth': [1000, 2000, 3000], 'ROP': [50, 45, 60]})
print(df)

   Depth  ROP
0   1000   50
1   2000   45
2   3000   60


## 3 - Python Basics: Built-in Data Structures
Before working with datasets, let's cover fundamental Python data structures.

**Summary Table: Python Data Structures**

| Data Structure | Ordered? | Mutable? | Duplicates Allowed? | Best Use Case |
|---------------|---------|----------|------------------|--------------|
| **List** (`list`) | ✅ Yes | ✅ Yes | ✅ Yes | General-purpose, ordered data storage |
| **Dictionary** (`dict`) | ❌ No (Python 3.7+ maintains insertion order) | ✅ Yes | ❌ No (keys must be unique) | Key-value lookups, structured data |
| **Tuple** (`tuple`) | ✅ Yes | ❌ No | ✅ Yes | Immutable, fixed collections |
| **Set** (`set`) | ❌ No | ✅ Yes (elements can be added/removed) | ❌ No | Unique element storage, set operations |

### 3.1 - Lists
**Definition:**
A list is an ordered, mutable (modifiable) collection that allows duplicate elements.

**Key Features:**
- Ordered: elements maintain the order in which they were added
- Mutable: Elements can be changed, added, or removed.
- Allows Duplicates: Multiple elements with the same value are allowed.

**When to Use?**
- When you need an ordered collection of items.
- When frequent updates (insertion/deletion/modification) are needed.
- When you want to store heterogeneous data (e.g., ["Drill Bit", 10, 45.7]).

In [None]:
# Lists
bha_components_list = ['Bit', 'Mud Motor', 'MWD', 'Stabilizer']
print(bha_components_list[0])  # Access first item
print(len(bha_components_list))  # Number of elements

Bit
4


**Now You Try**

1. Return the 3rd and 4th items from the bha_components_list
2. Add 'Drill Collar' to list using `.append()` function

In [None]:
# Your code here

MWD
['Bit', 'Mud Motor', 'MWD', 'Stabilizer', 'Drill Collar']


### 3.2 - Dictionaries
**Definition:** A dictionary is an unordered collection of key-value pairs, where keys are unique and immutable.

**Key Features:**
- Key-value pairs: Allows efficient lookups.
- Keys must be unique: No duplicate keys are allowed.
- Mutable: You can update values or add new key-value pairs.

**When to Use?**
- When you need fast lookups based on unique keys.
- When you need to store related attributes (e.g., drilling parameters per well).
- When you need flexible and structured data storage.

In [86]:
# Dictionaries
motor_specs_dict = {
    'model': 'A675XP',
    'lobes': '7:8',
    'stages': 5.0,
    'bit_bend': 6.03,  # ft
    'speed': 0.28,  # rpg
    'max_diff': 1050,  # psi
    'max_torque': 9200,  # ft-lb
}
print(motor_specs_dict['model'])  # Accessing dictionary value

A675XP


**Now You Try**

1. Use the motor_specs dictionary to return the motor speed.
2. Calculate the TQ/Diff ratio using the dictionary.

In [87]:
# Your code here
tq_diff_ratio = motor_specs_dict['max_torque'] / motor_specs_dict['max_diff']
print(tq_diff_ratio)  # Torque to differential pressure ratio

8.761904761904763


### 3.3 - Tuples
**Definition:** A tuple is an ordered, immutable collection that allows duplicate elements.

**Key Features:**
- Ordered: Elements maintain their order.
- Immutable: Cannot be changed after creation.
- Allows Duplicates: Multiple identical elements are allowed.

**When to Use?**
- When you need a fixed collection that should not change.
- When performance is critical (tuples are faster than lists).
- When using as dictionary keys (since they are immutable).

In [None]:
# Tuples
drilling_parameters = (5000, 50, 10.5)  # Immutable list
print(drilling_parameters[0])  # Access first item

### 3.4 - Sets
**Definition:** A set is an unordered, mutable collection that only stores unique elements.

**Key Features:**
- Unordered: No guaranteed element order.
- Mutable (but only for adding/removing elements).
- No Duplicates: Automatically removes duplicates.

**When to Use?**
- When you need to store unique values only (e.g., unique well names).
- When you need fast membership testing (in operator is fast).
- When performing set operations (union, intersection, difference).

In [None]:
# Sets
drilling_tools = {'Bit', 'Mud Motor', 'MWD', 'Rotary Table'}
print(drilling_tools)  # Unique elements
# Step 1: Adding an element to a set
drilling_tools.add('Casing')
print('Step 1 Result:')
print(drilling_tools)
# Step 2: Removing an element from a set
drilling_tools.remove('Bit')
print('Step 2 Result:')
print(drilling_tools)
# Step 3: Check if an element exists in a set
print('Step 3 Result:')
print('Bit' in drilling_tools)  # Returns False
# Iterating through a set
for tool in drilling_tools:
    print(tool)

# List Comprehensions
# Create a list of drilling tools with 'Drill' prefix
drilling_tools = ['Bit', 'Mud Motor', 'MWD', 'Rotary Table']
drilling_tools_with_prefix = [f'Drill {tool}' for tool in drilling_tools]
print(drilling_tools_with_prefix)  # ['Drill Bit', 'Drill Mud Motor', 'Drill MWD', 'Drill Rotary Table']

### 3.5 - Dictionary Comprehension

In [90]:
drilling_tools = ['Bit', 'Mud Motor', 'MWD', 'stabilizer']  # list of drilling tools
tool_od_list = [12.25, 9.625, 8.5, 8.5]  # Outer diameter list

# Create a dictionary with drilling tools and their ODs
drilling_tool_dict = {tool: od for tool, od in zip(drilling_tools, tool_od_list)}
print(drilling_tool_dict)

# Get the OD of the MWD from the dictionary:
mwd_od = drilling_tool_dict.get('MWD')
print(mwd_od)  # 8.5


{'Bit': 12.25, 'Mud Motor': 9.625, 'MWD': 8.5, 'stabilizer': 8.5}
8.5


**f-strings**

f-strings are good for debugging and clarity of what you're printing.

In [91]:
# F-strings:
tool_type = 'Bit'
print(f'The {tool_type} has an OD of {drilling_tool_dict[tool_type]} inches.')

The Bit has an OD of 12.25 inches.


**for loops**

for loops allow you to loop through dictionaries, lists, dataframes, etc. and perform various actions or calcs

In [92]:
# For Loop with Dictionary:
# Iterate through the dictionary and print each tool's name and outer diameter
for key, value in drilling_tool_dict.items():
    print(f'The {key} has an OD of {value} inches.')

The Bit has an OD of 12.25 inches.
The Mud Motor has an OD of 9.625 inches.
The MWD has an OD of 8.5 inches.
The stabilizer has an OD of 8.5 inches.


## 3.6 - Intro to DataFrames
We'll use Pandas to create and manipulate dataframes.

In [93]:
import pandas as pd

# Creating a simple DataFrame
data = {'Depth': [1050, 1100, 1150, 1200], 'ROP': [323, 350, 355, 385], 'WOB': [42, 43, 48, 50], 'RPM': [120, 120, 120, 120], 
        'DIFF': [458, 473, 491, 526]}
df = pd.DataFrame(data)
print(df)

   Depth  ROP  WOB  RPM  DIFF
0   1050  323   42  120   458
1   1100  350   43  120   473
2   1150  355   48  120   491
3   1200  385   50  120   526


**What is a DataFrame?**

Like a Python powered spreadsheet!

DataFrames are useful because you can:
- Filter data efficiently
- Compute statistics
- Clean messy data
- Import/export from CSV, Excel, databases
- Control everything with Python

## 3.7 - Uploading Data Sets from CSV

### Uploading **.csv** Data Files from Google Drive

The below snippet of code should be run to import the Forge data while running on GoogleColab

In [None]:
from google.colab import drive
import pandas as pd
drive.mount('/content/drive')

file_path = '/content/drive/My Drive/python-for-drilling-engineers/module_1/16A_78-32_time_data_10s_intervals_standard.csv'

forge_16A_df = pd.read_csv(file_path)

## 3.8 - Rapid Dataset Reviews

Let's start by taking a look at the format of the data pulled in from the CSV.

In [None]:
print(forge_16A_df.shape)  # Display the shape of the DataFrame (Rows, Columns)
print(f'\n\nRow Count: {forge_16A_df.shape[0]} \nColumn Count: {forge_16A_df.shape[1]}')  # Display row and column count
print(f'\n\nColumn names: \n{list(forge_16A_df.columns)}')  # Display the column names
# print the first 10 rows of the first 6 columns
print('\n\nFirst look at the dataframe:')
forge_16A_df.iloc[:10, :6]


The first row contains unit information. Let's save it as a dictionary for reference, then remove the row from the dataframe.

In [None]:
# Save the first row as a dictionary with the key as the column name and the value as the first row value.
unit_dict = forge_16A_df.iloc[0].to_dict()
print(unit_dict)

print(f'ROP units: {unit_dict["Rate of Penetration (Depth/Hour)"]}')  # Access the ROP units
print(unit_dict['Rate of Penetration (Minute/Depth)'])

# drop first row (units)
forge_16A_df.drop(index=0, inplace=True)  # Drop the first row

Now Let's take a look at the data types and non-null counts for each column using the .info function in the pandas library.

In [None]:
forge_16A_df.info()  # Display DataFrame info

Several columns are null. Let's remove them to focus our efforts.

In [None]:
# Remove columns with lte 1 non-null value
forge_16A_df = forge_16A_df.dropna(axis=1, thresh=2)
print(forge_16A_df.info(max_cols=None))

In [None]:
# Set the columns with 'Date' in the header to datetime
for col in forge_16A_df.columns:
    if 'Date' in col:
        forge_16A_df[col] = pd.to_datetime(forge_16A_df[col], errors='coerce')

# Set all other columns to float
for col in forge_16A_df.columns:
    if 'Date' not in col:
        forge_16A_df[col] = pd.to_numeric(forge_16A_df[col], errors='coerce')

print(forge_16A_df.head(10))

## 3.9 - Mapping Channels
Let's make a copy of our dataframe so as we manipulate the data, the original dataset remains preserved.

In [None]:
df = forge_16A_df.copy()  # Create a copy of the DataFrame

Let's clean up the DataFrame a bit:
1. Rename our columns to have a more code-friendly title.
2. Reduce the columns to only what we care about.
3. Set the rig_time column type to datetime.
4. Sort the dataframe by 'rig_time'.

In [None]:
print('Original Columns:')
print(forge_16A_df.columns)  # Check the columns in the DataFrame

### 3.9.1 - Channel Mapper Creation

Use a dictionary to create a channel mapper.

In [None]:
# Define the channel mapper dictionary
channel_mapper_dict = {
    'Date': 'rig_time',
    'Bit Diameter': 'bit_size',
    'Top Drive Revolutions per Minute': 'td_rpm',
    'Bit Revolutions per Minute': 'bit_rpm',
    'Weight on Bit': 'wob',
    'Differential Pressure': 'diff_press',
    'Block Position': 'block_height',
    'Rate of Penetration (Depth/Hour)': 'rop',
    'Depth Hole Total Vertical Depth': 'md',
    'Inclination': 'inc',
    'Azimuth': 'azi',
    'Hookload': 'hookload',
    'Pump Pressure': 'pump_press',
    'Return Flow': 'flow_out',
    'Flow In': 'flow_in',
    'Top Drive Torque': 'td_torque',
    'Gamma Measured while Drilling': 'gamma',
    'Rig Mode': 'rig_mode',
    'On Bottom': 'on_bottom_status',
    'Total Strokes per Minute': 'total_spm'
}

# Rename column headers using the dictionary
df.rename(columns=channel_mapper_dict, inplace=True)

print('Renamed Columns:')
print(df.columns)  # Check the renamed columns

### 3.9.2 - Limit the columns

In [None]:
# Rearrange and limit the columns in the dataframe
df = df[['rig_time', 'md', 'rop', 'wob', 'diff_press', 'td_rpm', 'td_torque',
         'bit_rpm', 'block_height', 'inc', 'azi', 'bit_size', 'on_bottom_status']]

In [None]:
# Set the type of rig_time to datetime
df['rig_time'] = pd.to_datetime(df['rig_time'], errors='coerce')  # Convert to datetime
for col in df.columns:
    if col != 'rig_time':
        df[col] = pd.to_numeric(df[col], errors='coerce')  # Convert to numeric

df.sort_values(by='rig_time', inplace=True)  # Sort by rig_time
print(f'\nRenamed Columns:')
print(df.columns)  # Check the columns in the DataFrame

## Plotting Basics

### DVD Plot
We're ready to start our analysis.

Let's wrap our heads around the dataset by visualizing a common DVD curve using **MatPlotLib**.

In [None]:
# Import the MatPlotLib and Seaborn libraries for plotting.
import matplotlib.pyplot as plt

# Reduce the frequency of the dataframe to make the plot less heavy
plot_df = df.copy()
plot_df = plot_df.iloc[::240, :]  # Take every 240th row using python slice notation --> start:stop:step
# drop rows where rig_time or md is null
plot_df.dropna(subset=['rig_time', 'md'], inplace=True)
# set rig_time as datetime
# plot_df['rig_time'] = pd.to_datetime(plot_df['rig_time'], errors='coerce')  # Convert to datetime
plot_df['rig_time'] = plot_df['rig_time'].dt.strftime('%Y-%m-%d %H:%M:%S')  # Format datetime

# convert md to numeric
plot_df['md'] = pd.to_numeric(plot_df['md'], errors='coerce')

print(f'Reduced row count: {plot_df.shape[0]}')  # Check the number of rows after reduction

# Ensure plots are displayed in Jupyter Notebook
%matplotlib inline 

# plot line graph x axis = rig_time, y axis = bit_depth, then invert the y-axis
plt.figure(figsize=(10, 6))
plt.plot(plot_df['rig_time'], plot_df['md'], label='Bit Depth', color='blue')
# Reduce the x-axis ticks to only show every 200th tick
plt.xticks(plot_df['rig_time'][::200], rotation=45)  # Rotate x-axis labels for better readability
plt.gca().invert_yaxis()  # Invert the y-axis
plt.xlabel('Rig Time')  # Set x-label
plt.ylabel('Depth (ft)')  # Set y-label
plt.title('DvD Curve')
plt.show()

## Practice With Your DataSet
Use the code block below to try the following:
1. Save a .csv file of your data to the python-for-drilling-engineers Google Drive.
2. Upload a CSV file containing drilling parameters for one of your wells (or a single bit run) to a DataFrame.
   - Include Parameters: 'rig_time', 'md', 'bit_depth', 'rop', 'wob', 'td_rpm', 'td_torque', 'diff_press', 'block_height'
3. Prep the DataFrame for analysis.
   1. Create a custom mapper dictionary
   2. Map the column headers to your chosen channel mnemonics & limit the columns in the dataframe.
4. Export and save the revised dataframe as a csv or excel file.


**Congratulations on completing Module 1!** 🎉

## Module 2 Next Friday!

Next week we will explore dataframes in-depth and start the Data QC process.  Ensuring you have a good, clean dataset is a **MUST** before you start to dive into ML.

You don't want garbage outputs, so let's not feed it garbage in.