
# Understanding Supervised Learning
### Driving Smarter Decisions in Oil & Gas

**Audience:** Executives at ABC Oil & Gas  
**Duration:** 3 Hours (with blended follow-up resources)

This notebook supports the live workshop by combining:
- Explanations (executive-friendly, no heavy math)
- Case studies from Oil & Gas
- Interactive activities
- Mini assessments (polls, quizzes, reflections)


<img src="module1.png" alt="Module 1 Poster" style="display:block; margin-left:auto; margin-right:auto;" width="500">




## Module 1: Demystifying Supervised Learning

Supervised learning is when we train an algorithm using **labeled data**.
- **Classification:** Predict categories (e.g., "safe" vs "unsafe equipment"), used for qualitative data
- **Regression:** Predict numbers (e.g., production forecast in barrels/day), used for quantitative data

**Analogy:** Teaching a child with flashcards.


**Practical Hands-On Demonstration**

In [16]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

**What do these code lines mean?** <br> These code lines load the essential tools we need to build and test a simple machine learning model. Pandas helps us organize data like spreadsheets, train_test_split divides the data so the model can practice and be tested, Logistic Regression is the method we use to predict yes/no outcomes (like whether equipment will fail), and classification_report gives us a report card to see how well the model performed. Together, they set up the foundation for applying supervised learning in real business problems.

In [None]:
# Simulated sensor dataset
data = {
    "Temperature": [70, 85, 90, 60, 75, 95, 100, 55],
    "Pressure": [30, 45, 50, 25, 35, 55, 60, 20],
    "Failure": [0, 1, 1, 0, 0, 1, 1, 0]  # 1=Failure, 0=No Failure
}

**What do these code lines mean?** <br> This small dataset represents machine readings that combine Temperature and Pressure to check whether a machine fails or not. Each row is like a daily observation — for example, when the temperature is 85 and pressure is 45, the machine failed (Failure = 1). The Failure column is the answer key (1 = failure, 0 = no failure) that the supervised learning model will learn from, so later it can predict failures for new readings.

In [None]:
df = pd.DataFrame(data)

**What does this code line mean?** <br> This line takes the raw dataset we created (with temperature, pressure, and failure values) and turns it into a DataFrame, which is like a clean, Excel-style table inside Python. This makes it easier to view, manage, and analyze the data in an organized way.

In [None]:
X = df[["Temperature", "Pressure"]]
y = df["Failure"]

**What do these code lines mean?** <br> These lines split the table into two parts:
X contains the inputs (Temperature and Pressure), which are the conditions we use to make predictions.
y contains the output/answer (Failure), which tells us whether the machine failed or not.
In simple terms, we’re saying: “Let’s use temperature and pressure readings (X) to predict if a machine will fail (y).”

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

**What does this code line mean?** <br> This line splits the data into two groups: one for training (teaching the model how to recognize patterns) and one for testing (checking if the model actually learned correctly). About 70% of the data is used to train, and 30% is saved aside to test. The random_state=42 just makes sure the split is the same every time, so results are consistent.

In [None]:
model = LogisticRegression()

**What does this code line mean?** <br> This line creates a Logistic Regression model, which is like a smart decision tool. It will learn patterns from the training data (temperature and pressure) to predict a simple yes/no outcome — in this case, whether the machine will fail or not. Think of it as setting up a digital assistant that specializes in answering “failure vs. no failure” questions.

In [None]:
model.fit(X_train, y_train)

**What does this code line mean?** <br> This line is where the model actually learns from the training data. By looking at past examples of temperature, pressure, and whether failure happened, the model figures out the patterns that link inputs (X) to the outcome (y). In simple terms, it’s like “teaching” the model using historical data so it can make predictions on new situations.

In [None]:
y_pred = model.predict(X_test)

**What does this code line mean?** <br> This line asks the trained model to make predictions on the test data it has never seen before. Using only temperature and pressure from the test set, the model tries to guess whether each machine will fail (yes/no). It’s like giving a student a quiz after studying — we see if they can apply what they learned to new questions.

In [None]:
print(classification_report(y_test, y_pred))

**What does this code line mean?** <br> This line prints out a report card for the model. It shows how well the model did on the test by giving measures like accuracy (how often it was right), precision (how careful it was when predicting failure), recall (how many actual failures it caught), and F1-score (a balanced overall grade). In simple terms, it tells us whether the model is reliable enough to be trusted for real-world decisions.


👉 **Activity 1.1** <br>
**Self-Reflection:** Which decisions in your role could benefit from “predicting outcomes” like this?


<img src="module2.png" alt="Module 2 Poster" style="display:block; margin-left:auto; margin-right:auto;" width="500">



## Module 2: Applications in Oil & Gas

Supervised learning is already transforming oil & gas:

1. **Predictive Maintenance** – Prevent equipment breakdowns.  
2. **Safety & Risk Management** – Predict accident likelihood.  
3. **Production Optimization** – Forecast oil production rates.  


**Practical Hands-On Demonstration**

In [None]:
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

**What do these code lines mean?** <br> These lines bring in the tools we need to predict numbers instead of yes/no answers.
NumPy (np) helps us handle lists of numbers easily, like days or production values.
LinearRegression is the model that draws the best straight line through data points, helping us forecast continuous values (like daily oil production).
Matplotlib (plt) is used to create simple charts, so we can see the trend visually.
In short: these tools let us analyze numbers, build a forecasting model, and show the results in a graph.

In [None]:
# Simulated well production dataset
days = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]).reshape(-1,1)
output = np.array([100, 98, 96, 92, 90, 88, 85, 83, 81, 79])

**What do these code lines mean?** <br> This code sets up a small dataset to show how production changes over time.
The first line (days) lists days 1 through 10, reshaped so the computer can read them properly as input data.
The second line (output) lists the oil production values for each of those days, which are slowly decreasing.
In simple terms, we’re telling the model: “Here are 10 days of production data — can you learn the trend so we can forecast future output?”

In [None]:
model = LinearRegression()

**What does this code line mean?** <br> This line creates a Linear Regression model, which is like drawing the best straight line through the data points. The model will use that line to understand the overall trend — in this case, how production decreases day by day — and then use it to predict future values.

In [None]:
model.fit(days, output)

**What does this code line mean?** <br> This line tells the Linear Regression model to learn from the data. It looks at the days (input) and the production numbers (output) and figures out the best straight line that explains the relationship between them. In simple terms, it’s like teaching the model the trend: “As days increase, production steadily decreases.”

In [None]:
predictions = model.predict(days)

**What does this code line mean?** <br> This line asks the trained model to make predictions for the production values based on the given days. Using the trend it learned, the model fills in or forecasts the numbers — essentially drawing the best-fit line through the data points. In plain terms, it’s like saying: “Given what I’ve seen, here’s what production should look like on each day.”

In [None]:
plt.scatter(days, output, label="Actual Production")
plt.plot(days, predictions, label="Predicted Trend", linestyle="--")
plt.xlabel("Days")
plt.ylabel("Barrels of Oil")
plt.legend()
plt.show()

**What do these code lines mean?** <br> These lines create a chart to compare the real production numbers with the model’s predictions.
The scatter plot (dots) shows the actual oil production for each day.
The dashed line shows the trend the model learned, making it easy to see the forecast.
Labels on the axes and a legend make the chart clear and professional.
In simple terms, this graph lets executives see both reality and prediction side by side, showing how supervised learning turns raw data into insights.


👉 **Activity 2.1:** Match classification/regression to scenarios  
- Equipment Failure (Safe/Unsafe) → ?  
- Forecasting Production Output → ?  
- Predicting Accident Severity → ?  



👉 **Activity 2.2:** Match classification/regression to scenarios  
- Equipment Failure (Safe/Unsafe) → Classification  
- Forecasting Production Output → Regression  
- Predicting Accident Severity → Classification  


<img src="module3.png" alt="Module 3 Poster" style="display:block; margin-left:auto; margin-right:auto;" width="500">



## Module 3: Responsible Leadership

Executives must:  
- Champion **AI as a partner, not a replacement**  
- Ensure **data quality & governance**  
- Evaluate **bias & trustworthiness**  
- Drive adoption through **pilot projects & scaling**  


**Activity 3.1**  <br>
❓ **Poll:** Would you trust AI to recommend shutting down a drilling rig?  
- Yes  
- No  
- Depends (on human oversight)  

👉**Activity 3.2** <br> **Summative Assessment (Action Plan):** Write down:  
1. One supervised learning use case in your division  
2. One risk you want to mitigate  
3. One action you can take in the next 90 days  
