# Real Bakken Well Analysis

This notebook demonstrates decline curve analysis using **real production data** from a Bakken shale oil well in North Dakota.

## What You'll Learn
- Work with real-world production data
- Handle data quality issues
- Compare multiple forecasting models on actual well data
- Analyze production trends and decline behavior
- Generate realistic economic forecasts

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from decline_curve import dca
from decline_curve.models import ArpsParams

# Configure logging
import logging
from decline_curve.logging_config import configure_logging, get_logger

configure_logging(level=logging.INFO)
logger = get_logger(__name__)

plt.style.use('seaborn-v0_8-darkgrid')
%matplotlib inline


## 1. Load Real Bakken Well Data

This data comes from the NYSTUEN 14B-35HS well operated by SM Energy Company in the Bakken formation.

In [None]:
# Load the data
df = pd.read_csv('data/bakken_well_production.csv')

# Display basic information
logger.info("Dataset Overview:")
logger.info(f"Records: {len(df)}")
logger.info(f"Columns: {', '.join(df.columns.tolist())}")
logger.info(f"Well Information:")
logger.info(f"  Well Name: {df['WellName'].iloc[0]}")
logger.info(f"  Operator: {df['Company'].iloc[0]}")
logger.info(f"  Field: {df['FieldName'].iloc[0]}")
logger.info(f"  Formation: {df['Pool'].iloc[0]}")
logger.info(f"  Location: {df['County'].iloc[0]} County, ND")
logger.info(f"  Online Date: {df['Online_Date'].iloc[0]}")

df.head()

## 2. Data Preparation and Cleaning

In [None]:
# Convert date column and set as index
df['ReportDate'] = pd.to_datetime(df['ReportDate'])
df = df.sort_values('ReportDate')

# Create clean production series
production = df[['ReportDate', 'Oil', 'Wtr', 'Gas', 'Days']].copy()
production = production.set_index('ReportDate')

# Calculate daily rates (production is monthly total)
production['oil_rate'] = production['Oil'] / production['Days']
production['water_rate'] = production['Wtr'] / production['Days']
production['gas_rate'] = production['Gas'] / production['Days']

logger.info("Production Statistics:")
logger.info(production[['oil_rate', 'water_rate', 'gas_rate']].describe().round(1))

# Create monthly oil production series for DCA
oil_series = production['Oil'].copy()
oil_series.name = 'oil_bbl'

logger.info(f"Production Period: {oil_series.index[0].strftime('%Y-%m')} to {oil_series.index[-1].strftime('%Y-%m')}")
logger.info(f"Total Months: {len(oil_series)}")
logger.info(f"Cumulative Oil: {oil_series.sum():,.0f} bbl")

## 3. Visualize Production History

In [None]:
# Create comprehensive production plot
fig, axes = plt.subplots(3, 1, figsize=(14, 12))

# Oil production
axes[0].plot(production.index, production['oil_rate'], 'o-', 
             color='green', linewidth=2, markersize=4, label='Daily Oil Rate')
axes[0].set_ylabel('Oil Rate (bbl/day)', fontsize=12)
axes[0].set_title('NYSTUEN 14B-35HS - Bakken Production History', 
                  fontsize=14, fontweight='bold')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Water production
axes[1].plot(production.index, production['water_rate'], 'o-', 
             color='blue', linewidth=2, markersize=4, label='Daily Water Rate')
axes[1].set_ylabel('Water Rate (bbl/day)', fontsize=12)
axes[1].legend()
axes[1].grid(True, alpha=0.3)

# Gas production
axes[2].plot(production.index, production['gas_rate'], 'o-', 
             color='red', linewidth=2, markersize=4, label='Daily Gas Rate')
axes[2].set_xlabel('Date', fontsize=12)
axes[2].set_ylabel('Gas Rate (mcf/day)', fontsize=12)
axes[2].legend()
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Calculate water cut
water_cut = (production['Wtr'] / (production['Oil'] + production['Wtr']) * 100)
logger.info(f"Water Cut Trend:")
logger.info(f"  Initial: {water_cut.iloc[0]:.1f}%")
logger.info(f"  Current: {water_cut.iloc[-1]:.1f}%")
logger.info(f"  Average: {water_cut.mean():.1f}%")

## 4. Decline Curve Analysis

Let's fit Arps models to the production data and generate forecasts.

In [None]:
# Compare different Arps models
models = {
    'Exponential': 'exponential',
    'Harmonic': 'harmonic',
    'Hyperbolic': 'hyperbolic'
}

forecasts = {}
metrics = {}

logger.info("Model Performance on Real Bakken Data:")

for name, kind in models.items():
    try:
        # Generate 24-month forecast
        forecast = dca.forecast(oil_series, model='arps', kind=kind, horizon=24)
        forecasts[name] = forecast
        
        # Evaluate on historical data
        metric = dca.evaluate(oil_series, forecast)
        metrics[name] = metric
        
        logger.info(f"{name} Model:")
        logger.info(f"  RMSE: {metric['rmse']:.0f} bbl/month")
        logger.info(f"  MAE: {metric['mae']:.0f} bbl/month")
        logger.info(f"  SMAPE: {metric['smape']:.1f}%")
    except Exception as e:
        logger.info(f"{name} Model: Failed - {str(e)}")

# Find best model
if metrics:
    best_model = min(metrics.keys(), key=lambda x: metrics[x]['rmse'])
    logger.info(f"✓ Best Model: {best_model} (lowest RMSE)")

## 5. Visualize Forecasts

In [None]:
# Plot historical data and forecasts
fig, ax = plt.subplots(figsize=(14, 7))

# Historical production
ax.plot(oil_series.index, oil_series.values, 'o-', 
        color='black', linewidth=2, markersize=5, label='Historical Production', zorder=5)

# Forecasts
colors = ['red', 'green', 'orange']
for (name, forecast), color in zip(forecasts.items(), colors):
    forecast_part = forecast.iloc[len(oil_series):]
    ax.plot(forecast_part.index, forecast_part.values, '--', 
            color=color, linewidth=2, label=f'{name} Forecast', alpha=0.8)

# Forecast start line
ax.axvline(x=oil_series.index[-1], color='gray', linestyle=':', 
           linewidth=2, label='Forecast Start')

ax.set_xlabel('Date', fontsize=12)
ax.set_ylabel('Oil Production (bbl/month)', fontsize=12)
ax.set_title('Bakken Well Production Forecast - NYSTUEN 14B-35HS', 
             fontsize=14, fontweight='bold')
ax.legend(loc='best')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 6. Economic Analysis

Calculate EUR and NPV for the well using the best-fit model.

In [None]:
# Economic parameters (typical for Bakken)
oil_price = 70.0      # $/bbl
opex = 25.0           # $/bbl (higher for mature well)
discount_rate = 0.10  # 10% annual

# Get best forecast
best_forecast = forecasts[best_model]
forecast_only = best_forecast.iloc[len(oil_series):]

# Calculate economics on forecasted production
economics = dca.economics(
    production=forecast_only,
    price=oil_price,
    opex=opex,
    discount_rate=discount_rate
)

# Calculate total EUR (historical + forecast)
historical_production = oil_series.sum()
forecasted_production = forecast_only.sum()
total_eur = historical_production + forecasted_production

logger.info("Economic Analysis Results:")
logger.info(f"Historical Production: {historical_production:,.0f} bbl")
logger.info(f"Forecasted Production (24 months): {forecasted_production:,.0f} bbl")
logger.info(f"Total EUR: {total_eur:,.0f} bbl")
logger.info(f"Forecasted Economics (next 24 months):")
logger.info(f"  NPV: ${economics['npv']:,.0f}")
logger.info(f"  Gross Revenue: ${forecasted_production * oil_price:,.0f}")
logger.info(f"  Operating Costs: ${forecasted_production * opex:,.0f}")
logger.info(f"  Net Revenue: ${forecasted_production * (oil_price - opex):,.0f}")

if economics['payback_month'] is not None:
    logger.info(f"  Payback Period: {economics['payback_month']} months")

## 7. Production Decline Analysis

In [None]:
# Calculate decline rates
monthly_decline = oil_series.pct_change() * -100  # Convert to positive decline %

# Plot decline rate over time
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 10))

# Production on log scale
ax1.semilogy(oil_series.index, oil_series.values, 'o-', 
             color='blue', linewidth=2, markersize=4)
ax1.set_ylabel('Oil Production (bbl/month) - Log Scale', fontsize=12)
ax1.set_title('Production Decline - Log Scale', fontsize=14, fontweight='bold')
ax1.grid(True, alpha=0.3, which='both')

# Monthly decline rate
ax2.plot(monthly_decline.index[1:], monthly_decline.values[1:], 'o-', 
         color='red', linewidth=2, markersize=4)
ax2.axhline(y=0, color='gray', linestyle='--', linewidth=1)
ax2.set_xlabel('Date', fontsize=12)
ax2.set_ylabel('Monthly Decline Rate (%)', fontsize=12)
ax2.set_title('Month-over-Month Decline Rate', fontsize=14, fontweight='bold')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Decline statistics
logger.info("Decline Rate Statistics:")
logger.info(f"  Average Monthly Decline: {monthly_decline[1:].mean():.1f}%")
logger.info(f"  Median Monthly Decline: {monthly_decline[1:].median():.1f}%")
logger.info(f"  First 6 months avg: {monthly_decline[1:7].mean():.1f}%")
logger.info(f"  Last 6 months avg: {monthly_decline[-6:].mean():.1f}%")

## 8. Compare with ARIMA Model

Let's see how a statistical model performs on this real data.

In [None]:
try:
    # Generate ARIMA forecast
    arima_forecast = dca.forecast(oil_series, model='arima', horizon=24)
    arima_metrics = dca.evaluate(oil_series, arima_forecast)
    
    logger.info("ARIMA Model Performance:")
    logger.info(f"  RMSE: {arima_metrics['rmse']:.0f} bbl/month")
    logger.info(f"  MAE: {arima_metrics['mae']:.0f} bbl/month")
    logger.info(f"  SMAPE: {arima_metrics['smape']:.1f}%")
    
    # Compare with best Arps model
    logger.info(f"Comparison with {best_model} Arps:")
    logger.info(f"  ARIMA RMSE: {arima_metrics['rmse']:.0f}")
    logger.info(f"  {best_model} RMSE: {metrics[best_model]['rmse']:.0f}")
    
    if arima_metrics['rmse'] < metrics[best_model]['rmse']:
        logger.info(f"  ✓ ARIMA performs better by {metrics[best_model]['rmse'] - arima_metrics['rmse']:.0f} bbl/month")
    else:
        logger.info(f"  ✓ {best_model} Arps performs better by {arima_metrics['rmse'] - metrics[best_model]['rmse']:.0f} bbl/month")
        
except Exception as e:
    logger.info(f"ARIMA model failed: {str(e)}")

## Summary

In this notebook, we:
1. ✓ Loaded and cleaned real Bakken well production data
2. ✓ Visualized oil, water, and gas production trends
3. ✓ Fitted multiple Arps decline models
4. ✓ Generated 24-month production forecasts
5. ✓ Calculated EUR and economic metrics
6. ✓ Analyzed production decline behavior
7. ✓ Compared physics-based and statistical models

## Key Insights

- Bakken wells typically show high initial production with rapid decline
- Water cut increases significantly over time
- Hyperbolic decline models often fit unconventional wells best
- Real data has more variability than synthetic data
- Economic analysis shows remaining value in mature wells

## Next Steps

- Try different economic scenarios (price sensitivity)
- Analyze multiple wells from the same field
- Compare with type curves for the Bakken formation
- Investigate operational events (shutdowns, workovers)