## Predicting Peak Oil Production Rate

In this project, we will build a series of models to predict the peak oil production rate of an oil well. Predicting this quantity provides valuable
insights on the overall production profile of an oil well, which is essential for making informed decisions down the road.

## Outline

The workflow of our approach is as follows:
- Data Exploration and Pre-processing: Visualize the raw input data and observe patterns in correlation. Then, pre-process the data in a reasonable manner.
- Model Building: Construct baseline linear and non-linear models to predict the peak oil production rate.
- Evaluation: Use R-Squared and RMSE to evaluate the models and compare their performance. 
- Conclusions: Draw conclusions on performance capablity based on the evaluation phase.

## Data Exploration

There are numerous descriptive variables in our dataset. These include:
- 

In [3]:
# Import standard libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import os
import warnings
from typing import *

# Ignore warnings
warnings.simplefilter("ignore")

# Path to the training data
training_path = "./training.csv"

Start by reading in the training data and dropping any irrelevant features. Also, drop any rows that do not have an OilPeakRate value.

In [7]:
raw_df = pd.read_csv(training_path)
raw_df = raw_df.drop(columns=["Unnamed: 0", "pad_id"])
raw_df = raw_df.dropna(subset=["OilPeakRate"])
raw_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 19306 entries, 0 to 29436
Data columns (total 29 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   surface_x                     19306 non-null  float64
 1   surface_y                     19306 non-null  float64
 2   bh_x                          17882 non-null  float64
 3   bh_y                          17882 non-null  float64
 4   standardized_operator_name    19306 non-null  int64  
 5   gross_perforated_length       19148 non-null  float64
 6   number_of_stages              2643 non-null   float64
 7   total_proppant                17912 non-null  float64
 8   total_fluid                   17866 non-null  float64
 9   true_vertical_depth           19201 non-null  float64
 10  ffs_frac_type                 14310 non-null  object 
 11  proppant_intensity            17872 non-null  float64
 12  frac_fluid_intensity          17821 non-null  float64
 13  averag

## Preliminary Visualization

In [9]:
all_variables_df = raw_df.dropna()
all_variables_df.head()

Unnamed: 0,surface_x,surface_y,bh_x,bh_y,standardized_operator_name,gross_perforated_length,number_of_stages,total_proppant,total_fluid,true_vertical_depth,...,relative_well_position,batch_frac_classification,well_family_relationship,frac_type,frac_seasoning,horizontal_midpoint_x,horizontal_midpoint_y,horizontal_toe_x,horizontal_toe_y,OilPeakRate
344,1032836.972,698078.5752,1028268.464,698847.5209,1213,3923.228346,14.0,451393.6429,1919215.0,8500.984252,...,Standalone Well,Non-Batch Frac,Standalone Well,Primary Frac,9.0,1030244.11,698733.6581,1028335.428,699174.1302,184.642886
347,1028372.818,705664.9173,1023864.056,706260.3561,1213,3899.606299,14.0,448267.0,1895077.0,8627.952756,...,Standalone Well,Non-Batch Frac,Standalone Well,Primary Frac,6.0,1025805.159,706187.211,1023920.524,706583.3042,223.857178
512,1329694.118,605041.4532,1330347.965,609686.9108,1025,3875.0,13.0,128765.8571,631501.1,6552.165354,...,Outer Well,Batch-Sequential Frac,Sibling Well,Primary Frac,32.0,1330267.421,607708.1822,1330683.948,609624.2091,125.714305
540,1070311.801,767633.0778,1077924.241,766559.2843,1030,6994.094488,16.0,274373.6429,1519341.0,8995.07874,...,Outer Well,Non-Batch Frac,Infill Child Well,Primary Frac,65.0,1074398.954,766764.2836,1077826.715,766010.6038,80.476997
547,1070326.117,767681.0442,1078025.255,767204.4358,1030,6958.661417,21.0,192160.8571,1925162.0,8603.346457,...,Outer Well,Batch-Sequential Frac,Sibling Well,Primary Frac,90.0,1074480.955,767384.8049,1077967.682,766649.795,103.739302
