To start, we import all of the packages that we will need in this notebook. 

In [1]:
import glob
import pandas as pd
import numpy as np
from scipy.integrate import quad
from scipy.optimize import curve_fit

# Problem 1 

### Adding the features we will need to optimize to a DataFrame and then exporting to a csv

Here we read in the well production.csv file 

In [2]:
well_productions = pd.read_csv("well productions/well production.csv")

Here we read in the csv file for each well 

In [3]:
import csv
datas = []
for file in glob.glob("well productions/*"):
    if "well production.csv" not in file:
        frame = pd.read_csv(file)
        # Strip off the extra things on the end
        frame["Name"] = file[17:-4]
        datas.append(frame)

The following function calculates the well length for each well and adds the well length of each well to the DataFrame 

In [4]:
def well_length(dataframe: pd.DataFrame):
    dataframe["well length"] = dataframe["easting"].iloc[-1] - dataframe["easting"][0]

The following function calculates the number of frac stages for each well and adds the calculated value to the DataFrame

In [5]:
def frac_stages(dataframe : pd.DataFrame):
    dataframe["frac stages"] = dataframe[dataframe["proppant weight (lbs)"].isna() == False].shape[0]

The following function sets proppant per stage as "ppf." It will default to the maximum weight. 

In [6]:
def propant_per_stage(dataframe: pd.DataFrame, find_min=False):
    if find_min:
        val = min(dataframe["proppant weight (lbs)"])
    else:
        val = max(dataframe["proppant weight (lbs)"])
    dataframe["ppf"] = val

The following function sets pump rate as "pr." It will default to the maximum rate. 

In [7]:
def pump_rate(dataframe: pd.DataFrame, find_min=False):
    if find_min:
        val = min(dataframe["pump rate (cubic feet/min)"])
    else:
        val = max(dataframe["pump rate (cubic feet/min)"])
    dataframe["pr"] = val

**I DON'T KNOW WHAT THIS DOES**

In [8]:
def find_prodcution(dataframe: pd.DataFrame):
    pass

The below function finds how many years the well will economically produce oil, under the assumption that up to 93 barrels of oil is economical. To arrive at this value we **NEED TO RATIONALIZE THIS VALUE** 

In [9]:
def life_of_res(qi, D):
    lifetime = 1/D*np.log(qi/93)
    if lifetime < 0:
        return 0
    else: 
        return lifetime

Exponential decline curve equation

    Arguments:
        t:  Number of months the well has been producing oil 
        qi: Float. Initial production rate when well first came online.
        di: Float. Nominal decline rate (constant)
        
    Output: 
        Returns q, or the expected production rate at time t. Float.

In [10]:
def exponential_loss(t, qi, D): 
    return qi*np.exp(-D*t)

 Hyperbolic decline curve equation
 
    Arguments:
        t:  Number of months the well has been producing oil
        qi: Float. Initial production rate when well first came online.
        b:  Float. Hyperbolic decline constant
        di: Float. Nominal decline rate at time t=0
    Output: 
        Returns q, or the expected production rate at time t. Float.

In [11]:
def hyperbolic_equation(t, qi, b, di):
    return qi/((1.0+b*di*t)**(1.0/b))

This function allows you to look at the first X months of production, and selects the highest production month as max initial production. It returns the max initial production in the first X months along with a series that contains the values of oil production for the first 12 months. 
    
    Arguments:
        number_first_months: int. Number of months from the point the well comes online
        to compare to get the max initial production rate qi (this looks at multiple months
        in case there is a production ramp-up)
        
        well_name: String. name of the well where we're attempting to get
        the max volume from.

In [12]:
def get_max_initial_production(number_first_months, well_name):
    
    row = well_productions.loc[well_productions["well name"] == well_name]
    val=0      
    row = row.filter(regex='oil')
    row=row.T.squeeze()
    
    for i in range(number_first_months):
        val = max(row[i], val)

    return val, row

The below two functions find the defined integral of the fitted exponential and hyperbolic equations between 0 and the calculated life of the reservoir. Our team found that was easier to use an integral then implement the equation fround in the "Hinge Basin" notebook. 

In [13]:
def get_cumulative_exponential(qi, D):
    return quad(exponential_loss, 0, life_of_res(qi, D), args=(qi,D))

def get_cumulative_hyperbolic(qi, b, di):
    return quad(hyperbolic_equation, 0, life_of_res(qi, di), args=(qi, b, di))

The following function will add the expected lifetime of the well when using the exponential_loss function and the hyperbolic function then add the respective values to the dataframe. Following that, it will find the cumulative production with the least amount of error out of the two functions, and add that value to the DataFrame. 

In [14]:
time_series=pd.Series(list(range(12)))
def cum_production(dataframe: pd.DataFrame, find_min=False):
    #find the name of the current well being examined
    name = dataframe["Name"][0]
    #Set qi equal to the maximum production in the first 5 months of this well's lifespan, and row equal to the series of the 
    #first 12 months' production values
    qi, row = get_max_initial_production(5, name)
    
    #Use the scipy curve_fit function to get the best possible exponential and hyperbolic curves
    popt_exp, pcov_exp = curve_fit(exponential_loss, time_series, row, bounds=(0, [qi,20]))
    popt_hyp, pcov_hyp=curve_fit(hyperbolic_equation, time_series, row,bounds=(0, [qi,2,20]))
    
    #Get the defined integral's value and error for both the exponential and hyperbolic curves
    cp_exp = get_cumulative_exponential(*popt_exp)
    cp_hyp = get_cumulative_hyperbolic(*popt_hyp)

    #Set cumulative production to whichever defined integral has the least error
    if cp_exp[1] < cp_hyp[1]: 
        cum_production = cp_exp[0]
    else: 
        cum_production = cp_hyp[0]

    #Add the respected calculated lifetimes for the exponential and hyperbolic curves to the dataframe then add the 
    #calculated cumulative production. 
    dataframe["lifetime_exp"] = life_of_res(*popt_exp)
    dataframe["lifetime_hyp"] = life_of_res(popt_hyp[0], popt_hyp[2])
    dataframe["cum_production"] = cum_production


Here we call all of the functions we wrote that add the values we will need to optimize to the DataFrame 

In [15]:
%%capture
list(map(cum_production, datas))
list(map(well_length, datas))
list(map(frac_stages, datas))

Here we drop the duplicate wells from the dataframe

In [16]:
%%capture
big_df = pd.concat(datas)
big_df.drop_duplicates(subset=['Name'], inplace=True)

Here we export our DataFrame to a new csv file 

In [17]:
big_df.to_csv("bigPoppa.csv", index=False)

### Creating a model that uses the features we added to the dataframe to predict cumulative output

**We should finally be able to get to the fun stuff now 😁**