There are various techniques to perform anomaly detection, ranging from statistical methods to machine learning models. For this analysis, I'll utilize the Interquartile Range (IQR) method, which is a statistical approach that's relatively simple yet effective for many cases.

#### Interquartile Range (IQR) Method:
* The IQR is the range between the first quartile (25th percentile) and the third quartile (75th percentile) of the data. It measures the statistical spread of the data. Anomalies are typically defined as observations that fall below the first quartile minus 1.5 times the IQR or above the third quartile plus 1.5 times the IQR.

In [1]:
import pandas as pd

# Load the data from the CSV file
df = pd.read_csv('C:\\Users\\ihsankoo\\Downloads\\Liang_Cleaned.csv')

In [2]:
# Calculate IQR for Oil Production Rate
Q1_oil = df['Oil Production Rate'].quantile(0.25)
Q3_oil = df['Oil Production Rate'].quantile(0.75)
IQR_oil = Q3_oil - Q1_oil

# Define bounds for the anomalies
lower_bound_oil = Q1_oil - 1.5 * IQR_oil
upper_bound_oil = Q3_oil + 1.5 * IQR_oil

# Identify anomalies
anomalies_oil = df[(df['Oil Production Rate'] < lower_bound_oil) | (df['Oil Production Rate'] > upper_bound_oil)]

anomalies_oil[['Well Name', 'Date', 'Oil Production Rate']]


Unnamed: 0,Well Name,Date,Oil Production Rate
0,P-1,1/1/2000,709.09
1,P-1,2/1/2000,1024.24
2,P-1,3/1/2000,957.58
3,P-1,4/1/2000,481.82
4,P-1,5/1/2000,448.48
...,...,...,...
354,P-4,7/1/2004,431.52
355,P-4,8/1/2004,431.39
359,P-4,12/1/2004,537.23
360,P-4,1/1/2005,518.49


* For the Oil Production Rate, we've detected 103 anomalies. These anomalies are instances where the oil production rate falls outside the expected range based on the Interquartile Range (IQR) method.

* The table above shows the well name, date, and oil production rate for each detected anomaly.

Next, I'll proceed to detect anomalies in the Water Production Rate.

In [3]:
# Calculate IQR for Water Production Rate
Q1_water = df['Water Production Rate'].quantile(0.25)
Q3_water = df['Water Production Rate'].quantile(0.75)
IQR_water = Q3_water - Q1_water

# Define bounds for the anomalies
lower_bound_water = Q1_water - 1.5 * IQR_water
upper_bound_water = Q3_water + 1.5 * IQR_water

# Identify anomalies
anomalies_water = df[(df['Water Production Rate'] < lower_bound_water) | (df['Water Production Rate'] > upper_bound_water)]

anomalies_water[['Well Name', 'Date', 'Water Production Rate']]


Unnamed: 0,Well Name,Date,Water Production Rate
2,P-1,3/1/2000,2221.21
3,P-1,4/1/2000,3478.79
5,P-1,6/1/2000,1251.52
6,P-1,7/1/2000,1366.67
7,P-1,8/1/2000,1984.85
...,...,...,...
395,P-4,12/1/2007,3056.36
396,P-4,1/1/2008,3117.41
397,P-4,2/1/2008,3143.88
398,P-4,3/1/2008,2856.53


* For the Water Production Rate, we've detected 189 anomalies. These anomalies are instances where the water production rate falls outside the expected range based on the Interquartile Range (IQR) method.

* The table above displays the well name, date, and water production rate for each detected anomaly.

<hr>

Given that gas production is consistently zero, we'll skip the anomaly detection for gas production.

Finally, let's detect anomalies in the Water Injection Rate.

In [4]:
# Calculate IQR for Water Injection Rate
Q1_injection = df['Water Injection Rate'].quantile(0.25)
Q3_injection = df['Water Injection Rate'].quantile(0.75)
IQR_injection = Q3_injection - Q1_injection

# Define bounds for the anomalies
lower_bound_injection = Q1_injection - 1.5 * IQR_injection
upper_bound_injection = Q3_injection + 1.5 * IQR_injection

# Identify anomalies
anomalies_injection = df[(df['Water Injection Rate'] < lower_bound_injection) | (df['Water Injection Rate'] > upper_bound_injection)]

anomalies_injection[['Well Name', 'Date', 'Water Injection Rate']]


Unnamed: 0,Well Name,Date,Water Injection Rate
403,I-1,4/1/2000,3541.35
450,I-1,3/1/2004,2827.29
451,I-1,4/1/2004,2927.15
452,I-1,5/1/2004,3126.97
499,I-1,4/1/2008,2970.98


* For the Water Injection Rate, we've detected 5 anomalies. These anomalies are instances where the water injection rate falls outside the expected range based on the Interquartile Range (IQR) method.

* The table above shows the well name, date, and water injection rate for each detected anomaly.

### Summary:
* We've detected anomalies in oil production, water production, and water injection rates using the IQR method.
* These anomalies could be due to various reasons, including equipment failures, operational changes, or other external factors affecting the well's performance.