## Background Information on Your Task

You are a **quantitative researcher** working with a commodity trading desk. Alex, a VP on the desk, wants to start trading **natural gas storage contracts**. However, the available market data must be of higher quality to enable the instrument to be priced accurately.

They have sent you an email asking you to help **extrapolate the data** available from external feeds to provide more **granularity**, considering **seasonal trends** in the price as it relates to **months in the year**. 

To price the contract, we will need **historical data** and an **estimate of the future gas price** at any date.

---

### About Commodity Storage Contracts

Commodity storage contracts represent deals between **warehouse (storage) owners** and participants in the **supply chain** (refineries, transporters, distributors, etc.). 

The deal is typically an agreement to **store an agreed quantity** of any physical commodity (oil, natural gas, agriculture) in a warehouse for a specified amount of time.

#### Key Terms of Such Contracts:
- Periodic fees for storage
- Limits on withdrawals/injections of a commodity
- Injection date: when the commodity is purchased and stored
- Withdrawal date: when the commodity is withdrawn from storage and sold

A client could be anyone within the commodities supply chain:
- **Producers**
- **Refiners**
- **Transporters**
- **Distributors**
- **Firms (e.g., commodity trading firms, hedge funds)**

These clients aim to take advantage of **seasonal or intra-day price differentials** in physical commodities.

📌 **Example**: A firm may choose to **buy natural gas in summer** and **sell it in winter**, using underground storage to hold the inventory and profit from the seasonal price differences.

---

## Your Task

After asking around for the source of the existing data, you learn the current process is:
- A **monthly snapshot** of prices from a market data provider
- Represents the **market price of natural gas delivered at the end of each calendar month**
- Data is available for roughly the **next 18 months**
- Combined with historical prices in a **time series database**

You gain access and **download the data in a CSV file**.

### What You Need to Do:
- Download the monthly natural gas price data
- Each point corresponds to the purchase price at the end of a month, from **31st October 2020** to **30th September 2024**
- **Analyze the data** to:
  - Estimate the **purchase price of gas** at any **past** date
  - **Extrapolate** it for **one year into the future**

🧠 Your code should:
- Take a **date as input**
- Return a **price estimate**

---

### Additional Guidance

Try to:
- **Visualize the data** to find patterns
- Consider factors causing price variation (e.g., **seasonal trends**)
- Ignore market holidays, weekends, and bank holidays

📌 **Note**: This role often requires knowledge of:
- **Data analysis**
- **Machine learning**

## ✅ Step 1: Load & Inspect the Data

#### 🔹1.1. Import the necessary libraries

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

#### 🔹1.2. Load the data

In [4]:
# Load the CSV file
df = pd.read_csv("Nat_Gas.csv")

# Show the first few rows
df.head()

Unnamed: 0,Dates,Prices
0,10/31/20,10.1
1,11/30/20,10.3
2,12/31/20,11.0
3,1/31/21,10.9
4,2/28/21,10.9


#### 🔹1.3. Check data info and types

In [6]:
# Check the shape and column names
print("Shape:", df.shape)
print('-------------')
print("Columns:", df.columns)
print('-------------')
# Check data types and missing values
print(df.info())
print('-------------')
print(df.isnull().sum())


Shape: (48, 2)
-------------
Columns: Index(['Dates', 'Prices'], dtype='object')
-------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48 entries, 0 to 47
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Dates   48 non-null     object 
 1   Prices  48 non-null     float64
dtypes: float64(1), object(1)
memory usage: 896.0+ bytes
None
-------------
Dates     0
Prices    0
dtype: int64


#### 🔹1.4. Convert 'Date' column to datetime

In [8]:
# Convert Date to datetime
df['Dates'] = pd.to_datetime(df['Dates'])

# Sort by date just in case
df = df.sort_values('Dates').reset_index(drop=True)

# Quick summary stats
df.describe()


  df['Dates'] = pd.to_datetime(df['Dates'])


Unnamed: 0,Dates,Prices
count,48,48.0
mean,2022-10-15 08:00:00,11.207083
min,2020-10-31 00:00:00,9.84
25%,2021-10-23 06:00:00,10.65
50%,2022-10-15 12:00:00,11.3
75%,2023-10-07 18:00:00,11.625
max,2024-09-30 00:00:00,12.8
std,,0.757897


In [None]:
# check data types again
df.dtypes

Dates     datetime64[ns]
Prices           float64
dtype: object