# Natural Gas Usage Forecasting

### Problem Statement: 
Using the provided Natural Gas usage dataset (columns such as year, month, area-name, process-name, series, series-description, value in MMCF), develop a forecasting model that predicts future natural gas consumption. The intent is to use historical consumption patterns (by region and process) to estimate how much gas will be used for a given time (year/month), region (area-name), and process (process-name).

### Importing Required Libraries


In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import TimeSeriesSplit
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error
from math import sqrtk

## Dataset Overview

#### Dataset Details -1
- Dataset Name: US Natural Gas Consumption Data
- Source: https://www.kaggle.com/datasets/alistairking/natural-gas-usage
- File format: .csv
        

### Dataset Description

### Dataset Details -1

Dataset Name : US Natural Gas Consumption Data

Source : Kaggle.com (based on U.S. EIA data)

File Format : .csv

#### Dataset Description

This dataset contains monthly records of natural gas consumption across U.S. states and regions, categorized by sector such as Residential, Commercial, Industrial, and Electric Power. Each row represents the measured consumption value for a given month, region, and process type. The dataset is useful for analyzing long-term energy usage patterns, comparing regional demand, and building forecasting models for future consumption.

#### Feature Description

- year : Year of the record

- month : Month of the record

- duoarea : Short region/state identifier code

- area-name : Full name of the state/region

- product : Energy product code

- product-name : Product name (e.g., Natural Gas)

- process : Sector code for type of usage

- process-name : Consumption sector (Residential, Commercial, Industrial, Electric Power)

- series : Unique identifier for the data series

- series-description : Description of the data series (e.g., Natural Gas Deliveries to Residential Consumers)

- value : Monthly natural gas consumption (in MMCF — million cubic feet)

- units : Units of measurement (MMCF)


### Dataset Details -2

Dataset Name : Natural Gas Pricing & Weather Data (Supplementary)

Source : Kaggle.com / U.S. EIA / NOAA (public energy & weather databases)

File Format : .csv

#### Dataset Description

This supplementary dataset combines average natural gas prices (per state/region) with weather indicators such as heating degree days (HDD) and cooling degree days (CDD). These external variables influence demand and are highly relevant for forecasting. For example, colder winters increase Residential & Commercial consumption, while higher prices can reduce usage.

#### Feature Description

- date : Month and year of observation

- state/region : Geographic area of record

- avg_price : Average natural gas price for the region (USD per MMCF)

- heating_degree_days (HDD) : Indicator of heating demand based on temperature below 65°F

- cooling_degree_days (CDD) : Indicator of cooling demand based on temperature above 65°F

- population_estimate : Optional demographic factor influencing demand




Dataset 1 = Core production/consumption data

Dataset 2 = Supplementary drivers (weather, price, demographics)

### Loading the dataset

In [None]:
df = pd.read_csv("data.csv")
df['date'] = pd.to_datetime(df['year'].astype(str) + '-' + df['month'].astype(str) + '-01')
df = df.sort_values(['area-name','process-name','date'])


#### Explore and Understand the Data


In [None]:
print("----- INFO -----")
df.info()

print("\n----- MISSING VALUES -----")
print(df.isnull().sum())

print("\n----- DESCRIBE -----")
print(df.describe(include='all'))