# Analysing Natural Gas Production Data by Municipality in Alberta, Canada

## Loading the csv file into the dataframe


In [46]:
import pandas as pd

df = pd.read_csv("gas_prod.csv")

## Trying to get a peep into the dataframe

In [47]:
# Getting the first 5 records in the data frame using the head method
print(df.head())

# Getting the last 5 records in the data frame using the tail method
print(df.tail())

    CSDUID         CSD  Period IndicatorSummaryDescription UnitOfMeasure  \
0  4805026  Drumheller    2003      Natural Gas Production            m3   
1  4805026  Drumheller    2004      Natural Gas Production            m3   
2  4805026  Drumheller    2005      Natural Gas Production            m3   
3  4805026  Drumheller    2006      Natural Gas Production            m3   
4  4805026  Drumheller    2007      Natural Gas Production            m3   

   OriginalValue  
0       104493.2  
1       105486.4  
2       130930.0  
3       128564.0  
4       124354.0  
       CSDUID                CSD  Period IndicatorSummaryDescription  \
1642  4814003  Yellowhead County    2019      Natural Gas Production   
1643  4814003  Yellowhead County    2020      Natural Gas Production   
1644  4814003  Yellowhead County    2021      Natural Gas Production   
1645  4814003  Yellowhead County    2022      Natural Gas Production   
1646  4814003  Yellowhead County    2023      Natural Gas Production 

## Understanding the dataset

In [48]:
# Getting the shape of the dataset
df_shape = df.shape
print(f"The dataset has {df_shape[0]} rows and {df_shape[1]} columns")

print()

# Getting the information about each colum in the dataset
print(df.info())

print()

# Getting the statistical summary for the numerical columns in the dataset
print(df.describe())

The dataset has 1647 rows and 6 columns

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1647 entries, 0 to 1646
Data columns (total 6 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   CSDUID                       1647 non-null   int64  
 1   CSD                          1647 non-null   object 
 2   Period                       1647 non-null   int64  
 3   IndicatorSummaryDescription  1647 non-null   object 
 4   UnitOfMeasure                1647 non-null   object 
 5   OriginalValue                1647 non-null   float64
dtypes: float64(1), int64(2), object(3)
memory usage: 77.3+ KB
None

             CSDUID       Period  OriginalValue
count  1.647000e+03  1647.000000   1.647000e+03
mean   4.810299e+06  2012.839709   1.808997e+06
std    5.317912e+03     6.094421   4.217402e+06
min    4.801003e+06  2003.000000   0.000000e+00
25%    4.806001e+06  2007.000000   1.579735e+05
50%    4.811001e+06  2013.000000

## Analysing the dataset

In [49]:
# Grouping the dataset by the CSD column and geting the sum of all the gas production ever
df_grouped_csd = df.groupby("CSD")["OriginalValue"].sum()
print(df_grouped_csd)

# Getting the province with the maximum amount of production ever
print(df_grouped_csd.sort_values(ascending=False))

print("")

# Getting the total production of Drumheller
print(f"The total production of Drumheller is {df_grouped_csd["Drumheller"]}")

print("")


# Filtering out the da using boolean indexing
edmonton_df = df[df["CSD"] == "Edmonton"]
print(edmonton_df)

print("")

# Getting the maximum production ever in edmonton
edmonton_df.max()["OriginalValue"]


CSD
Acadia No. 34                884337.1
Athabasca County            5720533.9
Barrhead County No. 11      5690824.4
Beaver County              11734793.8
Big Lakes County           10507574.2
                             ...     
Wheatland County           78249042.6
Willow Creek No. 26        12169996.7
Wood Buffalo               26404812.0
Woodlands County           31260534.9
Yellowhead County         455328557.5
Name: OriginalValue, Length: 83, dtype: float64
CSD
Greenview No. 16                                   543320505.1
Yellowhead County                                  455328557.5
Clearwater County                                  276229168.8
Saddle Hills County                                146473522.5
Grande Prairie County No. 1                        125766928.2
                                                      ...     
Fort Saskatchewan                                          0.0
Improvement District No. 12 Jasper Park                    0.0
Improvement District N

16594.0

## Summary

In this analysis, we explored natural gas production data by municipality in Alberta, Canada. The workflow included loading the dataset, inspecting its structure, and understanding its key statistics. We grouped the data by municipality (CSD) to identify regions with the highest total production and examined specific cases such as Drumheller and Edmonton. This approach provided insights into the distribution and scale of natural gas production across different municipalities, highlighting areas of significant output and enabling targeted analysis for further investigation.

## Natural Gas Production by Municipality (Alberta Open Data)

This project utilizes data from the [Natural Gas Production by Municipality dataset](https://open.alberta.ca/opendata/natural-gas-production-by-municipality#detailed) provided by the Government of Alberta through its Open Government program.

---

## License: Open Government Licence – Alberta (Version 2.1)

You are encouraged to use the data available under this licence, which grants you a **worldwide, royalty-free, perpetual, non-exclusive licence** to use the Information — including for commercial purposes — with only a few conditions.

### You are free to:
- Copy, modify, publish, translate, adapt, distribute, or otherwise use the information in any medium or format for any lawful purpose.

### You must:
- Acknowledge the source of the information by including the attribution statement specified by the Information Provider.
- If no specific statement is provided, use the following attribution:  
  **"Contains information licensed under the Open Government Licence – Alberta."**
- Where possible, provide a link to the [licence page](https://open.alberta.ca/licence).

### This licence **does not** grant rights to:
- Personal information;
- Records not accessible under applicable laws;
- Third-party rights not licensed by the Information Provider;
- Official symbols (e.g., names, crests, logos);
- Information under other IP rights (e.g., patents, trademarks).

### Other conditions:
- No endorsement: Do not imply any official status or endorsement by the Information Provider.
- No warranty: The data is provided "as is" without warranty of any kind.
- Governing law: This licence is governed by the laws of Alberta and Canada. Legal proceedings may only be brought in Alberta.

For complete legal terms, visit the official licence page:  
👉 [Open Government Licence – Alberta](https://open.alberta.ca/licence)

---
**Attribution Statement:**  
*Contains information licensed under the Open Government Licence – Alberta.*
