# Pandas for time series

Pandas is very useful for handling time series. 

## Read online dataset

Let's read some oil and gas production data. I'm using: https://factpages.sodir.no/en/field/TableView/Production/Saleable/Monthly

In [None]:
url = "https://factpages.sodir.no/public?/Factpages/external/tableview/field_production_monthly&rs:Command=Render&rc:Toolbar=false&rc:Parameters=f&IpAddress=not_used&CultureCode=en&rs:Format=CSV&Top100=false"

In [None]:
import pandas as pd

df = pd.read_csv(url)

df.head()

<div style="background: #e0ffe0; border: solid 2px #d0f0d0; border-radius:3px; padding: 1em; color: darkgreen">

<h3>Exercise</h3>

- How many rows are there in this dataframe?
- How many fields are represented? (Look at the column called `'prfInformationCarrier'`)
- How many years of data are there?
- What is the total production? (Look at the column called `'prfPrdOeNetMillSm3'`)
</div>

## Rename some columns

<div style="background: #e0ffe0; border: solid 2px #d0f0d0; border-radius:3px; padding: 1em; color: darkgreen">

<h3>Exercise</h3>

Rename some of the columns of the dataframe as follows:

    'prfYear': 'year'
    'prfMonth': 'month'
    'prfInformationCarrier': 'field'
    'prfPrdOilNetMillSm3': 'oil'
    'prfPrdOeNetMillSm3': 'OE'
    'prfPrdProducedWaterInFieldMillSm3': 'water'
</div>

## Add a datetime

We'd like to give this dataframe a **datetime** index with `pandas` datetimes. To do this easily, we need:

- EITHER columns named like `'year'`, `'month'`, `'day'`
- OR a column with a datetime string like `2019-06-30`.

In this dataframe, we have the former, so let's work with that.

<div style="background: #e0ffe0; border: solid 2px #d0f0d0; border-radius:3px; padding: 1em; color: darkgreen">

<h3>Exercise</h3>

- Make a column for the **day**, using a constant like 1.
- Make a datetime column called `'ds'` (for 'date stamp') using `pd.to_datetime()`, passing in a dataframe consisting of the three columns for year, month and the day you just made.
- Finally, to turn the new column into an index, give its name (`'ds'`) to `df.set_index()`.
</div>

You should end up with a new dataframe with the `'ds'` column as an index.

In [None]:
df.head()

## Simplify the dataframe

Before we carry on, let's simplify the dataframe a bit, reducing it to a few columns: **field**, **water**, **other**, and **oil** (the order is a slightly cheaty way to get the colours I want on the charts, without having to fiddle with them).

In [None]:
df['other'] = df.OE - df.oil
df = df.drop('OE', axis=1)
df = df[['field', 'water', 'other', 'oil']]

In [None]:
df.head()

This is very useful, and now perhaps it does make more sense to try slicing directly with the row index. Even the "includes end" slicing rule seems to help a bit:

In [None]:
df[:'1971']

## Time series with `pandas`

`pandas` knows all about time series. So we can easily make a time series plot:

In [None]:
df.loc[df['field']=='TROLL', 'oil'].plot()

Let's make a dataframe of only the TROLL field.

In [None]:
troll = df[df.field=='TROLL']

We can easily stretch it out, or add other lines:

In [None]:
troll.plot(figsize=(15, 3))

<div style="background: #e0ffe0; border: solid 2px #d0f0d0; border-radius:3px; padding: 1em; color: darkgreen">

<h3>EXERCISE</h3>

- Plot the production for Troll in the period 2005 to 2010 inclusive.
- Narrow the time range to June 2005 to June 2008.
</div>

## Resampling

Let's get the summed annual production for the Troll field:

In [None]:
troll.drop(columns=['field']).loc['2010':'2018'].resample('YE').sum()

# Try omitting the `drop` if you're curious why it's there.

Throw `.plot()` on the end:

In [None]:
troll.loc['1995':'2018'].resample('YE').sum().plot()

Or we can get totals for *ALL* fields in the database:

In [None]:
df.loc['2000':'2018'].resample('YE').sum().plot()

<div style="background: #e0ffe0; border: solid 2px #d0f0d0; border-radius:3px; padding: 1em; color: darkgreen">

<h3>EXERCISE</h3>

Let's look at the contribution TROLL made to NCS production since 1993.

- Use `df.loc` with `...plot()` to plot the gross annual production from 1990 to 2024.
- Add another line plotting the gross production _but without the Troll field_ over the same period.
</div>

----

&copy; 2025 Matt Hall / Equinor, licensed CC-BY