## Chapter 6: Plotting DataFrames

DataFrames in Pandas come with built-in plotting capabilities that make it easy to visualize data directly. Under the hood, a DataFrame acts as a lightweight wrapper around Matplotlib’s `pyplot` interface, allowing you to quickly create a variety of plots — like line graphs, bar charts, and area plots — using simple method calls such as `plot()`. This makes it convenient to explore and present data without needing to manually set up a full Matplotlib workflow.

Let’s look at an example using the Factpages data on monthly field production from the NCS. We’ll start by reading the data with our `factpages` package:

In [None]:
# import libraries required for the notebook
import pandas as pd # import pandas as pd
import hvplot.pandas # import hvplot to plot DataFrames
import factpages as fp # import our Factpages package

In [None]:
# read monthly production data of ConocoPhillips fields
fields = ["EKOFISK", "ELDFISK", "TOMMELITEN GAMMA", 
          "TOR", "VEST EKOFISK", "ALBUSKJELL", "VALHALL", 
          "HOD", "TOMMELITEN A"]

field = fp.Field() # create field object

df = field.monthly_production(fields) # read data

df.head() # show first 5 rows

Before we can plot the data, we need to do some preparation. The code below:

1. Renames the columns to more meaningful names, including year and month.

2. Creates a `Date` column by combining the `year` and `month` columns.

3. Drops unnecessary columns.

4. Sets the `Date` column as the index— this is important for plotting the production data over time.

5. Checks the result.

In [None]:
# Make a full copy first to avoid SettingWithCopyWarning
df = df.copy()

# get the columns names
columns = df.columns.to_list()

# Rename columns
df.rename(columns={columns[1]:"year", columns[2]:"month",
                   columns[3]:"Oil MSm3", columns[4]:"Gas BSm3",
                   columns[5]:"NGL MSm3", 
                   columns[6]:"Condensate MSm3",
                   columns[7]:"Oil eq. MSm3", 
                   columns[8]:"Water MSm3"}, 
                   inplace=True)

# Make a Date column from the year and month columns
df["Date"] = pd.to_datetime(df[["year", "month"]].assign(day=1))

# Drop the unnecessary columns 
df.drop(columns=["year", "month", columns[9]], inplace=True)

# Set the Date column as index
df.set_index("Date", inplace=True)

# Check the result
df.head()

We now have the data in the form we need. Let’s plot the production from the Ekofisk field — impressively, just a few lines of code will do the job:

In [None]:
# extract data for a specific field
field = "EKOFISK" 
df_field = df[df["prfInformationCarrier"] == field] 

# plot the production data
axs = df_field.plot(kind="area",subplots=True, figsize=(12,9));

# y axes limits
max_value = df_field["Oil eq. MSm3"].max() * 1
for ax in axs:
    ax.set_ylim(0, max_value)

While the basic `plot()` method gives us quick and useful plots, we can create even better and interactive visualizations using [hvplot](https://hvplot.holoviz.org), a powerful library that extends DataFrame plotting with rich, interactive features. Let’s plot the Ekofisk production data using hvplot:

In [None]:
# line plot with hvplot

# names of columns to plot
columns = df_field.columns.to_list()
columns.pop(0) # remove the first element

df_field.hvplot.line(x="Date", y=columns, 
                 title=f"Production data for {field}", 
                 width=800, height=400).opts(ylim=(0, max_value))

Notice how the data values are displayed when you hover the cursor over the curves. You can also toggle the curves on and off by clicking the legend entries.

Creating an area plot is just as straightforward:

In [None]:
# area plot
is_stacked = True # stacked area or not

if is_stacked:
    y_max = max_value * 2
    alpha = 1.0
else:
    y_max = max_value
    alpha = 0.4 

df_field.hvplot.area(x="Date", y=columns, 
                 stacked=is_stacked, alpha=alpha,
                 title=f"Production data for {field}", 
                 width=800, height=400).opts(ylim=(0, y_max))

This code generates an on-stacked area plot. Setting `is_stacked` to `True` produces a stacked area plot instead.

## Exercise 1

Modify the `plot_logs()` function in the [utilities](plot_utilities/utilities.py) module to:

- Allow the function to accept a list of two log names (e.g. ["PEF", "CALI"]) to be plotted in the first two subplots. This makes possible to visualize any logs of interest in these subplots.

- Accept a list of x-axis limits for each log.

## Exercise 2

Modify the `update_plot()` function in notebook [chapter6_3.ipynb](chapter6_3.ipynb) to include four subplots. The fourth subplot should display the trace at the selected inline and xline.

## Exercise 3

The file [xeek_train_subset.csv](../data/xeek_train_subset.csv) contains the east coordinates (`X_LOC`) and north coordinates (`Y_LOC`) of the 12 wells included in the file. Plot the wells on a map, showing both their locations and names.

In [None]:
# Do Exercise 3 here

## Exercise 4

Seismic sections often exaggerate the vertical scale relative to the horizontal scale. This exaggeration is quantified by the vertical exaggeration factor, $V$. This factor affects bedding dip ($\delta$) and thickness ($t$) as follows:

$$\tan\delta'=V\tan\delta$$

$$\frac{t'}{t}=\frac{\sin\delta'}{\sin\delta}$$

where $\delta'$ and $t'$ are the exaggerated bedding dip and thickness, respectively.

1. Plot unexaggerated dip ($\delta$, in degrees) versus exaggerated dip ($\delta'$, in degrees) for $V$ = 0.5, 1, 2, 3, 4, 6 and 10.

2. Plot unexaggerated dip ($\delta$, in degrees) versus normalised exaggerated thickness ($t'/t$) for $V$ = 0.5, 1, 2, 3, 4, 6 and 10.

In [None]:
# Do Exercise 4 here