# **Capstone Project - LNG Trading**

## Objectives

The objective of this notebook is to analyse data from the International Energy Agency (IEA) and US Energy Information Administration (EIA) to inform trading strategies for a US-based investment firm with a mandate to invest in energy and commodities, and a stated focus on liquefied natural gas (LNG).

## Inputs

* Write down which data or information you need to run the notebook 
import numpy as np
import pandas as pd

## Outputs

* Write here which files, code or artefacts you generate by the end of the notebook 

## Additional Comments

* If you have any additional comments that don't fit in the previous bullets, please state them here. 



---

# Change working directory

* We are assuming you will store the notebooks in a subfolder, therefore when running the notebook in the editor, you will need to change the working directory

We need to change the working directory from its current folder to its parent folder
* We access the current directory with os.getcwd()

In [None]:
import os
current_dir = os.getcwd()
current_dir

We want to make the parent of the current directory the new current directory
* os.path.dirname() gets the parent directory
* os.chir() defines the new current directory

In [None]:
os.chdir(os.path.dirname(current_dir))
print("You set a new current directory")

Confirm the new current directory

In [None]:
current_dir = os.getcwd()
current_dir

# Section 1

To analyse the international gas flows dataset from the IEA, we must import Numpy and Pandas, and read the csv file. After this, it should be printed.

In [5]:
import numpy as np
import pandas as pd
df = pd.read_csv('/Users/saad/Documents/vscode-projects/capstone/Export_GTF_IEA_202412 - GTF_data.csv')
print (df)

         Borderpoint                   Exit        Entry  MAXFLOW (Mm3/h)  \
0       Adriatic LNG  Liquefied Natural Gas        Italy              1.1   
1            Almeria                Algeria        Spain              1.3   
2         Alveringem                Belgium       France              1.1   
3         Alveringem                 France      Belgium              1.1   
4            Badajoz               Portugal        Spain              0.3   
..               ...                    ...          ...              ...   
257    Zelzate (GTS)                Belgium  Netherlands              NaN   
258    Zelzate (GTS)            Netherlands      Belgium              NaN   
259  Zelzate (Zebra)                Belgium  Netherlands              NaN   
260         Zevenaar                Germany  Netherlands              NaN   
261         Zevenaar            Netherlands      Germany              1.9   

     Oct-08  Nov-08  Dec-08  Jan-09  Feb-09  Mar-09  ...  Mar-24  Apr-24  \

Now we need to check the attributes of the newly created DataFrame, such as the size, and if there are any missing or duplicate values.

In [15]:
df.describe()
df.info()
df.isnull().sum()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 262 entries, 0 to 261
Columns: 199 entries, Borderpoint to Dec-24
dtypes: float64(196), object(3)
memory usage: 407.5+ KB


Borderpoint         0
Exit                0
Entry               0
MAXFLOW (Mm3/h)    59
Oct-08             37
                   ..
Aug-24             41
Sep-24             42
Oct-24             38
Nov-24             36
Dec-24             47
Length: 199, dtype: int64

This dataset has 8 rows and 196 columns, the RangeIndex is 262 entries, the columns have 199 entries, with a borderpoint to Dec-24. Of the data types, there are 196 floats and 3 objects. There are no null values in Borderpoint, Exit, and Entry, the three categorical values, and there are some values of 0 in the numerical variables, which is to be expected. These can be investigated later with visualisation. This includes the 59 entries of N/A in MAXFLOW (Mm3/h).

For the purposes of this analysis, I will not analyse Borderpoint, focusing on Exit and Entry to denote countries of origin and destination. I will also not examine MAXFLOW for now as it is absolute volume that is relevant.

---

# Section 2

Now I will do some basic visualisations with this dataset in Pandas, starting with a pie chart for total exports in 2024. To do this, I create a DataFrame that adds together all of the monthly numbers for that year, then group by the Exit variable.

In [29]:
import matplotlib
df = pd.read_csv('/Users/saad/Documents/vscode-projects/capstone/Export_GTF_IEA_202412 - GTF_data.csv')

df['total_exports_2024'] = df[['Jan-24', 'Feb-24', 'Mar-24', 'Apr-24', 'May-24', 
                               'Jun-24', 'Jul-24', 'Aug-24', 'Sep-24', 'Oct-24', 
                               'Nov-24', 'Dec-24']].sum(axis=1)

exports_2024 = df.groupby('Exit')['total_exports_2024'].sum()
exports_2024.plot(kind='pie', title='Total Gas Exports by Country in 2024', autopct='%1.1f%%')


ModuleNotFoundError: No module named 'matplotlib'

---

NOTE

* You may add as many sections as you want, as long as it supports your project workflow.
* All notebook's cells should be run top-down (you can't create a dynamic wherein a given point you need to go back to a previous cell to execute some task, like go back to a previous cell and refresh a variable content)

---

# Push files to Repo

* In cases where you don't need to push files to Repo, you may replace this section with "Conclusions and Next Steps" and state your conclusions and next steps.

In [None]:
import os
try:
  # create your folder here
  # os.makedirs(name='')
except Exception as e:
  print(e)
