# Southern Oscilation Index
This notebook serves to prepare data obtained from the Bureau of Meteorology regarding the monthly values of the Southern Oscilation Index.

In [1]:
import pandas as pd

# read the csv file into a dataframe
df = pd.read_csv("/kaggle/input/australian-bom-monthly-soi-values-jun-2023/monthly_soi.csv", header=None, names=["date", "soi"])

df

Unnamed: 0,date,soi
0,187601,11.3
1,187602,11.0
2,187603,0.2
3,187604,9.4
4,187605,6.8
...,...,...
1764,202301,11.8
1765,202302,10.5
1766,202303,-2.0
1767,202304,0.3


As the AI model we will train on this data is supposed to be used for forecasting, we cannot rely on precise measurements of the SOI to be avaliable. As a result, we classify the SOI into 3 discrete values for the AI model to use.

- -1 stands for an SOI below -6. While the classification for El Nino and La Nina events is more complex, we decided that this value was suitable as an approximation for determining El Nino events.
- 0 stands for an SOI between -6 and 6, which can be interpreted as an approximation for neutral events.
- 1 stands for an SOI above 6, which can be interpreted as an approximation for La Nina events.

In [2]:
def classify_soi(x):
    if x < -6:
        return -1
    elif x > 6:
        return 1
    else:
        return 0

In [3]:
df["soi_obsc"] = df["soi"].apply(classify_soi)

df

Unnamed: 0,date,soi,soi_obsc
0,187601,11.3,1
1,187602,11.0,1
2,187603,0.2,0
3,187604,9.4,1
4,187605,6.8,1
...,...,...,...
1764,202301,11.8,1
1765,202302,10.5,1
1766,202303,-2.0,0
1767,202304,0.3,0


In [4]:
df.to_csv("/kaggle/working/monthly_soi.csv")