<a href="https://colab.research.google.com/github/STASYA00/IAAC2024_tutorials/blob/main/quickstarts/03_optimization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> - Stasja's notebook

In [None]:
!pip install ipykernel plotly nbformat>4.2.0

In [None]:
# !pip install --upgrade nbformat

* [Ortools package](https://developers.google.com/optimization)
* [What is optimization](https://en.wikipedia.org/wiki/Mathematical_optimization)
* [Video on what optimization is](https://youtu.be/AM6BY4btj-M?t=170&si=FUAW-bzml27y61zq) - only 2.50 - 6.00

## 💨 Wind data

### Ecad data

One of the possible sources is [Ecad data](https://www.ecad.eu/dailydata/customquery.php). 

Steps:
* Go to custom query in ECAD
* Select __Wind direction__ in the _third_ dropdown
* Choose from the countries with available wind direction data
* Choose from the available stations in the selected country
* Download the file and unzip it

You can get climate data elsewhere, for example, from [YR](https://developer.yr.no/doc/GettingStarted/)

In [1]:
import pandas as pd

FILEPATH = "../.assets/wind_jan_mayen.csv"
df = pd.read_csv(FILEPATH, skipinitialspace = True)
df

Unnamed: 0,STAID,SOUID,DATE,DD,Q_DD
0,189,114683,20090901,348,0
1,189,114683,20090902,347,0
2,189,114683,20090903,310,0
3,189,114683,20090904,331,0
4,189,114683,20090905,330,0
...,...,...,...,...,...
5321,189,114683,20240327,-9999,9
5322,189,114683,20240328,-9999,9
5323,189,114683,20240329,-9999,9
5324,189,114683,20240330,-9999,9


In [2]:
FILEPATH = "../.assets/wind_speed_jan_mayen.txt"
df_speed = pd.read_csv(FILEPATH, skipinitialspace = True, skiprows=18)
df_speed.head()

Unnamed: 0,SOUID,DATE,FG,Q_FG
0,114682,19790101,15,1
1,114682,19790102,110,1
2,114682,19790103,163,1
3,114682,19790104,164,1
4,114682,19790105,70,1


Let's merge our dataframes by date and remove the missing values. In the datasets' descriptions you will see that missing values correspond to value 9 in "Q_ " column. Let's filter out these values.

In [3]:
full_df = df.merge(df_speed, how="left", on=["DATE"])  # merging two dataframes by date

full_df = full_df.loc[full_df["Q_DD"]!=9].loc[full_df["Q_FG"]!=9]  # remove all missing observations
full_df.head()

Unnamed: 0,STAID,SOUID_x,DATE,DD,Q_DD,SOUID_y,FG,Q_FG
0,189,114683,20090901,348,0,114682.0,88.0,0.0
1,189,114683,20090902,347,0,114682.0,112.0,0.0
2,189,114683,20090903,310,0,114682.0,70.0,0.0
3,189,114683,20090904,331,0,114682.0,59.0,0.0
4,189,114683,20090905,330,0,114682.0,33.0,0.0


In [4]:
full_df = full_df[["DATE", "DD", "FG"]]  # leave only relevant columns

full_df["DATE"] = pd.to_datetime(full_df["DATE"], format="%Y%m%d")  # convert DATE column to datetime format

full_df.head()

Unnamed: 0,DATE,DD,FG
0,2009-09-01,348,88.0
1,2009-09-02,347,112.0
2,2009-09-03,310,70.0
3,2009-09-04,331,59.0
4,2009-09-05,330,33.0


In [21]:
full_df["season"] = full_df["DATE"].apply(lambda s: s.month // 3)
full_df["month"] = full_df["DATE"].apply(lambda s: s.month)

### Visualizing wind data

In [36]:
import plotly.express as px

fig = px.bar_polar(full_df, r="FG", theta="DD",
                  #  color="FG",
                     template="plotly_dark",
                   color_discrete_sequence= px.colors.sequential.Plasma_r)
fig.show()