# Predicting wind shifts with observations

Here, we will combine our detection of lake-breeze fronts with obtaining observations and training a Random Forest model.

The goal is:
* Download observations for the last year for two airports and other regional sites
* Compute a column of "lake-breeze frontal passage" by detecting a sudden increase in easterly component of wind - this is what we want to predict
* We can use a Random Forest to measure how much the regional observations explain the variance of whether a lake-breeze front passed
* In turn, this model could then be used to predict wind shifts by exchanging observations for numerical-guidance values (this may be outside the scope of our work)

It is paramount that students work together to identify strengths and weaknesses in getting the paper written. Some of this is making sure there is a 3:1 effort ratio related to credit hours.

### Step 1: Get the observational data (see `rf_obs_tutorial`)


In [1]:
from synoptic.services import stations_timeseries
import datetime
import pandas as pd
import numpy as np

stid = "CHII2"
df = stations_timeseries(stid=stid, vars=['air_temp', 'wind_speed', 'dew_point_temperature'],
                         start=datetime.datetime(2021,10,20),
                         end=datetime.datetime(2022,10,20))








 🚚💨 Speedy Delivery from Synoptic API [timeseries]: https://api.synopticdata.com/v2/stations/timeseries?stid=CHII2&vars=air_temp,wind_speed,dew_point_temperature&start=202110200000&end=202210200000&token=🙈HIDDEN



In [2]:
df

Unnamed: 0_level_0,air_temp,dew_point_temperature,wind_speed
date_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2021-10-20 00:00:00+00:00,19.511,7.96,7.696
2021-10-20 00:10:00+00:00,19.411,6.76,8.195
2021-10-20 00:20:00+00:00,19.411,6.86,7.197
2021-10-20 00:30:00+00:00,19.211,7.36,7.696
2021-10-20 00:40:00+00:00,19.211,7.56,7.197
...,...,...,...
2022-10-19 23:20:00+00:00,8.211,-5.02,7.696
2022-10-19 23:30:00+00:00,7.911,-6.92,9.795
2022-10-19 23:40:00+00:00,7.811,-6.62,7.197
2022-10-19 23:50:00+00:00,7.811,-6.12,7.696


### Step 2: Organise the data with pandas


In [3]:
df.dropna(inplace=True)

# choosing only hourly obs - close to the hour 
df = df.resample("60T",convention="end").asfreq()
print(df)

                           air_temp  dew_point_temperature  wind_speed
date_time                                                             
2021-10-20 00:00:00+00:00    19.511                   7.96       7.696
2021-10-20 01:00:00+00:00    18.411                   8.56       6.194
2021-10-20 02:00:00+00:00    18.211                   8.26       7.696
2021-10-20 03:00:00+00:00    17.711                   7.86       7.197
2021-10-20 04:00:00+00:00    17.211                   8.46       8.195
...                             ...                    ...         ...
2022-10-19 20:00:00+00:00     7.811                  -4.71       5.695
2022-10-19 21:00:00+00:00     7.811                  -6.92       8.792
2022-10-19 22:00:00+00:00     8.311                  -6.42       8.195
2022-10-19 23:00:00+00:00     8.111                  -6.52       9.795
2022-10-20 00:00:00+00:00     7.711                  -6.22       8.792

[8761 rows x 3 columns]


### Step 3: Compute a column that gives "1" for a lake-breeze passage that day, or a "0" otherwise
This lets us train the observations on whether that (local-time) day had a lake breeze. To see examples of detection functions, see `detect_lbf`.

In [4]:
# Let's run each day through our detection
dates = np.unique(df.index.date)
print(f"There are {len(dates)} dates in the dataset.")

# Just to have a look at the first 06Z--06Z period
for date in dates[0:1]:
    start_dt = pd.Timestamp(ts_input=datetime.datetime(date.year,date.month,date.day,6,0,0),tz="UTC")
    # This date 6Z and the next 6Z
    end_dt = start_dt + pd.Timedelta(days=1)
    # Subset
    sub_df = df[(start_dt <= df.index) & (df.index <= end_dt)]

sub_df

There are 366 dates in the dataset.


Unnamed: 0_level_0,air_temp,dew_point_temperature,wind_speed
date_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2021-10-20 06:00:00+00:00,16.411,7.25,11.791
2021-10-20 07:00:00+00:00,16.011,7.85,11.292
2021-10-20 08:00:00+00:00,15.211,8.46,9.795
2021-10-20 09:00:00+00:00,15.011,8.56,9.795
2021-10-20 10:00:00+00:00,14.911,8.76,9.795
2021-10-20 11:00:00+00:00,14.911,9.07,8.195
2021-10-20 12:00:00+00:00,14.511,8.86,6.693
2021-10-20 13:00:00+00:00,14.511,9.27,7.197
2021-10-20 14:00:00+00:00,15.111,9.06,7.197
2021-10-20 15:00:00+00:00,16.211,9.06,6.693


In [5]:
timeseries = sub_df.wind_speed

# Maybe we want 25 data points so we can compute diffs for 07Z
# Need to determine which points are included in which day (tomorrow? yesterday?)

def was_there_a_lbf(timeseries,threshold=-3):
    # time series of wind speed
    diffs = timeseries.diff(periods=1).dropna()
    print(diffs)
    largest_neg_num = diffs.values.min()
    if largest_neg_num < threshold:
        # Returning a float makes it easier for the machine learning (other variables are floats)
        return 1.0
    else:
        return 0.0
    
yesno = was_there_a_lbf(timeseries,threshold=-3)
print(yesno)

date_time
2021-10-20 07:00:00+00:00   -0.499
2021-10-20 08:00:00+00:00   -1.497
2021-10-20 09:00:00+00:00    0.000
2021-10-20 10:00:00+00:00    0.000
2021-10-20 11:00:00+00:00   -1.600
2021-10-20 12:00:00+00:00   -1.502
2021-10-20 13:00:00+00:00    0.504
2021-10-20 14:00:00+00:00    0.000
2021-10-20 15:00:00+00:00   -0.504
2021-10-20 16:00:00+00:00   -2.094
2021-10-20 17:00:00+00:00    0.499
2021-10-20 18:00:00+00:00    1.096
2021-10-20 19:00:00+00:00    4.100
2021-10-20 20:00:00+00:00   -3.097
2021-10-20 21:00:00+00:00    0.000
2021-10-20 22:00:00+00:00   -0.504
2021-10-20 23:00:00+00:00   -0.499
2021-10-21 00:00:00+00:00    1.003
2021-10-21 01:00:00+00:00    2.598
2021-10-21 02:00:00+00:00    0.499
2021-10-21 03:00:00+00:00    0.000
2021-10-21 04:00:00+00:00    0.499
2021-10-21 05:00:00+00:00   -2.598
2021-10-21 06:00:00+00:00   -0.499
Freq: 60T, Name: wind_speed, dtype: float64
1.0


In [6]:
# For making lists of the same value
noahlist = ["noah",]*10

yesno_thisday = [yesno,]*24
print(yesno_thisday)

# We want to add this 24-row column to "df"
# join
# concatenate
# merge 

just_yesno = pd.DataFrame({"lbf_yesno":yesno_thisday},index=sub_df.index[1:])
print(just_yesno)

[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
                           lbf_yesno
date_time                           
2021-10-20 07:00:00+00:00        1.0
2021-10-20 08:00:00+00:00        1.0
2021-10-20 09:00:00+00:00        1.0
2021-10-20 10:00:00+00:00        1.0
2021-10-20 11:00:00+00:00        1.0
2021-10-20 12:00:00+00:00        1.0
2021-10-20 13:00:00+00:00        1.0
2021-10-20 14:00:00+00:00        1.0
2021-10-20 15:00:00+00:00        1.0
2021-10-20 16:00:00+00:00        1.0
2021-10-20 17:00:00+00:00        1.0
2021-10-20 18:00:00+00:00        1.0
2021-10-20 19:00:00+00:00        1.0
2021-10-20 20:00:00+00:00        1.0
2021-10-20 21:00:00+00:00        1.0
2021-10-20 22:00:00+00:00        1.0
2021-10-20 23:00:00+00:00        1.0
2021-10-21 00:00:00+00:00        1.0
2021-10-21 01:00:00+00:00        1.0
2021-10-21 02:00:00+00:00        1.0
2021-10-21 03:00:00+00:00        1.0
2021-10-21 04:00:00+00:00   

In [7]:
new_df = pd.concat([sub_df,just_yesno],axis=1).dropna()
print(new_df)

                           air_temp  dew_point_temperature  wind_speed  \
date_time                                                                
2021-10-20 07:00:00+00:00    16.011                   7.85      11.292   
2021-10-20 08:00:00+00:00    15.211                   8.46       9.795   
2021-10-20 09:00:00+00:00    15.011                   8.56       9.795   
2021-10-20 10:00:00+00:00    14.911                   8.76       9.795   
2021-10-20 11:00:00+00:00    14.911                   9.07       8.195   
2021-10-20 12:00:00+00:00    14.511                   8.86       6.693   
2021-10-20 13:00:00+00:00    14.511                   9.27       7.197   
2021-10-20 14:00:00+00:00    15.111                   9.06       7.197   
2021-10-20 15:00:00+00:00    16.211                   9.06       6.693   
2021-10-20 16:00:00+00:00    17.611                   9.97       4.599   
2021-10-20 17:00:00+00:00    18.311                  11.98       5.098   
2021-10-20 18:00:00+00:00    18.911   

### Step 4: Create a Random Forest model (see `rf_obs_tutorial`)

### Step 5: Play with the data; see what is important for predicting LBF passages.
Find a conclusion, which might be something like "We can predict 75% of the variance of whether a LBF passages or not using a given set of archived observations". This would be an experimental support for the paper, which would go on to explain how to implement such a system to forecast lake-breezes.

You might want to plot some images for a specific case to show the physical depiction of one day's random-forest output. Don't forget there is code in `lbf_casestudy_assignment` to help you use HRRR data as a proxy (i.e., next-best thing after having weather stations every 50 metres!).