# Directional Offset Detection for Wind Vane Anemometers 

<p>
    <img src='../images/vane_device.jpg' width=300>
    <p>
        <center>
            <strong>Figure:</strong> Example of a wind vane sensor
        </center>
    </p>
</p>

<p>
    The figure above shows a wind vane sensor. It is used to measure the wind direction. The sensor is mounted to the met mast with a predefined direction facing true north. It is shaped so that the wind flow will rotate the fin until it going in the direction of the wind speed. The position of the fin translates to the wind direction reading.
</p>

<p>
    When scouting a location to be a potential wind park, accurate information about the wind speed is vital. Hence, it is critical to eliminate directional offsets. A directional offset is essentially a constant offset for a direction for a period of time. The major cause for an offset is it not being mounted correctly and as a result not facing true north. For example, when setting up a new wind vane sensor the sensor is sometimes set up with a 180° offset - facing in the opposite direction of true north. 
</p> 

<p>
    The figure below shows the wind vanes for three met masts within one region on the left. On the right are the wind direction measurements of a nearby weather station as a reference. Both plots represent the distribution of the wind direction over 5 years for each sensor. The weather station data is pulled via an API call to WorldWeatherOnline. Despite the lower resolution of the weather station data (1h) to the met masts' measurements (10min), the weather station data is still a good enough approximation that may be used as a last checkpoint to detect directional offsets. In this example it is possible to see that both sensors of met mast M3 have an offset of about 90°.
</p>

<p>
    <img width="900" height="500" src="../images/diroffset_overview.png">
    <a href="../plots/diroffset_overview.html">Download Plotly File</a>
</p>

In [1]:
from wwo_hist import retrieve_hist_data
import numpy as np
import pandas as pd
import ruptures as rpt


def get_weather_data(date_start, date_end, location):
    """Retrieves the historic wind direction data for a location.

    Args:
        date_start (str): Start date of the data set.
        date_end (str): End date of the data set.
        location (str): Latitude/Longitude (decimal degree) or city name

    Returns:
        weather (DataFrame): Historic wind direction data for the location.
    """
    
    # Hourly data (highest resolution)
    frequency = 1
    # API key for WorldWeatherOnline
    api_key = '########'
    location_list = [location]
    hist_weather_data = retrieve_hist_data(api_key,
                                location_list,
                                date_start,
                                date_end,
                                frequency,
                                location_label = False,
                                export_csv = False,
                                store_df = True)[0]
    
    # Only keeps pulled temperature data 
    weather = hist_weather_data[['winddirDegree']]
    weather.rename(columns={'winddirDegree': 'Weather_DIR'}, inplace=True)
    
    return weather

<p>
    Offsets can be quite hard to detect. Especially because it is not always obvious which direction is the correct direction. And the further a reference sensor is physically from the sensor, the less the reference can be trusted as a comparison. Therefore, the offset detection is done as the below process figure depicts:
</p>

<p>
    <img src='../images/dir_offset_flow.png' width=600>
    <p>
        <center>
            <strong>Figure:</strong> Directional Offset Detection Flow
        </center>
    </p>
</p>

<p>
    At first each sensor is checked if it is aligned within itself. Then the whole met mast will be checked for alignment. If a nearby Lidar is available, the data may be used in a next step to calibrate the wind vane sensors. As a next step all masts that are near each other will be checked for offsets. In a final check the weather data is used as a reference. The further to the right of the process flow, the more tolerance is allowed until an offset is flagged as an offset. The last check via the weather station data as an example is used to check a significant offset of more than 45°.
</p>

<p>
    Below figure shows the delta of each wind vane sensor within each mast as a time series (top: Mast M1, middle: Mast M2, bottom: Mast M3). A deviation from zero is an indicator for an offset on a sensor-level and within a mast. The figure below shows that some kind of offset is present in Mast M1. The first offset starts June 2017, the second offset occurs in July 2018.
<p>

<p>
    <img width="900" height="500" src="../images/diroffset_sensor_delta.png">
    <a href="../plots/diroffset_sensor_delta.html">Download Plotly File</a>
</p>

<p>
    In order to find out which sensor is causing which offset, the <i>tower shadow</i> plot is calculated for each wind vane sensor and for each respective time interval. The figure below shows a lot of mess going on in each sensor.
</p>

<p>
    <img width="900" height="500" src="../images/diroffset_sensor_shadow.png">
    <a href="../plots/diroffset_sensor_shadow.html">Download Plotly File</a>
</p>

<p>
    The algorithm is working in the following way:
</p>

<ol type="1">
    <li>
        The delta between the two sensors on each mast is calculated as a time series. It shall be noted that the readings are values in degrees and hence in the range of 0.0° to 359.9°. In order to calculate the delta between two angles each angle is broken into its sine and cosine parts.
    </li><br>
    <li>
        Change point analysis is then used to find each timestamp where a significant change in the delta time series occurs. Additionally, the delta between before and after the timestamp is used as the most likely offset.
    </li><br>
    <li>
        In the next step, the algorithm is iteratively looping through the timestamps in which a potential offset has been identified. In each iteration the script is using the time interval before the timestamp as a reference and the interval after the timestamp and before (if it exists) the next timestamp as the to be tested time interval. This is happening in parallel for each sensor.
    </li><br>
    <ol>
        <li>In each iteration the shadow plot is calculated for the reference and the to be tested time intervals.
        </li><br>
        <li>The to be tested shadow plot is shifted in both directions by the most likely offset and the cross correlation between each shift and the reference is calculated.
        </li><br>
        <li>The script is returning the pair and shift with the highest cross correlation and it is being corrected for that pair.
        </li><br>
        <li>The algorithm is starting from the beginning until no shifts are being detected anymore.
        </li>
    </ol>
</ol>

<p>
    This iterative process is needed to account for different shifts potentially adding up.
</p>

<p>
    Below figure shows that after the process is finished, each sensor is aligned within itself. At the same time each mast is aligned within itself.
</p>

<p>
    <img width="900" height="500" src="../images/diroffset_sensor_delta_after.png">
    <a href="../plots/diroffset_sensor_delta_after.html">Download Plotly File</a>
</p>

<p>
    The figure below shows that each sensor is aligned within itself. For example when the wind direction is 90° all three shadow plots (for each time interval) are perfectly aligned.
</p>

<p>
    <img width="900" height="500" src="../images/diroffset_sensor_shadow_after.png">
    <a href="../plots/diroffset_sensor_shadow_after.html">Download Plotly File</a>
</p>

In [2]:
def _find_closest_value(value, X): 
    """The algorithm finds the closest value in an array to a specified value.
    
    Args:
        value (float): Specified value.
        X (array): Array to search for closest value in.

    Returns:
        value_closest (float): Closest value of array.
    """
    
    value_closest = X[np.unravel_index(np.argmin(np.abs(X-value)), X.shape)]
    
    return value_closest

def _calculate_delta_DIR(degree1, degree2):
    """The algorithm calculates the delta between two angles by breaking degrees
    into their sine and cosine parts.
    
    Args:
        degree1 (float): Angle 1 in degrees.
        degree2 (float): Angle 2 in degrees.

    Returns:
        degree_delta (float): Delta between Angle 1 and 2.
    """
    
    # Converting both angles to radians
    degree1 = degree1*np.pi/180
    degree2 = degree2*np.pi/180
    
    # Converting of both angles' delta to degrees
    degree_delta = round(np.arctan2(np.sin(degree1-degree2), 
                                    np.cos(degree1-degree2))*180/np.pi)

    return degree_delta

def _group_DIR(data, pair):
    """The algorithm calculates the tower shadow function for a pair of two
    anemometers and a wind vane sensor.
    
    Args:
        data (DataFrame): DataFrame that includes two wind speed sensor (on a 
                            similiar height) and one wind direction sensor 
                            measurements.
        pair (list): List that includes two wind speed and one wind direction 
                            sensors

    Returns:
        data_ratio (DataFrame): Median delta between both wind speed sensors as 
                            function of wind direction
    """
    
    data = data.copy()
    
    # Filtering out low wind speeds
    data = data.loc[data.iloc[:, 0] > 4]
    data = data.loc[data.iloc[:, 1] > 4]
    
    # Calculating median ratio between both wind speed sensor as function of 
    # wind direction
    data['ratio'] = data.iloc[:, 0]-data.iloc[:, 1]
    data_grouped = data.groupby(pair[2]).median()
    
    data_ratio = data_grouped[['ratio']]
    
    return data_ratio

def _find_best_corr(data_base, data_test, offsets = np.arange(-180, 190, 10)):
    """The algorithm finds the maximum correlation in regards to offsets.

    Args:
        data_base (DataFrame): DataFrame of base sensor.
        data_test (DataFrame): DataFrame of sensor in question.
        offsets (array): Array of possible offsets.

    Returns:
        offset_max (int): Offset that maximizes correlation.
    """

    corr_coeff = []

    # Calculating correlation coefficient for each shift which is 
    # specified by "offsets"
    for off in offsets:

        data_offset = _extend_data(data_test)
        data_offset.index += off

        corr_coeff.append(data_base.corr(data_offset[0:360]))

    # Find max correlation coefficient
    corr_coeff = np.array(corr_coeff)
    corr_coeff = np.nan_to_num(corr_coeff)

    offset_max = offsets[np.where(corr_coeff==corr_coeff.max())][0]

    return offset_max

def _extend_data(data):
    """Helper function to extend a DataFrame to below 0 degrees and above 360 
    degrees.

    Args:
        data (DataFrame): DataFrame to be extended.

    Returns:
        data_extended (DataFrame): extended DataFrame.
    """

    data = data.copy()
    data_sub = data.copy()
    data_add = data.copy()
    
    # Extends data from 0-360 to -360-720
    data_sub.index -= 360
    data_add.index += 360

    data_extended = pd.concat([data_sub[:0], data, data_add[360.1:]], axis=0)
    data_extended = data_extended.interpolate('linear')

    return data_extended

def _detect_changepoint_DIR(data, col1, col2):
    """Detects all significant changes in delta between two directional sensors.

    Args:
        data (DataFrame): DataFrame that includes two wind direction sensors.
        col1 (str): Wind Direction Sensor 1.
        col2 (str): Wind Direction Sensor 2.

    Returns:
        data_bkpts (list): List of potential change points.
    """
    
    data = data[[col1, col2]].copy()

    # Calculates delta between both directional sensors on a daily level
    data_delta = _calculate_delta_DIR(data[col1], data[col2]).resample('D').median()
    data_delta = data_delta.interpolate('linear')
    
    # Finds significant changes within delta
    data_bkpts = _find_breakpoint(data_delta.dropna())[:-1]

    data_bkpts = [data_delta.dropna().index[i] for i in data_bkpts]
    
    return data_bkpts

def _find_breakpoint(data):
    """Detects structural change points in a data set.

    Args:
        data (DataFrame): Data that will be checked for change points.

    Returns:
        bkpts (list): List with indices of detected change points.
    """
    
    model = 'l2'
    algo = rpt.Pelt(model=model, min_size=300, jump=5).fit(np.array(data))

    try:
        bkpts = algo.predict(pen=1000)
    except:
        bkpts = [len(data)]

    return bkpts

def _detect_delta_sensor_DIR(data, col1, col2):
    """Detects and calibrates offsets on a mast level for wind direction sensors.

    Args:
        data (DataFrame): DataFrame that includes two wind direction sensors.
        col1 (str): Wind Direction Sensor 1.
        col2 (str): Wind Direction Sensor 2.
    """

    # Finds significant changes in delta of both directional sensors
    bkpts = _detect_changepoint_DIR(data, col1, col2)
    bkpts.insert(0, data.index.min())
    bkpts.append(data.index.max())
    
    # Finds wind speed pair that maximizes data availability
    spd_pair = _find_SPD_availability(data)
    
    spd_pair_a = spd_pair.copy()
    spd_pair_a.append(col1)

    spd_pair_b = spd_pair.copy()
    spd_pair_b.append(col2)
    
    i = 1
    
    while i != len(bkpts):
        
        try:
            
            # Calculates delta of directional offsets between pre and post change point
            base = _calculate_delta_DIR(data.loc[bkpts[i-1]:bkpts[i], col1],
                    data.loc[bkpts[i-1]:bkpts[i], col2]).resample('D').median().median()
            
            test = _calculate_delta_DIR(data.loc[bkpts[i]:bkpts[i+1], col1],
                    data.loc[bkpts[i]:bkpts[i+1], col2]).resample('D').median().median()
                                    
            if abs(test-base) > 10:
                
                # Detects offset in both directional sensors and calibrates them 
                # accordingly
                a1 = _group_DIR(data[bkpts[i-1]:bkpts[i]], spd_pair_a)
                a2 = _group_DIR(data[bkpts[i]:bkpts[i+1]], spd_pair_a)
                
                b1 = _group_DIR(data[bkpts[i-1]:bkpts[i]], spd_pair_b)
                b2 = _group_DIR(data[bkpts[i]:bkpts[i+1]], spd_pair_b)
            
                a_delta = _find_best_corr(a1.iloc[:,0], a2.iloc[:,0])
                b_delta = _find_best_corr(b1.iloc[:,0], b2.iloc[:,0])
                
                data.loc[bkpts[i]:bkpts[i+1], col1] += a_delta
                data.loc[bkpts[i]:bkpts[i+1], col2] += b_delta
                
                data.loc[data[col1] > 360, col1] -= 360
                data.loc[data[col2] > 360, col2] -= 360
                data.loc[data[col1] < 0, col1] += 360
                data.loc[data[col2] < 0, col2] += 360
                
                # Loops back to the beginning of the dataset and reiterates procedure
                i = 1
            
            else:
                
                i +=1
        
        except:
            
            i += 1
            
def _calculate_availability(data, col):
    """Calculates data availability of a given column.

    Args:
        data (DataFrame): DataFrame that includes the to be checked column.
        col (str): Column that will be checked for data availability.
        
    Returns:
        availability (float): Data availability of column.
    """
    
    availabilty = 1 - data[col].isna().sum()/data[col].shape[0]
    
    return availabilty

def _find_SPD_availability(data):
    """Returns a wind speed pair that maximizes data availability.

    Args:
        data (DataFrame): DataFrame that includes the wind speed pairs.
        
    Returns:
        spd_pair (list): Wind speed pair that maximizes data availability.
    """
    
    # Finds wind speed columns and pairs them up
    spd_cols = [i for i in data.columns if 'SPD' in i]
    spd_pairs = {height.split('_')[2]:[col for col in spd_cols if height.split('_')[2] in col] \
                 for height in spd_cols}
    
    availability_dict = {}
    
    # Calculates data availabiliy and finds maximum
    for height in spd_pairs.keys():
        
        availability_dict[height] = np.array([_calculate_availability(data, col) for col \
                                              in spd_pairs[height]]).mean()

    availability_max = max(availability_dict, key=lambda k: availability_dict[k])
    
    spd_pair = spd_pairs[availability_max]
    
    return spd_pair

<p>
    In the next step, every met mast in the same location is compared to each other. The algorithm is calculating the frequency of the wind direction for each sensor. The frequency is divided into 16 bins starting at 12.25°. The bins are chosen this way so that there is a true north bin (348.75° to 12.25°). Then, the median is calculated among each sensor's mid point of each's respective bin with the highest frequency. It is expected that over a long period of time the frequency distribution of the wind direction will be similar in one location. Hence the calculated median represent the most likely highest frequent wind direction. Depending on how different each sensor's highest frequent bin is from the median, it will be adjusted. However, this will only happen if the difference is larger than 25°.
</p>

<p>
    Figure below shows the met masts' wind direction readings on the left after they have been aligned within themselves and among all three masts. On the right side the weather data used as a reference shows a very similar distribution.

<p>
    <img width="900" height="500" src="../images/diroffset_overview_after_project.png">
    <a href="../plots/diroffset_overview_after_project.html">Download Plotly File</a>
</p>

In [3]:
def _find_median_DIR(degrees):
    """Calculates the median of multiple angles.

    Args:
        degrees (array): Array with multiple angles.

    Returns:
        degree_median (float): Median angle of angle arrays.
    """

    # Converts angle to a set of kartesian coordinates in radiants
    radiant_sin = np.median(np.sin(degrees*np.pi/180))
    radiant_cos = np.median(np.cos(degrees*np.pi/180))

    if radiant_sin > 0 and radiant_cos > 0:

        degree_median = np.arctan(radiant_sin/radiant_cos)*180/np.pi

    elif radiant_cos < 0:

        degree_median = np.arctan(radiant_sin/radiant_cos)*180/np.pi+180

    else:

        degree_median = np.arctan(radiant_sin/radiant_cos)*180/np.pi+360
    
    return degree_median
    
    
def _detect_DIR_offset_project(project_dict):
    """Detectes and calibrates wind direction offsets on a project level.

    Args:
        project_dict (dict): Dictionary with wind direction sensors for the
                        whole project.
    """
    
    DIR_max = []
    channels = []

    # Finds most frequent wind direction bin for each wind direction sensor
    for mast in project_dict.keys():

        data = project_dict[mast][0].copy()

        for col in project_dict[mast][1:]:

            data.loc[data[col] > 360-22.5/2] -= 360
            
            DIR_max.append(pd.cut(data[col], bins=16).value_counts().idxmax().mid)
            channels.append(col)
            
    # Calculates delta of each sensors' most frequent bin to all sensors' median
    DIR_median = _find_median_DIR(np.array(DIR_max))
    
    multiples_dec = np.arange(-360, 370, 5)

    DIR_max = DIR_max - DIR_median

    for i in range(len(DIR_max)):

        DIR_max[i] = _find_closest_value(DIR_max[i], multiples_dec)
        
    offset_dict = {channels[i]:DIR_max[i] for i in range(len(channels))}
    
    # Calibrates each wind direction sensor
    for mast in project_dict.keys():

        data = project_dict[mast][0]
        
        for channel in project_dict[mast][1:]:
            
            if abs(offset_dict[channel]) > 25:
            
                data.loc[:, channel] -= offset_dict[channel]
                data.loc[data[channel] >= 360, channel] -= 360
                data.loc[data[channel] < 0, channel] += 360

<p>
    In a last step each sensor is tested against the weather data. Similiarly to the previous check the wind direction will be binned into 16 sections. The highest frequent bin's mid point is compared to the weather data's highest frequent bin's mid point. If the difference is larger than 45°, the sensor will be adjusted accordingly.
</p>

<p>
    The figure below shows the plot for each mast's wind vane sensor on the left after offset calibration. The reference on the right shows that all met masts are also aligned with the wind direction of the weather data.
</p>

<p>
    <img width="900" height="500" src="../images/diroffset_overview_after_vmm.png">
    <a href="../plots/diroffset_overview_after_vmm.html">Download Plotly File</a>
</p>

In [4]:
def _detect_DIR_offset_weather(data, col, weather):
    """Detects and calibrates offsets for wind direction sensors by comparing to
    a nearby weather station.

    Args:
        data (DataFrame): DataFrame that includes the to be checked wind direction
                        sensor.
        col (str): Wind direction sensor.
        weather (DataFrame): DataFrame with wind direction of nearby weather station.
    """

    DIR_max = []

    data_mast = data.copy()
    data_mast.loc[data_mast[col] > 360-22.5/2] -= 360
    
    data_weather = weather.copy()
    data_weather.loc[data_weather['Weather_DIR'] > 360-22.5/2] -= 360

    # Finds most frequent wind direction bin for wind direction sensor and 
    # directional data of nearby weather station
    mast_max = pd.cut(data[col], bins=16).value_counts().idxmax().mid
    weather_max = pd.cut(data_weather['Weather_DIR'], bins=16).value_counts().idxmax().mid

    # Calculates delta of wind direction sensor and directional data of 
    # nearby weather station
    mast_max = mast_max - weather_max
    multiples_dec = np.arange(-360, 361, 25)
    
    mast_max = _find_closest_value(mast_max, multiples_dec)

    # Calibrates wind direction sensor
    if abs(mast_max) > 45:
        
        data.loc[:, col] -= mast_max
        data.loc[data[col] >= 360, col] -= 360
        data.loc[data[col] < 0, col] += 360