# Unit Detection for Temperature Sensors

<p>
    <img src='../images/temperature_device.jpg' width=300>
    <p>
        <center>
            <strong>Figure:</strong> Example of a temperature sensor
        </center>
    </p>
</p>

<p>
    The figure above shows a temperature sensor. A data logger is attached to the sensor and records the sensor readings over time. In order to make sure the sensor and its logger are constantly working properly, a maintenance guy is semi-regularly send to each met mast to check if the readings are still accurate. The pre-defined process requires the maintenance guy to manually compare the sensors reading with a portable temperature sensor.
</p>

<p>
    The standard unit for temperature is supposed to be Celsius. However, especially in the US, the situation sometimes occurs that the maintenance guy changes the reading to Fahrenheit to manually compare the reading with the portable temperature sensor. If this unit change is not detected and converted back to Celsius, the temperature data becomes junk and hence not useful for further analysis.
</p>

<p>
    The below figure shows a temperature time series for about two months. The sensor data was measured in a Canadian location. It should be easy to notice that there is a significant difference between temperature measured in January and temperature measured in February. Additionally, it is not very likely that there will be temperatures of more than 60Â°C during January in Canada. It is quite possible that the first month was measured in Fahrenheit, and the second month in Celsius.
</p>

<p>
    <img width="900" height="500" src="../images/temperature_timeseries_example.png">
    <a href="../plots/temperature_timeseries_example.html">Download Plotly File</a>
</p>

<p>
    It is possible to extract the coordinates of the met mast from the master data base. These coordinates are then used to find the physically closest weather station. Via an API call it is possible to pull the historic weather data from WorldWeatherOnline. The highest resolution is an hourly frequency for the weather data, while the met mast data has a 10min frequency. It is however possible to resample the weather data to a 10min frequency as well by linear interpolation to get a good enough approximation. After interpolation the historic weather data can then be used as a reference to automatically detect the sensor's unit and convert it to Celsius if needed.
</p>

<p>
    The figure below shows the temperature data of the sensor in blue (in Fahrenheit, and Celsius), and the temperature data retrieved from the nearby weather station (in Celsius). Unsurprisingly, the temperature readings are significantly different for January and quite similiar for February.
</p>

<p>
    <img width="900" height="500" src="../images/temperature_timeseries_example_weather.png">
    <a href="../plots/temperature_timeseries_example_weather.html">Download Plotly File</a>
</p>

In [1]:
from wwo_hist import retrieve_hist_data
import numpy as np
import pandas as pd

def get_weather_data(date_start, date_end, location):
    """Retrieves the historic temperature data for a location.

    Args:
        date_start (str): Start date of the data set.
        date_end (str): End date of the data set.
        location (str): Latitude/Longitude (decimal degree) or city name

    Returns:
        weather (DataFrame): Historic temperature data for the location.
    """
    
    # Hourly data (highest resolution)
    frequency = 1
    # API key for WorldWeatherOnline
    api_key = '########'
    location_list = [location]
    hist_weather_data = retrieve_hist_data(api_key,
                                location_list,
                                date_start,
                                date_end,
                                frequency,
                                location_label = False,
                                export_csv = False,
                                store_df = True)[0]
    
    # Only keeps pulled temperature data 
    weather = hist_weather_data[['tempC']]
    
    return weather

<p>
    The figure below shows the temperature readings for the sensor and the weather station on the top, and the delta between the sensor and the weather station on the bottom. As already pointed out above, the delta between both sources is high if the respective units are different and low if the respective units are the same.
</p>

<p>
    <img width="900" height="500" src="../images/temperature_timeseries_delta_celsius.png">
    <a href="../plots/temperature_timeseries_delta_celsius.html">Download Plotly File</a>
</p>

<p>
    As suggested above, the difference of two sensors with different units is higher than for two sensors with the same unit. To take this into account, the weather station's temperature data is converted into Fahrenheit and Kelvin. Below figure shows the delta between the met mast's sensor and the weather station's temperature measured in three different units (top: Celsius, middle: Fahrenheit, bottom: Kelvin). Unsurprisingly, the delta is the lowest for Fahrenheit in January, and the lowest for Celsius in February.
</p>

<center>
    <img width="900" height="500" src="../images/temperature_timeseries_delta_all.png">
    <a href="../plots/temperature_timeseries_delta_all.html">Download Plotly File</a>
</center>

<p>
    Below figure shows the time series of the met mast temperature sensor. Blue represents Fahrenheit and red represents Celsius as the detected unit. 
</p>

<p>
    <img width="900" height="500" src="../images/temperature_timeseries_detected_unit.png">
    <a href="../plots/temperature_timeseries_detected_unit.html">Download Plotly File</a>
</p>

In [2]:
def _detect_unit(data, weather):
    """The algorithm is detecting the temperature unit for each time stamp.

    Args:
        data (DataFrame): Sensor data whose temperature unit needs to be \
                            checked for each time stamp.
        weather (DataFrame): Temperature data in Celsius from a close by \
                            weather station.

    Returns:
        data_unit (DataFrame): Detected units for each time stamp of the \
                            sensor data.
    """
    
    data = data.copy()
    weather = weather.copy()
    
    # Converting weather station temperature to Kelvin and Fahrenheit
    weather['tempK'] = weather['tempC'] + 273.15
    weather['tempF'] = weather['tempC'] * (9/5) + 32
    
    data_merged = data.merge(weather, 'left', left_index=True, right_index=True)
    data_merged[['tempC', 'tempK', 'tempF']] = data_merged[['tempC', 'tempK', \
                                                    'tempF']].interpolate('linear')
    
    # Calculating delta between sensor and weather station measurements (C, F, K)
    data_merged['delta_tempC'] = abs(data_merged.tempC - data_merged.T_AVG)
    data_merged['delta_tempF'] = abs(data_merged.tempF - data_merged.T_AVG)
    data_merged['delta_tempK'] = abs(data_merged.tempK - data_merged.T_AVG)
    
    # Rolling median to counter outliers
    data_merged['roll_tempC'] = data_merged.delta_tempC.rolling(6*24).median()
    data_merged['roll_tempF'] = data_merged.delta_tempF.rolling(6*24).median()
    data_merged['roll_tempK'] = data_merged.delta_tempK.rolling(6*24).median()
    
    # Find Temperature Unit which minimizes delta
    data_merged['min_temp'] = data_merged[['roll_tempC', 'roll_tempF', \
                                           'roll_tempK']].idxmin(axis=1)
    data_merged.loc[data_merged.min_temp == 'roll_tempC', 'unit'] = 0
    data_merged.loc[data_merged.min_temp == 'roll_tempF', 'unit'] = 1
    data_merged.loc[data_merged.min_temp == 'roll_tempK', 'unit'] = 2
    
    data_merged['unit'] = data_merged['unit'].fillna(method='bfill')
    data_merged['unit'] = data_merged['unit'].fillna(method='ffill')
    
    data_merged = data_merged.loc[~data_merged.index.duplicated()]
    
    data_unit = data_merged['unit']
    
    return data_unit

<p>
    After the unit for each timestamp was correctly identified, that information is used to convert any non-Celsius entry to Celsius.
</p>

<p>
    <img width="900" height="500" src="../images/temperature_timeseries_calibrated_unit.png">
    <a href="../plots/temperature_timeseries_calibrated_unit.html">Download Plotly File</a>
</p>

In [3]:
def convert_F(data):
    """Converts Fahrenheit to Celsius."""

    return (data-32)*5/9

def convert_K(data):
    """Converts Kelvin to Celsius."""
    
    return data-273.15

def _calibrate_unit(data, weather):
    """The algorithm converts each time stamp to Celsius.

    Args:
        data (DataFrame): Sensor data whose temperature unit needs to be \
                            checked for each time stamp.
        weather (DataFrame): Temperature data in Celsius from a close by \
                            weather station.

    Returns:
        data_temp (DataFrame): Sensor temperature data converted to Celsius.
    """

    data = data.copy()
    
    # Labels each time stamp with corresponding temperature unit
    data['unit'] = _detect_unit(data, weather)
    
    # Converts time stamps to Celsius
    data.loc[data.unit == 1, 'T_AVG'] = convert_F(data.loc[data.unit == 1, 'T_AVG'])    
    data.loc[data.unit == 2, 'T_AVG'] = convert_K(data.loc[data.unit == 2, 'T_AVG'])
    
    data_temp = data[['T_AVG']]
    
    return data_temp