# Interpolation of weather data from Vacker web service to create dataset of uniform periods

The weather records retrieved from the VackerVader website ([1](./VackerWeatherLog.ipynb)) are not uniformly regular.

The 'zoo' library has been used to generate aprooximate values for the missing datapoints.


In [8]:
# read the data file generated from the retrieved wetaher records from vakervader.se
data <- read.csv("./database/vackerWeather.csv", header=TRUE, sep=",", dec=".")

In [9]:
head(data)

X,timestamp,Temp,Pressure,WindSpeed,Precipitation,Humidity,WindDirection,Visibility,CloudCover,CloudHeight
<int>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
0,1545438000,4.0,997.0,2.6,,100,230,10000,0.0,
1,1545439800,5.0,997.0,3.1,,93,220,10000,0.0,
2,1545441600,5.0,997.0,3.6,,93,230,10000,0.0,
3,1545444000,5.2,995.0,3.6,,97,210,50000,25.0,
4,1545445200,5.0,998.0,3.1,,93,210,10000,75.0,975.0
5,1545447600,4.6,995.6,3.1,,97,200,50000,12.5,


In [14]:
# create a new column of type 'R' datetime converted from the unix timestamp of each row 
data$dt <- as.POSIXct(data$timestamp, origin="1970-01-01") 

In [18]:
summary(data)

       X           timestamp              Temp           Pressure     
 Min.   :    0   Min.   :1.545e+09   Min.   :-1.400   Min.   : 958.7  
 1st Qu.: 3280   1st Qu.:1.551e+09   1st Qu.: 6.000   1st Qu.:1002.0  
 Median : 6560   Median :1.556e+09   Median : 8.000   Median :1011.0  
 Mean   : 6560   Mean   :1.556e+09   Mean   : 8.276   Mean   :1010.3  
 3rd Qu.: 9840   3rd Qu.:1.561e+09   3rd Qu.:11.000   3rd Qu.:1020.1  
 Max.   :13120   Max.   :1.573e+09   Max.   :22.800   Max.   :1044.0  
                                                                      
   WindSpeed      Precipitation      Humidity      WindDirection  
 Min.   : 0.000   Min.   :11.30   Min.   : 33.00   Min.   : 10.0  
 1st Qu.: 3.600   1st Qu.:15.90   1st Qu.: 77.00   1st Qu.:140.0  
 Median : 5.700   Median :18.50   Median : 86.00   Median :200.0  
 Mean   : 6.066   Mean   :19.07   Mean   : 83.97   Mean   :202.2  
 3rd Qu.: 8.200   3rd Qu.:21.60   3rd Qu.: 93.00   3rd Qu.:270.0  
 Max.   :22.600   Max.   :32.4

In [58]:
# display a number of datetimes to get an idea of the pattern of records
data[1:40,12]

 [1] "2018-12-22 00:20:00 GMT" "2018-12-22 00:50:00 GMT"
 [3] "2018-12-22 01:20:00 GMT" "2018-12-22 02:00:00 GMT"
 [5] "2018-12-22 02:20:00 GMT" "2018-12-22 03:00:00 GMT"
 [7] "2018-12-22 03:20:00 GMT" "2018-12-22 03:50:00 GMT"
 [9] "2018-12-22 04:20:00 GMT" "2018-12-22 05:00:00 GMT"
[11] "2018-12-22 05:20:00 GMT" "2018-12-22 05:50:00 GMT"
[13] "2018-12-22 06:20:00 GMT" "2018-12-22 06:50:00 GMT"
[15] "2018-12-22 07:00:00 GMT" "2018-12-22 07:20:00 GMT"
[17] "2018-12-22 07:50:00 GMT" "2018-12-22 08:20:00 GMT"
[19] "2018-12-22 08:50:00 GMT" "2018-12-22 09:00:00 GMT"
[21] "2018-12-22 09:20:00 GMT" "2018-12-22 10:00:00 GMT"
[23] "2018-12-22 10:20:00 GMT" "2018-12-22 10:50:00 GMT"
[25] "2018-12-22 11:00:00 GMT" "2018-12-22 11:20:00 GMT"
[27] "2018-12-22 11:50:00 GMT" "2018-12-22 12:00:00 GMT"
[29] "2018-12-22 12:20:00 GMT" "2018-12-22 13:00:00 GMT"
[31] "2018-12-22 13:20:00 GMT" "2018-12-22 13:50:00 GMT"
[33] "2018-12-22 14:00:00 GMT" "2018-12-22 14:20:00 GMT"
[35] "2018-12-22 15:00:00 GMT" 

It can be seen that the interval between records is not uniform.  In the above data 10, 20, 30 and 40 minute intervals can be observed.

A method to interpolate the data was obtained from [StackOverflow](https://stackoverflow.com/questions/33186316/linear-interpolate-missing-values-in-time-series) which utilises the 'zoo' library, which had to be installed separately

In [35]:
# command to install 'zoo' package
#install.packages("zoo", repos='http://cran.us.r-project.org')

package 'zoo' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\Stuart\AppData\Local\Temp\RtmpOc93gg\downloaded_packages


In [43]:
# https://stackoverflow.com/questions/33186316/linear-interpolate-missing-values-in-time-series

# generate new dataframe for all timeslots, merge with existing data and create new complete columns by interpolation.

library(dplyr)
library(zoo)

# create dataframe with timestamp column from start of data to end of data every 10 mins (600 seconds)
filleddata <- data.frame(dt = seq(data$dt[1], data$dt[nrow(data)], by = 600, ))

# map existing data into new dataframe
filleddata <- full_join(filleddata, data, by = "dt")

# generate interpolated columns
filleddata <- mutate(filleddata, approxWindSpeed = na.approx(WindSpeed))
filleddata <- mutate(filleddata, approxPressure = na.approx(Pressure))
filleddata <- mutate(filleddata, approxTemp = na.approx(Temp))
filleddata <- mutate(filleddata, approxWindDirection = na.approx(WindDirection))


In [59]:
filleddata[1:24,c(1,2,3,4,5,6,9,13,14,15,16)]

dt,X,timestamp,Temp,Pressure,WindSpeed,WindDirection,approxWindSpeed,approxPressure,approxTemp,approxWindDirection
<dttm>,<int>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
2018-12-22 00:20:00,0.0,1545438000.0,4.0,997.0,2.6,230.0,2.6,997.0,4.0,230.0
2018-12-22 00:30:00,,,,,,,2.766667,997.0,4.333333,226.6667
2018-12-22 00:40:00,,,,,,,2.933333,997.0,4.666667,223.3333
2018-12-22 00:50:00,1.0,1545439800.0,5.0,997.0,3.1,220.0,3.1,997.0,5.0,220.0
2018-12-22 01:00:00,,,,,,,3.266667,997.0,5.0,223.3333
2018-12-22 01:10:00,,,,,,,3.433333,997.0,5.0,226.6667
2018-12-22 01:20:00,2.0,1545441600.0,5.0,997.0,3.6,230.0,3.6,997.0,5.0,230.0
2018-12-22 01:30:00,,,,,,,3.6,996.5,5.05,225.0
2018-12-22 01:40:00,,,,,,,3.6,996.0,5.1,220.0
2018-12-22 01:50:00,,,,,,,3.6,995.5,5.15,215.0


In [54]:
# create dataframe for export
exportdata <- filleddata[,c(1,13,16,14,15)]

In [55]:
head(exportdata)

dt,approxWindSpeed,approxWindDirection,approxPressure,approxTemp
<dttm>,<dbl>,<dbl>,<dbl>,<dbl>
2018-12-22 00:20:00,2.6,230.0,997,4.0
2018-12-22 00:30:00,2.766667,226.6667,997,4.333333
2018-12-22 00:40:00,2.933333,223.3333,997,4.666667
2018-12-22 00:50:00,3.1,220.0,997,5.0
2018-12-22 01:00:00,3.266667,223.3333,997,5.0
2018-12-22 01:10:00,3.433333,226.6667,997,5.0


In [56]:
# export data as comma separated variable file.
write.csv(exportdata, './database/InterpolatedVackerWeather.csv')