# Table of Contents
 <p><div class="lev1 toc-item"><a href="#Data-Import-and-Preprocessing" data-toc-modified-id="Data-Import-and-Preprocessing-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Data Import and Preprocessing</a></div><div class="lev2 toc-item"><a href="#Distances" data-toc-modified-id="Distances-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Distances</a></div><div class="lev1 toc-item"><a href="#Kernel-Specification" data-toc-modified-id="Kernel-Specification-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Kernel Specification</a></div><div class="lev1 toc-item"><a href="#Using-Masked-kernels" data-toc-modified-id="Using-Masked-kernels-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Using Masked kernels</a></div><div class="lev1 toc-item"><a href="#Fixed-parameter-Kernel" data-toc-modified-id="Fixed-parameter-Kernel-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Fixed parameter Kernel</a></div>

In [2]:
using TimeSeries
using DataFrames
using GaussianProcesses
using GaussianProcesses: Mean, Kernel, evaluate, metric, IsotropicData, VecF64
using GaussianProcesses: Stationary, KernelData, MatF64
import GaussianProcesses: optimize!, get_optim_target, cov, grad_slice!
import GaussianProcesses: num_params, set_params!, get_params, update_mll!, update_mll_and_dmll!
import GaussianProcesses: get_param_names, cov!, addcov!, multcov!
import Proj4
using Optim
using Distances
;

In [3]:
import PyPlot; plt=PyPlot
using LaTeXStrings
plt.rc("figure", dpi=300.0)
# plt.rc("figure", figsize=(6,4))
plt.rc("savefig", dpi=300.0)
plt.rc("text", usetex=true)
plt.rc("font", family="serif")
plt.rc("font", serif="Palatino")
;

# Data Import and Preprocessing

In [4]:
include("src/preprocessing.jl")

test_data (generic function with 1 method)

In [5]:
isdList=read_isdList()
isdList[1:5,:]

Unnamed: 0,USAF,WBAN,NAME,CTRY,STATE,ICAO,LAT,LON,ELEV,BEGIN,END,X_PRJ,Y_PRJ
1,10010,99999,JAN MAYEN(NOR-NAVY),NO,,ENJA,70.933,-8.667,9.0,1931,2015,4554500.0,6113440.0
2,10060,99999,EDGEOYA,NO,,,78.25,22.817,14.0,1973,2015,4049820.0,7556400.0
3,10070,99999,NY-ALESUND,SV,,,78.917,11.933,7.7,1973,2015,3867800.0,7265490.0
4,10080,99999,LONGYEAR,SV,,ENSB,78.246,15.466,26.8,1975,2015,3997050.0,7336690.0
5,10090,99999,KARL XII OYA,SV,,,80.65,25.0,5.0,1955,2015,3692590.0,7685450.0


In [6]:
isdSubset=isdList[[(usaf in (725450,725460,725480,725485)) for usaf in isdList[:USAF].values],:]
isdSubset

Unnamed: 0,USAF,WBAN,NAME,CTRY,STATE,ICAO,LAT,LON,ELEV,BEGIN,END,X_PRJ,Y_PRJ
1,725450,14990,THE EASTERN IOWA AIRPORT,US,IA,KCID,41.883,-91.717,264.6,1973,2015,1647990.0,1044100.0
2,725460,14933,DES MOINES INTERNATIONAL AIRPORT,US,IA,KDSM,41.534,-93.653,291.7,1973,2015,1487230.0,1003790.0
3,725480,94910,WATERLOO MUNICIPAL AIRPORT,US,IA,KALO,42.554,-92.401,264.6,1960,2015,1590250.0,1117660.0
4,725485,14940,MASON CITY MUNICIPAL ARPT,US,IA,KMCW,43.154,-93.327,373.4,1973,2015,1514070.0,1183740.0


In [7]:
hourly_cat=read_Stations(isdSubset)
hourly_cat[1:5,:]

Unnamed: 0,year,month,day,hour,min,seconds,temp,ts,station,ts_hours
1,2015,1,1,0,52,0,-7.8,2015-01-01T00:52:00,1,0.866667
2,2015,1,1,1,52,0,-8.3,2015-01-01T01:52:00,1,1.86667
3,2015,1,1,2,52,0,-8.3,2015-01-01T02:52:00,1,2.86667
4,2015,1,1,3,52,0,-9.4,2015-01-01T03:52:00,1,3.86667
5,2015,1,1,4,52,0,-9.4,2015-01-01T04:52:00,1,4.86667


## Distances

To get distances between stations, we can either use a function to compute distances on a sphere, or we can first project the coordinates onto a Euclidean plane, and then compute normal distances. I'll do it both ways to check they're consistent (equal up to a multiplication constant), and then use Euclidean distances for convenience.

In [8]:
# http://www.johndcook.com/blog/python_longitude_latitude/
function distance_on_unit_sphere(lat1, long1, lat2, long2)
 
    # Convert latitude and longitude to 
    # spherical coordinates in radians.
    degrees_to_radians = π/180.0
         
    # phi = 90 - latitude
    phi1 = (90.0 - lat1)*degrees_to_radians
    phi2 = (90.0 - lat2)*degrees_to_radians
         
    # theta = longitude
    theta1 = long1*degrees_to_radians
    theta2 = long2*degrees_to_radians
         
    # Compute spherical distance from spherical coordinates.
         
    # For two locations in spherical coordinates 
    # (1, theta, phi) and (1, theta', phi')
    # cosine( arc length ) = 
    #    sin phi sin phi' cos(theta-theta') + cos phi cos phi'
    # distance = rho * arc length
     
    cosangle = (sin(phi1)*sin(phi2)*cos(theta1 - theta2) +
           cos(phi1)*cos(phi2))
    arc = acos( cosangle )
 
    # Remember to multiply arc by the radius of the earth 
    # in your favorite set of units to get length.
    return arc
end

distance_on_unit_sphere (generic function with 1 method)

In [9]:
numstations = nrow(isdSubset)
pairwiseSphere = zeros(numstations, numstations)
for i in 1:numstations
    for j in 1:i
        if i==j
            continue
        end
        station1 = isdSubset[i,:]
        station2 = isdSubset[j,:]
        lat1= get(station1[1,:LAT])
        lon1 = get(station1[1,:LON])
        lat2 = get(station2[1,:LAT])
        lon2 = get(station2[1,:LON])
        pairwiseSphere[i,j] = distance_on_unit_sphere(lat1, lon1, lat2, lon2)
        pairwiseSphere[j,i] = pairwiseSphere[i,j]
    end
end
pairwiseSphere

4×4 Array{Float64,2}:
 0.0        0.0259496  0.0146736  0.0303475
 0.0259496  0.0        0.024088   0.0285853
 0.0146736  0.024088   0.0        0.0158124
 0.0303475  0.0285853  0.0158124  0.0      

In [10]:
pairwiseEuclid=pairwise(Euclidean(), Matrix(isdSubset[[:X_PRJ,:Y_PRJ]])')

4×4 Array{Float64,2}:
      0.0        165736.0        93510.4        1.93474e5
 165736.0             0.0            1.53559e5  1.81942e5
  93510.4             1.53559e5      0.0        1.00846e5
      1.93474e5       1.81942e5      1.00846e5  0.0      

Ratio of the two distance matrices: close enough to a constant!

In [11]:
pairwiseEuclid ./ pairwiseSphere

4×4 Array{Float64,2}:
 NaN            6.38684e6    6.37271e6    6.37527e6
   6.38684e6  NaN            6.37493e6    6.36489e6
   6.37271e6    6.37493e6  NaN            6.37765e6
   6.37527e6    6.36489e6    6.37765e6  NaN        

# Kernel Specification

Use the time series kernel fitted in `JuliaGP_timeseries_chunks.ipynb`, with the hyperparameters fitted there.

In [12]:
k1 = fix(Periodic(0.0,0.0,log(24.0)), :lp)
k2 = RQIso(0.0,0.0,0.0)
k3 = SEIso(0.0,0.0)
k4 = RQIso(0.0,0.0,0.0)
k5 = RQIso(0.0,0.0,0.0)
k6 = SE(0.0,0.0)
k_time=k1+k2+k3+k4+k5+k6
# hyperparameters fitted in JuliaGP_timeseries_chunks.ipynb
hyp=[-1.4693,-0.0806483,1.0449,1.50786,1.10795,-1.38548,-1.22736,-1.05138,3.09723,1.28737,2.84127,3.64666,0.469691,3.00962,7.70695,-5.39838]
set_params!(k_time, hyp[2:end])

The spatial kernel is just a squared exponential kernel. I don't think we have enough stations to do anything fancier than that.

In [13]:
k_spatial = SEIso(log(2*10^5), log(1.0))

Type: GaussianProcesses.SEIso, Params: [12.2061,0.0]


In [17]:
k_longrange = k1+k3+k4+k5+k6;

In [19]:
k_spatiotemporal = fix(Masked(k_longrange, [1])) * Masked(k_spatial, [2,3]) + fix(Masked(k4, [1]))

Type: GaussianProcesses.SumKernel
  Type: GaussianProcesses.ProdKernel
    Type: GaussianProcesses.FixedKern, Params: Float64[]
    Type: GaussianProcesses.Masked{GaussianProcesses.SEIso}, Params: [12.2061,0.0]
  Type: GaussianProcesses.FixedKern, Params: Float64[]


In [26]:
chunks=GP[]
chunk_width=24*10
tstart=0.0
tend=tstart+chunk_width
nobsv=0
while tstart < get(maximum(hourly_cat[:ts_hours]))
    in_chunk=(tstart .<= hourly_cat[:ts_hours].values) & (hourly_cat[:ts_hours].values .< tend)
    hourly_chunk = hourly_cat[in_chunk,:]
    nobsv_chunk = sum(in_chunk)
    nobsv += nobsv_chunk
    
    chunk_X_PRJ = isdSubset[:X_PRJ].values[hourly_chunk[:station].values]
    chunk_Y_PRJ = isdSubset[:Y_PRJ].values[hourly_chunk[:station].values]
    chunk_X = [hourly_chunk[:ts_hours].values chunk_X_PRJ chunk_Y_PRJ]
    
    y = hourly_chunk[:temp].values
    chunk = GP(chunk_X', y, MeanConst(mean(y)), k_spatiotemporal, 0.0)
    push!(chunks, chunk)
    
    tstart=tend
    tend+=chunk_width
end

In [27]:
include("src/TempModel.jl")



TempModel

In [28]:
reals = TempModel.GPRealisations(chunks);

In [29]:
@time update_mll_and_dmll!(reals; mean=false)

 48.968031 seconds (661.74 M allocations: 16.332 GB, 16.71% gc time)


4-element Array{Float64,1}:
 11951.9   
    -3.2778
  1421.33  
 13882.1   

In [30]:
@time opt_out=optimize!(reals, mean=false, show_trace=true, x_tol=1e-5, f_tol=1e-5);

Iter     Function value   Gradient norm 
     0     9.031502e+04     1.388209e+04
Base.LinAlg.PosDefException(422)
     1     8.943582e+04     2.046723e+04
     2     7.972682e+04     1.566036e+04
     3     7.620220e+04     3.213852e+03
     4     7.577909e+04     1.778986e+03
     5     7.488669e+04     4.218337e+03
     6     7.466434e+04     4.926334e+03
     7     7.394336e+04     6.257254e+03
     8     7.332750e+04     4.134967e+02
     9     7.332461e+04     2.289585e+02
    10     7.332315e+04     2.293577e+01
    11     7.332314e+04     1.571761e+01
1522.435704 seconds (22.23 G allocations: 526.426 GB, 17.01% gc time)


In [31]:
print(Optim.minimizer(opt_out))

[-1.65876,11.5213,1.24961]

In [36]:
reals.reals[1].m

Type: GaussianProcesses.MeanConst, Params: [-12.0992]


In [35]:
reals.reals[2].m

Type: GaussianProcesses.MeanConst, Params: [-4.06234]


In [33]:
print("\nk: SEIso \n=================\n")
@printf("σ: %5.3f\n", √k_spatial.σ2)
@printf("l: %5.3f\n", √k_spatial.ℓ2)
print("\nk: SEIso \n=================\n")
@printf("σ: %5.3f\n", √k4.σ2)
@printf("l: %5.3f\n", √k4.ℓ2)
print("\n=================\n")
@printf("σy: %5.3f\n", exp(reals.logNoise))


k: SEIso 
σ: 3.489
l: 100845.105

k: SEIso 
σ: 3.623
l: 22.137

σy: 0.190


This gives us a pretty convincing fit for the spatial component of the covariance. Note that the lengthscale is in meters, and 76 500 m $\approx$ 19 miles. The noise standard deviation $\sigma_y$ has gone down a little bit.

In [29]:
cov(k_spatial, 0.0)

1.30407992075188

# Table of Contents
 <p><div class="lev1 toc-item"><a href="#Data-Import-and-Preprocessing" data-toc-modified-id="Data-Import-and-Preprocessing-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Data Import and Preprocessing</a></div><div class="lev2 toc-item"><a href="#Distances" data-toc-modified-id="Distances-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Distances</a></div><div class="lev1 toc-item"><a href="#Kernel-Specification" data-toc-modified-id="Kernel-Specification-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Kernel Specification</a></div><div class="lev1 toc-item"><a href="#Using-Masked-kernels" data-toc-modified-id="Using-Masked-kernels-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Using Masked kernels</a></div><div class="lev1 toc-item"><a href="#Fixed-parameter-Kernel" data-toc-modified-id="Fixed-parameter-Kernel-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Fixed parameter Kernel</a></div>

# Table of Contents
 <p><div class="lev1 toc-item"><a href="#Data-Import-and-Preprocessing" data-toc-modified-id="Data-Import-and-Preprocessing-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Data Import and Preprocessing</a></div><div class="lev2 toc-item"><a href="#Distances" data-toc-modified-id="Distances-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Distances</a></div><div class="lev1 toc-item"><a href="#Kernel-Specification" data-toc-modified-id="Kernel-Specification-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Kernel Specification</a></div><div class="lev1 toc-item"><a href="#Using-Masked-kernels" data-toc-modified-id="Using-Masked-kernels-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Using Masked kernels</a></div><div class="lev1 toc-item"><a href="#Fixed-parameter-Kernel" data-toc-modified-id="Fixed-parameter-Kernel-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Fixed parameter Kernel</a></div>