We load three data sets that we want to merge into one big data set. `cpt` contains all waypoints of chimpanzee travel, one point recorded every 15 m on average. It also includes information on whether points can be considered change points. `wp` is a dataset that only contains the points where chimpanzees showed special activity such as feeding or resting. `tr` finally is a GeoJSON file that contains cleaned paths. Because the order and number of waypoints of these cleaned routes might differ from the ones contained in the other data sets, we preserve the structure of `tr` and add information that we are interested in from the other data sets. In the process, we also create segments that we will use for the modelling.

We start with the `pt` data set and parse the datetime information:

In [1]:
using DataFrames, CSV, Dates, JSON, Query

cpt = CSV.read("data/cpt.csv") |> DataFrame
cpt.date = Date.(cpt.xdate, "yyyy/mm/dd")
cpt.datetime = cpt.date .+ cpt.stime
cpt.cpt = cpt.q5;
sort!(cpt, (:datetime))

Unnamed: 0_level_0,OBJECTID,xdate,prg,code,xname,xtime,focal,lat,lon
Unnamed: 0_level_1,Int64,String,Int64,Missing,String,String,String,Float64,Float64
1,1,2007/10/04,0,missing,s0,1899/12/30,T3,0.57843,30.3669
2,2,2007/10/04,1,missing,s0,1899/12/30,T3,0.57837,30.3671
3,91,2007/10/04,2,missing,s0,1899/12/30,T3,0.57818,30.3671
4,92,2007/10/04,3,missing,s0,1899/12/30,T3,0.57798,30.3671
5,93,2007/10/04,0,missing,s1,1899/12/30,A3,0.57778,30.3664
6,94,2007/10/04,1,missing,s1,1899/12/30,A3,0.57782,30.3668
7,95,2007/10/04,2,missing,s1,1899/12/30,A3,0.57783,30.3669
8,96,2007/10/04,3,missing,s1,1899/12/30,A3,0.57788,30.367
9,97,2007/10/04,4,missing,s1,1899/12/30,A3,0.57792,30.3671
10,98,2007/10/04,5,missing,s1,1899/12/30,A3,0.57793,30.3672


Now, we deal with the `wp` data. We parse datetimes and create a column that holds information on the stationary time spent at each waypoint.

In [2]:
wp = CSV.read("data/wp.csv") |> DataFrame
wp.datetime = Date.(wp.xdate, "dd/mm/yyyy") .+ wp.xtIN
wp.stattime = wp.xtOUT .- wp.xtIN; # stationary time at waypoint

Next, we dummy code the activity data contained in the `actcode` column.

In [3]:
wp.act = [1 for x in wp.actcode] # wp only includes points with activity
wp.feeding = [x[2] == 'F' for x in wp.actcode]; 

After that, we merge both data sets into one big DataFrame `df` that contains all the relevant information.

In [4]:
# create df with all the points, and if point is waypoint, include additional info

cpt_wp = leftjoin(cpt[:, [:focal, :prg, :datetime, :date, :lat, :lon, :tparty, :altitude, :cpt]], wp[:, [:datetime, :stattime, :act, :feeding]], on = :datetime);

In [5]:
replace!(cpt_wp.act, missing => 0)
replace!(cpt_wp.cpt, missing => 0)
replace!(cpt_wp.feeding, missing => 0)
cpt_wp.act = Bool.(cpt_wp.act) # int => bool
cpt_wp.cpt = Bool.(cpt_wp.cpt)

mask = vcat([false], [cpt_wp.lat[i] == cpt_wp.lat[i + 1] && cpt_wp.lon[i] == cpt_wp.lon[i + 1] for i in 1:nrow(cpt_wp)-1])
cpt_wp = cpt_wp[map(!, mask), :] # remove duplicates

print(cpt_wp[1:15, [:focal, :prg, :lat, :lon, :cpt, :act, :feeding]])

15×7 DataFrame
│ Row │ focal  │ prg   │ lat     │ lon     │ cpt  │ act  │ feeding │
│     │ [90mString[39m │ [90mInt64[39m │ [90mFloat64[39m │ [90mFloat64[39m │ [90mBool[39m │ [90mBool[39m │ [90mBool?[39m   │
├─────┼────────┼───────┼─────────┼─────────┼──────┼──────┼─────────┤
│ 1   │ T3     │ 0     │ 0.57843 │ 30.3669 │ 0    │ 0    │ 0       │
│ 2   │ T3     │ 1     │ 0.57837 │ 30.3671 │ 0    │ 0    │ 0       │
│ 3   │ T3     │ 2     │ 0.57818 │ 30.3671 │ 0    │ 0    │ 0       │
│ 4   │ T3     │ 3     │ 0.57798 │ 30.3671 │ 0    │ 0    │ 0       │
│ 5   │ A3     │ 0     │ 0.57778 │ 30.3664 │ 0    │ 0    │ 0       │
│ 6   │ A3     │ 1     │ 0.57782 │ 30.3668 │ 0    │ 0    │ 0       │
│ 7   │ A3     │ 2     │ 0.57783 │ 30.3669 │ 0    │ 0    │ 0       │
│ 8   │ A3     │ 3     │ 0.57788 │ 30.367  │ 0    │ 0    │ 0       │
│ 9   │ A3     │ 4     │ 0.57792 │ 30.3671 │ 0    │ 0    │ 0       │
│ 10  │ A3     │ 5     │ 0.57793 │ 30.3672 │ 0    │ 0    │ 0       │
│ 11  │ A3     │ 6

We are interested in travel that is potentially intentional. For this reason, we want to compartmentalise the daily travel into segments from one point of activity to another point of activity where we suspect, based on the Change Point Test, that they are ultimate destinations of travel (not just a proximate beacon). The function `count` iterates over the points and advances the segment count by one after the co-occurence of a change point and recorded activity. The presence of recorded activity is a necessary condition to be sure that the change point does not appear because of topological features (road system/ridge/river that requires a turn). Two data sets are created: A considers a change point if its has recorded activity of any sort, F only if it has feeding activity.

We use the function `count` to create a column `segment` which takes the form `<focal individual>_<date>_<count>`.

In [6]:
mutable struct Counter
    i::Int
    num::Int 
end 
counter1 = Counter(1,1)
counter2 = Counter(1,1)

function count(df::DataFrame, counter::Counter, i::Int, col::Symbol, indices::Array{Int64,1})
    if df.focal[counter.i] != df.focal[i] || Date(df.datetime[counter.i]) != Date(df.datetime[i])
        counter.num = 1
    end
    counter.i = i
    ret = counter.num
    mini(i) = i - 2 < 1 ? i : i - 2
    maxi(i) = i + 2 > nrow(df) ? i : i + 2 
    if df[col][i] && (df.focal[mini(i)] == df.focal[maxi(i)] && Date(df.datetime[mini(i)]) == Date(df.datetime[maxi(i)]) ? 
            any(df.cpt[mini(i):maxi(i)]) : 
            df.cpt[i]) # we also check up to two points before and after the change point (if it's the same focal)
        counter.num += 1
        push!(indices, i)
    end
    return ret
end

indices_a = Int64[]
indices_f = Int64[]
df_a = cpt_wp
df_f = copy(cpt_wp)
df_a.segment = [df_a.focal[i] * "_" * string(Date(df_a.datetime[i])) * "_" * string(count(df_a, counter1, i, :act, indices_a)) for i in 1:nrow(df_a)] # create column "segment" (all activity considered)
df_f.segment = [df_f.focal[i] * "_" * string(Date(df_f.datetime[i])) * "_" * string(count(df_f, counter2, i, :feeding, indices_f)) for i in 1:nrow(df_f)] # only feeding considered

df_a_new = DataFrame(:segment => String[], :focal => String[], :datetime => DateTime[], :lat => Float64[], :lon => Float64[], :tparty => Union{Missing, String}[], :altitude => Int64[])
df_f_new = DataFrame(:segment => String[], :focal => String[], :datetime => DateTime[], :lat => Float64[], :lon => Float64[], :tparty => Union{Missing, String}[], :altitude => Int64[])

#print(filter(r -> r.focal == "ST" && Date(r.datetime) == Date(2007,10,13), cpt_wp)[:, [:lat, :lon, :cpt, :act, :feeding, :focal, :datetime]])
#print(filter(r -> r.focal == "ST" && Date(r.datetime) == Date(2007,10,13), df_a)[:, [:segment, :lat, :lon]])
#print(filter(r -> r.focal == "ST" && Date(r.datetime) == Date(2007,10,13), df_f)[:, [:segment, :lat, :lon]])

for i in 1:nrow(df_a)
    row = df_a[i, [:segment, :focal, :datetime, :lat, :lon, :tparty, :altitude]]
    push!(df_a_new, row)
    if in(i, indices_a) && i < nrow(df_a)
        row.segment = df_a[i + 1, :segment]
        row.focal = df_a[i + 1, :focal]
        row.tparty = df_a[i + 1, :tparty]
        push!(df_a_new, row)
    end
end

for i in 1:nrow(df_f)
    row = df_f[i, [:segment, :focal, :datetime, :lat, :lon, :tparty, :altitude]]
    push!(df_f_new, row)
    if in(i, indices_f) && i < nrow(df_f)
        row.segment = df_f[i + 1, :segment]
        row.focal = df_f[i + 1, :focal]
        row.tparty = df_f[i + 1, :tparty]
        push!(df_f_new, row)
    end
end

CSV.write("data/points_a.csv", df_a_new)
CSV.write("data/points_f.csv", df_f_new)

"data/points_f.csv"