This script shows an example of a working spatiotemporal visualization that shows the following data:
- Acoustic (real dates, real locations)
- Aerial surveys (real dates, two fake locations)
- Zooplankton surveys (real dates, one fake location)

### This is an important script to document well:

#### Data workflow to incorporate new pieces goes like this:
- bring in raw data from Rob
- reformat to be integrated into Concat3 form
- once everything's in Concat3 form, that df will be fed into this 'flip' script which will:
    - flip from long to wide (one row per date, with sep columns for acou/prey/aerial)
    - create column that flags overlaps
        - start with flagging when all three datasets are present
        - *more intricate: identify where we have 2 overlapping, specify which datasets
    - flip back to long
- join back to concat3 form for plotting, this time with 'overlap' column that can be visualized on map

In [1]:
import altair as alt
import pandas as pd
import geopandas as gpd

In [2]:
# concat3 was created in SpatioTemporal_March15.ipynb script 
    # (combines acoustic data, plus TL and Zooplnk survey dates Rob sent me on 3/8 and 3/9)
        # 3/9 email in '2019 Zooplankton Data' thread
        # 3/8 email in 'survey tracts and times' thread

concat3 = pd.read_csv('../data/concat3.csv',
                              parse_dates = ['between_days'])

concat3 # each record is a date associated with a datatype we have
        #each data type has its own record for each day it's available

Unnamed: 0.1,Unnamed: 0,between_days,depYear,c_uniqueUnitID,latitudeDeployed_DecDeg,longitudeDeployed_DecDeg,DataType
0,0,2011-02-17,2011,2011_BRP_CCB_S1016_Dep20_20110217_PU0205_FD020...,41.9412,-70.288,Acoustic
1,0,2011-02-18,2011,2011_BRP_CCB_S1016_Dep20_20110217_PU0205_FD020...,41.9412,-70.288,Acoustic
2,0,2011-02-19,2011,2011_BRP_CCB_S1016_Dep20_20110217_PU0205_FD020...,41.9412,-70.288,Acoustic
3,0,2011-02-20,2011,2011_BRP_CCB_S1016_Dep20_20110217_PU0205_FD020...,41.9412,-70.288,Acoustic
4,0,2011-02-21,2011,2011_BRP_CCB_S1016_Dep20_20110217_PU0205_FD020...,41.9412,-70.288,Acoustic
...,...,...,...,...,...,...,...
3735,64,2018-04-13,2018,,41.9700,-70.430,Zooplankton
3736,65,2018-04-22,2018,,41.9700,-70.430,Zooplankton
3737,66,2018-04-27,2018,,41.9700,-70.430,Zooplankton
3738,67,2018-04-30,2018,,41.9700,-70.430,Zooplankton


### Flipping the data
- bring in concat4 (dataset with one acoustic record per hydrophone array) 
    - condensed acoustic df created in 'DataReformat_AcousticRange.ipynb'
    - concat4 df created in 'SpatioTemporal_March15.ipynb' (condensed acoustic concatenated with aerial and prey)
- pivot concat4 from long format to wide
- create column that flags overlap days

In [3]:
# concat4 = condensed acoustic df concatenated with whale and zooplank
concat4 = pd.read_csv('../data/concat4.csv',
                              parse_dates = ['between_days'])

concat4

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,c_uniqueUnitID,between_days,DataType
0,0,0.0,2011_BRP_CCB_S1016_Dep20_20110217_PU0205_FD020...,2011-02-17,acoustic
1,1,0.0,2011_BRP_CCB_S1016_Dep20_20110217_PU0205_FD020...,2011-02-18,acoustic
2,2,0.0,2011_BRP_CCB_S1016_Dep20_20110217_PU0205_FD020...,2011-02-19,acoustic
3,3,0.0,2011_BRP_CCB_S1016_Dep20_20110217_PU0205_FD020...,2011-02-20,acoustic
4,4,0.0,2011_BRP_CCB_S1016_Dep20_20110217_PU0205_FD020...,2011-02-21,acoustic
...,...,...,...,...,...
887,64,,,2018-04-13,zooplankton
888,65,,,2018-04-22,zooplankton
889,66,,,2018-04-27,zooplankton
890,67,,,2018-04-30,zooplankton


`pivot concat4`

In [22]:
# pivot from long to wide 
concat_pivot = concat4.pivot(index = 'between_days', columns='DataType', values='DataType')
concat_pivot

DataType,acoustic,whale,zooplankton
between_days,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2011-02-17,acoustic,whale,zooplankton
2011-02-18,acoustic,,
2011-02-19,acoustic,,
2011-02-20,acoustic,,
2011-02-21,acoustic,,
...,...,...,...
2018-05-26,acoustic,,
2018-05-27,acoustic,,
2018-05-28,acoustic,,
2018-05-29,acoustic,,


`create overlap column`

In [25]:
# Identifies with T/F where we have overlap of all 3 (acou, prey, zoo)
    # True = all three columns have a value
    # False = at least one column has an NaN value (aka, data gap)   

def my_function(row):
    return all(row[['acoustic', 'whale', 'zooplankton']].notna())

In [24]:
concat_pivot['overlap'] = concat_pivot.apply(my_function, axis = 1)
concat_pivot

DataType,acoustic,whale,zooplankton,overlap
between_days,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2011-02-17,acoustic,whale,zooplankton,True
2011-02-18,acoustic,,,False
2011-02-19,acoustic,,,False
2011-02-20,acoustic,,,False
2011-02-21,acoustic,,,False
...,...,...,...,...
2018-05-26,acoustic,,,False
2018-05-27,acoustic,,,False
2018-05-28,acoustic,,,False
2018-05-29,acoustic,,,False


In [49]:
#pd.DataFrame.to_csv(concat_pivot, '../data/concat_pivot.csv')

...is it cool to join concat_pivot to concat3?

*in other words...

**will the True/False's correctly carry over??**

`test a rejoin`

In [27]:
# short sample
concat_short = concat_pivot.head()
concat_short

DataType,acoustic,whale,zooplankton,overlap
between_days,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2011-02-17,acoustic,whale,zooplankton,True
2011-02-18,acoustic,,,False
2011-02-19,acoustic,,,False
2011-02-20,acoustic,,,False
2011-02-21,acoustic,,,False


In [39]:
concat_short = concat_short.reset_index()

In [48]:
# flip it back (with column) .. weird
concat_melt = concat_short.melt(id_vars=['between_days', 'overlap'],
                                ignore_index=False)

concat_melt

Unnamed: 0,between_days,overlap,DataType,value
0,2011-02-17,True,acoustic,acoustic
1,2011-02-18,False,acoustic,acoustic
2,2011-02-19,False,acoustic,acoustic
3,2011-02-20,False,acoustic,acoustic
4,2011-02-21,False,acoustic,acoustic
0,2011-02-17,True,whale,whale
1,2011-02-18,False,whale,
2,2011-02-19,False,whale,
3,2011-02-20,False,whale,
4,2011-02-21,False,whale,


### Plotting

In [51]:
# clipped shapefile
clipped_shp = '/Users/cristiana/Documents/Duke/MP/Python/Scripting/scratch/data/newengland_clipped (1)/newengland_clipped.shp'
clip = gpd.read_file(clipped_shp).to_crs('epsg:4326')
clip.head()

Unnamed: 0,FIPS,NAME,ACRES,Shape_Leng,Shape_Area,geometry
0,25,MASSACHUSETTS,5104241.5,609872.80019,2503760000.0,"POLYGON ((-70.82491 42.26034, -70.78642 42.234..."


In [52]:
# Massachusetts plotted with altair
alt.Chart(clip).mark_geoshape(
    fill='#2a1d0c', stroke='#706545', strokeWidth=0.5
).project('mercator')

In [53]:
# full interactive visual -- acoustic + aerial + zooplankton

interval = alt.selection(type='interval', encodings=['x']) 
# interactive piece
    # different types
    # interactivity along x axis
# selections can have conditions

timeline_base = alt.Chart(concat3).mark_rect().encode(
    y = alt.Y('DataType:O', axis=alt.Axis(title='Data Type')),
    color = 'DataType:N'
).properties(
    width = 600
)

timeline_overview = timeline_base.encode(
    x = alt.X(
        'between_days:T', 
        timeUnit = 'yearmonthdate', 
        axis = alt.Axis(title='Date')
    )
).add_selection( # adding interactivity
    interval
).properties(
    height = 40
)

timeline_detail = timeline_base.encode(
    x = alt.X(
        'between_days:T', 
        timeUnit='yearmonthdate',
        axis = alt.Axis(title=''),
        scale = alt.Scale(domain=interval) # using the interactive selection to show X range
    )
).properties(
    height = 100
)

basemap = alt.Chart(clip).mark_geoshape(
    fill = 'lightgray', stroke='#706545', strokeWidth=0.5
).project('mercator').properties(
    width = 600,
    height = 300
)

points = alt.Chart(concat3).mark_point().encode(
    longitude = 'longitudeDeployed_DecDeg:Q',
    latitude = 'latitudeDeployed_DecDeg:Q',
    color = 'DataType:N'
).transform_filter(
    interval
).project("mercator").properties( # can put scale parameter
    width = 600,
    height = 300
)

March19 = alt.vconcat((basemap + points), timeline_detail, timeline_overview)
March19

#March19.save('CCB_SpatioTemporal_March19.html')

what do we need to see?
- the full temporal extent of each dataset
- which days the datasets overlap
- locations of detections

what outputs do we need?
- overlap days (reported number? csv list of dates? visual heatmap?)
- spatial proximity (how close is detection A to detection B?)

check this out for interaction
# https://altair-viz.github.io/gallery/interactive_brush.html