### This notebook shows how to make a Pandas DataFrame from WarpScript GTS

### 1. From a single GTS to a DataFrame

In [1]:
%load_ext warpscript
%alias_magic w warpscript

Created `%w` as an alias for `%warpscript`.
Created `%%w` as an alias for `%%warpscript`.


We will need pandas and pickle libraries.

In [2]:
from __future__ import print_function
import pandas as pd
import pickle as pkl

We first create a random GTS.

In [3]:
%%w -s s -l
NEWGTS 'randGTS' RENAME 1 10 <% h RAND RAND NaN RAND ADDVALUE %> FOR

Local gateway launched on port 33909
Creating a new WarpScript stack accessible under variable "s".


top: 	randGTS{}<DOUBLE, 10 values>

In order to make a GTS understood by a python interpreter, we store its content in a map of lists and pickle it as a dict.<br/>
The macro `GTStoPickledDict` does this. To load it, you can place the file `macros/GTStoPickledDict.mc2`<br/>
in the macros folder of the Warp 10 platform you are sending requests to, or you can execute the following cell.

In [4]:
%%w -s s
<%
    # Documenting the macro
    'GTS BOOLEAN @GTStoPickledDict, more doc in macros/GTStoPickledDict.mc2' DOC

    # Check there is two arguments on the stack
    <% DEPTH 2 < %> <% 'Macro takes two arguments' MSGFAIL %> IFT
        
    # Check that top is a boolean indicating whether to use GTS classname or selector
    <% 1 PICK TYPEOF 'BOOLEAN' != %> <% 'First argument must be a boolean indicating whether to use GTS selector (true) or classname (false)' MSGFAIL %> IFT
    
    # Check that second argument is a GTS
    <% 2 PICK TYPEOF 'GTS' != %> <% 'Second argument must be a GTS' MSGFAIL %> IFT
    
    # Store the arguments
    'withSelector' STORE
    'gts' STORE
    
    # Make name
    $gts <% $withSelector %> <% TOSELECTOR %> <% NAME %> IFTE
    'name' STORE
    
    # macro: check not all NaN (for locations and elevations)
    <% UNIQUE DUP SIZE 1 == SWAP 0 GET ISNaN && %> 'isAllNaN' STORE
        
    # Return pickled dict for pandas
    {
        # ticks
        'timestamps' $gts TICKLIST
        
        # locations
        $gts LOCATIONS 'lon' STORE 'lat' STORE
        <% $lat @isAllNaN ! %> <% $name '.lat' + $lat %> IFT
        <% $lon @isAllNaN ! %> <% $name '.lon' + $lon %> IFT
        
        # elevations
        $gts ELEVATIONS 'elev' STORE
        <% $elev @isAllNaN ! %> <% $name '.elev' + $elev %> IFT
        
        # values        
        $name $gts VALUES
    }
    ->PICKLE
%>
'GTStoPickledDict' STORE

top: 	randGTS{}<DOUBLE, 10 values>

We evaluate the macro on the random GTS that was left on the stack.<br/>
Setting the first argument to false means we drop its labels for its pickled representation.

In [5]:
%%w -s s
false
@GTStoPickledDict # use instead '@./GTStoPickledDict' if GTStoPickledDict.mc2 is in the macros folder

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I3600000000\nI7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\neX\x0b\x00\x00\x00randGTS.latq\x03]q\x04(G?\xc7\xe0\\B\x00\x00\x00G?\xa7\x99k\x98\x00\x00\x00G?\xe3\x15\xbb\xaa\x80\x00\x00G?\xee(r\x8a\x80\x00\x00G?\xeb%k[\x80\x00\x00G?\xb5\x96,(\x00\x00\x00G?\xd3\xeb\xf5\xec\x00\x00\x00G?\xb6\xe8\x99\xd0\x00\x00\x00G?\xed>\xa4\xac\x80\x00\x00G?\xc6\x1e\xf5\x8a\x00\x00\x00eX\x0b\x00\x00\x00randGTS.lonq\x05]q\x06(G?\xd5\x02\xeb\xde\x00\x00\x00G?\xc1\xaa\xcc\xa4\x00\x00\x00G?\xd1\xd1u\xd4\x00\x00\x00G?\xe1\x12\xde\x94\x00\x00\x00G?\xd7K\xa7P\x00\x00\x00G?\xe5\xe2\xf8i\x00\x00\x00G?\xe2k\x1aJ\x00\x00\x00G?\xbf\x9f\xfe\xd8\x00\x00\x00G?\xe0s\xd0\xf6\x00\x00\x00G?\xc9\x15N8\x00\x00\x00eX\x07\x00\x00\x00randGTSq\x07]q\x08(G?\xb1\x96\xf8\x01\x82\xf6\x90G?\xec\x8a\x91\xd4*+wG?\xe5\xfdy;U\xab\xc3G?\xeb\xac\x85g.\x8d\xa5G?\xec\xe0\xed\xa7\xf3\x9d)G?\xd7\xdfzI\x8c\x8

We then load the dict from its pickled representation and create a pandas dataframe with it.

In [6]:
gts1 = s.pop()
df1 = pd.DataFrame.from_dict(pkl.loads(gts1))
df1

Unnamed: 0,timestamps,randGTS.lat,randGTS.lon,randGTS
0,3600000000,0.186534,0.328303,0.06871
1,7200000000,0.046092,0.138025,0.891915
2,10800000000,0.596403,0.278409,0.687192
3,14400000000,0.942437,0.533553,0.86481
4,18000000000,0.848318,0.363993,0.902457
5,21600000000,0.084323,0.683956,0.373015
6,25200000000,0.311277,0.575574,0.881603
7,28800000000,0.089487,0.123535,0.800017
8,32400000000,0.913897,0.514138,0.378057
9,36000000000,0.17282,0.195963,0.401335


In the following example, we choose to keep label information.

In [7]:
%%w -s s
NEWGTS 'randGTS' RENAME 1 10 <% h RAND RAND NaN RAND ADDVALUE %> FOR
{ 'key1' 'info1' 'key2' 'info2' } RELABEL
true
@GTStoPickledDict # use instead '@./GTStoPickledDict' if GTStoPickledDict.mc2 is in the macros folder

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I3600000000\nI7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\neX"\x00\x00\x00randGTS{key1=info1,key2=info2}.latq\x03]q\x04(G?\xa8lE\x18\x00\x00\x00G?\xe8\xa5d)\x00\x00\x00G?\xe9{S\xe2\x80\x00\x00G?\xae\xf4\x90\x98\x00\x00\x00G?\xe3\xfe\xa3,\x00\x00\x00G?\xd2\xa3"\x91\x00\x00\x00G?\xd2\x1b\xab\xbd\x00\x00\x00G?\xe7)\x94\xb2\x00\x00\x00G?\xedFbX\x00\x00\x00G?\xc9\xf4\x84|\x00\x00\x00eX"\x00\x00\x00randGTS{key1=info1,key2=info2}.lonq\x05]q\x06(G?\xe2\xa3Vl\x00\x00\x00G?\xd5@\xbf\x90\x00\x00\x00G?\x9b\\\x88\x00\x00\x00\x00G?\xda\xdc\xc8@\x00\x00\x00G?\xeb\xe5u\xc5\x00\x00\x00G?\xefiI\x1f\x00\x00\x00G?\xecgU\x90\x00\x00\x00G?\xde\xc99H\x00\x00\x00G?\xba!`\xa0\x00\x00\x00G?\xc6\x9e\x0eL\x00\x00\x00eX\x1e\x00\x00\x00randGTS{key1=info1,key2=info2}q\x07]q\x08(G?\xef\x18\x92R\xcc\x8e\x1eG?\xd6\xb5\xce\xea\x1c\xe1\xc0G?\xd7\x95I\xdc\xbb\xd9\x9cG?\xe3\x8d*\x85\xa9

In [8]:
gts2 = s.pop()
df2 = pd.DataFrame.from_dict(pkl.loads(gts2))
df2

Unnamed: 0,timestamps,"randGTS{key1=info1,key2=info2}.lat","randGTS{key1=info1,key2=info2}.lon","randGTS{key1=info1,key2=info2}"
0,3600000000,0.047701,0.582439,0.971749
1,7200000000,0.770189,0.332077,0.354847
2,10800000000,0.796305,0.02672,0.368487
3,14400000000,0.06046,0.419725,0.610982
4,18000000000,0.624834,0.87176,0.915982
5,21600000000,0.291207,0.981602,0.569737
6,25200000000,0.282939,0.887614,0.659928
7,28800000000,0.723826,0.481032,0.460731
8,32400000000,0.914842,0.102072,0.862312
9,36000000000,0.202775,0.176698,0.00056


We can also not use geo information.

In [9]:
%%w -s s
NEWGTS 'randTS' RENAME 2 11 <% h NaN NaN NaN RAND ADDVALUE %> FOR
false
@GTStoPickledDict # use instead '@./GTStoPickledDict' if GTStoPickledDict.mc2 is in the macros folder

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\nI39600000000\neX\x06\x00\x00\x00randTSq\x03]q\x04(G?\xec5\xe2\x97)D[G?\xdaf\x1c\x0b=\x00^G?\xef\xcc\xd5\xd5\xde\xa09G?\xd5\xdc\x02.\xf9\x82`G?\xef\xbc\xd4\xed"|JG?\xea{\x93\xb5|E\xd1G?\xe2$r\xf7Ao\nG?\xe2\x11\xa0\xf0\xd3\xe7\xb0G?\xc2\x1b\xaa\x81?\x12\xb4G?\xe07Wx\xb5\xadJeu.'

In [10]:
gts3 = s.pop()
df3 = pd.DataFrame.from_dict(pkl.loads(gts3))
df3

Unnamed: 0,timestamps,randTS
0,7200000000,0.881578
1,10800000000,0.412482
2,14400000000,0.993754
3,18000000000,0.341553
4,21600000000,0.991801
5,25200000000,0.827585
6,28800000000,0.566949
7,32400000000,0.564652
8,36000000000,0.141469
9,39600000000,0.506756


### 2. Revert a DataFrame to a GTS

To revert a DataFrame to a GTS, we first need to convert the DataFrame into a dict.

In [11]:
gts1b = df1.to_dict('list')
gts1b

{'timestamps': [3600000000,
  7200000000,
  10800000000,
  14400000000,
  18000000000,
  21600000000,
  25200000000,
  28800000000,
  32400000000,
  36000000000],
 'randGTS.lat': [0.18653443548828363,
  0.04609237890690565,
  0.5964029626920819,
  0.9424374299123883,
  0.8483177935704589,
  0.08432270027697086,
  0.3112768940627575,
  0.08948670700192451,
  0.9138968819752336,
  0.1728197978809476],
 'randGTS.lon': [0.32830330543220043,
  0.13802488334476948,
  0.2784094400703907,
  0.5335533991456032,
  0.36399252712726593,
  0.683956341817975,
  0.5755740590393543,
  0.12353508733212948,
  0.514137726277113,
  0.19596269354224205],
 'randGTS': [0.06870985066322022,
  0.8919152397005367,
  0.6871915968780445,
  0.8648097052832112,
  0.9024570732407585,
  0.37301499540813365,
  0.8816031369874782,
  0.8000174484071105,
  0.37805728046251363,
  0.4013346069487659]}

We can push this dict directly onto the stack, since it will be automatically converted in the JVM.

In [12]:
s.push(gts1b)
s

top: 	{'randGTS.lat': [0.18653443548828363, 0.04609237890690565, 0.5964029626920819, 0.9424374299123883, 0.8483177935704589, 0.08432270027697086, 0.3112768940627575, 0.08948670700192451, 0.9138968819752336, 0.1728197978809476], 'timestamps': [3600000000, 7200000000, 10800000000, 14400000000, 18000000000, 21600000000, 25200000000, 28800000000, 32400000000, 36000000000], 'randGTS': [0.06870985066322022, 0.8919152397005367, 0.6871915968780445, 0.8648097052832112, 0.9024570732407585, 0.37301499540813365, 0.8816031369874782, 0.8000174484071105, 0.37805728046251363, 0.4013346069487659], 'randGTS.lon': [0.32830330543220043, 0.13802488334476948, 0.2784094400703907, 0.5335533991456032, 0.36399252712726593, 0.683956341817975, 0.5755740590393543, 0.12353508733212948, 0.514137726277113, 0.19596269354224205]}

Now we can use the lists contained in this map to populate a GTS.

In [13]:
%%w -s s
'dict' STORE
$dict 'timestamps' GET
$dict 'randGTS.lat' GET
$dict 'randGTS.lon' GET
[] // no elevation
$dict 'randGTS' GET
MAKEGTS 'randGTS' RENAME

top: 	randGTS{}<DOUBLE, 10 values>

In [14]:
print(s.pop().toString())

randGTS{}
=3600000000/0.18653443548828363:0.32830330543220043/ 0.06870985066322022
=7200000000/0.04609237890690565:0.13802488334476948/ 0.8919152397005367
=10800000000/0.5964029626920819:0.2784094400703907/ 0.6871915968780445
=14400000000/0.9424374299123883:0.5335533991456032/ 0.8648097052832112
=18000000000/0.8483177935704589:0.36399252712726593/ 0.9024570732407585
=21600000000/0.08432270027697086:0.683956341817975/ 0.37301499540813365
=25200000000/0.3112768940627575:0.5755740590393543/ 0.8816031369874782
=28800000000/0.08948670700192451:0.12353508733212948/ 0.8000174484071105
=32400000000/0.9138968819752336:0.514137726277113/ 0.37805728046251363
=36000000000/0.1728197978809476:0.19596269354224205/ 0.4013346069487659



### 3. From a list of GTS to a DataFrame

We want to put every GTS of a list in a same DataFrame with a single `timestamps` column.<br/>
Since every GTS don't have values for the same timestamps, we need to handle missing values,<br/>
and we need to make the assumption that each GTS can have at most one value per timestamp.<br/>
It is more efficient to do that in WarpScript, as done by the macro `ListGTStoPickledDict`.

If there are many unaligned ticks, consider converting to lists of single column dataFrame or Series instead.

In [15]:
%%w -s s -o -l
<%
    # Documenting the macro
    '[GTS] BOOLEAN @ListGTStoPickledDict , more doc in macros/ListGTStoPickledDict.mc2' DOC

    # Check there is two arguments on the stack
    <% DEPTH 2 < %> <% 'Macro takes two arguments' MSGFAIL %> IFT
        
    # Check that top is a boolean indicating whether to use GTS classname or selector
    <% 1 PICK TYPEOF 'BOOLEAN' != %> <% 'First argument must be a boolean indicating whether to use GTS selector (true) or classname (false)' MSGFAIL %> IFT
    
    # Check that second argument is a list of GTS
    <% 2 PICK TYPEOF 'LIST' != %> <% 'Second argument must be a List of GTS' MSGFAIL %> IFT
    2 PICK <% <% TYPEOF 'GTS' != %> <% 'Second argument is a list that has an element that is not a GTS' MSGFAIL %> IFT %> FOREACH
    
    # Store the arguments
    'withSelector' STORE
    'gtsList' STORE
    
    # make tickbase of all GTS
    $gtsList TICKS 'ticks' STORE
    $ticks [] [] [] $ticks MAKEGTS 'baseGTS' STORE
    
    # macro: check not all NaN (for locations and elevations)
    <% UNIQUE DUP SIZE 1 == SWAP 0 GET ISNaN && %> 'isAllNaN' STORE
        
    # Return pickled dict for pandas
    {
        # ticks
        'timestamps' $ticks
        
        # loop over list of GTS
        $gtsList
        <%
            'gts' STORE
            
            # Make name
            $gts <% $withSelector %> <% TOSELECTOR %> <% NAME %> IFTE
            'name' STORE
        
            # Put on the same tick base and fill missing values with NaN
            [ $gts true mapper.replace 0 0 0 ] MAP
            'mask' STORE
            [ $mask [ $baseGTS ] [] op.negmask ] APPLY
            [ SWAP NaN mapper.replace 0 0 0 ] MAP
            0 GET 'residualSeries' STORE
            [ $gts $residualSeries ] MERGE SORT
            'gts' STORE
        
            # locations
            $gts LOCATIONS 'lon' STORE 'lat' STORE
            <% $lat @isAllNaN ! %> <% $name '.lat' + $lat %> IFT
            <% $lon @isAllNaN ! %> <% $name '.lon' + $lon %> IFT
        
            # elevations
            $gts ELEVATIONS 'elev' STORE
            <% $elev @isAllNaN ! %> <% $name '.elev' + $elev %> IFT
        
            # values        
            $name $gts VALUES
        %>
        FOREACH
    }
    ->PICKLE
%>
'ListGTStoPickledDict' STORE

Creating a new WarpScript stack accessible under variable "s".




We apply the macro `ListGTStoPickledDict` similarly than `GTStoPickledDict`,<br/>
except that it takes a list of GTS instead of a single GTS as second argument.

In [16]:
%%w -s s
[ NEWGTS 'randGTS' RENAME 1 10 <% h RAND RAND NaN RAND ADDVALUE %> FOR
  NEWGTS 'randTS' RENAME 2 11 <% h NaN NaN NaN RAND ADDVALUE %> FOR
  NEWGTS 'stringTS' RENAME 5 8 <% h NaN NaN NaN 'a string' ADDVALUE %> FOR ]
false
@ListGTStoPickledDict  # use instead '@./ListGTStoPickledDict' if ListGTStoPickledDict.mc2 is in the macros folder

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I3600000000\nI7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\nI39600000000\neX\x0b\x00\x00\x00randGTS.latq\x03]q\x04(G?\xe0\xd3\xe2m\x80\x00\x00G?\xc1\xad\xbdV\x00\x00\x00G?\xedH\x1d\x93\x80\x00\x00G?\xe3\x11\xab{\x00\x00\x00G?\xe9f0\x8c\x00\x00\x00G?n\x9c\xae\x00\x00\x00\x00G?\xd4\xe2\xf0\xae\x00\x00\x00G?\xec\xa5(J\x80\x00\x00G?\xeeRv\xf2\x80\x00\x00G?\xc80\x9c\x84\x00\x00\x00G\x7f\xf8\x00\x00\x00\x00\x00\x00eX\x0b\x00\x00\x00randGTS.lonq\x05]q\x06(G?\xed^\x11\xbe\x00\x00\x00G?\xb9$\\\xc0\x00\x00\x00G?\xeeE{\x8a\x00\x00\x00G?\xb1\x01\xd9P\x00\x00\x00G?\xcaZ\x1b\xac\x00\x00\x00G?\xda\xd1]\\\x00\x00\x00G?\xe0\x0f\xd3\xdf\x00\x00\x00G?\xd5\xf9\xc7V\x00\x00\x00G?\xab\xd4\xcf@\x00\x00\x00G?\xef\x0csR\x00\x00\x00G\x7f\xf8\x00\x00\x00\x00\x00\x00eX\x07\x00\x00\x00randGTSq\x07]q\x08(G?\xca\x84\xf7\x8f\xb5\xbeHG?\xeb$@\xd1\xb2\n\x0cG?\xe0X\x84:p+\xbaG?\xbaL\x

Contrary to our first example with a single GTS, the following cell will raise<br/>
an error if a GTS of the list has a timestamp with multiple values.

In [17]:
listGts = s.pop()
df4 = pd.DataFrame.from_dict(pkl.loads(listGts))
df4

Unnamed: 0,timestamps,randGTS.lat,randGTS.lon,randGTS,randTS,stringTS
0,3600000000,0.525865,0.917733,0.207183,,
1,7200000000,0.138115,0.098211,0.848175,0.881303,
2,10800000000,0.915053,0.945982,0.510805,0.402287,
3,14400000000,0.595907,0.066434,0.102736,0.710865,
4,18000000000,0.793724,0.205875,0.665382,0.986074,a string
5,21600000000,0.003737,0.419029,0.950145,0.668054,a string
6,25200000000,0.326351,0.501932,0.913157,0.189466,a string
7,28800000000,0.895161,0.34337,0.999936,0.069121,a string
8,32400000000,0.947566,0.054358,0.875606,0.484998,
9,36000000000,0.188984,0.97027,0.78319,0.503395,
