### This notebook shows how to make a Pandas DataFrame from WarpScript GTS

### 1. From a single GTS to a DataFrame

In [1]:
%load_ext warpscript
%alias_magic w warpscript

Created `%w` as an alias for `%warpscript`.
Created `%%w` as an alias for `%%warpscript`.


We will need pandas and pickle libraries.

In [2]:
from __future__ import print_function
import pandas as pd
import pickle as pkl

We first create a random GTS.

In [3]:
%%w -s s -l
NEWGTS 'randGTS' RENAME 1 10 <% h RAND RAND NaN RAND ADDVALUE %> FOR

Local gateway launched on port 38481
Creating a new WarpScript stack accessible under variable "s".


top: 	randGTS{}<DOUBLE, 10 values>

In order to make a GTS understood by a python interpreter, we store its content in a map of lists and pickle it as a dict.<br/>
The macro `GTStoPickledDict` does this. To load it, you can place the file `macros/GTStoPickledDict.mc2`<br/>
in the macros folder of the Warp 10 platform you are sending requests to, or you can execute the following cell.

In [4]:
%%w -s s
<%
    # Documenting the macro
    'GTS BOOLEAN @GTStoPickledDict, more doc in macros/GTStoPickledDict.mc2' DOC

    # Check there is two arguments on the stack
    <% DEPTH 2 < %> <% 'Macro takes two arguments' MSGFAIL %> IFT
        
    # Check that top is a boolean indicating whether to use GTS classname or selector
    <% 1 PICK TYPEOF 'BOOLEAN' != %> <% 'First argument must be a boolean indicating whether to use GTS selector (true) or classname (false)' MSGFAIL %> IFT
    
    # Check that second argument is a GTS
    <% 2 PICK TYPEOF 'GTS' != %> <% 'Second argument must be a GTS' MSGFAIL %> IFT
    
    # Store the arguments
    'withSelector' STORE
    'gts' STORE
    
    # Make name
    $gts <% $withSelector %> <% TOSELECTOR %> <% NAME %> IFTE
    'name' STORE
    
    # macro: check not all NaN (for locations and elevations)
    <% UNIQUE DUP SIZE 1 == SWAP 0 GET ISNaN && %> 'isAllNaN' STORE
        
    # Return pickled dict for pandas
    {
        # ticks
        'timestamps' $gts TICKLIST
        
        # locations
        $gts LOCATIONS 'lon' STORE 'lat' STORE
        <% $lat @isAllNaN ! %> <% $name '.lat' + $lat %> IFT
        <% $lon @isAllNaN ! %> <% $name '.lon' + $lon %> IFT
        
        # elevations
        $gts ELEVATIONS 'elev' STORE
        <% $elev @isAllNaN ! %> <% $name '.elev' + $elev %> IFT
        
        # values        
        $name $gts VALUES
    }
    ->PICKLE
%>
'GTStoPickledDict' STORE

top: 	randGTS{}<DOUBLE, 10 values>

We evaluate the macro on the random GTS that was left on the stack.<br/>
Setting the first argument to false means we drop its labels for its pickled representation.

In [5]:
%%w -s s
false
@GTStoPickledDict # use instead '@./GTStoPickledDict' if GTStoPickledDict.mc2 is in the macros folder

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I3600000000\nI7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\neX\x0b\x00\x00\x00randGTS.latq\x03]q\x04(G?\xd1\x17`;\x00\x00\x00G?\xd6\x14 \x97\x00\x00\x00G?\xe4\x1c\xa7\\\x80\x00\x00G?\xd0\xaf\xbe\xcc\x00\x00\x00G?\xe4z\xb9!\x00\x00\x00G?\xeaDwO\x80\x00\x00G?\xe2\xadD\x0b\x00\x00\x00G?\xc3\x13\xfcd\x00\x00\x00G?\xc6\x8b\xfb\x08\x00\x00\x00G?\xd2\r\xcc\x9b\x00\x00\x00eX\x0b\x00\x00\x00randGTS.lonq\x05]q\x06(G?\xe8\xab6\x9c\x00\x00\x00G?\xed\x14\x87\x92\x00\x00\x00G?\xe8\xcd\xd0:\x00\x00\x00G?\xd9\xda\x0b\xc4\x00\x00\x00G?\xe8;\x12\xb8\x00\x00\x00G?\xde(\xb3\xd8\x00\x00\x00G?\xefD<~\x00\x00\x00G?\xe0\x8b\xceh\x00\x00\x00G?\xe6\xc6\xb8=\x00\x00\x00G?\xe9\xa2\xf9;\x00\x00\x00eX\x07\x00\x00\x00randGTSq\x07]q\x08(G?\xd1W\'\x86h\txG?\xea\r\xed\x99\xa0\x9b\xdfG?\xd5\xaf"v\xc35\xd8G?\xe2OqAr\x08\x0fG?\xd7\x19<\xef\x1a\x97\xdeG?\xa7\x0c\t$\xac\xf9\xf0G?\xecR;\x0

We then load the dict from its pickled representation and create a pandas dataframe with it.

In [6]:
gts1 = s.pop()
df1 = pd.DataFrame.from_dict(pkl.loads(gts1))
df1

Unnamed: 0,timestamps,randGTS.lat,randGTS.lon,randGTS
0,3600000000,0.267052,0.7709,0.270944
1,7200000000,0.344978,0.908756,0.8142
2,10800000000,0.628498,0.775124,0.338814
3,14400000000,0.260727,0.403933,0.572198
4,18000000000,0.639981,0.757211,0.360915
5,21600000000,0.820858,0.471234,0.045014
6,25200000000,0.583651,0.97708,0.885038
7,28800000000,0.149047,0.517066,0.54808
8,32400000000,0.176147,0.711758,0.817855
9,36000000000,0.282092,0.801144,0.469961


In the following example, we choose to keep label information.

In [7]:
%%w -s s
NEWGTS 'randGTS' RENAME 1 10 <% h RAND RAND NaN RAND ADDVALUE %> FOR
{ 'key1' 'info1' 'key2' 'info2' } RELABEL
true
@GTStoPickledDict # use instead '@./GTStoPickledDict' if GTStoPickledDict.mc2 is in the macros folder

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I3600000000\nI7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\neX"\x00\x00\x00randGTS{key1=info1,key2=info2}.latq\x03]q\x04(G?\xea\x8b9\n\x80\x00\x00G?\xea\x1f\xea\x1f\x80\x00\x00G?\xe8,\x9fg\x00\x00\x00G?\xe2\xdfi\x86\x00\x00\x00G?\xe1\xd3\xe5\xf0\x80\x00\x00G?\xe0\xc9 \xe7\x00\x00\x00G?\xdbk@\x0c\x00\x00\x00G?\xef\xdc>\xc5\x00\x00\x00G?\xb4\x0b\x98\x84\x00\x00\x00G?\xefb\xe3\xce\x80\x00\x00eX"\x00\x00\x00randGTS{key1=info1,key2=info2}.lonq\x05]q\x06(G?\xe9\x0eu\x81\x00\x00\x00G?\xdc\xfeh\x9a\x00\x00\x00G?\xbasj\xf8\x00\x00\x00G?\xe8\x19\x7f1\x00\x00\x00G?\xe8\x13\xb5\x01\x00\x00\x00G?\xe6\x91\x02t\x00\x00\x00G?\xe8[/!\x00\x00\x00G?\xc0h7P\x00\x00\x00G?\xe3\xa4\x95I\x00\x00\x00G?\x9a\x07\x08@\x00\x00\x00eX\x1e\x00\x00\x00randGTS{key1=info1,key2=info2}q\x07]q\x08(G?\xe5\x170\xc5\xa1n8G?\xe0/\xfe)@KpG?\xe7\xaa\x1bA\x9at\x1aG?\xcb\x89\x0bx\x12\xcf(G?\xcf!

In [8]:
gts2 = s.pop()
df2 = pd.DataFrame.from_dict(pkl.loads(gts2))
df2

Unnamed: 0,timestamps,"randGTS{key1=info1,key2=info2}.lat","randGTS{key1=info1,key2=info2}.lon","randGTS{key1=info1,key2=info2}"
0,3600000000,0.829495,0.783015,0.659081
1,7200000000,0.816396,0.453028,0.505858
2,10800000000,0.755447,0.103324,0.739515
3,14400000000,0.589772,0.753112,0.21512
4,18000000000,0.557116,0.752406,0.243219
5,21600000000,0.524552,0.705201,0.437826
6,25200000000,0.428421,0.761131,0.738199
7,28800000000,0.995635,0.12818,0.206785
8,32400000000,0.078302,0.613841,0.720892
9,36000000000,0.980822,0.025417,0.284827


We can also not use geo information.

In [9]:
%%w -s s
NEWGTS 'randTS' RENAME 2 11 <% h NaN NaN NaN RAND ADDVALUE %> FOR
false
@GTStoPickledDict # use instead '@./GTStoPickledDict' if GTStoPickledDict.mc2 is in the macros folder

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\nI39600000000\neX\x06\x00\x00\x00randTSq\x03]q\x04(G?\xc4\xf6\x0b\xb7\xe7\x11\xecG?\x89\x80/P~\x10\xc0G?\xe51\xd42\xa8\xc3SG?\xe8T*F\xd0\xe3XG?\xeb\xc7%u\xca\x1b\x93G?\xd1\xb2\xf8\x10/\xb8:G?\xd2\xe1D\x9eD\xf6,G?\xe6\xa7\tD\xf8GSG?\xc0\x13V$\xf5+,G?\xbd&\xac\xcd\xa6\xbbpeu.'

In [10]:
gts3 = s.pop()
df3 = pd.DataFrame.from_dict(pkl.loads(gts3))
df3

Unnamed: 0,timestamps,randTS
0,7200000000,0.163759
1,10800000000,0.012452
2,14400000000,0.662333
3,18000000000,0.760274
4,21600000000,0.86806
5,25200000000,0.276548
6,28800000000,0.294999
7,32400000000,0.70789
8,36000000000,0.12559
9,39600000000,0.113871


### 2. Revert a DataFrame to a GTS

To revert a DataFrame to a GTS, we first need to convert the DataFrame into a dict.

In [11]:
gts1b = df1.to_dict('list')
gts1b

{'timestamps': [3600000000,
  7200000000,
  10800000000,
  14400000000,
  18000000000,
  21600000000,
  25200000000,
  28800000000,
  32400000000,
  36000000000],
 'randGTS.lat': [0.26705175172537565,
  0.34497847314924,
  0.6284977728500962,
  0.2607266418635845,
  0.6399808544665575,
  0.8208576729521155,
  0.5836506094783545,
  0.14904742129147053,
  0.17614686861634254,
  0.2820922387763858],
 'randGTS.lon': [0.7709000632166862,
  0.9087560512125492,
  0.7751237042248249,
  0.40393346920609474,
  0.7572110742330551,
  0.4712342843413353,
  0.977079626172781,
  0.5170661956071854,
  0.7117577735334635,
  0.8011442329734564],
 'randGTS': [0.2709444820940443,
  0.8142002106222853,
  0.3388143691816743,
  0.5721975591879999,
  0.36091540670383704,
  0.0450137002248886,
  0.8850379097698539,
  0.5480798980462472,
  0.8178545814337475,
  0.46996075349765365]}

We can push this dict directly onto the stack, since it will be automatically converted in the JVM.

In [12]:
s.push(gts1b)
s

top: 	{'randGTS.lat': [0.26705175172537565, 0.34497847314924, 0.6284977728500962, 0.2607266418635845, 0.6399808544665575, 0.8208576729521155, 0.5836506094783545, 0.14904742129147053, 0.17614686861634254, 0.2820922387763858], 'timestamps': [3600000000, 7200000000, 10800000000, 14400000000, 18000000000, 21600000000, 25200000000, 28800000000, 32400000000, 36000000000], 'randGTS': [0.2709444820940443, 0.8142002106222853, 0.3388143691816743, 0.5721975591879999, 0.36091540670383704, 0.0450137002248886, 0.8850379097698539, 0.5480798980462472, 0.8178545814337475, 0.46996075349765365], 'randGTS.lon': [0.7709000632166862, 0.9087560512125492, 0.7751237042248249, 0.40393346920609474, 0.7572110742330551, 0.4712342843413353, 0.977079626172781, 0.5170661956071854, 0.7117577735334635, 0.8011442329734564]}

Now we can use the lists contained in this map to populate a GTS.

In [13]:
%%w -s s
'dict' STORE
$dict 'timestamps' GET
$dict 'randGTS.lat' GET
$dict 'randGTS.lon' GET
[] // no elevation
$dict 'randGTS' GET
MAKEGTS 'randGTS' RENAME

top: 	randGTS{}<DOUBLE, 10 values>

In [14]:
print(s.pop().toString())

randGTS{}
=3600000000/0.26705175172537565:0.7709000632166862/ 0.2709444820940443
=7200000000/0.34497847314924:0.9087560512125492/ 0.8142002106222853
=10800000000/0.6284977728500962:0.7751237042248249/ 0.3388143691816743
=14400000000/0.2607266418635845:0.40393346920609474/ 0.5721975591879999
=18000000000/0.6399808544665575:0.7572110742330551/ 0.36091540670383704
=21600000000/0.8208576729521155:0.4712342843413353/ 0.0450137002248886
=25200000000/0.5836506094783545:0.977079626172781/ 0.8850379097698539
=28800000000/0.14904742129147053:0.5170661956071854/ 0.5480798980462472
=32400000000/0.17614686861634254:0.7117577735334635/ 0.8178545814337475
=36000000000/0.2820922387763858:0.8011442329734564/ 0.46996075349765365



### 3. From a list of GTS to a DataFrame

We want to put every GTS of a list in a same DataFrame with a single `timestamps` column.<br/>
Since every GTS don't have values for the same timestamps, we need to handle missing values,<br/>
and we need to make the assumption that each GTS can have at most one value per timestamp.<br/>
It is more efficient to do that in WarpScript, as done by the macro `ListGTStoPickledDict`.

If there are many unaligned ticks, consider converting to lists of single column dataFrame or Series instead.

In [15]:
%%w -s s -o -l
<%
    # Documenting the macro
    '[GTS] BOOLEAN @ListGTStoPickledDict , more doc in macros/ListGTStoPickledDict.mc2' DOC

    # Check there is two arguments on the stack
    <% DEPTH 2 < %> <% 'Macro takes two arguments' MSGFAIL %> IFT
        
    # Check that top is a boolean indicating whether to use GTS classname or selector
    <% 1 PICK TYPEOF 'BOOLEAN' != %> <% 'First argument must be a boolean indicating whether to use GTS selector (true) or classname (false)' MSGFAIL %> IFT
    
    # Check that second argument is a list of GTS
    <% 2 PICK TYPEOF 'LIST' != %> <% 'Second argument must be a List of GTS' MSGFAIL %> IFT
    2 PICK <% <% TYPEOF 'GTS' != %> <% 'Second argument is a list that has an element that is not a GTS' MSGFAIL %> IFT %> FOREACH
    
    # Store the arguments
    'withSelector' STORE
    'gtsList' STORE
    
    # make tickbase of all GTS
    $gtsList TICKS 'ticks' STORE
    $ticks [] [] [] $ticks MAKEGTS 'baseGTS' STORE
    
    # macro: check not all NaN (for locations and elevations)
    <% UNIQUE DUP SIZE 1 == SWAP 0 GET ISNaN && %> 'isAllNaN' STORE
        
    # Return pickled dict for pandas
    {
        # ticks
        'timestamps' $ticks
        
        # loop over list of GTS
        $gtsList
        <%
            'gts' STORE
            
            # Make name
            $gts <% $withSelector %> <% TOSELECTOR %> <% NAME %> IFTE
            'name' STORE
        
            # Put on the same tick base and fill missing values with NaN
            [ $gts true mapper.replace 0 0 0 ] MAP
            'mask' STORE
            [ $mask [ $baseGTS ] [] op.negmask ] APPLY
            [ SWAP NaN mapper.replace 0 0 0 ] MAP
            0 GET 'residualSeries' STORE
            [ $gts $residualSeries ] MERGE SORT
            'gts' STORE
        
            # locations
            $gts LOCATIONS 'lon' STORE 'lat' STORE
            <% $lat @isAllNaN ! %> <% $name '.lat' + $lat %> IFT
            <% $lon @isAllNaN ! %> <% $name '.lon' + $lon %> IFT
        
            # elevations
            $gts ELEVATIONS 'elev' STORE
            <% $elev @isAllNaN ! %> <% $name '.elev' + $elev %> IFT
        
            # values        
            $name $gts VALUES
        %>
        FOREACH
    }
    ->PICKLE
%>
'ListGTStoPickledDict' STORE

Creating a new WarpScript stack accessible under variable "s".




We apply the macro `ListGTStoPickledDict` similarly than `GTStoPickledDict`,<br/>
except that it takes a list of GTS instead of a single GTS as second argument.

In [16]:
%%w -s s
[ NEWGTS 'randGTS' RENAME 1 10 <% h RAND RAND NaN RAND ADDVALUE %> FOR
  NEWGTS 'randTS' RENAME 2 11 <% h NaN NaN NaN RAND ADDVALUE %> FOR
  NEWGTS 'stringTS' RENAME 5 8 <% h NaN NaN NaN 'a string' ADDVALUE %> FOR ]
false
@ListGTStoPickledDict  # use instead '@./ListGTStoPickledDict' if ListGTStoPickledDict.mc2 is in the macros folder

top: 	b"\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I3600000000\nI7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\nI39600000000\neX\x0b\x00\x00\x00randGTS.latq\x03]q\x04(G?\xd4\x15\xc4\x8a\x00\x00\x00G?\xe3w\x83\xb5\x00\x00\x00G?\xe0\x84\x03\x8d\x80\x00\x00G?\xc0l\x1c\xb2\x00\x00\x00G?\xd4\x89~\x89\x00\x00\x00G?\xd4Hb\xc8\x00\x00\x00G?\xe4X06\x00\x00\x00G?\xe8\xe2\\\xeb\x80\x00\x00G?\xdd2\xa1\x1e\x00\x00\x00G?\xe2\x86-+\x00\x00\x00G\x7f\xf8\x00\x00\x00\x00\x00\x00eX\x0b\x00\x00\x00randGTS.lonq\x05]q\x06(G?\xea*\xe7\x97\x00\x00\x00G?\xe8\x8d\x98\x1c\x00\x00\x00G?\xef\xde\xe4;\x00\x00\x00G?\xd2\xfaY\xc6\x00\x00\x00G?\xef}\xb82\x00\x00\x00G?\xd2\xe8\x95\xf6\x00\x00\x00G?\xeb\xe3D\x80\x00\x00\x00G?\xc3*K,\x00\x00\x00G?\xe5d+=\x00\x00\x00G?\xe1\xcf%E\x00\x00\x00G\x7f\xf8\x00\x00\x00\x00\x00\x00eX\x07\x00\x00\x00randGTSq\x07]q\x08(G?\xa5\x87Vq\x9b\x120G?\xeew\xc2P\xad\xc1\xd1G?\x83\x1d\xfd\xc3\xea\xfe\xc0G?\xc

Contrary to our first example with a single GTS, the following cell will raise<br/>
an error if a GTS of the list has a timestamp with multiple values.

In [17]:
listGts = s.pop()
df4 = pd.DataFrame.from_dict(pkl.loads(listGts))
df4

Unnamed: 0,timestamps,randGTS.lat,randGTS.lon,randGTS,randTS,stringTS
0,3600000000,0.313829,0.817737,0.042048,,
1,7200000000,0.608339,0.767284,0.952119,0.416859,
2,10800000000,0.516115,0.995958,0.009335,0.094262,
3,14400000000,0.128299,0.29653,0.152992,0.180889,
4,18000000000,0.320892,0.984097,0.417149,0.63813,a string
5,21600000000,0.316918,0.295446,0.141646,0.766729,a string
6,25200000000,0.635765,0.871493,0.955917,0.505213,a string
7,28800000000,0.777632,0.149728,0.872746,0.490997,a string
8,32400000000,0.456215,0.668478,0.650703,0.324468,
9,36000000000,0.578879,0.556536,0.327971,0.523338,
