### This notebook shows how to make a Pandas DataFrame from WarpScript GTS

### 1. From a single GTS to a DataFrame

In [1]:
%load_ext warpscript
%alias_magic w warpscript

Created `%%w` as an alias for `%%warpscript`.


We will need pandas and pickle libraries.

In [2]:
import pandas as pd
import pickle as pkl

We first create a random GTS.

In [3]:
%%w -s s
NEWGTS 'randGTS' RENAME 1 10 <% h RAND RAND NaN RAND ADDVALUE %> FOR

Starting connection with 127.0.0.1:25333.
Creating a new WarpScript stack accessible under variable "s".
top: 	randGTS{}<DOUBLE, 10 values>



In order to make a GTS understood by a python interpreter, we store its content in a map of lists and pickle it as a dict.<br/>
The macro `GTStoPickledDict` does this. To load it, you can place the file `macros/GTStoPickledDict.mc2`<br/>
in the macros folder of the Warp 10 platform you are sending requests to, or you can execute the following cell.

In [4]:
%%w -s s
<%
    # Documenting the macro
    'GTS BOOLEAN @GTStoPickledDict, more doc in macros/GTStoPickledDict.mc2' DOC

    # Check there is two arguments on the stack
    <% DEPTH 2 < %> <% 'Macro takes two arguments' MSGFAIL %> IFT
        
    # Check that top is a boolean indicating whether to use GTS classname or selector
    <% 1 PICK TYPEOF 'BOOLEAN' != %> <% 'First argument must be a boolean indicating whether to use GTS selector (true) or classname (false)' MSGFAIL %> IFT
    
    # Check that second argument is a GTS
    <% 2 PICK TYPEOF 'GTS' != %> <% 'Second argument must be a GTS' MSGFAIL %> IFT
    
    # Store the arguments
    'withSelector' STORE
    'gts' STORE
    
    # Make name
    $gts <% $withSelector %> <% TOSELECTOR %> <% NAME %> IFTE
    'name' STORE
    
    # macro: check not all NaN (for locations and elevations)
    <% UNIQUE DUP SIZE 1 == SWAP 0 GET ISNaN && %> 'isAllNaN' STORE
        
    # Return pickled dict for pandas
    {
        # ticks
        'timestamps' $gts TICKLIST
        
        # locations
        $gts LOCATIONS 'lon' STORE 'lat' STORE
        <% $lat @isAllNaN ! %> <% $name '.lat' + $lat %> IFT
        <% $lon @isAllNaN ! %> <% $name '.lon' + $lon %> IFT
        
        # elevations
        $gts ELEVATIONS 'elev' STORE
        <% $elev @isAllNaN ! %> <% $name '.elev' + $elev %> IFT
        
        # values        
        $name $gts VALUES
    }
    ->PICKLE
%>
'GTStoPickledDict' STORE

top: 	randGTS{}<DOUBLE, 10 values>



We evaluate the macro on the random GTS that was left on the stack.<br/>
Setting the first argument to false means we drop its labels for its pickled representation.

In [5]:
%%w -s s
false
@GTStoPickledDict # use instead '@./GTStoPickledDict' if GTStoPickledDict.mc2 is in the macros folder

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I3600000000\nI7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\neX\x0b\x00\x00\x00randGTS.latq\x03]q\x04(G?\xebO\xa1\xf3\x00\x00\x00G?\xa9\x12\x02x\x00\x00\x00G?\xebG\xbe\x93\x00\x00\x00G?\xd5\x8a\x0c\xbd\x00\x00\x00G?\xe4\xa1\xa0^\x00\x00\x00G?\xd5\x02\x87R\x00\x00\x00G?\xc8\xfc\xd3\x9e\x00\x00\x00G?\x8fI\xd8\xa0\x00\x00\x00G?\xa9\x9f\xa6\xa0\x00\x00\x00G?\xe8!b\xbe\x00\x00\x00eX\x0b\x00\x00\x00randGTS.lonq\x05]q\x06(G?\xd1\xdd\x12\xfe\x00\x00\x00G?\xb17^h\x00\x00\x00G?\xea!\r\xd5\x00\x00\x00G?\xd3]@\xf4\x00\x00\x00G?\xc5v\xe0\x0c\x00\x00\x00G?\xee\x85\xe9\xf9\x00\x00\x00G?\xde\x1f=:\x00\x00\x00G?\xd5\xa0\x06D\x00\x00\x00G?\xe2u\x14d\x00\x00\x00G?\xef\xd1\xfa#\x00\x00\x00eX\x07\x00\x00\x00randGTSq\x07]q\x08(G?\xd4q77\xfa\x90bG?\xcc\xd8v\xbc\xfe\xd00G?\xc8\x07\xd1<(\xf0\x0cG?\xedT\xc03=\x0b\xe0G?\xed\x0f{\x8c@\xe6[G?\xe8\xc5\xf2\x95(\x0f\x1eG?\xef\x8d\x8

We then load the dict from its pickled representation and create a pandas dataframe with it.

In [6]:
gts1 = s.pop()
df1 = pd.DataFrame.from_dict(pkl.loads(gts1))
df1

Unnamed: 0,timestamps,randGTS.lat,randGTS.lon,randGTS
0,3600000000,0.853471,0.279118,0.31941
1,7200000000,0.048966,0.067251,0.225356
2,10800000000,0.852508,0.816535,0.187739
3,14400000000,0.336551,0.302567,0.916596
4,18000000000,0.64473,0.16769,0.90814
5,21600000000,0.328279,0.953847,0.774164
6,25200000000,0.195216,0.470657,0.986025
7,28800000000,0.015278,0.337892,0.199682
8,32400000000,0.050046,0.576792,0.700311
9,36000000000,0.754075,0.994382,0.046004


In the following example, we choose to keep label information.

In [7]:
%%w -s s
NEWGTS 'randGTS' RENAME 1 10 <% h RAND RAND NaN RAND ADDVALUE %> FOR
{ 'key1' 'info1' 'key2' 'info2' } RELABEL
true
@GTStoPickledDict # use instead '@./GTStoPickledDict' if GTStoPickledDict.mc2 is in the macros folder

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I3600000000\nI7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\neX"\x00\x00\x00randGTS{key1=info1,key2=info2}.latq\x03]q\x04(G?\xe0\x1b0\xb3\x00\x00\x00G?\xea\xab\x818\x80\x00\x00G?\xdb\xf2i.\x00\x00\x00G?\xe9\x17_\x7f\x00\x00\x00G?\xecc\xc5\x9c\x00\x00\x00G?\xc5\xa0\xdf.\x00\x00\x00G?\xde\x18T\xb1\x00\x00\x00G?\xef\x11\x10\xd5\x80\x00\x00G?\xe6&\xdfk\x00\x00\x00G?\xdct\xdfq\x00\x00\x00eX"\x00\x00\x00randGTS{key1=info1,key2=info2}.lonq\x05]q\x06(G?\xe5\x82\x8ae\x00\x00\x00G?\xe4\x878Y\x00\x00\x00G?\xef\xdb\xee\xca\x00\x00\x00G?\xeb\x0e\xff"\x00\x00\x00G?\xe51k\xc6\x00\x00\x00G?\xdbEbj\x00\x00\x00G?\xd2\xbc\xb5\xe4\x00\x00\x00G?\xe7\x0b\xa4\x1b\x00\x00\x00G?\xefg{\xb2\x00\x00\x00G?\xe5d\xde\xe3\x00\x00\x00eX\x1e\x00\x00\x00randGTS{key1=info1,key2=info2}q\x07]q\x08(G?\xe9\xe2{\xf6\xfeK\xabG?\xee\x02\x99\xaciY\xe5G?\xda\x82Y.\xccG\x92G?\xd77\xf1\xd5!\xb4\xc

In [8]:
gts2 = s.pop()
df2 = pd.DataFrame.from_dict(pkl.loads(gts2))
df2

Unnamed: 0,timestamps,"randGTS{key1=info1,key2=info2}.lat","randGTS{key1=info1,key2=info2}.lon","randGTS{key1=info1,key2=info2}"
0,3600000000,0.503319,0.672185,0.808897
1,7200000000,0.833436,0.641506,0.937817
2,10800000000,0.436671,0.995597,0.414206
3,14400000000,0.784103,0.845581,0.36279
4,18000000000,0.887179,0.662283,0.890288
5,21600000000,0.168972,0.42611,0.062235
6,25200000000,0.470235,0.292768,0.796807
7,28800000000,0.970833,0.720171,0.817986
8,32400000000,0.692245,0.981382,0.655206
9,36000000000,0.444633,0.668563,0.816574


We can also not use geo information.

In [9]:
%%w -s s
NEWGTS 'randTS' RENAME 2 11 <% h NaN NaN NaN RAND ADDVALUE %> FOR
false
@GTStoPickledDict # use instead '@./GTStoPickledDict' if GTStoPickledDict.mc2 is in the macros folder

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\nI39600000000\neX\x06\x00\x00\x00randTSq\x03]q\x04(G?\xde\x99\xa9\x07\xde\xc1DG?\xefpT.\x08\xf9\xbcG?\xeaOt)w#UG?\xda\x92^~\xe1|`G?\xeaT\xce \x96\x97DG?\xe2w\xb4\x06\x83zmG?\xeb\xcd\xa6\xfc\xe4\xde\x80G?\xe0!\xa8T\xb4\xbe\x19G?\xd1\xea\x07\xc3\x95\x86LG?\xe7\x9a\xe0d=W3eu.'



In [10]:
gts3 = s.pop()
df3 = pd.DataFrame.from_dict(pkl.loads(gts3))
df3

Unnamed: 0,timestamps,randTS
0,7200000000,0.478129
1,10800000000,0.982462
2,14400000000,0.822199
3,18000000000,0.415184
4,21600000000,0.822852
5,25200000000,0.577112
6,28800000000,0.868854
7,32400000000,0.504109
8,36000000000,0.279909
9,39600000000,0.737656


### 2. Revert a DataFrame to a GTS

To revert a DataFrame to a GTS, we first need to convert the DataFrame into a dict.

In [11]:
gts1b = df1.to_dict('list')
gts1b

{'timestamps': [3600000000,
  7200000000,
  10800000000,
  14400000000,
  18000000000,
  21600000000,
  25200000000,
  28800000000,
  32400000000,
  36000000000],
 'randGTS.lat': [0.853470778092742,
  0.048965527676045895,
  0.8525078650563955,
  0.3365508886054158,
  0.6447297893464565,
  0.32827933318912983,
  0.1952156564220786,
  0.015277569182217121,
  0.0500461645424366,
  0.7540754042565823],
 'randGTS.lon': [0.2791182976216078,
  0.06725111044943333,
  0.8165349159389734,
  0.3025667555630207,
  0.16769028268754482,
  0.9538469184190035,
  0.4706566873937845,
  0.3378921188414097,
  0.576791949570179,
  0.994381969794631],
 'randGTS': [0.3194101378123887,
  0.22535595157899158,
  0.18773856580253356,
  0.9165955544234485,
  0.9081399669004883,
  0.7741635239803804,
  0.9860249967304233,
  0.19968162529983224,
  0.7003113756299111,
  0.04600353243964228]}

We can push this dict directly onto the stack, since it will be automatically converted in the JVM.

In [12]:
s.push(gts1b)
s

top: 	{'randGTS.lat': [0.853470778092742, 0.048965527676045895, 0.8525078650563955, 0.3365508886054158, 0.6447297893464565, 0.32827933318912983, 0.1952156564220786, 0.015277569182217121, 0.0500461645424366, 0.7540754042565823], 'timestamps': [3600000000, 7200000000, 10800000000, 14400000000, 18000000000, 21600000000, 25200000000, 28800000000, 32400000000, 36000000000], 'randGTS': [0.3194101378123887, 0.22535595157899158, 0.18773856580253356, 0.9165955544234485, 0.9081399669004883, 0.7741635239803804, 0.9860249967304233, 0.19968162529983224, 0.7003113756299111, 0.04600353243964228], 'randGTS.lon': [0.2791182976216078, 0.06725111044943333, 0.8165349159389734, 0.3025667555630207, 0.16769028268754482, 0.9538469184190035, 0.4706566873937845, 0.3378921188414097, 0.576791949570179, 0.994381969794631]}

Now we can use the lists contained in this map to populate a GTS.

In [13]:
%%w -s s
'dict' STORE
$dict 'timestamps' GET
$dict 'randGTS.lat' GET
$dict 'randGTS.lon' GET
[] // no elevation
$dict 'randGTS' GET
MAKEGTS 'randGTS' RENAME

top: 	randGTS{}<DOUBLE, 10 values>



In [14]:
print(s.pop().toString())

randGTS{}
=3600000000/0.853470778092742:0.2791182976216078/ 0.3194101378123887
=7200000000/0.048965527676045895:0.06725111044943333/ 0.22535595157899158
=10800000000/0.8525078650563955:0.8165349159389734/ 0.18773856580253356
=14400000000/0.3365508886054158:0.3025667555630207/ 0.9165955544234485
=18000000000/0.6447297893464565:0.16769028268754482/ 0.9081399669004883
=21600000000/0.32827933318912983:0.9538469184190035/ 0.7741635239803804
=25200000000/0.1952156564220786:0.4706566873937845/ 0.9860249967304233
=28800000000/0.015277569182217121:0.3378921188414097/ 0.19968162529983224
=32400000000/0.0500461645424366:0.576791949570179/ 0.7003113756299111
=36000000000/0.7540754042565823:0.994381969794631/ 0.04600353243964228



### 3. From a list of GTS to a DataFrame

We want to put every GTS of a list in a same DataFrame with a single `timestamps` column.<br/>
Since every GTS don't have values for the same timestamps, we need to handle missing values,<br/>
and we need to make the assumption that each GTS can have at most one value per timestamp.<br/>
It is more efficient to do that in WarpScript, as done by the macro `ListGTStoPickledDict`.

In [15]:
%%w -s s -o
<%
    # Documenting the macro
    '[GTS] BOOLEAN @ListGTStoPickledDict , more doc in macros/ListGTStoPickledDict.mc2' DOC

    # Check there is two arguments on the stack
    <% DEPTH 2 < %> <% 'Macro takes two arguments' MSGFAIL %> IFT
        
    # Check that top is a boolean indicating whether to use GTS classname or selector
    <% 1 PICK TYPEOF 'BOOLEAN' != %> <% 'First argument must be a boolean indicating whether to use GTS selector (true) or classname (false)' MSGFAIL %> IFT
    
    # Check that second argument is a list of GTS
    <% 2 PICK TYPEOF 'LIST' != %> <% 'Second argument must be a List of GTS' MSGFAIL %> IFT
    2 PICK <% <% TYPEOF 'GTS' != %> <% 'Second argument is a list that has an element that is not a GTS' MSGFAIL %> IFT %> FOREACH
    
    # Store the arguments
    'withSelector' STORE
    'gtsList' STORE
    
    # make tickbase of all GTS
    $gtsList TICKS 'ticks' STORE
    $ticks [] [] [] $ticks MAKEGTS 'baseGTS' STORE
    
    # macro: check not all NaN (for locations and elevations)
    <% UNIQUE DUP SIZE 1 == SWAP 0 GET ISNaN && %> 'isAllNaN' STORE
        
    # Return pickled dict for pandas
    {
        # ticks
        'timestamps' $ticks
        
        # loop over list of GTS
        $gtsList
        <%
            'gts' STORE
            
            # Make name
            $gts <% $withSelector %> <% TOSELECTOR %> <% NAME %> IFTE
            'name' STORE
        
            # Put on the same tick base and fill missing values with NaN
            [ $gts true mapper.replace 0 0 0 ] MAP
            'mask' STORE
            [ $mask [ $baseGTS ] [] op.negmask ] APPLY
            [ SWAP NaN mapper.replace 0 0 0 ] MAP
            0 GET 'residualSeries' STORE
            [ $gts $residualSeries ] MERGE SORT
            'gts' STORE
        
            # locations
            $gts LOCATIONS 'lon' STORE 'lat' STORE
            <% $lat @isAllNaN ! %> <% $name '.lat' + $lat %> IFT
            <% $lon @isAllNaN ! %> <% $name '.lon' + $lon %> IFT
        
            # elevations
            $gts ELEVATIONS 'elev' STORE
            <% $elev @isAllNaN ! %> <% $name '.elev' + $elev %> IFT
        
            # values        
            $name $gts VALUES
        %>
        FOREACH
    }
    ->PICKLE
%>
'ListGTStoPickledDict' STORE

Creating a new WarpScript stack accessible under variable "s".



We apply the macro `ListGTStoPickledDict` similarly than `GTStoPickledDict`,<br/>
except that it takes a list of GTS instead of a single GTS as second argument.

In [16]:
%%w -s s
[ NEWGTS 'randGTS' RENAME 1 10 <% h RAND RAND NaN RAND ADDVALUE %> FOR
  NEWGTS 'randTS' RENAME 2 11 <% h NaN NaN NaN RAND ADDVALUE %> FOR
  NEWGTS 'stringTS' RENAME 5 8 <% h NaN NaN NaN 'a string' ADDVALUE %> FOR ]
false
@ListGTStoPickledDict  # use instead '@./ListGTStoPickledDict' if ListGTStoPickledDict.mc2 is in the macros folder

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I3600000000\nI7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\nI39600000000\neX\x0b\x00\x00\x00randGTS.latq\x03]q\x04(G?\xd9\xc5\xea\xea\x00\x00\x00G?\xe6\x86f\x1b\x00\x00\x00G?\xe8\xbd<\x13\x80\x00\x00G?\xebu\xb3\x9d\x00\x00\x00G?\xc0~\xe5^\x00\x00\x00G?\x8cW\x91\xa0\x00\x00\x00G?\xcf\xea\x8c\x1e\x00\x00\x00G?\xd6\xa9\x08V\x00\x00\x00G?\xcdZ\xabF\x00\x00\x00G?\xd3nMm\x00\x00\x00G\x7f\xf8\x00\x00\x00\x00\x00\x00eX\x0b\x00\x00\x00randGTS.lonq\x05]q\x06(G?\xdc\xb8\xcd\xa8\x00\x00\x00G?\xdb\xc1\x91\x82\x00\x00\x00G?\xa0\xc5i\xc0\x00\x00\x00G?\xe1o\xd5@\x00\x00\x00G?\xda\xca\xd5T\x00\x00\x00G?\xe2p\xaf\xbe\x00\x00\x00G?\xe7\x18N\x91\x00\x00\x00G?\xdb\xbc\xaan\x00\x00\x00G?\xe9;\xae\xcf\x00\x00\x00G?\xe3p\xe7I\x00\x00\x00G\x7f\xf8\x00\x00\x00\x00\x00\x00eX\x07\x00\x00\x00randGTSq\x07]q\x08(G?\xe33h2\xef\xb7pG?\xd8\xab\xd2B\x9b\x8f4G?\xe0\x15\xd1\x80\xc2\x14\

Contrary to our first example with a single GTS, the following cell will raise<br/>
an error if a GTS of the list has a timestamp with multiple values.

In [17]:
listGts = s.pop()
df4 = pd.DataFrame.from_dict(pkl.loads(listGts))
df4

Unnamed: 0,timestamps,randGTS.lat,randGTS.lon,randGTS,randTS,stringTS
0,3600000000,0.402705,0.44878,0.600025,,
1,7200000000,0.703906,0.433689,0.385487,0.293542,
2,10800000000,0.7731,0.032756,0.502663,0.143137,
3,14400000000,0.858118,0.544901,0.896276,0.944003,
4,18000000000,0.128873,0.41863,0.27631,0.49465,a string
5,21600000000,0.013839,0.576256,0.739073,0.010522,a string
6,25200000000,0.249345,0.721717,0.974772,0.022031,a string
7,28800000000,0.354067,0.43339,0.974664,0.391647,a string
8,32400000000,0.229329,0.788536,0.211988,0.664815,
9,36000000000,0.303607,0.607532,0.421865,0.736374,
