### This notebook shows how to make a Pandas DataFrame from WarpScript GTS

### 1. From a single GTS to a DataFrame

In [1]:
%load_ext warpscript_cellmagic
%alias_magic w warpscript

Created `%%w` as an alias for `%%warpscript`.


We will need pandas and pickle libraries.

In [2]:
import pandas as pd
import pickle as pkl

We first create a random GTS.

In [3]:
%%w -s s
NEWGTS 'randGTS' RENAME 1 10 <% h RAND RAND NaN RAND ADDVALUE %> FOR

Starting connection with 127.0.0.1:25333.
Creating a new WarpScript stack accessible under variable "s".
top: 	<GTS with 10 values>



In order to make a GTS understood by a python interpreter, we store its content in a map of lists and pickle it as a dict.<br/>
The macro `GTStoPickledDict` does this. To load it, you can place the file `macros/GTStoPickledDict.mc2`<br/>
in the macro folder of the Warp 10 platform you are sending requests to, or you can execute the following cell.

In [4]:
%%w -s s
<%
    # Documenting the macro
    'GTS BOOLEAN @GTStoPickledDict' DOC

    # Check there is two arguments on the stack
    <% DEPTH 2 < %> <% 'Macro takes two arguments' MSGFAIL %> IFT
        
    # Check that top is a boolean indicating whether to use GTS classname or selector
    <% 1 PICK TYPEOF 'BOOLEAN' != %> <% 'First argument must be a boolean indicating whether to use GTS selector (true) or classname (false)' MSGFAIL %> IFT
    
    # Check that second argument is a GTS
    <% 2 PICK TYPEOF 'GTS' != %> <% 'Second argument must be a GTS' MSGFAIL %> IFT
    
    # Store the arguments
    'withSelector' STORE
    'gts' STORE
    
    # Make name
    $gts <% $withSelector %> <% TOSELECTOR %> <% NAME %> IFTE
    'name' STORE
    
    # macro: check not all NaN (for locations and elevations)
    <% UNIQUE DUP SIZE 1 == SWAP 0 GET ISNaN && %> 'isAllNaN' STORE
        
    # Return pickled dict for pandas
    {
        # ticks
        'timestamps' $gts TICKLIST
        
        # locations
        $gts LOCATIONS 'lon' STORE 'lat' STORE
        <% $lat @isAllNaN ! %> <% $name '.lat' + $lat %> IFT
        <% $lon @isAllNaN ! %> <% $name '.lon' + $lon %> IFT
        
        # elevations
        $gts ELEVATIONS 'elev' STORE
        <% $elev @isAllNaN ! %> <% $name '.elev' + $elev %> IFT
        
        # values        
        $name $gts VALUES
    }
    ->PICKLE
%>
'GTStoPickledDict' STORE

top: 	<GTS with 10 values>



We evaluate the macro on the random GTS that was left on the stack.<br/>
Setting the first argument to false means we drop its labels for its pickled representation.

In [5]:
%%w -s s
false
@GTStoPickledDict

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I3600000000\nI7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\neX\x0b\x00\x00\x00randGTS.latq\x03]q\x04(G?\xe2\xc3\x14,\x80\x00\x00G?\xe0a\xe7\x88\x00\x00\x00G?\xd3CO\xc3\x00\x00\x00G?\xea_\xd9\xbb\x00\x00\x00G?\xd4:A\x07\x00\x00\x00G?\x93h\xb0\x10\x00\x00\x00G?\xe7\x12\xed\x0e\x80\x00\x00G?\xe2\xecz`\x80\x00\x00G?\xca\xbe\xcc\xf0\x00\x00\x00G?}K[\x00\x00\x00\x00eX\x0b\x00\x00\x00randGTS.lonq\x05]q\x06(G?\xe6\xac,/\x00\x00\x00G?\xe8\x0cq\xf1\x00\x00\x00G?\xee\x1f:j\x00\x00\x00G?\xe3\x05\x14\x01\x00\x00\x00G?\xd5\xd6-\xc8\x00\x00\x00G?\xdcb\x06\xc6\x00\x00\x00G?\xe1)[j\x00\x00\x00G?\xea\xac\x01\xa1\x00\x00\x00G?\xe4Y\x97U\x00\x00\x00G?\xd8.q\x0c\x00\x00\x00eX\x07\x00\x00\x00randGTSq\x07]q\x08(G?\xe9R\xd0JzK\xa7G?\xe5\x00\xa09\x9f|\xa3G?\xe1\xa8\x16\x8b\xd9UDG?\xca\xb7q<A\tpG?\xd9\xd3\x7f\x8b\xef\xcd\xceG?\xc0\xf4\xc9\xff\x7f\xa6\\G?\xe2\xe2\x0c\xe7\xa9\x

We then load the dict from its pickled representation and create a pandas dataframe with it.

In [6]:
gts1 = s.pop()
df1 = pd.DataFrame.from_dict(pkl.loads(gts1))
df1

Unnamed: 0,timestamps,randGTS.lat,randGTS.lon,randGTS
0,3600000000,0.586313,0.708517,0.791359
1,7200000000,0.511951,0.751519,0.656326
2,10800000000,0.300983,0.941312,0.551769
3,14400000000,0.824201,0.59437,0.208723
4,18000000000,0.316056,0.341197,0.403534
5,21600000000,0.018954,0.443483,0.13247
6,25200000000,0.72106,0.536298,0.590094
7,28800000000,0.591367,0.833497,0.235734
8,32400000000,0.208948,0.635936,0.434192
9,36000000000,0.007152,0.377835,0.29969


In the following example, we choose to keep label information.

In [7]:
%%w -s s
NEWGTS 'randGTS' RENAME 1 10 <% h RAND RAND NaN RAND ADDVALUE %> FOR
{ 'key1' 'info1' 'key2' 'info2' } RELABEL
true
@GTStoPickledDict

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I3600000000\nI7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\neX"\x00\x00\x00randGTS{key1=info1,key2=info2}.latq\x03]q\x04(G?\xbc\xcd\x8c\\\x00\x00\x00G?\xe4\xac\x91\x87\x80\x00\x00G?\xdaE\xff\n\x00\x00\x00G?\xc1/\xc0\xa4\x00\x00\x00G?\xd4vD\xd6\x00\x00\x00G?\xb7\x85\xc1\xec\x00\x00\x00G?\xea\xfe ]\x00\x00\x00G?\xe2\xc6c\x9d\x80\x00\x00G?\x9e/\xac`\x00\x00\x00G?\xe3\x15j\xa1\x80\x00\x00eX"\x00\x00\x00randGTS{key1=info1,key2=info2}.lonq\x05]q\x06(G?\xe4\x8f\xb1}\x00\x00\x00G?\xe4\xac\xf0\x8a\x00\x00\x00G?\xef\xbf8\xac\x00\x00\x00G?\xef\x8e\xd5\xe5\x00\x00\x00G?\xe8\x153\xae\x00\x00\x00G?\xd9\x0f`,\x00\x00\x00G?\xebh\x9fU\x00\x00\x00G?\x9de,`\x00\x00\x00G?\xe4\xfc\x19!\x00\x00\x00G?\xe9\x9bn\xfa\x00\x00\x00eX\x1e\x00\x00\x00randGTS{key1=info1,key2=info2}q\x07]q\x08(G?\xc0x\xd8\x9d\xe9\x02\xccG?\xedqUi\x1f,\x0eG?\xdb\r\xf6\x1apsvG?\xe2\xe7\xbd\xf9\x94\x8b

In [8]:
gts2 = s.pop()
df2 = pd.DataFrame.from_dict(pkl.loads(gts2))
df2

Unnamed: 0,timestamps,"randGTS{key1=info1,key2=info2}.lat","randGTS{key1=info1,key2=info2}.lon","randGTS{key1=info1,key2=info2}"
0,3600000000,0.112511,0.642541,0.128688
1,7200000000,0.646065,0.646111,0.920085
2,10800000000,0.410522,0.992092,0.422727
3,14400000000,0.13427,0.986186,0.590789
4,18000000000,0.319719,0.752588,0.384815
5,21600000000,0.091885,0.391563,0.595955
6,25200000000,0.843521,0.856521,0.638896
7,28800000000,0.586717,0.028706,0.310371
8,32400000000,0.029479,0.655774,0.955717
9,36000000000,0.596364,0.800224,0.453578


We can also not use geo information.

In [9]:
%%w -s s
NEWGTS 'randTS' RENAME 2 11 <% h NaN NaN NaN RAND ADDVALUE %> FOR
false
@GTStoPickledDict

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\nI39600000000\neX\x06\x00\x00\x00randTSq\x03]q\x04(G?\xdb\x99(\xd5\x92`\xc8G?\xe0.!\x7f\xa4\xd5\xe6G?\xd7T\xc3>\xc0\xef\\G?\xb8\xfb\x02\x8ap\x11XG?\xd1\xe4]\xeb\x03\x8a\xe4G?\xec\xd4\x16\x934\x0b\x14G?\xddfx}k\xb7vG?\xd8t\x01\x81\x9c\xf4\xdeG?\xdeO%\x10S\x92\xfcG?\xd5\xd5A\x15i\x8a\x86eu.'



In [10]:
gts3 = s.pop()
df3 = pd.DataFrame.from_dict(pkl.loads(gts3))
df3

Unnamed: 0,timestamps,randTS
0,7200000000,0.431223
1,10800000000,0.505631
2,14400000000,0.364549
3,18000000000,0.09758
4,21600000000,0.279563
5,25200000000,0.90089
6,28800000000,0.459379
7,32400000000,0.38208
8,36000000000,0.473581
9,39600000000,0.341141


### 2. Revert a DataFrame to a GTS

To revert a DataFrame to a GTS, we first need to convert the DataFrame into a dict.

In [11]:
gts1b = df1.to_dict('list')
gts1b

{'timestamps': [3600000000,
  7200000000,
  10800000000,
  14400000000,
  18000000000,
  21600000000,
  25200000000,
  28800000000,
  32400000000,
  36000000000],
 'randGTS.lat': [0.5863133305683732,
  0.5119512230157852,
  0.3009833721444011,
  0.824200501665473,
  0.3160555427893996,
  0.018954039551317692,
  0.7210603030398488,
  0.5913669476285577,
  0.2089477702975273,
  0.007151942700147629],
 'randGTS.lon': [0.708517162129283,
  0.7515191752463579,
  0.9413120336830616,
  0.5943698901683092,
  0.3411974385380745,
  0.4434830602258444,
  0.5362984724342823,
  0.8334968704730272,
  0.6359364185482264,
  0.37783456966280937],
 'randGTS': [0.7913590864794643,
  0.6563264012765057,
  0.5517685634064624,
  0.20872321550457285,
  0.40353382745453537,
  0.13247036910552168,
  0.5900940441755161,
  0.23573377307033538,
  0.4341923728411008,
  0.29968951018543877]}

We can push this dict directly onto the stack, since it will be automatically converted in the JVM.

In [12]:
s.push(gts1b)
s

top: 	{'randGTS.lat': [0.5863133305683732, 0.5119512230157852, 0.3009833721444011, 0.824200501665473, 0.3160555427893996, 0.018954039551317692, 0.7210603030398488, 0.5913669476285577, 0.2089477702975273, 0.007151942700147629], 'timestamps': [3600000000, 7200000000, 10800000000, 14400000000, 18000000000, 21600000000, 25200000000, 28800000000, 32400000000, 36000000000], 'randGTS': [0.7913590864794643, 0.6563264012765057, 0.5517685634064624, 0.20872321550457285, 0.40353382745453537, 0.13247036910552168, 0.5900940441755161, 0.23573377307033538, 0.4341923728411008, 0.29968951018543877], 'randGTS.lon': [0.708517162129283, 0.7515191752463579, 0.9413120336830616, 0.5943698901683092, 0.3411974385380745, 0.4434830602258444, 0.5362984724342823, 0.8334968704730272, 0.6359364185482264, 0.37783456966280937]}

Now we can use the lists contained in this map to populate a GTS.

In [13]:
%%w -s s
'dict' STORE
$dict 'timestamps' GET
$dict 'randGTS.lat' GET
$dict 'randGTS.lon' GET
[] // no elevation
$dict 'randGTS' GET
MAKEGTS 'randGTS' RENAME

top: 	<GTS with 10 values>



In [14]:
print(s.pop().toString())

randGTS{}
=3600000000/0.5863133305683732:0.708517162129283/ 0.7913590864794643
=7200000000/0.5119512230157852:0.7515191752463579/ 0.6563264012765057
=10800000000/0.3009833721444011:0.9413120336830616/ 0.5517685634064624
=14400000000/0.824200501665473:0.5943698901683092/ 0.20872321550457285
=18000000000/0.3160555427893996:0.3411974385380745/ 0.40353382745453537
=21600000000/0.018954039551317692:0.4434830602258444/ 0.13247036910552168
=25200000000/0.7210603030398488:0.5362984724342823/ 0.5900940441755161
=28800000000/0.5913669476285577:0.8334968704730272/ 0.23573377307033538
=32400000000/0.2089477702975273:0.6359364185482264/ 0.4341923728411008
=36000000000/0.007151942700147629:0.37783456966280937/ 0.29968951018543877



### 3. From a list of GTS to a DataFrame

We want to put every GTS of a list in a same DataFrame with a single `timestamps` column.<br/>
Since every GTS don't have values for the same timestamps, we need to handle missing values,<br/>
and we need to make the assumption that each GTS can have at most one value per timestamp.<br/>
It is more efficient to do that in WarpScript, as done by the macro `ListGTStoPickledDict`.

In [15]:
%%w -s s -o
<%
    # Documenting the macro
    '[GTS] BOOLEAN @ListGTStoPickledDict' DOC

    # Check there is two arguments on the stack
    <% DEPTH 2 < %> <% 'Macro takes two arguments' MSGFAIL %> IFT
        
    # Check that top is a boolean indicating whether to use GTS classname or selector
    <% 1 PICK TYPEOF 'BOOLEAN' != %> <% 'First argument must be a boolean indicating whether to use GTS selector (true) or classname (false)' MSGFAIL %> IFT
    
    # Check that second argument is a list of GTS
    <% 2 PICK TYPEOF 'LIST' != %> <% 'Second argument must be a List of GTS' MSGFAIL %> IFT
    2 PICK <% <% TYPEOF 'GTS' != %> <% 'Second argument is a list that has an element that is not a GTS' MSGFAIL %> IFT %> FOREACH
    
    # Store the arguments
    'withSelector' STORE
    'gtsList' STORE
    
    # make tickbase of all GTS
    $gtsList TICKS 'ticks' STORE
    $ticks [] [] [] $ticks MAKEGTS 'baseGTS' STORE
    
    # macro: check not all NaN (for locations and elevations)
    <% UNIQUE DUP SIZE 1 == SWAP 0 GET ISNaN && %> 'isAllNaN' STORE
        
    # Return pickled dict for pandas
    {
        # ticks
        'timestamps' $ticks
        
        # loop over list of GTS
        $gtsList
        <%
            'gts' STORE
            
            # Make name
            $gts <% $withSelector %> <% TOSELECTOR %> <% NAME %> IFTE
            'name' STORE
        
            # Put on the same tick base and fill missing values with NaN
            [ $gts true mapper.replace 0 0 0 ] MAP
            'mask' STORE
            [ $mask [ $baseGTS ] [] op.negmask ] APPLY
            [ SWAP NaN mapper.replace 0 0 0 ] MAP
            0 GET 'residualSeries' STORE
            [ $gts $residualSeries ] MERGE SORT
            'gts' STORE
        
            # locations
            $gts LOCATIONS 'lon' STORE 'lat' STORE
            <% $lat @isAllNaN ! %> <% $name '.lat' + $lat %> IFT
            <% $lon @isAllNaN ! %> <% $name '.lon' + $lon %> IFT
        
            # elevations
            $gts ELEVATIONS 'elev' STORE
            <% $elev @isAllNaN ! %> <% $name '.elev' + $elev %> IFT
        
            # values        
            $name $gts VALUES
        %>
        FOREACH
    }
    ->PICKLE
%>
'ListGTStoPickledDict' STORE

Creating a new WarpScript stack accessible under variable "s".



We apply the macro `ListGTStoPickledDict` similarly than `GTStoPickledDict`,<br/>
except that it takes a list of GTS instead of a single GTS as second argument.

In [16]:
%%w -s s
[ NEWGTS 'randGTS' RENAME 1 10 <% h RAND RAND NaN RAND ADDVALUE %> FOR
  NEWGTS 'randTS' RENAME 2 11 <% h NaN NaN NaN RAND ADDVALUE %> FOR
  NEWGTS 'stringTS' RENAME 5 8 <% h NaN NaN NaN 'a string' ADDVALUE %> FOR ]
false
@ListGTStoPickledDict

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I3600000000\nI7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\nI39600000000\neX\x0b\x00\x00\x00randGTS.latq\x03]q\x04(G?\xe0\xc6@\xe8\x00\x00\x00G?\xc6J\x96T\x00\x00\x00G?\xdfj]s\x00\x00\x00G?\xd2\x8f\x86\xf5\x00\x00\x00G?\x99[\xbe\x90\x00\x00\x00G?\xea\x11\xf2K\x00\x00\x00G?\xe4;G\xbb\x80\x00\x00G?\xd0\xbd\xf3\xe3\x00\x00\x00G?\xee\xbfpO\x80\x00\x00G?\xee}\x96s\x00\x00\x00G\x7f\xf8\x00\x00\x00\x00\x00\x00eX\x0b\x00\x00\x00randGTS.lonq\x05]q\x06(G?\xc8\xf94\x8c\x00\x00\x00G?\xc5\xb1\x83\x10\x00\x00\x00G?\xc7\x86q\x00\x00\x00\x00G?\xb6\xbe0(\x00\x00\x00G?\xefe\xe9\xdc\x00\x00\x00G?\xd6\x06\xc3\xb6\x00\x00\x00G?\xc4\xd9\xf2L\x00\x00\x00G?\xe1x}\x80\x00\x00\x00G?\xebmH/\x00\x00\x00G?cq?\x00\x00\x00\x00G\x7f\xf8\x00\x00\x00\x00\x00\x00eX\x07\x00\x00\x00randGTSq\x07]q\x08(G?\xebK\x8d\xce\xa0\xa9\xc8G?\xdc\xac\x1b\xe3\xe6/\\G?\xc6\x8a\x15BaCtG?\xedt:\x8dK\x97

Contrary to our first example with a single GTS, the following cell will raise<br/>
an error if a GTS of the list has a timestamp with multiple values.

In [17]:
listGts = s.pop()
df4 = pd.DataFrame.from_dict(pkl.loads(listGts))
df4

Unnamed: 0,timestamps,randGTS.lat,randGTS.lon,randGTS,randTS,stringTS
0,3600000000,0.524201,0.195105,0.852973,,
1,7200000000,0.174151,0.16948,0.448005,0.34749,
2,10800000000,0.490867,0.18379,0.176089,0.068454,
3,14400000000,0.29001,0.08884,0.920438,0.971243,
4,18000000000,0.024764,0.981191,0.509041,0.632721,a string
5,21600000000,0.814691,0.344163,0.201234,0.2093,a string
6,25200000000,0.632236,0.162901,0.967781,0.051159,a string
7,28800000000,0.261594,0.545958,0.464111,0.103935,a string
8,32400000000,0.960869,0.85709,0.831691,0.770611,
9,36000000000,0.952831,0.002373,0.290394,0.315191,
