### This notebook shows how to make a Pandas DataFrame from WarpScript GTS

### 1. From a single GTS to a DataFrame

In [1]:
%load_ext warpscript
%alias_magic w warpscript

Created `%w` as an alias for `%warpscript`.
Created `%%w` as an alias for `%%warpscript`.


We will need pandas and pickle libraries.

In [2]:
import pandas as pd
import pickle as pkl

We first create a random GTS.

In [3]:
%%w -s s -l
NEWGTS 'randGTS' RENAME 1 10 <% h RAND RAND NaN RAND ADDVALUE %> FOR

Local gateway launched on port 37937
Creating a new WarpScript stack accessible under variable "s".


top: 	randGTS{}<DOUBLE, 10 values>

In order to make a GTS understood by a python interpreter, we store its content in a map of lists and pickle it as a dict.<br/>
The macro `GTStoPickledDict` does this. To load it, you can place the file `macros/GTStoPickledDict.mc2`<br/>
in the macros folder of the Warp 10 platform you are sending requests to, or you can execute the following cell.

In [4]:
%%w -s s
<%
    # Documenting the macro
    'GTS BOOLEAN @GTStoPickledDict, more doc in macros/GTStoPickledDict.mc2' DOC

    # Check there is two arguments on the stack
    <% DEPTH 2 < %> <% 'Macro takes two arguments' MSGFAIL %> IFT
        
    # Check that top is a boolean indicating whether to use GTS classname or selector
    <% 1 PICK TYPEOF 'BOOLEAN' != %> <% 'First argument must be a boolean indicating whether to use GTS selector (true) or classname (false)' MSGFAIL %> IFT
    
    # Check that second argument is a GTS
    <% 2 PICK TYPEOF 'GTS' != %> <% 'Second argument must be a GTS' MSGFAIL %> IFT
    
    # Store the arguments
    'withSelector' STORE
    'gts' STORE
    
    # Make name
    $gts <% $withSelector %> <% TOSELECTOR %> <% NAME %> IFTE
    'name' STORE
    
    # macro: check not all NaN (for locations and elevations)
    <% UNIQUE DUP SIZE 1 == SWAP 0 GET ISNaN && %> 'isAllNaN' STORE
        
    # Return pickled dict for pandas
    {
        # ticks
        'timestamps' $gts TICKLIST
        
        # locations
        $gts LOCATIONS 'lon' STORE 'lat' STORE
        <% $lat @isAllNaN ! %> <% $name '.lat' + $lat %> IFT
        <% $lon @isAllNaN ! %> <% $name '.lon' + $lon %> IFT
        
        # elevations
        $gts ELEVATIONS 'elev' STORE
        <% $elev @isAllNaN ! %> <% $name '.elev' + $elev %> IFT
        
        # values        
        $name $gts VALUES
    }
    ->PICKLE
%>
'GTStoPickledDict' STORE

top: 	randGTS{}<DOUBLE, 10 values>

We evaluate the macro on the random GTS that was left on the stack.<br/>
Setting the first argument to false means we drop its labels for its pickled representation.

In [5]:
%%w -s s
false
@GTStoPickledDict # use instead '@./GTStoPickledDict' if GTStoPickledDict.mc2 is in the macros folder

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I3600000000\nI7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\neX\x0b\x00\x00\x00randGTS.latq\x03]q\x04(G?\xda\xed\xe7.\x00\x00\x00G?\xd0X|\xb1\x00\x00\x00G?\xd1C,p\x00\x00\x00G?\xd89\xabl\x00\x00\x00G?\xde"=\x91\x00\x00\x00G?\xe5Z|\xb9\x00\x00\x00G?\xe4R\x97\'\x80\x00\x00G?\xe3\xbdl\xe0\x00\x00\x00G?\xc8\x92J\xce\x00\x00\x00G?\x95\xc1q0\x00\x00\x00eX\x0b\x00\x00\x00randGTS.lonq\x05]q\x06(G?\xe6\xcf\x12D\x00\x00\x00G?\xed\x83\x19\xfa\x00\x00\x00G?\xe8\x1ds\xb1\x00\x00\x00G?\xcaU\x8c|\x00\x00\x00G?\xe18N\x86\x00\x00\x00G?\xe1e\xbd\x17\x00\x00\x00G?\xef\x81Q\xa4\x00\x00\x00G?\xe9\xb2\x82u\x00\x00\x00G?\xdc\x89U\x8c\x00\x00\x00G?\xe1"o]\x00\x00\x00eX\x07\x00\x00\x00randGTSq\x07]q\x08(G?\xe9\xdeH\x07\x0e\xe0!G?\xedz\x80\x06\xe9\x0b\xebG?\xe3\xb0\xa1\xd5[\xdbEG?\xba\x8b\x93\xec\xeb\x878G?\xeb1k\x8f\xee\xb6\x89G?\xe6tr@\x9d\xcddG?\xc2\xf5\xc2s\x8e\xd5\xf0G?\x

We then load the dict from its pickled representation and create a pandas dataframe with it.

In [6]:
gts1 = s.pop()
df1 = pd.DataFrame.from_dict(pkl.loads(gts1))
df1

Unnamed: 0,timestamps,randGTS.lat,randGTS.lon,randGTS
0,3600000000,0.42077,0.712777,0.808384
1,7200000000,0.255401,0.922254,0.921204
2,10800000000,0.269725,0.753595,0.615312
3,14400000000,0.37852,0.205736,0.103692
4,18000000000,0.47084,0.538123,0.849783
5,21600000000,0.667296,0.543669,0.701715
6,25200000000,0.635082,0.984536,0.148125
7,28800000000,0.616873,0.803041,0.075746
8,32400000000,0.191964,0.445882,0.663464
9,36000000000,0.021246,0.535453,0.254766


In the following example, we choose to keep label information.

In [7]:
%%w -s s
NEWGTS 'randGTS' RENAME 1 10 <% h RAND RAND NaN RAND ADDVALUE %> FOR
{ 'key1' 'info1' 'key2' 'info2' } RELABEL
true
@GTStoPickledDict # use instead '@./GTStoPickledDict' if GTStoPickledDict.mc2 is in the macros folder

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I3600000000\nI7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\neX"\x00\x00\x00randGTS{key1=info1,key2=info2}.latq\x03]q\x04(G?\xbe;\x0bP\x00\x00\x00G?\xc5lb\x1c\x00\x00\x00G?\xee\xecG\x17\x00\x00\x00G?\xaf\xac\xd1\x08\x00\x00\x00G?\xeb\xde\xe5c\x80\x00\x00G?\xe6\xdey\xa7\x80\x00\x00G?\xcf\x8a\xfe\xc0\x00\x00\x00G?\xef\xb8b\xf2\x00\x00\x00G?\xe9"$\\\x80\x00\x00G?\xe1\xed\xdc\xc1\x80\x00\x00eX"\x00\x00\x00randGTS{key1=info1,key2=info2}.lonq\x05]q\x06(G?\xdf\xd7R>\x00\x00\x00G?\xee\xfaR\xb2\x00\x00\x00G?\xeeu\x1c\x9b\x00\x00\x00G?\xda\x94\x08\x94\x00\x00\x00G?\xdb\xe0`\xd0\x00\x00\x00G?\xd91\xa1\x8c\x00\x00\x00G?\xb7p\x00x\x00\x00\x00G?\xef\x13"`\x00\x00\x00G?\xb3\x8ccH\x00\x00\x00G?\xe6\xe9>+\x00\x00\x00eX\x1e\x00\x00\x00randGTS{key1=info1,key2=info2}q\x07]q\x08(G?\xce\xa5`1Z\xf2\x10G?\xc7\x9c\xca\x1f\xa3\x9b\xe4G?\xd5\xcfPX\xa6h\xe0G?\xed?;\xd9\x96\xbb\x

In [8]:
gts2 = s.pop()
df2 = pd.DataFrame.from_dict(pkl.loads(gts2))
df2

Unnamed: 0,timestamps,"randGTS{key1=info1,key2=info2}.lat","randGTS{key1=info1,key2=info2}.lon","randGTS{key1=info1,key2=info2}"
0,3600000000,0.118088,0.497517,0.239422
1,7200000000,0.16737,0.968057,0.184472
2,10800000000,0.966342,0.951796,0.340778
3,14400000000,0.061865,0.415285,0.913969
4,18000000000,0.870959,0.43557,0.010452
5,21600000000,0.714658,0.393654,0.528644
6,25200000000,0.246429,0.091553,0.666447
7,28800000000,0.991258,0.971086,0.602124
8,32400000000,0.785418,0.076361,0.932695
9,36000000000,0.560286,0.715972,0.772038


We can also not use geo information.

In [9]:
%%w -s s
NEWGTS 'randTS' RENAME 2 11 <% h NaN NaN NaN RAND ADDVALUE %> FOR
false
@GTStoPickledDict # use instead '@./GTStoPickledDict' if GTStoPickledDict.mc2 is in the macros folder

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\nI39600000000\neX\x06\x00\x00\x00randTSq\x03]q\x04(G?\xbc\xff\x9d\xf3\x8d\xc7\xd8G?\xed\xc0[p\xf7\xf3\xddG?\xee\xbb\x1d\xed`O\x83G?\xeb\x1a\xc8\xc1f\xad\x10G?\xee\xb6-L4l\x10G?\xe7\x1ca,o\xfa<G?\xe6\x00->\xda\xce\xdeG?\xe2B\x8b\x0fR\x04\xedG?\xea\x057\x96\xb2f G?\xc6\x0c\xa7(V\t\x1ceu.'

In [10]:
gts3 = s.pop()
df3 = pd.DataFrame.from_dict(pkl.loads(gts3))
df3

Unnamed: 0,timestamps,randTS
0,7200000000,0.113275
1,10800000000,0.929731
2,14400000000,0.960341
3,18000000000,0.84702
4,21600000000,0.959738
5,25200000000,0.722214
6,28800000000,0.687522
7,32400000000,0.570623
8,36000000000,0.813137
9,39600000000,0.172261


### 2. Revert a DataFrame to a GTS

To revert a DataFrame to a GTS, we first need to convert the DataFrame into a dict.

In [11]:
gts1b = df1.to_dict('list')
gts1b

{'timestamps': [3600000000,
  7200000000,
  10800000000,
  14400000000,
  18000000000,
  21600000000,
  25200000000,
  28800000000,
  32400000000,
  36000000000],
 'randGTS.lat': [0.4207704495638609,
  0.2554008224979043,
  0.2697249501943588,
  0.3785198740661144,
  0.4708398738875985,
  0.6672958005219698,
  0.6350818416103721,
  0.6168732047080994,
  0.19196448381990194,
  0.021245735697448254],
 'randGTS.lon': [0.7127772644162178,
  0.9222535975277424,
  0.7535952050238848,
  0.20573574118316174,
  0.5381233803927898,
  0.5436692666262388,
  0.9845359995961189,
  0.8030407223850489,
  0.4458822123706341,
  0.5354534927755594],
 'randGTS': [0.8083839547971402,
  0.9212036261527577,
  0.6153115431942316,
  0.10369228872471303,
  0.84978273498355,
  0.7017146360434876,
  0.1481249870536625,
  0.07574645773924882,
  0.6634639925847672,
  0.25476563188485957]}

We can push this dict directly onto the stack, since it will be automatically converted in the JVM.

In [12]:
s.push(gts1b)
s

top: 	{'randGTS.lat': [0.4207704495638609, 0.2554008224979043, 0.2697249501943588, 0.3785198740661144, 0.4708398738875985, 0.6672958005219698, 0.6350818416103721, 0.6168732047080994, 0.19196448381990194, 0.021245735697448254], 'timestamps': [3600000000, 7200000000, 10800000000, 14400000000, 18000000000, 21600000000, 25200000000, 28800000000, 32400000000, 36000000000], 'randGTS': [0.8083839547971402, 0.9212036261527577, 0.6153115431942316, 0.10369228872471303, 0.84978273498355, 0.7017146360434876, 0.1481249870536625, 0.07574645773924882, 0.6634639925847672, 0.25476563188485957], 'randGTS.lon': [0.7127772644162178, 0.9222535975277424, 0.7535952050238848, 0.20573574118316174, 0.5381233803927898, 0.5436692666262388, 0.9845359995961189, 0.8030407223850489, 0.4458822123706341, 0.5354534927755594]}

Now we can use the lists contained in this map to populate a GTS.

In [13]:
%%w -s s
'dict' STORE
$dict 'timestamps' GET
$dict 'randGTS.lat' GET
$dict 'randGTS.lon' GET
[] // no elevation
$dict 'randGTS' GET
MAKEGTS 'randGTS' RENAME

top: 	randGTS{}<DOUBLE, 10 values>

In [14]:
print(s.pop().toString())

randGTS{}
=3600000000/0.4207704495638609:0.7127772644162178/ 0.8083839547971402
=7200000000/0.2554008224979043:0.9222535975277424/ 0.9212036261527577
=10800000000/0.2697249501943588:0.7535952050238848/ 0.6153115431942316
=14400000000/0.3785198740661144:0.20573574118316174/ 0.10369228872471303
=18000000000/0.4708398738875985:0.5381233803927898/ 0.84978273498355
=21600000000/0.6672958005219698:0.5436692666262388/ 0.7017146360434876
=25200000000/0.6350818416103721:0.9845359995961189/ 0.1481249870536625
=28800000000/0.6168732047080994:0.8030407223850489/ 0.07574645773924882
=32400000000/0.19196448381990194:0.4458822123706341/ 0.6634639925847672
=36000000000/0.021245735697448254:0.5354534927755594/ 0.25476563188485957



### 3. From a list of GTS to a DataFrame

We want to put every GTS of a list in a same DataFrame with a single `timestamps` column.<br/>
Since every GTS don't have values for the same timestamps, we need to handle missing values,<br/>
and we need to make the assumption that each GTS can have at most one value per timestamp.<br/>
It is more efficient to do that in WarpScript, as done by the macro `ListGTStoPickledDict`.

If there are many unaligned ticks, consider converting to lists of single column dataFrame or Series instead.

In [15]:
%%w -s s -o -l
<%
    # Documenting the macro
    '[GTS] BOOLEAN @ListGTStoPickledDict , more doc in macros/ListGTStoPickledDict.mc2' DOC

    # Check there is two arguments on the stack
    <% DEPTH 2 < %> <% 'Macro takes two arguments' MSGFAIL %> IFT
        
    # Check that top is a boolean indicating whether to use GTS classname or selector
    <% 1 PICK TYPEOF 'BOOLEAN' != %> <% 'First argument must be a boolean indicating whether to use GTS selector (true) or classname (false)' MSGFAIL %> IFT
    
    # Check that second argument is a list of GTS
    <% 2 PICK TYPEOF 'LIST' != %> <% 'Second argument must be a List of GTS' MSGFAIL %> IFT
    2 PICK <% <% TYPEOF 'GTS' != %> <% 'Second argument is a list that has an element that is not a GTS' MSGFAIL %> IFT %> FOREACH
    
    # Store the arguments
    'withSelector' STORE
    'gtsList' STORE
    
    # make tickbase of all GTS
    $gtsList TICKS 'ticks' STORE
    $ticks [] [] [] $ticks MAKEGTS 'baseGTS' STORE
    
    # macro: check not all NaN (for locations and elevations)
    <% UNIQUE DUP SIZE 1 == SWAP 0 GET ISNaN && %> 'isAllNaN' STORE
        
    # Return pickled dict for pandas
    {
        # ticks
        'timestamps' $ticks
        
        # loop over list of GTS
        $gtsList
        <%
            'gts' STORE
            
            # Make name
            $gts <% $withSelector %> <% TOSELECTOR %> <% NAME %> IFTE
            'name' STORE
        
            # Put on the same tick base and fill missing values with NaN
            [ $gts true mapper.replace 0 0 0 ] MAP
            'mask' STORE
            [ $mask [ $baseGTS ] [] op.negmask ] APPLY
            [ SWAP NaN mapper.replace 0 0 0 ] MAP
            0 GET 'residualSeries' STORE
            [ $gts $residualSeries ] MERGE SORT
            'gts' STORE
        
            # locations
            $gts LOCATIONS 'lon' STORE 'lat' STORE
            <% $lat @isAllNaN ! %> <% $name '.lat' + $lat %> IFT
            <% $lon @isAllNaN ! %> <% $name '.lon' + $lon %> IFT
        
            # elevations
            $gts ELEVATIONS 'elev' STORE
            <% $elev @isAllNaN ! %> <% $name '.elev' + $elev %> IFT
        
            # values        
            $name $gts VALUES
        %>
        FOREACH
    }
    ->PICKLE
%>
'ListGTStoPickledDict' STORE

Creating a new WarpScript stack accessible under variable "s".




We apply the macro `ListGTStoPickledDict` similarly than `GTStoPickledDict`,<br/>
except that it takes a list of GTS instead of a single GTS as second argument.

In [16]:
%%w -s s
[ NEWGTS 'randGTS' RENAME 1 10 <% h RAND RAND NaN RAND ADDVALUE %> FOR
  NEWGTS 'randTS' RENAME 2 11 <% h NaN NaN NaN RAND ADDVALUE %> FOR
  NEWGTS 'stringTS' RENAME 5 8 <% h NaN NaN NaN 'a string' ADDVALUE %> FOR ]
false
@ListGTStoPickledDict  # use instead '@./ListGTStoPickledDict' if ListGTStoPickledDict.mc2 is in the macros folder

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I3600000000\nI7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\nI39600000000\neX\x0b\x00\x00\x00randGTS.latq\x03]q\x04(G?\xedy\xb8\xa1\x00\x00\x00G?\xe3\xce\x1d\xda\x80\x00\x00G?\xe6\x7f\x03\x94\x00\x00\x00G?\xe3Kt\x00\x00\x00\x00G?\xbd\n\xe6\x10\x00\x00\x00G?\xe2\x91GI\x80\x00\x00G?\xe2\xc9\xe3\x0b\x80\x00\x00G?\xd1\xf5\xe4A\x00\x00\x00G?\xe0o\xd4\xba\x00\x00\x00G?\xcbZ*\x9c\x00\x00\x00G\x7f\xf8\x00\x00\x00\x00\x00\x00eX\x0b\x00\x00\x00randGTS.lonq\x05]q\x06(G?\xc2\xfb\xe0\\\x00\x00\x00G?\xeba\x1bh\x00\x00\x00G?\xe3Yoo\x00\x00\x00G?\xee\n\x1f\x9a\x00\x00\x00G?\xe7\x1e\xf2\x8c\x00\x00\x00G?\xef\xe2\x98\x0b\x00\x00\x00G?\xce\xd2\x1b\xe4\x00\x00\x00G?\xec\x8bYJ\x00\x00\x00G?\xee\xf2\x9b\xcb\x00\x00\x00G?\xee\xd4f\x1f\x00\x00\x00G\x7f\xf8\x00\x00\x00\x00\x00\x00eX\x07\x00\x00\x00randGTSq\x07]q\x08(G?\xec\x94t\xa8\x80\x1a\xfaG?\xee \xfcT\xcc\x0f\xb6G?\xe2\x8

Contrary to our first example with a single GTS, the following cell will raise<br/>
an error if a GTS of the list has a timestamp with multiple values.

In [17]:
listGts = s.pop()
df4 = pd.DataFrame.from_dict(pkl.loads(listGts))
df4

Unnamed: 0,timestamps,randGTS.lat,randGTS.lon,randGTS,randTS,stringTS
0,3600000000,0.921109,0.148312,0.893122,,
1,7200000000,0.618911,0.855604,0.941527,0.38926,
2,10800000000,0.703005,0.604667,0.57987,0.377414,
3,14400000000,0.602961,0.938736,0.314582,0.23306,
4,18000000000,0.113448,0.722528,0.924263,0.319086,a string
5,21600000000,0.580234,0.99641,0.048028,0.943076,a string
6,25200000000,0.587144,0.240787,0.459515,0.380149,a string
7,28800000000,0.280633,0.89201,0.00856,0.98866,a string
8,32400000000,0.513651,0.967115,0.005018,0.648168,
9,36000000000,0.213689,0.963428,0.536075,0.755925,
