### This notebook shows how to make a Pandas DataFrame from WarpScript GTS

### 1. From a single GTS to a DataFrame

In [1]:
%load_ext warpscript
%alias_magic w warpscript

Created `%w` as an alias for `%warpscript`.
Created `%%w` as an alias for `%%warpscript`.


We will need pandas and pickle libraries.

In [2]:
import pandas as pd
import pickle as pkl

We first create a random GTS.

In [3]:
%%w -s s -l
NEWGTS 'randGTS' RENAME 1 10 <% h RAND RAND NaN RAND ADDVALUE %> FOR

Local gateway launched on port 34613
Creating a new WarpScript stack accessible under variable "s".


top: 	randGTS{}<DOUBLE, 10 values>

In order to make a GTS understood by a python interpreter, we store its content in a map of lists and pickle it as a dict.<br/>
The macro `GTStoPickledDict` does this. To load it, you can place the file `macros/GTStoPickledDict.mc2`<br/>
in the macros folder of the Warp 10 platform you are sending requests to, or you can execute the following cell.

In [4]:
%%w -s s
<%
    # Documenting the macro
    'GTS BOOLEAN @GTStoPickledDict, more doc in macros/GTStoPickledDict.mc2' DOC

    # Check there is two arguments on the stack
    <% DEPTH 2 < %> <% 'Macro takes two arguments' MSGFAIL %> IFT
        
    # Check that top is a boolean indicating whether to use GTS classname or selector
    <% 1 PICK TYPEOF 'BOOLEAN' != %> <% 'First argument must be a boolean indicating whether to use GTS selector (true) or classname (false)' MSGFAIL %> IFT
    
    # Check that second argument is a GTS
    <% 2 PICK TYPEOF 'GTS' != %> <% 'Second argument must be a GTS' MSGFAIL %> IFT
    
    # Store the arguments
    'withSelector' STORE
    'gts' STORE
    
    # Make name
    $gts <% $withSelector %> <% TOSELECTOR %> <% NAME %> IFTE
    'name' STORE
    
    # macro: check not all NaN (for locations and elevations)
    <% UNIQUE DUP SIZE 1 == SWAP 0 GET ISNaN && %> 'isAllNaN' STORE
        
    # Return pickled dict for pandas
    {
        # ticks
        'timestamps' $gts TICKLIST
        
        # locations
        $gts LOCATIONS 'lon' STORE 'lat' STORE
        <% $lat @isAllNaN ! %> <% $name '.lat' + $lat %> IFT
        <% $lon @isAllNaN ! %> <% $name '.lon' + $lon %> IFT
        
        # elevations
        $gts ELEVATIONS 'elev' STORE
        <% $elev @isAllNaN ! %> <% $name '.elev' + $elev %> IFT
        
        # values        
        $name $gts VALUES
    }
    ->PICKLE
%>
'GTStoPickledDict' STORE

top: 	randGTS{}<DOUBLE, 10 values>

We evaluate the macro on the random GTS that was left on the stack.<br/>
Setting the first argument to false means we drop its labels for its pickled representation.

In [5]:
%%w -s s
false
@GTStoPickledDict # use instead '@./GTStoPickledDict' if GTStoPickledDict.mc2 is in the macros folder

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I3600000000\nI7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\neX\x0b\x00\x00\x00randGTS.latq\x03]q\x04(G?\xee\xc8\xb1\xd7\x80\x00\x00G?\xc7\xbb\x1f\xfe\x00\x00\x00G?\xda\x1e\x18\n\x00\x00\x00G?\xc6\x1aq\x9a\x00\x00\x00G?ru8@\x00\x00\x00G?\xd1K0\xaf\x00\x00\x00G?\xdfa;5\x00\x00\x00G?\xde\x00\xffK\x00\x00\x00G?\xd0\xb4\xe2\x85\x00\x00\x00G?\xef\xfcwd\x80\x00\x00eX\x0b\x00\x00\x00randGTS.lonq\x05]q\x06(G?\xd1\xff\xd5\n\x00\x00\x00G?\x9e\x04\t \x00\x00\x00G?\xe5\x86;8\x00\x00\x00G?\xe3\xd52?\x00\x00\x00G?\xc33\xa9\x88\x00\x00\x00G?\xe6l\xf2?\x00\x00\x00G?\xdf\x17\xd6z\x00\x00\x00G?\xe7\x92\xf8{\x00\x00\x00G?\xcc\xa8zH\x00\x00\x00G?\xd0\x9at,\x00\x00\x00eX\x07\x00\x00\x00randGTSq\x07]q\x08(G?\xed+*\xff\xe7\\bG?\xed$#\xf4\xc6\x19\x98G?\xe7mp\xe6C\rXG?\xc4\xac\xa0\x1b<\x0eDG?\xeb\x7f\x87\x8d\x9e\xde\xe9G?\xe4\x01\xdd\xb7JnNG?\xbc\x95\xccGG\r\xd8G?\xd1\xd2n/\x

We then load the dict from its pickled representation and create a pandas dataframe with it.

In [6]:
gts1 = s.pop()
df1 = pd.DataFrame.from_dict(pkl.loads(gts1))
df1

Unnamed: 0,timestamps,randGTS.lat,randGTS.lon,randGTS
0,3600000000,0.961999,0.28124,0.91152
1,7200000000,0.185398,0.029312,0.910662
2,10800000000,0.408087,0.672636,0.732109
3,14400000000,0.172682,0.619775,0.161518
4,18000000000,0.004506,0.150014,0.859318
5,21600000000,0.270214,0.700799,0.625228
6,25200000000,0.49031,0.48583,0.111661
7,28800000000,0.468811,0.736691,0.278469
8,32400000000,0.26104,0.223892,0.83035
9,36000000000,0.999569,0.259427,0.879311


In the following example, we choose to keep label information.

In [7]:
%%w -s s
NEWGTS 'randGTS' RENAME 1 10 <% h RAND RAND NaN RAND ADDVALUE %> FOR
{ 'key1' 'info1' 'key2' 'info2' } RELABEL
true
@GTStoPickledDict # use instead '@./GTStoPickledDict' if GTStoPickledDict.mc2 is in the macros folder

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I3600000000\nI7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\neX"\x00\x00\x00randGTS{key1=info1,key2=info2}.latq\x03]q\x04(G?\xc7j\x18\xc0\x00\x00\x00G?\xb5\xf6UD\x00\x00\x00G?\xeb\x13\x1eo\x80\x00\x00G?\xa0IH\xb8\x00\x00\x00G?\xed\xc2\x89\xb4\x00\x00\x00G?\xda\xcfd\xb1\x00\x00\x00G?\xa6\x87(X\x00\x00\x00G?\xcf\xe1[\x1c\x00\x00\x00G?\xe9R\xdc\xeb\x80\x00\x00G?\xbf)\xf0l\x00\x00\x00eX"\x00\x00\x00randGTS{key1=info1,key2=info2}.lonq\x05]q\x06(G?\xca\xaf\xb4\x90\x00\x00\x00G?\xd8\xb1\xa8~\x00\x00\x00G?\xe5\xf9oE\x00\x00\x00G?\xeb\x0f\xd4[\x00\x00\x00G?\xc7*\x0fd\x00\x00\x00G?\xd4\x15\xceb\x00\x00\x00G?\xder\x92d\x00\x00\x00G?\xe4z\xce\x0c\x00\x00\x00G?\xe1\x81\xc1$\x00\x00\x00G?\xc1\'\xbf\xbc\x00\x00\x00eX\x1e\x00\x00\x00randGTS{key1=info1,key2=info2}q\x07]q\x08(G?\xe9\xeb\xd2c(\x10\x14G?\xba\xfd\xf5Z\x01\xa6\x80G?\xd5\x04\x16\x03\x83\xff\x06G?\xeb\xb4Z]L

In [8]:
gts2 = s.pop()
df2 = pd.DataFrame.from_dict(pkl.loads(gts2))
df2

Unnamed: 0,timestamps,"randGTS{key1=info1,key2=info2}.lat","randGTS{key1=info1,key2=info2}.lon","randGTS{key1=info1,key2=info2}"
0,3600000000,0.182925,0.208487,0.810037
1,7200000000,0.08579,0.385843,0.105438
2,10800000000,0.846084,0.686699,0.328374
3,14400000000,0.031809,0.845682,0.865766
4,18000000000,0.929997,0.180971,0.928074
5,21600000000,0.418908,0.313831,0.06087
6,25200000000,0.044,0.475743,0.406871
7,28800000000,0.249065,0.639991,0.335248
8,32400000000,0.791365,0.547089,0.190725
9,36000000000,0.121734,0.134026,0.088288


We can also not use geo information.

In [9]:
%%w -s s
NEWGTS 'randTS' RENAME 2 11 <% h NaN NaN NaN RAND ADDVALUE %> FOR
false
@GTStoPickledDict # use instead '@./GTStoPickledDict' if GTStoPickledDict.mc2 is in the macros folder

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\nI39600000000\neX\x06\x00\x00\x00randTSq\x03]q\x04(G?\xcdVJf\xb9\xa8\x14G?\xd1\xf7\xc4\n{\xb0>G?\xc6T\x12Q\xd0\xe7\x94G?\xe2\xd1\xa6\xd0&\xb5\x0cG?\xe0u\xcfV\xce\xd7\xf1G?\xe5fL"\x96]\x17G?\xcb\xe0[\xea\xe4\xa1\xc0G?\x9a\xe7\xe0:=\xf2@G?\xdf\xa4\'{\xa5r>G?\xa6\x96\xc1\xe1\x10s0eu.'

In [10]:
gts3 = s.pop()
df3 = pd.DataFrame.from_dict(pkl.loads(gts3))
df3

Unnamed: 0,timestamps,randTS
0,7200000000,0.229196
1,10800000000,0.280747
2,14400000000,0.174441
3,18000000000,0.588092
4,21600000000,0.514381
5,25200000000,0.668737
6,28800000000,0.217784
7,32400000000,0.026275
8,36000000000,0.494394
9,39600000000,0.044119


### 2. Revert a DataFrame to a GTS

To revert a DataFrame to a GTS, we first need to convert the DataFrame into a dict.

In [11]:
gts1b = df1.to_dict('list')
gts1b

{'timestamps': [3600000000,
  7200000000,
  10800000000,
  14400000000,
  18000000000,
  21600000000,
  25200000000,
  28800000000,
  32400000000,
  36000000000],
 'randGTS.lat': [0.9619988640770316,
  0.18539810087531805,
  0.4080867860466242,
  0.17268199939280748,
  0.00450632069259882,
  0.27021424379199743,
  0.49030952621251345,
  0.468810866586864,
  0.2610403345897794,
  0.9995686495676637],
 'randGTS.lon': [0.2812397573143244,
  0.029312269762158394,
  0.6726356595754623,
  0.6197749357670546,
  0.15001410618424416,
  0.7007991056889296,
  0.48582994379103184,
  0.7366907503455877,
  0.22389153018593788,
  0.25942711159586906],
 'randGTS': [0.9115195272560295,
  0.9106616764773205,
  0.7321094987204715,
  0.1615181096059236,
  0.8593175664927603,
  0.6252277927300811,
  0.11166073551344569,
  0.2784686533646078,
  0.8303498407121223,
  0.8793111206485495]}

We can push this dict directly onto the stack, since it will be automatically converted in the JVM.

In [12]:
s.push(gts1b)
s

top: 	{'randGTS.lat': [0.9619988640770316, 0.18539810087531805, 0.4080867860466242, 0.17268199939280748, 0.00450632069259882, 0.27021424379199743, 0.49030952621251345, 0.468810866586864, 0.2610403345897794, 0.9995686495676637], 'timestamps': [3600000000, 7200000000, 10800000000, 14400000000, 18000000000, 21600000000, 25200000000, 28800000000, 32400000000, 36000000000], 'randGTS': [0.9115195272560295, 0.9106616764773205, 0.7321094987204715, 0.1615181096059236, 0.8593175664927603, 0.6252277927300811, 0.11166073551344569, 0.2784686533646078, 0.8303498407121223, 0.8793111206485495], 'randGTS.lon': [0.2812397573143244, 0.029312269762158394, 0.6726356595754623, 0.6197749357670546, 0.15001410618424416, 0.7007991056889296, 0.48582994379103184, 0.7366907503455877, 0.22389153018593788, 0.25942711159586906]}

Now we can use the lists contained in this map to populate a GTS.

In [13]:
%%w -s s
'dict' STORE
$dict 'timestamps' GET
$dict 'randGTS.lat' GET
$dict 'randGTS.lon' GET
[] // no elevation
$dict 'randGTS' GET
MAKEGTS 'randGTS' RENAME

top: 	randGTS{}<DOUBLE, 10 values>

In [14]:
print(s.pop().toString())

randGTS{}
=3600000000/0.9619988640770316:0.2812397573143244/ 0.9115195272560295
=7200000000/0.18539810087531805:0.029312269762158394/ 0.9106616764773205
=10800000000/0.4080867860466242:0.6726356595754623/ 0.7321094987204715
=14400000000/0.17268199939280748:0.6197749357670546/ 0.1615181096059236
=18000000000/0.00450632069259882:0.15001410618424416/ 0.8593175664927603
=21600000000/0.27021424379199743:0.7007991056889296/ 0.6252277927300811
=25200000000/0.49030952621251345:0.48582994379103184/ 0.11166073551344569
=28800000000/0.468810866586864:0.7366907503455877/ 0.2784686533646078
=32400000000/0.2610403345897794:0.22389153018593788/ 0.8303498407121223
=36000000000/0.9995686495676637:0.25942711159586906/ 0.8793111206485495



### 3. From a list of GTS to a DataFrame

We want to put every GTS of a list in a same DataFrame with a single `timestamps` column.<br/>
Since every GTS don't have values for the same timestamps, we need to handle missing values,<br/>
and we need to make the assumption that each GTS can have at most one value per timestamp.<br/>
It is more efficient to do that in WarpScript, as done by the macro `ListGTStoPickledDict`.

If there are many unaligned ticks, consider converting to lists of single column dataFrame or Series instead.

In [15]:
%%w -s s -o -l
<%
    # Documenting the macro
    '[GTS] BOOLEAN @ListGTStoPickledDict , more doc in macros/ListGTStoPickledDict.mc2' DOC

    # Check there is two arguments on the stack
    <% DEPTH 2 < %> <% 'Macro takes two arguments' MSGFAIL %> IFT
        
    # Check that top is a boolean indicating whether to use GTS classname or selector
    <% 1 PICK TYPEOF 'BOOLEAN' != %> <% 'First argument must be a boolean indicating whether to use GTS selector (true) or classname (false)' MSGFAIL %> IFT
    
    # Check that second argument is a list of GTS
    <% 2 PICK TYPEOF 'LIST' != %> <% 'Second argument must be a List of GTS' MSGFAIL %> IFT
    2 PICK <% <% TYPEOF 'GTS' != %> <% 'Second argument is a list that has an element that is not a GTS' MSGFAIL %> IFT %> FOREACH
    
    # Store the arguments
    'withSelector' STORE
    'gtsList' STORE
    
    # make tickbase of all GTS
    $gtsList TICKS 'ticks' STORE
    $ticks [] [] [] $ticks MAKEGTS 'baseGTS' STORE
    
    # macro: check not all NaN (for locations and elevations)
    <% UNIQUE DUP SIZE 1 == SWAP 0 GET ISNaN && %> 'isAllNaN' STORE
        
    # Return pickled dict for pandas
    {
        # ticks
        'timestamps' $ticks
        
        # loop over list of GTS
        $gtsList
        <%
            'gts' STORE
            
            # Make name
            $gts <% $withSelector %> <% TOSELECTOR %> <% NAME %> IFTE
            'name' STORE
        
            # Put on the same tick base and fill missing values with NaN
            [ $gts true mapper.replace 0 0 0 ] MAP
            'mask' STORE
            [ $mask [ $baseGTS ] [] op.negmask ] APPLY
            [ SWAP NaN mapper.replace 0 0 0 ] MAP
            0 GET 'residualSeries' STORE
            [ $gts $residualSeries ] MERGE SORT
            'gts' STORE
        
            # locations
            $gts LOCATIONS 'lon' STORE 'lat' STORE
            <% $lat @isAllNaN ! %> <% $name '.lat' + $lat %> IFT
            <% $lon @isAllNaN ! %> <% $name '.lon' + $lon %> IFT
        
            # elevations
            $gts ELEVATIONS 'elev' STORE
            <% $elev @isAllNaN ! %> <% $name '.elev' + $elev %> IFT
        
            # values        
            $name $gts VALUES
        %>
        FOREACH
    }
    ->PICKLE
%>
'ListGTStoPickledDict' STORE

Creating a new WarpScript stack accessible under variable "s".




We apply the macro `ListGTStoPickledDict` similarly than `GTStoPickledDict`,<br/>
except that it takes a list of GTS instead of a single GTS as second argument.

In [16]:
%%w -s s
[ NEWGTS 'randGTS' RENAME 1 10 <% h RAND RAND NaN RAND ADDVALUE %> FOR
  NEWGTS 'randTS' RENAME 2 11 <% h NaN NaN NaN RAND ADDVALUE %> FOR
  NEWGTS 'stringTS' RENAME 5 8 <% h NaN NaN NaN 'a string' ADDVALUE %> FOR ]
false
@ListGTStoPickledDict  # use instead '@./ListGTStoPickledDict' if ListGTStoPickledDict.mc2 is in the macros folder

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I3600000000\nI7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\nI39600000000\neX\x0b\x00\x00\x00randGTS.latq\x03]q\x04(G?\xe1\xf2\xc1_\x80\x00\x00G?\xa2\xfc\xa5\xf0\x00\x00\x00G?r\xb0\xa2@\x00\x00\x00G?\x82~\xc1\x80\x00\x00\x00G?\xb6\x11\xd3\x8c\x00\x00\x00G?\xe6\xe7\xc2N\x00\x00\x00G?\xe2l\xd4a\x00\x00\x00G?\x83\xc8\xbd\xc0\x00\x00\x00G?\xe914\t\x00\x00\x00G?\xe7~\x9b\x83\x00\x00\x00G\x7f\xf8\x00\x00\x00\x00\x00\x00eX\x0b\x00\x00\x00randGTS.lonq\x05]q\x06(G?\xce\t\xc8\x10\x00\x00\x00G?\xd0\xb4\x8c\x90\x00\x00\x00G?\xe6\xc2\x81\xff\x00\x00\x00G?\xcew\xc8\xec\x00\x00\x00G?\xecs\xa2\x82\x00\x00\x00G?\xeaeB\x89\x00\x00\x00G?\xccJ\xda(\x00\x00\x00G?\xcc\xdf\xa3$\x00\x00\x00G?\xe5#\xba+\x00\x00\x00G?\xde\xce\x9fF\x00\x00\x00G\x7f\xf8\x00\x00\x00\x00\x00\x00eX\x07\x00\x00\x00randGTSq\x07]q\x08(G?\xe5\xfd\r\xb7\xc3\xe5\xf3G?\xd8\xf4\xef\\%\xe6\x9aG?\xc4\x92\x1a

Contrary to our first example with a single GTS, the following cell will raise<br/>
an error if a GTS of the list has a timestamp with multiple values.

In [17]:
listGts = s.pop()
df4 = pd.DataFrame.from_dict(pkl.loads(listGts))
df4

Unnamed: 0,timestamps,randGTS.lat,randGTS.lon,randGTS,randTS,stringTS
0,3600000000,0.560883,0.234674,0.68714,,
1,7200000000,0.037084,0.26102,0.38995,0.768348,
2,10800000000,0.004563,0.711244,0.160709,0.962148,
3,14400000000,0.009031,0.238031,0.335029,0.084742,
4,18000000000,0.08621,0.889116,0.626282,0.911916,a string
5,21600000000,0.715791,0.824861,0.644833,0.995838,a string
6,25200000000,0.575785,0.221034,0.53761,0.502281,a string
7,28800000000,0.00966,0.225575,0.452887,0.363706,a string
8,32400000000,0.787256,0.660611,0.876669,0.656574,
9,36000000000,0.734205,0.481361,0.27409,0.504505,
