### This notebook shows how to make a Pandas DataFrame from WarpScript GTS

### 1. From a single GTS to a DataFrame

In [1]:
%load_ext warpscript_cellmagic
%alias_magic w warpscript

Created `%%w` as an alias for `%%warpscript`.


We will need pandas and pickle libraries.

In [2]:
import pandas as pd
import pickle as pkl

We first create a random GTS.

In [3]:
%%w -s s
NEWGTS 'randGTS' RENAME 1 10 <% h RAND RAND NaN RAND ADDVALUE %> FOR

Starting connection with 127.0.0.1:25333.
Creating a new WarpScript stack accessible under variable "s".
top: 	<GTS with 10 values>



In order to make a GTS understood by a python interpreter, we store its content in a map of lists and pickle it as a dict.<br/>
The macro `GTStoPickledDict` does this. To load it, you can place the file `macros/GTStoPickledDict.mc2`<br/>
in the macro folder of the Warp 10 platform you are sending requests to, or you can execute the following cell.

In [4]:
%%w -s s
<%
    # Documenting the macro
    'GTS BOOLEAN @GTStoPickledDict' DOC

    # Check there is two arguments on the stack
    <% DEPTH 2 < %> <% 'Macro takes two arguments' MSGFAIL %> IFT
        
    # Check that top is a boolean indicating whether to use GTS classname or selector
    <% 1 PICK TYPEOF 'BOOLEAN' != %> <% 'First argument must be a boolean indicating whether to use GTS selector (true) or classname (false)' MSGFAIL %> IFT
    
    # Check that second argument is a GTS
    <% 2 PICK TYPEOF 'GTS' != %> <% 'Second argument must be a GTS' MSGFAIL %> IFT
    
    # Store the arguments
    'withSelector' STORE
    'gts' STORE
    
    # Make name
    $gts <% $withSelector %> <% TOSELECTOR %> <% NAME %> IFTE
    'name' STORE
    
    # macro: check not all NaN (for locations and elevations)
    <% UNIQUE DUP SIZE 1 == SWAP 0 GET ISNaN && %> 'isAllNaN' STORE
        
    # Return pickled dict for pandas
    {
        # ticks
        'timestamps' $gts TICKLIST
        
        # locations
        $gts LOCATIONS 'lon' STORE 'lat' STORE
        <% $lat @isAllNaN ! %> <% $name '.lat' + $lat %> IFT
        <% $lon @isAllNaN ! %> <% $name '.lon' + $lon %> IFT
        
        # elevations
        $gts ELEVATIONS 'elev' STORE
        <% $elev @isAllNaN ! %> <% $name '.elev' + $elev %> IFT
        
        # values        
        $name $gts VALUES
    }
    ->PICKLE
%>
'GTStoPickledDict' STORE

top: 	<GTS with 10 values>



We evaluate the macro on the random GTS that was left on the stack.<br/>
Setting the first argument to false means we drop its labels for its pickled representation.

In [5]:
%%w -s s
false
@GTStoPickledDict

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I3600000000\nI7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\neX\x0b\x00\x00\x00randGTS.latq\x03]q\x04(G?\xe6\x94\x9cm\x00\x00\x00G?\xe3\x0e\xb1\xe8\x80\x00\x00G?\xd0aXE\x00\x00\x00G?\xefMz\x11\x80\x00\x00G?\xc4:`\xd8\x00\x00\x00G?\xeeoC\t\x80\x00\x00G?\xe0\x01F\xfa\x80\x00\x00G?\xd9t\xf42\x00\x00\x00G?\xec\x93\x14%\x80\x00\x00G?\xe6\xe2i\x7f\x00\x00\x00eX\x0b\x00\x00\x00randGTS.lonq\x05]q\x06(G?\xe6i\xf8\xf0\x00\x00\x00G?\xd2;\xcb\xaa\x00\x00\x00G?\xeb\xb8[\xee\x00\x00\x00G?\xe0\xa4\x10h\x00\x00\x00G?\xdd\xaa]\xda\x00\x00\x00G?\xce\r\xc8\x84\x00\x00\x00G?\xb5\r\xb7\x80\x00\x00\x00G?\xe8\xd9y+\x00\x00\x00G?\xe6\xb4\x1f\xbb\x00\x00\x00G?\xe6\xfeO\xd7\x00\x00\x00eX\x07\x00\x00\x00randGTSq\x07]q\x08(G?\xea\xf1y\x9e\xb2g\xf8G?\xc1x\xaf\xb0\xe9{4G?\xe0\xe9\x12\xc7\x8b\x1c\x06G?\xed\xbb\x04m*>\xe0G?\xe4>b\xcd<NrG?\xdc\x8a:\x88~\xdd\x86G?\xd4<\x8b\xbd\xbe\xf6

We then load the dict from its pickled representation and create a pandas dataframe with it.

In [6]:
gts1 = s.pop()
df1 = pd.DataFrame.from_dict(pkl.loads(gts1))
df1

Unnamed: 0,timestamps,randGTS.lat,randGTS.lon,randGTS
0,3600000000,0.705641,0.700436,0.841977
1,7200000000,0.595544,0.2849,0.136496
2,10800000000,0.255941,0.866255,0.528451
3,14400000000,0.978208,0.520027,0.929079
4,18000000000,0.158032,0.463523,0.632615
5,21600000000,0.951082,0.234796,0.445937
6,25200000000,0.500156,0.082241,0.316195
7,28800000000,0.397763,0.776547,0.669269
8,32400000000,0.892954,0.709488,0.896741
9,36000000000,0.715138,0.718544,0.72252


In the following example, we choose to keep label information.

In [7]:
%%w -s s
NEWGTS 'randGTS' RENAME 1 10 <% h RAND RAND NaN RAND ADDVALUE %> FOR
{ 'key1' 'info1' 'key2' 'info2' } RELABEL
true
@GTStoPickledDict

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I3600000000\nI7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\neX"\x00\x00\x00randGTS{key1=info1,key2=info2}.latq\x03]q\x04(G?\xa8y\xd0\xe8\x00\x00\x00G?\xa9`F\xb8\x00\x00\x00G?\xeb!}\x19\x00\x00\x00G?\xe6l\xdd\x10\x80\x00\x00G?\xdb\x98\x92\xaa\x00\x00\x00G?\xd8[\x85\x16\x00\x00\x00G?\xc1\x17\x97\xf4\x00\x00\x00G?\xef\xa1\xe7\x13\x80\x00\x00G?\xdb\xfcw\xd9\x00\x00\x00G?\xe1\x99\xf9\xe9\x80\x00\x00eX"\x00\x00\x00randGTS{key1=info1,key2=info2}.lonq\x05]q\x06(G?\xde\xf2c^\x00\x00\x00G?\xd3}yP\x00\x00\x00G?\x8a\xc22\x00\x00\x00\x00G?\xea&\xae\x89\x00\x00\x00G?\xd7\x0ec\x0e\x00\x00\x00G?\xe5d\xf0\xd1\x00\x00\x00G?\xed\xb0\xbd\x1a\x00\x00\x00G?\xe3\xf3\xa0\xdf\x00\x00\x00G?\xd8F\xb4t\x00\x00\x00G?\xc3A\xb1\x18\x00\x00\x00eX\x1e\x00\x00\x00randGTS{key1=info1,key2=info2}q\x07]q\x08(G?\xd91\xd6$%g\xf8G?\xe9K\xe2lfL\x8dG?\xeb-\xc6\xa3\xe4c\xd0G?\xdf\x03\xa2\xda.\

In [8]:
gts2 = s.pop()
df2 = pd.DataFrame.from_dict(pkl.loads(gts2))
df2

Unnamed: 0,timestamps,"randGTS{key1=info1,key2=info2}.lat","randGTS{key1=info1,key2=info2}.lon","randGTS{key1=info1,key2=info2}"
0,3600000000,0.047804,0.483544,0.393667
1,7200000000,0.049563,0.304533,0.790513
2,10800000000,0.847838,0.013066,0.849338
3,14400000000,0.700789,0.817222,0.484597
4,18000000000,0.431187,0.360253,0.420333
5,21600000000,0.380586,0.668572,0.266557
6,25200000000,0.133533,0.927825,0.398617
7,28800000000,0.988514,0.62349,0.161993
8,32400000000,0.437284,0.379315,0.32006
9,36000000000,0.550046,0.150442,0.238042


We can also not use geo information.

In [9]:
%%w -s s
NEWGTS 'randTS' RENAME 2 11 <% h NaN NaN NaN RAND ADDVALUE %> FOR
false
@GTStoPickledDict

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\nI39600000000\neX\x06\x00\x00\x00randTSq\x03]q\x04(G?\xea\xe8\xad\\\xbe\xdb7G?\xe5?\xf0\xa0\x8a\x19\xf8G?\xd6\xa15\xa9\x1e\xcc`G?\xde\x16\xba\x11,l\x00G?\xc9\xa1\xb2$\x8e/\xa4G?\xe2\x06\x93\x13O\x89\x02G?\xdek \x8e\xd4\xf7\xacG?\xe5@\xf0\xd7\x07\xa2zG?\xb5M\x0f\x08\xe8\x87\xd8G?\xdf\x19\x07\x04\n\xc8\xaceu.'



In [10]:
gts3 = s.pop()
df3 = pd.DataFrame.from_dict(pkl.loads(gts3))
df3

Unnamed: 0,timestamps,randTS
0,7200000000,0.840903
1,10800000000,0.664055
2,14400000000,0.353589
3,18000000000,0.470137
4,21600000000,0.200247
5,25200000000,0.563303
6,28800000000,0.475289
7,32400000000,0.664177
8,36000000000,0.083207
9,39600000000,0.485903


### 2. Revert a DataFrame to a GTS

To revert a DataFrame to a GTS, we first need to convert the DataFrame into a dict.

In [11]:
gts1b = df1.to_dict('list')
gts1b

{'timestamps': [3600000000,
  7200000000,
  10800000000,
  14400000000,
  18000000000,
  21600000000,
  25200000000,
  28800000000,
  32400000000,
  36000000000],
 'randGTS.lat': [0.705640995875001,
  0.5955438176169991,
  0.2559414552524686,
  0.9782076207920909,
  0.15803156420588493,
  0.9510817704722285,
  0.5001559155061841,
  0.3977632988244295,
  0.8929539425298572,
  0.7151381950825453],
 'randGTS.lon': [0.7004360854625702,
  0.2848996315151453,
  0.866254772990942,
  0.5200273543596268,
  0.4635233525186777,
  0.23479563184082508,
  0.08224055171012878,
  0.776547035202384,
  0.7094877865165472,
  0.7185439299792051],
 'randGTS': [0.8419769382046516,
  0.1364955533817579,
  0.5284513375598869,
  0.9290792591218313,
  0.6326154716975465,
  0.4459368069614914,
  0.3161954262721772,
  0.6692693813708607,
  0.8967405642467337,
  0.7225203948170561]}

We can push this dict directly onto the stack, since it will be automatically converted in the JVM.

In [12]:
s.push(gts1b)
s

top: 	{'randGTS.lat': [0.705640995875001, 0.5955438176169991, 0.2559414552524686, 0.9782076207920909, 0.15803156420588493, 0.9510817704722285, 0.5001559155061841, 0.3977632988244295, 0.8929539425298572, 0.7151381950825453], 'timestamps': [3600000000, 7200000000, 10800000000, 14400000000, 18000000000, 21600000000, 25200000000, 28800000000, 32400000000, 36000000000], 'randGTS': [0.8419769382046516, 0.1364955533817579, 0.5284513375598869, 0.9290792591218313, 0.6326154716975465, 0.4459368069614914, 0.3161954262721772, 0.6692693813708607, 0.8967405642467337, 0.7225203948170561], 'randGTS.lon': [0.7004360854625702, 0.2848996315151453, 0.866254772990942, 0.5200273543596268, 0.4635233525186777, 0.23479563184082508, 0.08224055171012878, 0.776547035202384, 0.7094877865165472, 0.7185439299792051]}

Now we can use the lists contained in this map to populate a GTS.

In [13]:
%%w -s s
'dict' STORE
$dict 'timestamps' GET
$dict 'randGTS.lat' GET
$dict 'randGTS.lon' GET
[] // no elevation
$dict 'randGTS' GET
MAKEGTS 'randGTS' RENAME

top: 	<GTS with 10 values>



In [14]:
print(s.pop().toString())

randGTS{}
=3600000000/0.705640995875001:0.7004360854625702/ 0.8419769382046516
=7200000000/0.5955438176169991:0.2848996315151453/ 0.1364955533817579
=10800000000/0.2559414552524686:0.866254772990942/ 0.5284513375598869
=14400000000/0.9782076207920909:0.5200273543596268/ 0.9290792591218313
=18000000000/0.15803156420588493:0.4635233525186777/ 0.6326154716975465
=21600000000/0.9510817704722285:0.23479563184082508/ 0.4459368069614914
=25200000000/0.5001559155061841:0.08224055171012878/ 0.3161954262721772
=28800000000/0.3977632988244295:0.776547035202384/ 0.6692693813708607
=32400000000/0.8929539425298572:0.7094877865165472/ 0.8967405642467337
=36000000000/0.7151381950825453:0.7185439299792051/ 0.7225203948170561



### 3. From a list of GTS to a DataFrame

We want to put every GTS of a list in a same DataFrame with a single `timestamps` column.<br/>
Since every GTS don't have values for the same timestamps, we need to handle missing values,<br/>
and we need to make the assumption that each GTS can have at most one value per timestamp.<br/>
It is more efficient to do that in WarpScript, as done in by the macro `ListGTStoPickledDict`.

In [15]:
%%w -s s -o
<%
    # Documenting the macro
    '[GTS] BOOLEAN @ListGTStoPickledDict' DOC

    # Check there is two arguments on the stack
    <% DEPTH 2 < %> <% 'Macro takes two arguments' MSGFAIL %> IFT
        
    # Check that top is a boolean indicating whether to use GTS classname or selector
    <% 1 PICK TYPEOF 'BOOLEAN' != %> <% 'First argument must be a boolean indicating whether to use GTS selector (true) or classname (false)' MSGFAIL %> IFT
    
    # Check that second argument is a list of GTS
    <% 2 PICK TYPEOF 'LIST' != %> <% 'Second argument must be a List of GTS' MSGFAIL %> IFT
    2 PICK <% <% TYPEOF 'GTS' != %> <% 'Second argument is a list that has an element that is not a GTS' MSGFAIL %> IFT %> FOREACH
    
    # Store the arguments
    'withSelector' STORE
    'gtsList' STORE
    
    # make tickbase of all GTS
    $gtsList TICKS 'ticks' STORE
    $ticks [] [] [] $ticks MAKEGTS 'baseGTS' STORE
    
    # macro: check not all NaN (for locations and elevations)
    <% UNIQUE DUP SIZE 1 == SWAP 0 GET ISNaN && %> 'isAllNaN' STORE
        
    # Return pickled dict for pandas
    {
        # ticks
        'timestamps' $ticks
        
        # loop over list of GTS
        $gtsList
        <%
            'gts' STORE
            
            # Make name
            $gts <% $withSelector %> <% TOSELECTOR %> <% NAME %> IFTE
            'name' STORE
        
            # Put on the same tick base and fill missing values with NaN
            [ $gts true mapper.replace 0 0 0 ] MAP
            'mask' STORE
            [ $mask [ $baseGTS ] [] op.negmask ] APPLY
            [ SWAP NaN mapper.replace 0 0 0 ] MAP
            0 GET 'residualSeries' STORE
            [ $gts $residualSeries ] MERGE SORT
            'gts' STORE
        
            # locations
            $gts LOCATIONS 'lon' STORE 'lat' STORE
            <% $lat @isAllNaN ! %> <% $name '.lat' + $lat %> IFT
            <% $lon @isAllNaN ! %> <% $name '.lon' + $lon %> IFT
        
            # elevations
            $gts ELEVATIONS 'elev' STORE
            <% $elev @isAllNaN ! %> <% $name '.elev' + $elev %> IFT
        
            # values        
            $name $gts VALUES
        %>
        FOREACH
    }
    ->PICKLE
%>
'ListGTStoPickledDict' STORE

Creating a new WarpScript stack accessible under variable "s".



We apply the macro `ListGTStoPickledDict` similarly than `GTStoPickledDict`,<br/>
except that it takes a list of GTS instead of a single GTS as second argument.

In [16]:
%%w -s s
[ NEWGTS 'randGTS' RENAME 1 10 <% h RAND RAND NaN RAND ADDVALUE %> FOR
  NEWGTS 'randTS' RENAME 2 11 <% h NaN NaN NaN RAND ADDVALUE %> FOR
  NEWGTS 'stringTS' RENAME 5 8 <% h NaN NaN NaN 'a string' ADDVALUE %> FOR ]
false
@ListGTStoPickledDict

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I3600000000\nI7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\nI39600000000\neX\x0b\x00\x00\x00randGTS.latq\x03]q\x04(G?\xd2 c\xe2\x00\x00\x00G?\xe1\xde{\xf5\x80\x00\x00G?\xe1\x13\x1a\x9b\x80\x00\x00G?\xeaD~\xde\x80\x00\x00G?\xd8\xf0\x07\x0e\x00\x00\x00G?\xec/5\x8e\x00\x00\x00G?\xe7\xf5\xda\xd3\x80\x00\x00G?\xbb\xc8\x82\xa8\x00\x00\x00G?\xe5C\xf2\x16\x80\x00\x00G?\xe1Sr]\x80\x00\x00G\x7f\xf8\x00\x00\x00\x00\x00\x00eX\x0b\x00\x00\x00randGTS.lonq\x05]q\x06(G?\x9b\xb6\xe2\x00\x00\x00\x00G?\xebL\x93\x8c\x00\x00\x00G?\xbe\xe5)x\x00\x00\x00G?\x9d%0`\x00\x00\x00G?\x91M\xfe\xc0\x00\x00\x00G?\xd3\x97\xf6@\x00\x00\x00G?\xa8g\x84\xb0\x00\x00\x00G?\xe5I\xbe\x1f\x00\x00\x00G?\xef9\x1e>\x00\x00\x00G?\xe8ww\xe9\x00\x00\x00G\x7f\xf8\x00\x00\x00\x00\x00\x00eX\x07\x00\x00\x00randGTSq\x07]q\x08(G?\xd8\xb1\x0bsg:@G?\xe0\x9fAkD\x1e\xe2G?\xe3(\xee\x14\xf3\x95\xb8G?\xdd\xe5\x

Contrary to our first example with a single GTS, the following cell will raise<br/>
an error if a GTS of the list has a timestamp with multiple values.

In [17]:
listGts = s.pop()
df4 = pd.DataFrame.from_dict(pkl.loads(listGts))
df4

Unnamed: 0,timestamps,randGTS.lat,randGTS.lon,randGTS,randTS,stringTS
0,3600000000,0.283227,0.027065,0.385806,,
1,7200000000,0.558409,0.853098,0.51944,0.582836,
2,10800000000,0.533582,0.120684,0.598746,0.507609,
3,14400000000,0.820861,0.028462,0.467147,0.513203,
4,18000000000,0.38965,0.016899,0.824058,0.279753,a string
5,21600000000,0.880763,0.30615,0.409675,0.196021,a string
6,25200000000,0.748762,0.047665,0.093829,0.752237,a string
7,28800000000,0.108528,0.665252,0.027807,0.366883,a string
8,32400000000,0.664544,0.975722,0.74569,0.785369,
9,36000000000,0.541436,0.764584,0.752973,0.787244,
