### This notebook shows how to make a Pandas DataFrame from WarpScript GTS

### 1. From a single GTS to a DataFrame

In [1]:
%load_ext warpscript_cellmagic
%alias_magic w warpscript

Created `%%w` as an alias for `%%warpscript`.


We will need pandas and pickle libraries.

In [2]:
import pandas as pd
import pickle as pkl

We first create a random GTS.

In [3]:
%%w -s s
NEWGTS 'randGTS' RENAME 1 10 <% h RAND RAND NaN RAND ADDVALUE %> FOR

Starting connection with 127.0.0.1:25333.
Creating a new WarpScript stack accessible under variable "s".
top: 	<GTS with 10 values>



In order to make a GTS understood by a python interpreter, we store its content in a map and pickle it as a dict.<br/>
The following macro does this.

In [4]:
%%w -s s
<%
    # Documenting the macro
    'GTS BOOLEAN @GTStoPickledDict' DOC

    # Check there is two arguments on the stack
    <% DEPTH 2 < %> <% 'Macro takes two arguments' MSGFAIL %> IFT
        
    # Check that top is a boolean indicating whether to use GTS classname or selector
    <% 1 PICK TYPEOF 'BOOLEAN' != %> <% 'First argument must be a boolean indicating whether to use GTS selector (true) or classname (false)' MSGFAIL %> IFT
    
    # Check that second argument is a GTS
    <% 2 PICK TYPEOF 'GTS' != %> <% 'Second argument must be a GTS' MSGFAIL %> IFT
    
    # Store the arguments
    'withSelector' STORE
    'gts' STORE
    
    # Make name
    $gts <% $withSelector %> <% TOSELECTOR %> <% NAME %> IFTE
    'name' STORE
    
    # macro: check not all NaN (for locations and elevations)
    <% UNIQUE DUP SIZE 1 == SWAP 0 GET ISNaN && %> 'isAllNaN' STORE
        
    # Return pickled dict for pandas
    {
        # ticks
        'timestamps' $gts TICKS
        
        # locations
        $gts LOCATIONS 'lon' STORE 'lat' STORE
        <% $lat @isAllNaN ! %> <% $name '.lat' + $lat %> IFT
        <% $lon @isAllNaN ! %> <% $name '.lon' + $lon %> IFT
        
        # elevations
        $gts ELEVATIONS 'elev' STORE
        <% $elev @isAllNaN ! %> <% $name '.elev' + $elev %> IFT
        
        # values        
        $name $gts VALUES
    }
    ->PICKLE
%>
'GTStoPickledDict' STORE

top: 	<GTS with 10 values>



We evaluate the macro on the random GTS that was left on the stack.<br/>
Setting the first argument to false means we drop its labels for its pickled representation.

In [5]:
%%w -s s
false
@GTStoPickledDict

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I3600000000\nI7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\neX\x0b\x00\x00\x00randGTS.latq\x03]q\x04(G?\xd9\xe9N\x81\x00\x00\x00G?\xe9z\xd1\xe5\x00\x00\x00G?\xb3\x9d\x83\xf8\x00\x00\x00G?\xed\xe8\x9e\x88\x00\x00\x00G?\xd5\xbc J\x00\x00\x00G?\x99X{@\x00\x00\x00G?\xd7\x0e}\xf3\x00\x00\x00G?\xe63_m\x80\x00\x00G?\xbc\x83\xfb\xdc\x00\x00\x00G?\xcd\xd7\x90\xd4\x00\x00\x00eX\x0b\x00\x00\x00randGTS.lonq\x05]q\x06(G?\xec2\xc2\x85\x00\x00\x00G?\xc5\x19=\xd0\x00\x00\x00G?\xc7\xf5*\xc8\x00\x00\x00G?\xab\xe8\xd3\xa0\x00\x00\x00G?\xdc\xcd\xb62\x00\x00\x00G?\xe8\x0fb\xfd\x00\x00\x00G?\xe5\xce&2\x00\x00\x00G?\xe2\x1e\xfa\xa7\x00\x00\x00G?\xdd\x14\xb1\xc2\x00\x00\x00G?\xd9\x03\xa0\x8e\x00\x00\x00eX\x07\x00\x00\x00randGTSq\x07]q\x08(G?\xcaA\xc1\x1d\x8be8G?\xd5\xf3b\x9e\xd1\xba\xb0G?\xday\xd5(\xa2\x1f\xb4G?\xe2\x08Z\x8a\xe0<\x8dG?\xe0\xb3@\xdf\x85\xf6\xb4G?\xee\xc92\x1

We then load the dict from its pickled representation and create a pandas dataframe with it.

In [6]:
gts1 = s.pop()
df1 = pd.DataFrame.from_dict(pkl.loads(gts1))
df1

Unnamed: 0,timestamps,randGTS.lat,randGTS.lon,randGTS
0,3600000000,0.404865,0.881196,0.205132
1,7200000000,0.796243,0.164833,0.34298
2,10800000000,0.076622,0.187169,0.413686
3,14400000000,0.934646,0.054511,0.56352
4,18000000000,0.339607,0.450056,0.521882
5,21600000000,0.024752,0.751878,0.96206
6,25200000000,0.36026,0.681415,0.999211
7,28800000000,0.693771,0.566282,0.400321
8,32400000000,0.111389,0.454388,0.134888
9,36000000000,0.233141,0.390846,0.915845


In the following example, we choose to keep label information.

In [7]:
%%w -s s
NEWGTS 'randGTS' RENAME 1 10 <% h RAND RAND NaN RAND ADDVALUE %> FOR
{ 'key1' 'info1' 'key2' 'info2' } RELABEL
true
@GTStoPickledDict

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I3600000000\nI7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\neX"\x00\x00\x00randGTS{key1=info1,key2=info2}.latq\x03]q\x04(G?\x93#b\xd0\x00\x00\x00G?\xcb\xee\xdcd\x00\x00\x00G?\xee4\xcdv\x00\x00\x00G?\xef\x94\xf5\xf3\x80\x00\x00G?\xef?Jm\x80\x00\x00G?\xc6\xbe\x15\xca\x00\x00\x00G?\xc93\xdc\xd6\x00\x00\x00G?\xe0\xa5T\xb9\x00\x00\x00G?\xca\x97\xd6\x0e\x00\x00\x00G?\xdc\xa0L\x8d\x00\x00\x00eX"\x00\x00\x00randGTS{key1=info1,key2=info2}.lonq\x05]q\x06(G?\xbfm7x\x00\x00\x00G?\xe82Cr\x00\x00\x00G?\xe0f\xc7\x94\x00\x00\x00G?\xdc\x1b\xbaJ\x00\x00\x00G?\xc4-K\xdc\x00\x00\x00G?\xc1\x8f\xc3\x14\x00\x00\x00G?\xdb6\x14@\x00\x00\x00G?\xed\xb6C\xf7\x00\x00\x00G?\xe6\x9a\xa1&\x00\x00\x00G?\xb1#\x9c \x00\x00\x00eX\x1e\x00\x00\x00randGTS{key1=info1,key2=info2}q\x07]q\x08(G?\xddJR6\xdfe\xf4G?\xe6mu\x96\xd5T(G?}(98\x14p\x80G?\xbf\xc2\x19\xab\xea\xe1\x08G?j\x84\xea\xdaz3\x0

In [8]:
gts2 = s.pop()
df2 = pd.DataFrame.from_dict(pkl.loads(gts2))
df2

Unnamed: 0,timestamps,"randGTS{key1=info1,key2=info2}.lat","randGTS{key1=info1,key2=info2}.lon","randGTS{key1=info1,key2=info2}"
0,3600000000,0.01869,0.12276,0.457661
1,7200000000,0.218227,0.756136,0.700862
2,10800000000,0.943946,0.512546,0.007118
3,14400000000,0.986934,0.439192,0.124055
4,18000000000,0.976476,0.157632,0.003237
5,21600000000,0.177676,0.1372,0.710023
6,25200000000,0.196895,0.425176,0.890975
7,28800000000,0.520182,0.928499,0.347249
8,32400000000,0.207759,0.706376,0.107632
9,36000000000,0.447284,0.06695,0.699648


We can also not use geo information.

In [9]:
%%w -s s
NEWGTS 'randTS' RENAME 2 11 <% h NaN NaN NaN RAND ADDVALUE %> FOR
false
@GTStoPickledDict

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\nI39600000000\neX\x06\x00\x00\x00randTSq\x03]q\x04(G?\xb7Qq\xe3E\xed0G?\x98PcL\x08\xc6\xa0G?\xdd\x94\xfa\x1e\xd1\x85TG?\xe7\x93^\xe6:\xcc\xfeG?\xef\xa6\xf5\x8a\x07\x95HG?\xca\xe5\x11@\x0c\x0clG?\x9c_E\xfb\xfa\x9c@G?\xed\xd4^\xd5-\r\x90G?\xa0\x8f\xf98\xf4\xff\x10G?\xcc\xdb,\xdf\xee\x7f\xa8eu.'



In [10]:
gts3 = s.pop()
df3 = pd.DataFrame.from_dict(pkl.loads(gts3))
df3

Unnamed: 0,timestamps,randTS
0,7200000000,0.091087
1,10800000000,0.023744
2,14400000000,0.462218
3,18000000000,0.73674
4,21600000000,0.989131
5,25200000000,0.210116
6,28800000000,0.027707
7,32400000000,0.932174
8,36000000000,0.032348
9,39600000000,0.225439


### 2. Revert a DataFrame to a GTS

To revert a DataFrame to a GTS, we first need to convert the DataFrame into a dict.

In [11]:
gts1b = df1.to_dict('list')
gts1b

{'timestamps': [3600000000,
  7200000000,
  10800000000,
  14400000000,
  18000000000,
  21600000000,
  25200000000,
  28800000000,
  32400000000,
  36000000000],
 'randGTS.lat': [0.40486490819603205,
  0.7962426636368036,
  0.0766222458332777,
  0.9346459060907364,
  0.33960730768740177,
  0.02475159242749214,
  0.3602595208212733,
  0.6937710894271731,
  0.11138891335576773,
  0.23314104415476322],
 'randGTS.lon': [0.8811962697654963,
  0.16483280807733536,
  0.1871694065630436,
  0.054510701447725296,
  0.4500556457787752,
  0.7518782559782267,
  0.6814146973192692,
  0.5662816297262907,
  0.45438808389008045,
  0.39084638468921185],
 'randGTS': [0.20513166372874436,
  0.3429800559014593,
  0.4136860749063047,
  0.5635197365208043,
  0.5218815198602074,
  0.962060025370153,
  0.999211052250773,
  0.4003212969848394,
  0.13488763645544766,
  0.9158450347747287]}

We can push this dict directly onto the stack, since it will be automatically converted in the JVM.

In [12]:
s.push(gts1b)
s

top: 	{'randGTS.lat': [0.40486490819603205, 0.7962426636368036, 0.0766222458332777, 0.9346459060907364, 0.33960730768740177, 0.02475159242749214, 0.3602595208212733, 0.6937710894271731, 0.11138891335576773, 0.23314104415476322], 'timestamps': [3600000000, 7200000000, 10800000000, 14400000000, 18000000000, 21600000000, 25200000000, 28800000000, 32400000000, 36000000000], 'randGTS': [0.20513166372874436, 0.3429800559014593, 0.4136860749063047, 0.5635197365208043, 0.5218815198602074, 0.962060025370153, 0.999211052250773, 0.4003212969848394, 0.13488763645544766, 0.9158450347747287], 'randGTS.lon': [0.8811962697654963, 0.16483280807733536, 0.1871694065630436, 0.054510701447725296, 0.4500556457787752, 0.7518782559782267, 0.6814146973192692, 0.5662816297262907, 0.45438808389008045, 0.39084638468921185]}

Now we can use the lists contained in this map to populate a GTS.

In [13]:
%%w -s s
'dict' STORE
$dict 'timestamps' GET
$dict 'randGTS.lat' GET
$dict 'randGTS.lon' GET
[] // no elevation
$dict 'randGTS' GET
MAKEGTS 'randGTS' RENAME

top: 	<GTS with 10 values>



In [14]:
print(s.pop().toString())

randGTS{}
=3600000000/0.40486490819603205:0.8811962697654963/ 0.20513166372874436
=7200000000/0.7962426636368036:0.16483280807733536/ 0.3429800559014593
=10800000000/0.0766222458332777:0.1871694065630436/ 0.4136860749063047
=14400000000/0.9346459060907364:0.054510701447725296/ 0.5635197365208043
=18000000000/0.33960730768740177:0.4500556457787752/ 0.5218815198602074
=21600000000/0.02475159242749214:0.7518782559782267/ 0.962060025370153
=25200000000/0.3602595208212733:0.6814146973192692/ 0.999211052250773
=28800000000/0.6937710894271731:0.5662816297262907/ 0.4003212969848394
=32400000000/0.11138891335576773:0.45438808389008045/ 0.13488763645544766
=36000000000/0.23314104415476322:0.39084638468921185/ 0.9158450347747287



### 3. From a list of GTS to a DataFrame

When converting a list of GTS to a DataFrame, we need to handle missing values in the resulting DataFrame since the<br/>
GTS can have different timestamps. It is more efficient to do that in WarpScript, as done in by following macro.

In [15]:
%%w -s s -o
<%
    # Documenting the macro
    '[GTS] BOOLEAN @ListGTStoPickledDict' DOC

    # Check there is two arguments on the stack
    <% DEPTH 2 < %> <% 'Macro takes two arguments' MSGFAIL %> IFT
        
    # Check that top is a boolean indicating whether to use GTS classname or selector
    <% 1 PICK TYPEOF 'BOOLEAN' != %> <% 'First argument must be a boolean indicating whether to use GTS selector (true) or classname (false)' MSGFAIL %> IFT
    
    # Check that second argument is a list of GTS
    <% 2 PICK TYPEOF 'LIST' != %> <% 'Second argument must be a List of GTS' MSGFAIL %> IFT
    2 PICK <% <% TYPEOF 'GTS' != %> <% 'Second argument is a list that has an element that is not a GTS' MSGFAIL %> IFT %> FOREACH
    
    # Store the arguments
    'withSelector' STORE
    'gtsList' STORE
    
    # make tickbase of all GTS
    $gtsList TICKS 'ticks' STORE
    $ticks [] [] [] $ticks MAKEGTS 'baseGTS' STORE
    
    # macro: check not all NaN (for locations and elevations)
    <% UNIQUE DUP SIZE 1 == SWAP 0 GET ISNaN && %> 'isAllNaN' STORE
        
    # Return pickled dict for pandas
    {
        # ticks
        'timestamps' $ticks
        
        # loop over list of GTS
        $gtsList
        <%
            'gts' STORE
            
            # Make name
            $gts <% $withSelector %> <% TOSELECTOR %> <% NAME %> IFTE
            'name' STORE
        
            # Put on the same tick base and fill missing values with NaN
            [ $gts true mapper.replace 0 0 0 ] MAP
            'mask' STORE
            [ $mask [ $baseGTS ] [] op.negmask ] APPLY
            [ SWAP NaN mapper.replace 0 0 0 ] MAP
            0 GET 'residualSeries' STORE
            [ $gts $residualSeries ] MERGE SORT
            'gts' STORE
        
            # locations
            $gts LOCATIONS 'lon' STORE 'lat' STORE
            <% $lat @isAllNaN ! %> <% $name '.lat' + $lat %> IFT
            <% $lon @isAllNaN ! %> <% $name '.lon' + $lon %> IFT
        
            # elevations
            $gts ELEVATIONS 'elev' STORE
            <% $elev @isAllNaN ! %> <% $name '.elev' + $elev %> IFT
        
            # values        
            $name $gts VALUES
        %>
        FOREACH
    }
    ->PICKLE
%>
'ListGTStoPickledDict' STORE

Creating a new WarpScript stack accessible under variable "s".



We apply the macro ListGTStoPickledDict similarly than GTStoPickledDict,<br/>
except that it takes a list of GTS instead of a single GTS as second argument.

In [16]:
%%w -s s
[ NEWGTS 'randGTS' RENAME 1 10 <% h RAND RAND NaN RAND ADDVALUE %> FOR
  NEWGTS 'randTS' RENAME 2 11 <% h NaN NaN NaN RAND ADDVALUE %> FOR
  NEWGTS 'stringTS' RENAME 5 8 <% h NaN NaN NaN 'a string' ADDVALUE %> FOR ]
false
@ListGTStoPickledDict

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I3600000000\nI7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\nI39600000000\neX\x0b\x00\x00\x00randGTS.latq\x03]q\x04(G?\xe0\xcd\xff1\x00\x00\x00G?\xe5\xf2A\xd4\x00\x00\x00G?\xdf\x04\xba\xd6\x00\x00\x00G?\xdfL:\x96\x00\x00\x00G?\xc6\x12\xe9H\x00\x00\x00G?\xaa\xd8(0\x00\x00\x00G?\x95\xad\x9c\xa0\x00\x00\x00G?\xdb\x10Eb\x00\x00\x00G?\xd9h;\xd9\x00\x00\x00G?\xe6(\xf5Z\x80\x00\x00G\x7f\xf8\x00\x00\x00\x00\x00\x00eX\x0b\x00\x00\x00randGTS.lonq\x05]q\x06(G?\xday\xf4\x0e\x00\x00\x00G?\xe5\xb81\x10\x00\x00\x00G?\xe6qY[\x00\x00\x00G?\xc66\xec\xa8\x00\x00\x00G?\xd9,\xc0r\x00\x00\x00G?\xd1!\x016\x00\x00\x00G?\xc90\xa3\x04\x00\x00\x00G?\xc8\xee\x94\xdc\x00\x00\x00G?\xdf\x87\x14\x18\x00\x00\x00G?\xe1\xb38l\x00\x00\x00G\x7f\xf8\x00\x00\x00\x00\x00\x00eX\x07\x00\x00\x00randGTSq\x07]q\x08(G?\xa1\x13!\xeb\xe1S\xd0G?\xbf"\xf7\x07<.pG?\xed\x19=\r7q\xf8G?\xe49P\x9e!#\xd1G?

In [17]:
listGts = s.pop()
df4 = pd.DataFrame.from_dict(pkl.loads(listGts))
df4

Unnamed: 0,timestamps,randGTS.lat,randGTS.lon,randGTS,randTS,stringTS
0,3600000000,0.525146,0.413693,0.033349,,
1,7200000000,0.685822,0.678734,0.121627,0.857549,
2,10800000000,0.484664,0.701337,0.909331,0.403845,
3,14400000000,0.489028,0.173551,0.631996,0.657455,
4,18000000000,0.172452,0.393356,0.406904,0.349516,a string
5,21600000000,0.05243,0.267639,0.888134,0.949335,a string
6,25200000000,0.02117,0.196797,0.296513,0.663908,a string
7,28800000000,0.422868,0.194781,0.323313,0.82108,a string
8,32400000000,0.396987,0.49262,0.525312,0.548141,
9,36000000000,0.6925,0.553127,0.838751,0.586068,
