### This notebook shows how to make a Pandas DataFrame from WarpScript GTS

### 1. From a single GTS to a DataFrame

In [1]:
%load_ext warpscript_cellmagic
%alias_magic w warpscript

Created `%%w` as an alias for `%%warpscript`.


We will need pandas and pickle libraries.

In [2]:
import pandas as pd
import pickle as pkl

We first create a random GTS.

In [3]:
%%w -s s
NEWGTS 'randGTS' RENAME 1 10 <% h RAND RAND NaN RAND ADDVALUE %> FOR

Starting connection with 127.0.0.1:25333.
Creating a new WarpScript stack accessible under variable "s".
top: 	<GTS with 10 values>



In order to make a GTS understood by a python interpreter, we store its content in a map and pickle it as a dict.<br/>
The following macro does this.

In [4]:
%%w -s s
<%
    # Documenting the macro
    'GTS BOOLEAN @GTStoPickledDict' DOC

    # Check there is two arguments on the stack
    <% DEPTH 2 < %> <% 'Macro takes two arguments' MSGFAIL %> IFT
        
    # Check that top is a boolean indicating whether to use GTS classname or selector
    <% 1 PICK TYPEOF 'BOOLEAN' != %> <% 'First argument must be a boolean indicating whether to use GTS selector (true) or classname (false)' MSGFAIL %> IFT
    
    # Check that second argument is a GTS
    <% 2 PICK TYPEOF 'GTS' != %> <% 'Second argument must be a GTS' MSGFAIL %> IFT
    
    # Store the arguments
    'withSelector' STORE
    'gts' STORE
    
    # Make name
    $gts <% $withSelector %> <% TOSELECTOR %> <% NAME %> IFTE
    'name' STORE
    
    # macro: check not all NaN (for locations and elevations)
    <% UNIQUE DUP SIZE 1 == SWAP 0 GET ISNaN && %> 'isAllNaN' STORE
        
    # Return pickled dict for pandas
    {
        # ticks
        'timestamps' $gts TICKS
        
        # locations
        $gts LOCATIONS 'lon' STORE 'lat' STORE
        <% $lat @isAllNaN ! %> <% $name '.lat' + $lat %> IFT
        <% $lon @isAllNaN ! %> <% $name '.lon' + $lon %> IFT
        
        # elevations
        $gts ELEVATIONS 'elev' STORE
        <% $elev @isAllNaN ! %> <% $name '.elev' + $elev %> IFT
        
        # values        
        $name $gts VALUES
    }
    ->PICKLE
%>
'GTStoPickledDict' STORE

top: 	<GTS with 10 values>



We evaluate the macro on the random GTS that was left on the stack.<br/>
Setting the first argument to false means we drop its labels for its pickled representation.

In [5]:
%%w -s s
false
@GTStoPickledDict

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I3600000000\nI7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\neX\x0b\x00\x00\x00randGTS.latq\x03]q\x04(G?\xedt\xe2m\x00\x00\x00G?\xdd\xfbI\xac\x00\x00\x00G?\xed\x82\xb0\x82\x00\x00\x00G?\xe5\xc0\x9a\xd2\x80\x00\x00G?\xd8u\x7fk\x00\x00\x00G?\xde\xf9p\x1d\x00\x00\x00G?\xef\x0b\x93`\x00\x00\x00G?\xaa\nK\x90\x00\x00\x00G?\xcd\x03\xfa0\x00\x00\x00G?\xc2W\x05\n\x00\x00\x00eX\x0b\x00\x00\x00randGTS.lonq\x05]q\x06(G?\xcap$$\x00\x00\x00G?\xc2\xef\n\xa8\x00\x00\x00G?\xcd1\xfa \x00\x00\x00G?\xe54\x95l\x00\x00\x00G?\xeb\x8d\x14\x9d\x00\x00\x00G?\xd7\xdd\x10(\x00\x00\x00G?\xe9?f#\x00\x00\x00G?\xd0mm\xd8\x00\x00\x00G?\xe4\xe9h\x03\x00\x00\x00G?\xed=\x8c\xeb\x00\x00\x00eX\x07\x00\x00\x00randGTSq\x07]q\x08(G?\xeb\xad_\xc1\xc7\xcd\xd9G?\xd2\xfc_\x14\xb28\x86G?\xe5\xa7\x0f\xc6J\xaa\xdbG?\xd1{\x8b\nIAnG?\xe5\xe6\x85\x16\n\x05\xfeG?\xee,\x08p\xa2\xeb\x85G?\xc9\x060F\xdd\x

We then load the dict from its pickled representation and create a pandas dataframe with it.

In [6]:
gts1 = s.pop()
df1 = pd.DataFrame.from_dict(pkl.loads(gts1))
df1

Unnamed: 0,timestamps,randGTS.lat,randGTS.lon,randGTS
0,3600000000,0.920518,0.206547,0.864914
1,7200000000,0.468462,0.14792,0.296654
2,10800000000,0.922203,0.228088,0.676643
3,14400000000,0.679761,0.662669,0.273165
4,18000000000,0.382171,0.860972,0.68439
5,21600000000,0.483974,0.372868,0.942875
6,25200000000,0.970163,0.788989,0.195501
7,28800000000,0.05086,0.256679,0.425685
8,32400000000,0.226684,0.653492,0.008506
9,36000000000,0.143281,0.913763,0.700398


In the following example, we choose to keep label information.

In [7]:
%%w -s s
NEWGTS 'randGTS' RENAME 1 10 <% h RAND RAND NaN RAND ADDVALUE %> FOR
{ 'key1' 'info1' 'key2' 'info2' } RELABEL
true
@GTStoPickledDict

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I3600000000\nI7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\neX"\x00\x00\x00randGTS{key1=info1,key2=info2}.latq\x03]q\x04(G?\xe9\xd2\xde\x80\x00\x00\x00G?\xeb\xf1\xe5\x85\x00\x00\x00G?\xdao\xc0\x1f\x00\x00\x00G?\xe8\xb6N\xf8\x80\x00\x00G?\xda\xbd\x9bn\x00\x00\x00G?\xdc\x9c\x97U\x00\x00\x00G?^\xf8C\x00\x00\x00\x00G?\xd9\x1dQi\x00\x00\x00G?\xd2\xc6\x9d\xb6\x00\x00\x00G?\xdc\xe0SF\x00\x00\x00eX"\x00\x00\x00randGTS{key1=info1,key2=info2}.lonq\x05]q\x06(G?\xe9s\xca\xc6\x00\x00\x00G?\xc0\xc8_\x04\x00\x00\x00G?\xe7\xe2\xed\xae\x00\x00\x00G?\xea\x90)\xb3\x00\x00\x00G?\xd0W\x80r\x00\x00\x00G?\xeb\xe1\xe7\x93\x00\x00\x00G?\xd5\xbci\xc4\x00\x00\x00G?\xd3;\xeb \x00\x00\x00G?\xed2\rv\x00\x00\x00G?\xc7\xc7e4\x00\x00\x00eX\x1e\x00\x00\x00randGTS{key1=info1,key2=info2}q\x07]q\x08(G?\xab\xde\xe1\xc2\xa1A\xd0G?\xd4\xd9\'\x11\xba\xc0lG?\xe6\xa6\n\xde\xb9\xd2;G?\xef@b\x1

In [8]:
gts2 = s.pop()
df2 = pd.DataFrame.from_dict(pkl.loads(gts2))
df2

Unnamed: 0,timestamps,"randGTS{key1=info1,key2=info2}.lat","randGTS{key1=info1,key2=info2}.lon","randGTS{key1=info1,key2=info2}"
0,3600000000,0.806991,0.795385,0.054435
1,7200000000,0.873278,0.131115,0.325754
2,10800000000,0.413071,0.746451,0.707769
3,14400000000,0.772254,0.830098,0.976609
4,18000000000,0.417823,0.255341,0.893858
5,21600000000,0.447058,0.871326,0.662417
6,25200000000,0.00189,0.339625,0.27934
7,28800000000,0.392414,0.300532,0.417766
8,32400000000,0.293373,0.91236,0.481557
9,36000000000,0.451192,0.185773,0.98421


We can also not use geo information.

In [9]:
%%w -s s
NEWGTS 'randTS' RENAME 2 11 <% h NaN NaN NaN RAND ADDVALUE %> FOR
false
@GTStoPickledDict

top: 	b'\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\nI39600000000\neX\x06\x00\x00\x00randTSq\x03]q\x04(G?\xb4\xe4\xcd2\xd2L\xa0G?\xc9?\xbe\xf6\xf4\x03\x84G?\xed\xcf\x12 \x1c]\xd3G?\xcd\xa8\x97\x92=\x9f\x04G?\xa1\x0bn\xd6:\x00\xa0G?\xd0CD\xfa\xa2\x9c\x0cG?\xe8\xb8>:\x00e\xc4G?\xed\x80\x92O\x12\xdf\xe0G?\xda\xce\xaf\xd1\xe2U\xe6G?\xee\x0e\xcc<\xe3\xa7\xfeeu.'



In [10]:
gts3 = s.pop()
df3 = pd.DataFrame.from_dict(pkl.loads(gts3))
df3

Unnamed: 0,timestamps,randTS
0,7200000000,0.081616
1,10800000000,0.197258
2,14400000000,0.931527
3,18000000000,0.231708
4,21600000000,0.03329
5,25200000000,0.254106
6,28800000000,0.772491
7,32400000000,0.921945
8,36000000000,0.418865
9,39600000000,0.939306


### 2. Revert a DataFrame to a GTS

To revert a DataFrame to a GTS, we first need to convert the DataFrame into a dict.

In [11]:
gts1b = df1.to_dict('list')
gts1b

{'timestamps': [3600000000,
  7200000000,
  10800000000,
  14400000000,
  18000000000,
  21600000000,
  25200000000,
  28800000000,
  32400000000,
  36000000000],
 'randGTS.lat': [0.9205181244760752,
  0.46846238896250725,
  0.9222033061087132,
  0.6797613250091672,
  0.3821714920923114,
  0.48397448379546404,
  0.9701630473136902,
  0.05085979588329792,
  0.22668387740850449,
  0.14328062999993563],
 'randGTS.lon': [0.20654727704823017,
  0.14791997149586678,
  0.2280876785516739,
  0.6626689061522484,
  0.8609717432409525,
  0.3728676214814186,
  0.7889891322702169,
  0.2566790208220482,
  0.6534919794648886,
  0.9137634839862585],
 'randGTS': [0.8649138245368092,
  0.2966535284148296,
  0.6766432640918895,
  0.27316547398097313,
  0.6843896322681642,
  0.9428751182489213,
  0.19550136051428335,
  0.42568543424880434,
  0.008505697092029196,
  0.700397843905618]}

We can push this dict directly onto the stack, since it will be automatically converted in the JVM.

In [12]:
s.push(gts1b)
s

top: 	{'randGTS.lat': [0.9205181244760752, 0.46846238896250725, 0.9222033061087132, 0.6797613250091672, 0.3821714920923114, 0.48397448379546404, 0.9701630473136902, 0.05085979588329792, 0.22668387740850449, 0.14328062999993563], 'timestamps': [3600000000, 7200000000, 10800000000, 14400000000, 18000000000, 21600000000, 25200000000, 28800000000, 32400000000, 36000000000], 'randGTS': [0.8649138245368092, 0.2966535284148296, 0.6766432640918895, 0.27316547398097313, 0.6843896322681642, 0.9428751182489213, 0.19550136051428335, 0.42568543424880434, 0.008505697092029196, 0.700397843905618], 'randGTS.lon': [0.20654727704823017, 0.14791997149586678, 0.2280876785516739, 0.6626689061522484, 0.8609717432409525, 0.3728676214814186, 0.7889891322702169, 0.2566790208220482, 0.6534919794648886, 0.9137634839862585]}

Now we can use the lists contained in this map to populate a GTS.

In [13]:
%%w -s s
'dict' STORE
$dict 'timestamps' GET
$dict 'randGTS.lat' GET
$dict 'randGTS.lon' GET
[] // no elevation
$dict 'randGTS' GET
MAKEGTS 'randGTS' RENAME

top: 	<GTS with 10 values>



In [14]:
print(s.pop().toString())

randGTS{}
=3600000000/0.9205181244760752:0.20654727704823017/ 0.8649138245368092
=7200000000/0.46846238896250725:0.14791997149586678/ 0.2966535284148296
=10800000000/0.9222033061087132:0.2280876785516739/ 0.6766432640918895
=14400000000/0.6797613250091672:0.6626689061522484/ 0.27316547398097313
=18000000000/0.3821714920923114:0.8609717432409525/ 0.6843896322681642
=21600000000/0.48397448379546404:0.3728676214814186/ 0.9428751182489213
=25200000000/0.9701630473136902:0.7889891322702169/ 0.19550136051428335
=28800000000/0.05085979588329792:0.2566790208220482/ 0.42568543424880434
=32400000000/0.22668387740850449:0.6534919794648886/ 0.008505697092029196
=36000000000/0.14328062999993563:0.9137634839862585/ 0.700397843905618



### 3. From a list of GTS to a DataFrame

When converting a list of GTS to a DataFrame, we need to handle missing values in the resulting DataFrame since the<br/>
GTS can have different timestamps. It is more efficient to do that in WarpScript, as done in by following macro.

In [15]:
%%w -s s -o
<%
    # Documenting the macro
    '[GTS] BOOLEAN @ListGTStoPickledDict' DOC

    # Check there is two arguments on the stack
    <% DEPTH 2 < %> <% 'Macro takes two arguments' MSGFAIL %> IFT
        
    # Check that top is a boolean indicating whether to use GTS classname or selector
    <% 1 PICK TYPEOF 'BOOLEAN' != %> <% 'First argument must be a boolean indicating whether to use GTS selector (true) or classname (false)' MSGFAIL %> IFT
    
    # Check that second argument is a list of GTS
    <% 2 PICK TYPEOF 'LIST' != %> <% 'Second argument must be a List of GTS' MSGFAIL %> IFT
    2 PICK <% <% TYPEOF 'GTS' != %> <% 'Second argument is a list that has an element that is not a GTS' MSGFAIL %> IFT %> FOREACH
    
    # Store the arguments
    'withSelector' STORE
    'gtsList' STORE
    
    # make tickbase of all GTS
    $gtsList TICKS 'ticks' STORE
    $ticks [] [] [] $ticks MAKEGTS 'baseGTS' STORE
    
    # macro: check not all NaN (for locations and elevations)
    <% UNIQUE DUP SIZE 1 == SWAP 0 GET ISNaN && %> 'isAllNaN' STORE
        
    # Return pickled dict for pandas
    {
        # ticks
        'timestamps' $ticks
        
        # loop over list of GTS
        $gtsList
        <%
            'gts' STORE
            
            # Make name
            $gts <% $withSelector %> <% TOSELECTOR %> <% NAME %> IFTE
            'name' STORE
        
            # Put on the same tick base and fill missing values with NaN
            [ $gts true mapper.replace 0 0 0 ] MAP
            'mask' STORE
            [ $mask [ $baseGTS ] [] op.negmask ] APPLY
            [ SWAP NaN mapper.replace 0 0 0 ] MAP
            0 GET 'residualSeries' STORE
            [ $gts $residualSeries ] MERGE SORT
            'gts' STORE
        
            # locations
            $gts LOCATIONS 'lon' STORE 'lat' STORE
            <% $lat @isAllNaN ! %> <% $name '.lat' + $lat %> IFT
            <% $lon @isAllNaN ! %> <% $name '.lon' + $lon %> IFT
        
            # elevations
            $gts ELEVATIONS 'elev' STORE
            <% $elev @isAllNaN ! %> <% $name '.elev' + $elev %> IFT
        
            # values        
            $name $gts VALUES
        %>
        FOREACH
    }
    ->PICKLE
%>
'ListGTStoPickledDict' STORE

Creating a new WarpScript stack accessible under variable "s".



We apply the macro ListGTStoPickledDict similarly than GTStoPickledDict,<br/>
except that it takes a list of GTS instead of a single GTS as second argument.

In [16]:
%%w -s s
[ NEWGTS 'randGTS' RENAME 1 10 <% h RAND RAND NaN RAND ADDVALUE %> FOR
  NEWGTS 'randTS' RENAME 2 11 <% h NaN NaN NaN RAND ADDVALUE %> FOR
  NEWGTS 'stringTS' RENAME 5 8 <% h NaN NaN NaN 'a string' ADDVALUE %> FOR ]
false
@ListGTStoPickledDict

top: 	b"\x80\x02}q\x00(X\n\x00\x00\x00timestampsq\x01]q\x02(I3600000000\nI7200000000\nI10800000000\nI14400000000\nI18000000000\nI21600000000\nI25200000000\nI28800000000\nI32400000000\nI36000000000\nI39600000000\neX\x0b\x00\x00\x00randGTS.latq\x03]q\x04(G?\xc1!Sz\x00\x00\x00G?\xdd/\x14T\x00\x00\x00G?\xcc,\\\x10\x00\x00\x00G?\xe5\xa3[\xc5\x80\x00\x00G?\xed\x0b\xf3\x89\x00\x00\x00G?\xe2y\xe7n\x00\x00\x00G?\xd1\xde\xcdB\x00\x00\x00G?\xe1\x9dd\x99\x80\x00\x00G?\xdaO\x95y\x00\x00\x00G?\xdb24Q\x00\x00\x00G\x7f\xf8\x00\x00\x00\x00\x00\x00eX\x0b\x00\x00\x00randGTS.lonq\x05]q\x06(G?\xdf\x8f\xd1\x16\x00\x00\x00G?\xec\x8c{\xae\x00\x00\x00G?\xba\xec\xbdh\x00\x00\x00G?\xe0J2\xdc\x00\x00\x00G?\xe6\x10G\xdd\x00\x00\x00G?\xe3l,'\x00\x00\x00G?\xdd\x9f\xa2d\x00\x00\x00G?\xc0j\xc3\xd0\x00\x00\x00G?\xe6\xacf\x04\x00\x00\x00G?\xe4\x06\xa3\xe7\x00\x00\x00G\x7f\xf8\x00\x00\x00\x00\x00\x00eX\x07\x00\x00\x00randGTSq\x07]q\x08(G?\xc7\x9f\xd5@\xd2%\x98G?\xa6\xfe\x82\xbc\x80G\xd0G?\xde\x19\xf1\x1bn\x16\xd2G?\xd1\x

In [17]:
listGts = s.pop()
df4 = pd.DataFrame.from_dict(pkl.loads(listGts))
df4

Unnamed: 0,timestamps,randGTS.lat,randGTS.lon,randGTS,randTS,stringTS
0,3600000000,0.13383,0.493153,0.184565,,
1,7200000000,0.455998,0.892149,0.044911,0.889899,
2,10800000000,0.220104,0.105175,0.470333,0.374779,
3,14400000000,0.676191,0.509057,0.26571,0.310753,
4,18000000000,0.907709,0.689487,0.31798,0.341025,a string
5,21600000000,0.577381,0.606955,0.224143,0.221835,a string
6,25200000000,0.279224,0.462868,0.725985,0.857803,a string
7,28800000000,0.550463,0.128258,0.136351,0.162575,a string
8,32400000000,0.411107,0.708545,0.905575,0.252265,
9,36000000000,0.424939,0.625811,0.017779,0.279712,
