To summarize the contents of this notebook:

1. I have read in the Train and Test datasets and reduced their sizes dramatically.
2. I have also written them as pickle files into the working directory, which means loading them later takes a very short amount of time.
3. I confirmed that there are 600 satellites in Train and their true and simulated kinematics were all recorded in January 2014. Likewise, there are 300 satellites in Test and their simulated kinematics were all recorded in February 2014.
4. I inserted a new column called absolute time (abs_time) as a record of time that has passed since 00:00:00 1-Jan-2014, for both Train and Test.
5. I extracted Satellite-1 data (the second satellite out of 600) and wrote them into pickle files. 


Not entirely sure what to do next, but this is my overview of the next few objectives:

- Study time series analysis, forecasting and support vector machines.
- Read more papers about the subject of predicting RSO (Resident Space Objects) trajectories using a ML approach.
- Carry out a literature review (2000 words should be fine).
- Use Sat_1 data as a small sample to learn to apply forecasting techniques before moving on to the remaining 599.
- Have to submit predictions to the IDAO portal before the deadline on 12 Feb 2020

**Ismail Dawoodjee 9:08am 22-Jan-2020**

### 20-Jan-2020

## File Size Reduction

In [1]:
import os
import numpy as np
import pandas as pd
import datetime as dt

In [20]:
%%time
train = pd.read_csv('train.csv')

Wall time: 17.2 s


In [16]:
# refer to https://www.kaggle.com/frankherfert/tips-tricks-for-working-with-large-datasets
train.info(memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 649912 entries, 0 to 649911
Data columns (total 15 columns):
id        649912 non-null int64
epoch     649912 non-null object
sat_id    649912 non-null int64
x         649912 non-null float64
y         649912 non-null float64
z         649912 non-null float64
Vx        649912 non-null float64
Vy        649912 non-null float64
Vz        649912 non-null float64
x_sim     649912 non-null float64
y_sim     649912 non-null float64
z_sim     649912 non-null float64
Vx_sim    649912 non-null float64
Vy_sim    649912 non-null float64
Vz_sim    649912 non-null float64
dtypes: float64(12), int64(2), object(1)
memory usage: 119.0 MB


In [4]:
# epoch column is an object that uses up the most memory
train.memory_usage(deep=True) * 1e-6

Index      0.000080
id         5.199296
epoch     51.992960
sat_id     5.199296
x          5.199296
y          5.199296
z          5.199296
Vx         5.199296
Vy         5.199296
Vz         5.199296
x_sim      5.199296
y_sim      5.199296
z_sim      5.199296
Vx_sim     5.199296
Vy_sim     5.199296
Vz_sim     5.199296
dtype: float64

In [27]:
# convert epoch to datetime
print("size before:", train["epoch"].memory_usage(deep=True) * 1e-6)
train["epoch"] = pd.to_datetime(train["epoch"], infer_datetime_format=True)
print("size after: ", train["epoch"].memory_usage(deep=True) * 1e-6)

size before: 51.99304
size after:  5.199376


In [28]:
train.memory_usage(deep=True).sum() * 1e-6

77.98952

In [13]:
train.head()

Unnamed: 0,id,epoch,sat_id,x,y,z,Vx,Vy,Vz,x_sim,y_sim,z_sim,Vx_sim,Vy_sim,Vz_sim
0,0,2014-01-01 00:00:00.000,0,-8855.823863,13117.780146,-20728.353233,-0.908303,-3.808436,-2.022083,-8843.131454,13138.22169,-20741.615306,-0.907527,-3.80493,-2.024133
1,1,2014-01-01 00:46:43.000,0,-10567.672384,1619.746066,-24451.813271,-0.30259,-4.272617,-0.612796,-10555.500066,1649.289367,-24473.089556,-0.303704,-4.269816,-0.616468
2,2,2014-01-01 01:33:26.001,0,-10578.684043,-10180.46746,-24238.280949,0.277435,-4.047522,0.723155,-10571.858472,-10145.939908,-24271.169776,0.27488,-4.046788,0.718768
3,3,2014-01-01 02:20:09.001,0,-9148.251857,-20651.43746,-20720.381279,0.7156,-3.373762,1.722115,-9149.620794,-20618.200201,-20765.019094,0.712437,-3.375202,1.718306
4,4,2014-01-01 03:06:52.002,0,-6719.092336,-28929.061629,-14938.907967,0.992507,-2.519732,2.344703,-6729.358857,-28902.271436,-14992.399986,0.989382,-2.522618,2.342237


In [41]:
# save to a pickle file 
train.to_pickle("train.pkl")

In [42]:
# compare sizes
print("train.csv:", os.stat('train.csv').st_size * 1e-6)
print("train.pkl:", os.stat('train.pkl').st_size * 1e-6)

train.csv: 170.63602
train.pkl: 77.990578


In [43]:
# delete train dataset
del train

In [44]:

train = pd.read_pickle('train.pkl')

In [45]:
# get min and max values for integer columns; ljust is left justify
for col in train[['id','sat_id']]:
    print(col.ljust(10), f"min:{train[col].min()}".ljust(10), f"max:{train[col].max()}")

id         min:0      max:1234093
sat_id     min:0      max:599


In [46]:
# python by default uses 64 bit integers, so convert them to unsigned 32bit or 16bit integers
# DO NOT convert float64 to lower types because less decimals mean less accuracy
train["id"] = pd.to_numeric(train["id"], downcast="unsigned")

In [47]:
train["sat_id"] = pd.to_numeric(train["sat_id"], downcast="unsigned")

In [48]:
train[['id','sat_id']].memory_usage(deep=True) * 1e-6

Index     0.000080
id        2.599648
sat_id    1.299824
dtype: float64

In [18]:
train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 649912 entries, 0 to 649911
Data columns (total 15 columns):
id        649912 non-null int64
epoch     649912 non-null datetime64[ns]
sat_id    649912 non-null int64
x         649912 non-null float64
y         649912 non-null float64
z         649912 non-null float64
Vx        649912 non-null float64
Vy        649912 non-null float64
Vz        649912 non-null float64
x_sim     649912 non-null float64
y_sim     649912 non-null float64
z_sim     649912 non-null float64
Vx_sim    649912 non-null float64
Vy_sim    649912 non-null float64
Vz_sim    649912 non-null float64
dtypes: datetime64[ns](1), float64(12), int64(2)
memory usage: 74.4 MB


In [50]:
train.memory_usage(deep=True) * 1e-6

Index     0.000080
id        2.599648
epoch     5.199296
sat_id    1.299824
x         5.199296
y         5.199296
z         5.199296
Vx        5.199296
Vy        5.199296
Vz        5.199296
x_sim     5.199296
y_sim     5.199296
z_sim     5.199296
Vx_sim    5.199296
Vy_sim    5.199296
Vz_sim    5.199296
dtype: float64

In [51]:
# smallest possible size made into a pickle file
train.to_pickle('train_initial.pkl')

## Data Exploration

In [52]:
%%time
train = pd.read_pickle('train_initial.pkl')

Wall time: 297 ms


In [76]:
train.head()

Unnamed: 0,id,epoch,sat_id,x,y,z,Vx,Vy,Vz,x_sim,y_sim,z_sim,Vx_sim,Vy_sim,Vz_sim
0,0,2014-01-01 00:00:00.000,0,-8855.823863,13117.780146,-20728.353233,-0.908303,-3.808436,-2.022083,-8843.131454,13138.22169,-20741.615306,-0.907527,-3.80493,-2.024133
1,1,2014-01-01 00:46:43.000,0,-10567.672384,1619.746066,-24451.813271,-0.30259,-4.272617,-0.612796,-10555.500066,1649.289367,-24473.089556,-0.303704,-4.269816,-0.616468
2,2,2014-01-01 01:33:26.001,0,-10578.684043,-10180.46746,-24238.280949,0.277435,-4.047522,0.723155,-10571.858472,-10145.939908,-24271.169776,0.27488,-4.046788,0.718768
3,3,2014-01-01 02:20:09.001,0,-9148.251857,-20651.43746,-20720.381279,0.7156,-3.373762,1.722115,-9149.620794,-20618.200201,-20765.019094,0.712437,-3.375202,1.718306
4,4,2014-01-01 03:06:52.002,0,-6719.092336,-28929.061629,-14938.907967,0.992507,-2.519732,2.344703,-6729.358857,-28902.271436,-14992.399986,0.989382,-2.522618,2.342237


In [24]:
# explore epoch column; extract years, months, days
epoch_year = train['epoch'].dt.year
epoch_month = train['epoch'].dt.month
epoch_day = train['epoch'].dt.day

In [25]:
# only one unique year 2014
epoch_year.unique()

array([2014], dtype=int64)

In [26]:
# only one unique month January
epoch_month.unique()

array([1], dtype=int64)

In [27]:
# all 31 days of January
epoch_day.unique()

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31],
      dtype=int64)

Based on this information, all 600 satellites' kinematics were recorded only in January 2014

In [77]:
# do all satellites start being recorded at 00:00:00 1 Jan 2014?
first_indices = []   # id of first observation of satellite e.g. Sat_0 is 0, Sat_1 is 958
initial_records = [] #
for sat_no in train['sat_id'].unique():
    first_id = train['sat_id'].searchsorted(sat_no)
    first_indices.append(first_id[0])
    initial_records.append(train.iloc[first_id[0],1])

In [78]:
# yes
pd.Series(initial_records).unique()

array(['2014-01-01T00:00:00.000000000'], dtype='datetime64[ns]')

In [79]:
first_indices[1]

958

In [28]:
# calculate absolute difference in time since the initial record 
# insert that next to the epoch column
train.insert(2, 'abs_time', train['epoch']-train['epoch'][0])

In [80]:
train.head()

Unnamed: 0,id,epoch,sat_id,x,y,z,Vx,Vy,Vz,x_sim,y_sim,z_sim,Vx_sim,Vy_sim,Vz_sim
0,0,2014-01-01 00:00:00.000,0,-8855.823863,13117.780146,-20728.353233,-0.908303,-3.808436,-2.022083,-8843.131454,13138.22169,-20741.615306,-0.907527,-3.80493,-2.024133
1,1,2014-01-01 00:46:43.000,0,-10567.672384,1619.746066,-24451.813271,-0.30259,-4.272617,-0.612796,-10555.500066,1649.289367,-24473.089556,-0.303704,-4.269816,-0.616468
2,2,2014-01-01 01:33:26.001,0,-10578.684043,-10180.46746,-24238.280949,0.277435,-4.047522,0.723155,-10571.858472,-10145.939908,-24271.169776,0.27488,-4.046788,0.718768
3,3,2014-01-01 02:20:09.001,0,-9148.251857,-20651.43746,-20720.381279,0.7156,-3.373762,1.722115,-9149.620794,-20618.200201,-20765.019094,0.712437,-3.375202,1.718306
4,4,2014-01-01 03:06:52.002,0,-6719.092336,-28929.061629,-14938.907967,0.992507,-2.519732,2.344703,-6729.358857,-28902.271436,-14992.399986,0.989382,-2.522618,2.342237


In [25]:
# read the papers linked in the contest page; SOTA = State of the Affairs

In [27]:
train.to_pickle('train.pkl')

### 21-Jan-2020

In [81]:
# make a separate dataset to only look at satellite 1 (second one in the dataset)
train1 = train.iloc[first_indices[1]:first_indices[2],:]

In [82]:
train1.head()

Unnamed: 0,id,epoch,sat_id,x,y,z,Vx,Vy,Vz,x_sim,y_sim,z_sim,Vx_sim,Vy_sim,Vz_sim
958,1819,2014-01-01 00:00:00.000,1,10390.313089,-2796.458271,3179.562085,2.520477,6.14993,-2.827599,10405.813755,-2771.18076,3166.926302,2.508879,6.152996,-2.826227
959,1820,2014-01-01 00:21:11.845,1,11195.606833,5078.653968,-839.076593,-1.126667,5.826412,-3.255872,11199.853336,5102.405255,-845.930363,-1.130992,5.822303,-3.250049
960,1821,2014-01-01 00:42:23.690,1,8235.556436,11445.904263,-4680.514023,-3.230227,4.126433,-2.704441,8239.504004,11465.13735,-4680.291877,-3.227356,4.123931,-2.700255
961,1822,2014-01-01 01:03:35.534,1,3560.149776,15634.195146,-7654.177182,-3.964696,2.520867,-1.978151,3569.107805,15652.050271,-7650.341207,-3.960215,2.520907,-1.976723
962,1823,2014-01-01 01:24:47.379,1,-1580.476891,18023.318335,-9755.287599,-4.050865,1.296388,-1.346512,-1566.253652,18042.14334,-9750.982621,-4.047005,1.297625,-1.34701


In [83]:
train1.tail(10)

Unnamed: 0,id,epoch,sat_id,x,y,z,Vx,Vy,Vz,x_sim,y_sim,z_sim,Vx_sim,Vy_sim,Vz_sim
3056,3917,2014-01-31 20:29:46.714,1,-26325.014144,10570.481429,-9072.938557,-2.258731,-1.407697,0.896216,-26649.642737,11504.337647,-8922.646171,-1.693458,-1.931831,0.878298
3057,3918,2014-01-31 20:50:58.559,1,-28881.463509,8664.346191,-7831.985167,-1.764817,-1.580165,1.048091,-28494.865916,8926.357454,-7708.398061,-1.206114,-2.112617,1.025474
3058,3919,2014-01-31 21:12:10.404,1,-30821.671271,6572.487863,-6422.856696,-1.288627,-1.701815,1.162225,-29713.169888,6151.648233,-6324.240838,-0.707152,-2.241929,1.145855
3059,3920,2014-01-31 21:33:22.249,1,-32164.222216,4352.825564,-4888.483748,-0.823985,-1.782364,1.24598,-30287.447717,3243.635732,-4802.930087,-0.192718,-2.322406,1.24126
3060,3921,2014-01-31 21:54:34.093,1,-32920.027847,2053.732736,-3264.47988,-0.364995,-1.827375,1.303696,-30194.980398,264.597575,-3176.038569,0.342144,-2.353209,1.311728
3061,3922,2014-01-31 22:15:45.938,1,-33092.618399,-281.511542,-1582.378975,0.094088,-1.839427,1.337545,-29406.078919,-2719.769909,-1476.299104,0.903445,-2.329687,1.355335
3062,3923,2014-01-31 22:36:57.783,1,-32678.193673,-2611.293944,127.87953,0.55909,-1.818598,1.347891,-27882.742274,-5634.903507,259.572177,1.498239,-2.242285,1.367602
3063,3924,2014-01-31 22:58:09.628,1,-31665.342406,-4892.533056,1835.695504,1.036204,-1.76246,1.333303,-25577.429154,-8389.868466,1986.924648,2.134397,-2.074305,1.340251
3064,3925,2014-01-31 23:19:21.473,1,-30034.378237,-7077.220106,3507.336506,1.532305,-1.665557,1.290191,-22432.558095,-10865.83635,3646.968029,2.819524,-1.797623,1.258773
3065,3926,2014-01-31 23:40:33.318,1,-27756.26069,-9107.885238,5102.708242,2.055278,-1.518181,1.211938,-18382.812351,-12896.945935,5156.151522,3.557107,-1.364583,1.097643


In [84]:
# save that to a pickle file
train1.to_pickle('train1.pkl')

In [85]:
train1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2108 entries, 958 to 3065
Data columns (total 15 columns):
id        2108 non-null uint32
epoch     2108 non-null datetime64[ns]
sat_id    2108 non-null uint16
x         2108 non-null float64
y         2108 non-null float64
z         2108 non-null float64
Vx        2108 non-null float64
Vy        2108 non-null float64
Vz        2108 non-null float64
x_sim     2108 non-null float64
y_sim     2108 non-null float64
z_sim     2108 non-null float64
Vx_sim    2108 non-null float64
Vy_sim    2108 non-null float64
Vz_sim    2108 non-null float64
dtypes: datetime64[ns](1), float64(12), uint16(1), uint32(1)
memory usage: 226.5 KB


## Doing the same for Test

In [54]:
%%time
test = pd.read_csv('test.csv')

Wall time: 6.1 s


In [55]:
test.info(memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 284071 entries, 0 to 284070
Data columns (total 9 columns):
id        284071 non-null int64
sat_id    284071 non-null int64
epoch     284071 non-null object
x_sim     284071 non-null float64
y_sim     284071 non-null float64
z_sim     284071 non-null float64
Vx_sim    284071 non-null float64
Vy_sim    284071 non-null float64
Vz_sim    284071 non-null float64
dtypes: float64(6), int64(2), object(1)
memory usage: 39.0 MB


In [56]:
test.memory_usage(deep=True) * 1e-6

Index      0.000080
id         2.272568
sat_id     2.272568
epoch     22.725680
x_sim      2.272568
y_sim      2.272568
z_sim      2.272568
Vx_sim     2.272568
Vy_sim     2.272568
Vz_sim     2.272568
dtype: float64

In [57]:
test['epoch'] = pd.to_datetime(test['epoch'], infer_datetime_format=True)

In [58]:
test['id'] = pd.to_numeric(test['id'], downcast='unsigned')

In [59]:
test['sat_id'] = pd.to_numeric(test['sat_id'], downcast='unsigned')

In [60]:
test.info(memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 284071 entries, 0 to 284070
Data columns (total 9 columns):
id        284071 non-null uint32
sat_id    284071 non-null uint16
epoch     284071 non-null datetime64[ns]
x_sim     284071 non-null float64
y_sim     284071 non-null float64
z_sim     284071 non-null float64
Vx_sim    284071 non-null float64
Vy_sim    284071 non-null float64
Vz_sim    284071 non-null float64
dtypes: datetime64[ns](1), float64(6), uint16(1), uint32(1)
memory usage: 16.8 MB


In [61]:
test.memory_usage(deep=True) * 1e-6

Index     0.000080
id        1.136284
sat_id    0.568142
epoch     2.272568
x_sim     2.272568
y_sim     2.272568
z_sim     2.272568
Vx_sim    2.272568
Vy_sim    2.272568
Vz_sim    2.272568
dtype: float64

In [62]:
# can I use first_indices also for test data to extract Sat_1 data? 
# no i cant because test data is data in february, which is a continuation from train data in januray
# first observation ID of Test_Sat_1 is 3927
test.head()

Unnamed: 0,id,sat_id,epoch,x_sim,y_sim,z_sim,Vx_sim,Vy_sim,Vz_sim
0,3927,1,2014-02-01 00:01:45.162,-13366.891347,-14236.753503,6386.774555,4.333815,-0.692764,0.810774
1,3928,1,2014-02-01 00:22:57.007,-7370.434039,-14498.77152,7130.411325,5.077413,0.360609,0.313402
2,3929,1,2014-02-01 00:44:08.852,-572.068654,-13065.289498,7033.794876,5.519106,2.01283,-0.539412
3,3930,1,2014-02-01 01:05:20.697,6208.945257,-9076.852425,5548.2969,4.849212,4.338955,-1.8696
4,3931,1,2014-02-01 01:26:32.542,10768.200284,-2199.706707,2272.014862,1.940505,6.192887,-3.167724


In [63]:
# last observation ID of Train_Sat_1 is  3926
train.iloc[first_indices[1]:first_indices[2],:].tail()

Unnamed: 0,id,epoch,sat_id,x,y,z,Vx,Vy,Vz,x_sim,y_sim,z_sim,Vx_sim,Vy_sim,Vz_sim
3061,3922,2014-01-31 22:15:45.938,1,-33092.618399,-281.511542,-1582.378975,0.094088,-1.839427,1.337545,-29406.078919,-2719.769909,-1476.299104,0.903445,-2.329687,1.355335
3062,3923,2014-01-31 22:36:57.783,1,-32678.193673,-2611.293944,127.87953,0.55909,-1.818598,1.347891,-27882.742274,-5634.903507,259.572177,1.498239,-2.242285,1.367602
3063,3924,2014-01-31 22:58:09.628,1,-31665.342406,-4892.533056,1835.695504,1.036204,-1.76246,1.333303,-25577.429154,-8389.868466,1986.924648,2.134397,-2.074305,1.340251
3064,3925,2014-01-31 23:19:21.473,1,-30034.378237,-7077.220106,3507.336506,1.532305,-1.665557,1.290191,-22432.558095,-10865.83635,3646.968029,2.819524,-1.797623,1.258773
3065,3926,2014-01-31 23:40:33.318,1,-27756.26069,-9107.885238,5102.708242,2.055278,-1.518181,1.211938,-18382.812351,-12896.945935,5156.151522,3.557107,-1.364583,1.097643


### 22-Jan-2020

In [64]:
first_indices_test = []
for sat_no in test['sat_id'].unique():
    first_id = test['sat_id'].searchsorted(sat_no)
    first_indices_test.append(first_id[0])

In [65]:
first_indices_test[1]

1901

In [66]:
test.iloc[1900]

id                              5827
sat_id                             1
epoch     2014-02-28 22:55:38.449000
x_sim                        10996.4
y_sim                        -1764.7
z_sim                        1427.18
Vx_sim                       1.42443
Vy_sim                       6.22262
Vz_sim                      -3.37183
Name: 1900, dtype: object

In [67]:
test.iloc[1901]

id                              6245
sat_id                             2
epoch     2014-02-01 00:07:54.678000
x_sim                       -64199.4
y_sim                        52297.7
z_sim                       -16704.6
Vx_sim                      -1.31379
Vy_sim                     -0.632881
Vz_sim                      0.952439
Name: 1901, dtype: object

In [68]:
test.head()

Unnamed: 0,id,sat_id,epoch,x_sim,y_sim,z_sim,Vx_sim,Vy_sim,Vz_sim
0,3927,1,2014-02-01 00:01:45.162,-13366.891347,-14236.753503,6386.774555,4.333815,-0.692764,0.810774
1,3928,1,2014-02-01 00:22:57.007,-7370.434039,-14498.77152,7130.411325,5.077413,0.360609,0.313402
2,3929,1,2014-02-01 00:44:08.852,-572.068654,-13065.289498,7033.794876,5.519106,2.01283,-0.539412
3,3930,1,2014-02-01 01:05:20.697,6208.945257,-9076.852425,5548.2969,4.849212,4.338955,-1.8696
4,3931,1,2014-02-01 01:26:32.542,10768.200284,-2199.706707,2272.014862,1.940505,6.192887,-3.167724


In [69]:
len(test['sat_id'].unique())

300

In [70]:
test.iloc[:1901,:].tail()

Unnamed: 0,id,sat_id,epoch,x_sim,y_sim,z_sim,Vx_sim,Vy_sim,Vz_sim
1896,5823,1,2014-02-28 21:30:51.070,-12172.502254,-14729.896827,7614.241477,4.406935,-0.495499,0.456099
1897,5824,1,2014-02-28 21:52:02.915,-6136.919502,-14726.318844,7862.718555,5.058846,0.579143,-0.109441
1898,5825,1,2014-02-28 22:13:14.759,553.235044,-13011.265336,7194.565001,5.357535,2.231405,-1.008146
1899,5826,1,2014-02-28 22:34:26.604,7007.220162,-8771.583052,5124.655752,4.490665,4.502656,-2.296487
1900,5827,1,2014-02-28 22:55:38.449,10996.426875,-1764.70356,1427.176527,1.424435,6.222616,-3.371827


In [71]:
train['epoch'][0]

Timestamp('2014-01-01 00:00:00')

In [51]:
# insert abs_time for test data, calculated since 1 Jan 2014
test.insert(3, 'abs_time', test['epoch']-train['epoch'][0])

In [52]:
test.head()

Unnamed: 0,id,sat_id,epoch,abs_time,x_sim,y_sim,z_sim,Vx_sim,Vy_sim,Vz_sim
0,3927,1,2014-02-01 00:01:45.162,31 days 00:01:45.162000,-13366.891347,-14236.753503,6386.774555,4.333815,-0.692764,0.810774
1,3928,1,2014-02-01 00:22:57.007,31 days 00:22:57.007000,-7370.434039,-14498.77152,7130.411325,5.077413,0.360609,0.313402
2,3929,1,2014-02-01 00:44:08.852,31 days 00:44:08.852000,-572.068654,-13065.289498,7033.794876,5.519106,2.01283,-0.539412
3,3930,1,2014-02-01 01:05:20.697,31 days 01:05:20.697000,6208.945257,-9076.852425,5548.2969,4.849212,4.338955,-1.8696
4,3931,1,2014-02-01 01:26:32.542,31 days 01:26:32.542000,10768.200284,-2199.706707,2272.014862,1.940505,6.192887,-3.167724


In [72]:
# test is finally ready to be written as pickle file
test.to_pickle('test_initial.pkl')

In [54]:
del test

In [73]:
%%time
test = pd.read_pickle('test_initial.pkl')

Wall time: 71.8 ms


In [74]:
# extract Sat_1 test data and write to pickle file
test1 = test.iloc[:first_indices_test[1],:]

In [75]:
test1.to_pickle('test1.pkl')