# Luftdaten data : data cleaning, resampling - mini version

# - RETRYING THIS, just to makes sure we've got the hang of it! 
## Code builds a continuous time tabular version of the luftdaen data, such that the same time period is present for each sensor in the data, regardless of whether each sensor has data for all the time slots. 

## Testing :
- using pd.resample
- constructing a time shift using pandas own tools, rather than my own


#### Reference documents

Resampling time series data with Pandas ( Ben Alex Keen ) 
http://benalexkeen.com/resampling-time-series-data-with-pandas/

Pandas reference manual : 

.at - access df values using nay kind of index, for SINGLE VALUES
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html#pandas.DataFrame.loc

.iat - only integer index values for getting/setting SINGLE df values
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iat.html

.loc - 
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html#pandas.DataFrame.loc

.iloc - purely integer indexed access ( getting/setting ) values 
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html#pandas.DataFrame.iloc

datetime - documentation - useful for time! 
https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html

#### methods of filling … 

These are some of the common methods you might use for resampling:

Method	Description

bfill	Backward fill

count	Count of values

ffill	Forward fill

first	First valid data value

last	Last valid data value

max	Maximum data value

mean	Mean of values in time range

median	Median of values in time range

min	Minimum data value

nunique	Number of unique values

ohlc	Opening value, highest value, lowest value, closing value

pad	Same as forward fill

std	Standard deviation of values

sum	Sum of values

var	Variance of values

#### time abbreviations 

Alias	Description

B	Business day

D	Calendar day

W	Weekly

M	Month end

Q	Quarter end

A	Year end

BA	Business year end

AS	Year start

H	Hourly frequency

T, min	Minutely frequency

S	Secondly frequency

L, ms	Millisecond frequency

U, us	Microsecond frequency

N, ns	Nanosecond frequency

In [1]:
import pandas as pd
import numpy as np
import time

In [2]:
# parameters

# start_time = "2018-12-31 21:58:42"
end_time = "2019-01-01 11:58:42"
# generate this please
start_time = "?????"

time_frequency_for_periods__for_basic_data = "5T"
num_of_time_periods___for_basic_data = 24*20 # 24 hrs * 12 x 5 mins in each hour

# when generating time periods 
sampling_frequency = "3T"



# --- data urls 

curr_url = "????"
nordic_midnight_24_hrs_data__url = "/Users/miska/Documents/open_something/luftdaten/luftdaten_code/luftdaten__make_tabular_data__from_db_data/ld_NYE_midnight_24hrs_nordics_all_data_01.csv"
# nordic_midnight_24_hrs_data__url = "/home/miska/documents/opensomething/luftdaten/dustmin_to_csv__various_code/ld_NYE_midnight_24hrs_nordics_all_data_01.csv"



# set the current data source 
curr_url = nordic_midnight_24_hrs_data__url

In [3]:
# try convert the timestamp in the data, to epoch

in_data = pd.read_csv( curr_url )
in_data.shape

(127109, 7)

#### basic data checking

In [4]:
in_data.dtypes

sensor_id         int64
sensor_namee     object
lat             float64
lon             float64
timestamp        object
p1              float64
p2              float64
dtype: object

In [5]:
# is the timestamp column not an offical timestamp column?
type( in_data['timestamp'][0] )

str

In [6]:
# aha - timestamp column not a timestamp column?
# - let's fix 
in_data['timestamp'] = pd.to_datetime( in_data['timestamp'] )

In [7]:
# check the timestamps column type again
type( in_data['timestamp'][0] )

pandas._libs.tslibs.timestamps.Timestamp

In [8]:
# set the timestamp column as the index 
in_data = in_data.set_index( 'timestamp' )

In [9]:
in_data.index

DatetimeIndex(['2018-12-31 11:57:22', '2018-12-31 11:58:44',
               '2018-12-31 11:58:47', '2018-12-31 11:56:41',
               '2018-12-31 11:57:42', '2018-12-31 11:57:52',
               '2018-12-31 11:58:51', '2018-12-31 11:58:28',
               '2018-12-31 11:57:18', '2018-12-31 11:57:22',
               ...
               '2019-01-01 11:59:41', '2019-01-01 11:59:46',
               '2019-01-01 11:57:19', '2019-01-01 11:59:59',
               '2019-01-01 11:56:55', '2019-01-01 11:58:57',
               '2019-01-01 11:59:36', '2019-01-01 11:59:41',
               '2019-01-01 11:57:12', '2019-01-01 11:58:42'],
              dtype='datetime64[ns]', name='timestamp', length=127109, freq=None)

In [10]:
in_data = in_data.sort_index()

In [11]:
# check
in_data.index

DatetimeIndex(['2018-12-31 11:55:19', '2018-12-31 11:56:37',
               '2018-12-31 11:56:38', '2018-12-31 11:56:39',
               '2018-12-31 11:56:40', '2018-12-31 11:56:40',
               '2018-12-31 11:56:40', '2018-12-31 11:56:41',
               '2018-12-31 11:56:42', '2018-12-31 11:56:43',
               ...
               '2019-01-01 23:59:54', '2019-01-01 23:59:54',
               '2019-01-01 23:59:55', '2019-01-01 23:59:55',
               '2019-01-01 23:59:56', '2019-01-01 23:59:56',
               '2019-01-01 23:59:56', '2019-01-01 23:59:56',
               '2019-01-01 23:59:57', '2019-01-01 23:59:58'],
              dtype='datetime64[ns]', name='timestamp', length=127109, freq=None)

In [12]:
# order by time? 
in_data[:20]

Unnamed: 0_level_0,sensor_id,sensor_namee,lat,lon,p1,p2
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2018-12-31 11:55:19,13012,SDS011,57.662,12.006,6.3,2.6
2018-12-31 11:56:37,18112,SDS011,57.478,11.978,174.8,15.13
2018-12-31 11:56:38,15067,SDS011,60.024,18.77,1.62,1.02
2018-12-31 11:56:39,11765,SDS011,55.716,13.244,33.95,13.4
2018-12-31 11:56:40,14811,SDS011,57.706,11.9,63.25,10.33
2018-12-31 11:56:40,10827,SDS011,59.334,13.512,11.45,6.0
2018-12-31 11:56:40,17538,SDS011,55.612,12.972,13.68,3.12
2018-12-31 11:56:41,7406,SDS011,56.964,24.128,11.05,6.62
2018-12-31 11:56:42,16155,SDS011,59.832,17.632,1.66,1.1
2018-12-31 11:56:43,11058,SDS011,59.272,15.22,2.99,1.46


In [13]:
# this works :) 
# in_data = in_data.sort_index()

In [14]:
# let's try sorting the index in a different way … just for the sake of trying
in_data = in_data.sort_values( by='timestamp' )

#### quick data printout

In [15]:
in_data.head()

Unnamed: 0_level_0,sensor_id,sensor_namee,lat,lon,p1,p2
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2018-12-31 11:55:19,13012,SDS011,57.662,12.006,6.3,2.6
2018-12-31 11:56:37,18112,SDS011,57.478,11.978,174.8,15.13
2018-12-31 11:56:38,15067,SDS011,60.024,18.77,1.62,1.02
2018-12-31 11:56:39,11765,SDS011,55.716,13.244,33.95,13.4
2018-12-31 11:56:40,14811,SDS011,57.706,11.9,63.25,10.33


In [16]:
in_data.tail()

Unnamed: 0_level_0,sensor_id,sensor_namee,lat,lon,p1,p2
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-01-01 23:59:56,12129,SDS011,59.34,18.04,1.85,0.8
2019-01-01 23:59:56,17538,SDS011,55.612,12.972,6.08,1.53
2019-01-01 23:59:56,12131,SDS011,59.45,17.916,1.48,0.52
2019-01-01 23:59:57,10843,SDS011,59.354,18.364,7.26,1.6
2019-01-01 23:59:58,11374,SDS011,59.258,18.008,2.65,2.01


#### quick data exploration 

In [17]:
# do a quick search of how many values there are in the first five minutes

In [18]:
first_five_mins_rows = in_data[ '2019-01-01 12:00:00' : '2019-01-01 12:05:00' ]
first_five_mins_rows.shape

(314, 6)

In [19]:
# just check how many of the sensors have entries for the first five minues
first_five_mins_rows['sensor_id'].unique().shape, in_data['sensor_id'].unique().shape

((178,), (205,))

###### construct the different times we could use for different indexing 

####### - eg for inserting blank rows, to get similar time periods in each sensors data
####### - eg for quering different times … eg at the beginning / end of the data series for each sensor 


In [20]:
start_time = pd.to_datetime( '2018-12-31 12:00:00' )
start_time

Timestamp('2018-12-31 12:00:00')

In [21]:
startime_plus_five_mins = start_time + pd.offsets.Minute( 5 )
startime_plus_five_mins

Timestamp('2018-12-31 12:05:00')

In [22]:
end_time = pd.to_datetime( '2019-01-01 12:00:00' )
end_time

Timestamp('2019-01-01 12:00:00')

In [23]:
end_time_minus_five_mins = end_time - pd.offsets.Minute(5 )
end_time_minus_five_mins

Timestamp('2019-01-01 11:55:00')

In [24]:
in_data_start_time = in_data[:1]

In [25]:
type( in_data_start_time )

pandas.core.frame.DataFrame

In [26]:
type( in_data_start_time.index ) #iat[ 0, ]

pandas.core.indexes.datetimes.DatetimeIndex

In [27]:
in_data_start_time.index

DatetimeIndex(['2018-12-31 11:55:19'], dtype='datetime64[ns]', name='timestamp', freq=None)

In [28]:
in_data_start_time.set_index( pd.DatetimeIndex( [ start_time ] )   )

Unnamed: 0,sensor_id,sensor_namee,lat,lon,p1,p2
2018-12-31 12:00:00,13012,SDS011,57.662,12.006,6.3,2.6


In [29]:
in_data_start_time

Unnamed: 0_level_0,sensor_id,sensor_namee,lat,lon,p1,p2
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2018-12-31 11:55:19,13012,SDS011,57.662,12.006,6.3,2.6


In [30]:
### make new template row for the time rows we'll insert later

In [31]:
in_data_start_time[:1]

Unnamed: 0_level_0,sensor_id,sensor_namee,lat,lon,p1,p2
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2018-12-31 11:55:19,13012,SDS011,57.662,12.006,6.3,2.6


In [32]:
in_data_start_time[:1]['sensor_namee'] = np.NaN
in_data_start_time[:1]['sensor_id'] = np.NaN
in_data_start_time[:1]['p1'] = np.NaN
in_data_start_time[:1]['p2'] = np.NaN
in_data_start_time[:1]['lat'] = np.NaN
in_data_start_time[:1]['lon'] = np.NaN
in_data_start_time

Unnamed: 0_level_0,sensor_id,sensor_namee,lat,lon,p1,p2
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2018-12-31 11:55:19,13012,SDS011,,,,


In [33]:
# NO EFFECT :-( )
in_data_start_time[:1]['sensor_namee'] = 13

In [34]:
in_data_start_time

Unnamed: 0_level_0,sensor_id,sensor_namee,lat,lon,p1,p2
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2018-12-31 11:55:19,13012,SDS011,,,,


### make template row for blank data, with right time, rows, to insert later
#### to make the time data for the sensor data have the same start and end 

In [35]:
in_data_templace_for_different_new_time_rows__series = in_data_start_time.iloc[0]

In [36]:
in_data_templace_for_different_new_time_rows__series

sensor_id        13012
sensor_namee    SDS011
lat                NaN
lon                NaN
p1                 NaN
p2                 NaN
Name: 2018-12-31 11:55:19, dtype: object

In [37]:
# now set the other values in the start time
in_data_templace_for_different_new_time_rows__series.at['p1'] = np.NaN
in_data_templace_for_different_new_time_rows__series.at['p2'] = np.NaN
in_data_templace_for_different_new_time_rows__series.at['lat'] = np.NaN
in_data_templace_for_different_new_time_rows__series.at['lon'] = np.NaN
# in_data_templace_for_different_new_time_rows__series.at['sensor_namee'] = np.NaN
in_data_templace_for_different_new_time_rows__series.at['sensor_id'] = np.NaN
in_data_templace_for_different_new_time_rows__series.name = start_time
in_data_templace_for_different_new_time_rows__series

sensor_id          NaN
sensor_namee    SDS011
lat                NaN
lon                NaN
p1                 NaN
p2                 NaN
Name: 2018-12-31 12:00:00, dtype: object

##### the (blank) START time data row 

In [38]:
start_time__blank_data_row = in_data_templace_for_different_new_time_rows__series
start_time__blank_data_row

sensor_id          NaN
sensor_namee    SDS011
lat                NaN
lon                NaN
p1                 NaN
p2                 NaN
Name: 2018-12-31 12:00:00, dtype: object

##### the (blank) END time data row 

In [39]:
end_time__blank_data_row = in_data_templace_for_different_new_time_rows__series
end_time__blank_data_row.name = end_time
end_time__blank_data_row

sensor_id          NaN
sensor_namee    SDS011
lat                NaN
lon                NaN
p1                 NaN
p2                 NaN
Name: 2019-01-01 12:00:00, dtype: object

### TRY 2 : make blank data frame rows with the desired start and end times 
- this time make it MINIMAL, with only p1 and p2, and time index ;) 

In [40]:
blank_mininmal_START_TIME_row = pd.DataFrame( data={ 'p1' : np.NaN, 'p2' : np.NaN }, index=pd.DatetimeIndex( [ start_time ] ) )
blank_mininmal_START_TIME_row

Unnamed: 0,p1,p2
2018-12-31 12:00:00,,


In [41]:
blank_mininmal_END_TIME_row = pd.DataFrame( data={ 'p1' : np.NaN, 'p2' : np.NaN }, index=pd.DatetimeIndex( [ end_time ] ) )
blank_mininmal_END_TIME_row

Unnamed: 0,p1,p2
2019-01-01 12:00:00,,


## TEST ZONE 

In [42]:
# try making a dataframe with only given values 
given_sensors_data = in_data[ in_data['sensor_id'] == 13012]
given_sensors_data.head()

Unnamed: 0_level_0,sensor_id,sensor_namee,lat,lon,p1,p2
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2018-12-31 11:55:19,13012,SDS011,,,,
2018-12-31 12:00:55,13012,SDS011,57.662,12.006,4.23,2.7
2018-12-31 12:03:25,13012,SDS011,57.662,12.006,4.98,2.66
2018-12-31 12:05:55,13012,SDS011,57.662,12.006,7.73,2.85
2018-12-31 12:08:25,13012,SDS011,57.662,12.006,8.76,3.78


In [43]:
given_sensors_data.shape

(706, 6)

In [44]:
given_sensors_data_MINIMAL = given_sensors_data[ ['p1', 'p2'] ]

In [45]:
given_sensors_data_MINIMAL.head()

Unnamed: 0_level_0,p1,p2
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1
2018-12-31 11:55:19,,
2018-12-31 12:00:55,4.23,2.7
2018-12-31 12:03:25,4.98,2.66
2018-12-31 12:05:55,7.73,2.85
2018-12-31 12:08:25,8.76,3.78


In [46]:
given_sensors_data_MINIMAL.index

DatetimeIndex(['2018-12-31 11:55:19', '2018-12-31 12:00:55',
               '2018-12-31 12:03:25', '2018-12-31 12:05:55',
               '2018-12-31 12:08:25', '2018-12-31 12:13:42',
               '2018-12-31 12:16:10', '2018-12-31 12:18:39',
               '2018-12-31 12:21:07', '2018-12-31 12:23:52',
               ...
               '2019-01-01 23:23:57', '2019-01-01 23:26:27',
               '2019-01-01 23:34:46', '2019-01-01 23:34:46',
               '2019-01-01 23:42:57', '2019-01-01 23:45:35',
               '2019-01-01 23:48:07', '2019-01-01 23:50:37',
               '2019-01-01 23:55:37', '2019-01-01 23:58:07'],
              dtype='datetime64[ns]', name='timestamp', length=706, freq=None)

In [47]:
# now try combining this table with the blank start/end times rows generated earlier

### testing replacing values with various measures 

In [74]:
given_sensors_data[ given_sensors_data['p1'] != np.nan ]

Unnamed: 0_level_0,sensor_id,sensor_namee,lat,lon,p1,p2
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2018-12-31 11:55:19,13012,SDS011,,,,
2018-12-31 12:00:55,13012,SDS011,57.662,12.006,4.23,2.70
2018-12-31 12:03:25,13012,SDS011,57.662,12.006,4.98,2.66
2018-12-31 12:05:55,13012,SDS011,57.662,12.006,7.73,2.85
2018-12-31 12:08:25,13012,SDS011,57.662,12.006,8.76,3.78
2018-12-31 12:13:42,13012,SDS011,57.662,12.006,6.85,2.88
2018-12-31 12:16:10,13012,SDS011,57.662,12.006,5.30,2.74
2018-12-31 12:18:39,13012,SDS011,57.662,12.006,6.44,2.90
2018-12-31 12:21:07,13012,SDS011,57.662,12.006,8.32,2.95
2018-12-31 12:23:52,13012,SDS011,57.662,12.006,7.62,2.80


### try making start and end time rows from this table, in the hope it'll work better… 

In [48]:
start_time_dataframe_row = given_sensors_data_MINIMAL.iloc[:1]

In [49]:
start_time_dataframe_row.shape

(1, 2)

In [50]:
start_time_dataframe_row

Unnamed: 0_level_0,p1,p2
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1
2018-12-31 11:55:19,,


In [51]:
type( start_time_dataframe_row ) 

pandas.core.frame.DataFrame

In [52]:
start_time_dataframe_row.index = pd.DatetimeIndex( [start_time] )

In [53]:
start_time_dataframe_row

Unnamed: 0,p1,p2
2018-12-31 12:00:00,,


In [54]:
out = start_time_dataframe_row.append( given_sensors_data_MINIMAL )

In [55]:
out.head()

Unnamed: 0,p1,p2
2018-12-31 12:00:00,,
2018-12-31 11:55:19,,
2018-12-31 12:00:55,4.23,2.7
2018-12-31 12:03:25,4.98,2.66
2018-12-31 12:05:55,7.73,2.85


#### now make a blank end time… 

In [56]:
end_time_dataframe_row = given_sensors_data_MINIMAL.iloc[:1]

In [57]:
end_time_dataframe_row.index = pd.DatetimeIndex( [ end_time ] )

In [58]:
end_time_dataframe_row

Unnamed: 0,p1,p2
2019-01-01 12:00:00,,


In [59]:
out = out.append( end_time_dataframe_row )

In [60]:
out.head()

Unnamed: 0,p1,p2
2018-12-31 12:00:00,,
2018-12-31 11:55:19,,
2018-12-31 12:00:55,4.23,2.7
2018-12-31 12:03:25,4.98,2.66
2018-12-31 12:05:55,7.73,2.85


In [61]:
out.tail()

Unnamed: 0,p1,p2
2019-01-01 23:48:07,1.28,0.56
2019-01-01 23:50:37,1.36,0.65
2019-01-01 23:55:37,1.46,0.5
2019-01-01 23:58:07,1.12,0.46
2019-01-01 12:00:00,,


#### now try do the data interpolation, for each sensor, so there's the same time period in each sensor

In [62]:
##### work on a smaller section of the data
in_data__smlr = in_data[:]
in_data__smlr.shape

(127109, 6)

In [63]:
# check how it looks for how many sensors 
first_five_mins_rows__SMLR = in_data__smlr[ '2019-01-01 12:00:00' : '2019-01-01 12:05:00' ]
first_five_mins_rows__SMLR.shape, first_five_mins_rows__SMLR['sensor_id'].unique().shape

((314, 6), (178,))

In [76]:
# - for timing 

start_time = time.time()

# - the rest

# let's cheat a bit and output things in a list
out_time_complete_arrays = []
out_time_complete_arrays__WITHOUT_WHERE_CLEANING = []

list_of_unique_sensor_ids = in_data__smlr['sensor_id'].unique()

## print("- got "+str( list_of_unique_sensor_ids.shape )+" unique sensor ids \n")

length_of_list_of_unique_sensor_ids = len( list_of_unique_sensor_ids )

for curr_sensor_id in list_of_unique_sensor_ids[:]:
    
    print("\n --  working on sensor id "+str( curr_sensor_id )+"/"+str(length_of_list_of_unique_sensor_ids) )
    
    # -- fetch all rows for a given sensor 
    curr_sensor_data = in_data__smlr[ in_data__smlr['sensor_id'] == curr_sensor_id  ]
    
    print("-- -- and that has "+str( curr_sensor_data.shape )+" entries " )
    
    ### print("-- -- - the start date row looks like this ")
    ### print( start_time_dataframe_row )
    
    # generate a minimised version of the data such that one only has the time index, p1 and p2 columns
    curr_sensor_data__minimised =  curr_sensor_data[ ['p1', 'p2'] ]
    
    ### print( "-- -- - curr_sensor_data__minimised.index = "+str( curr_sensor_data__minimised.index ) )
    ### print( "-- -- - curr_sensor_data__minimised.columns = "+str( curr_sensor_data__minimised.columns ) )    
    
    ### print("-- -- - curr_sensor_data__minimised.shape = "+str( curr_sensor_data__minimised.shape ) )
    
    ### print(  curr_sensor_data__minimised.head()  )
    
    # -- now add beginning and end time data frames 
    
    ### print( "-- -- - start_time_dataframe_row.shape = "+str( start_time_dataframe_row.shape ) )
    ### print( "-- -- - start_time_dataframe_row.columns = "+str( start_time_dataframe_row.columns ) )
    
    curr_sensor_data_min_w_relv_start_n_end_times = start_time_dataframe_row.append( curr_sensor_data__minimised )
    
    # then add the end time
    
    curr_sensor_data_min_w_relv_start_n_end_times = curr_sensor_data_min_w_relv_start_n_end_times.append( end_time_dataframe_row )
    
    ## print("-- -- - curr_sensor_data_min_w_relv_start_n_end_times.shape = "+str(curr_sensor_data_min_w_relv_start_n_end_times.shape) )
    
    # -- NOW ASSEMBLE THE DATA! 
    
    ## print( curr_sensor_data_min_w_relv_start_n_end_times.head() )
    ## print( curr_sensor_data_min_w_relv_start_n_end_times.tail() )
    
    #
    
    ## print("-- -- - curr_sensor_data_min_w_relv_start_n_end_times.index = "+str( curr_sensor_data_min_w_relv_start_n_end_times.index ) )
    
    # -- now resample 
    
    array_resampled = curr_sensor_data_min_w_relv_start_n_end_times.resample("5Min").mean()
    
    ### print("-- -- - array_resampled : "+str( array_resampled ) )
    
    # -- fill the NaNs … 
    
    # SAVE A COPY WIHTOUT WHERE CLEANING 
    out_time_complete_arrays__WITHOUT_WHERE_CLEANING.append( array_resampled )
    
    # array_resampled['p1'] = array_resampled['p1'].where( array_resampled['p1'] == np.NaN, 0 )
    ### array_resampled['p2'] = array_resampled.where( array_resampled['p2'] == np.NaN, 0 )    
    
    # - clean 
    array_resampled = array_resampled.fillna( 0 )
    
    # -- export
    
    out_time_complete_arrays.append( array_resampled )
    
    
print("\n -- | -- | -- |  and all that took "+str( time.time() - start_time )+" seconds" )


 --  working on sensor id 13012/205
-- -- and that has (706, 6) entries 

 --  working on sensor id 18112/205
-- -- and that has (715, 6) entries 

 --  working on sensor id 15067/205
-- -- and that has (721, 6) entries 

 --  working on sensor id 11765/205
-- -- and that has (721, 6) entries 

 --  working on sensor id 14811/205
-- -- and that has (719, 6) entries 

 --  working on sensor id 10827/205
-- -- and that has (700, 6) entries 

 --  working on sensor id 17538/205
-- -- and that has (694, 6) entries 

 --  working on sensor id 7406/205
-- -- and that has (720, 6) entries 

 --  working on sensor id 16155/205
-- -- and that has (720, 6) entries 

 --  working on sensor id 11058/205
-- -- and that has (721, 6) entries 

 --  working on sensor id 16533/205
-- -- and that has (336, 6) entries 

 --  working on sensor id 14807/205
-- -- and that has (721, 6) entries 

 --  working on sensor id 10924/205
-- -- and that has (721, 6) entries 

 --  working on sensor id 10723/205
--


 --  working on sensor id 12385/205
-- -- and that has (659, 6) entries 

 --  working on sensor id 12673/205
-- -- and that has (720, 6) entries 

 --  working on sensor id 19337/205
-- -- and that has (721, 6) entries 

 --  working on sensor id 19547/205
-- -- and that has (722, 6) entries 

 --  working on sensor id 14801/205
-- -- and that has (703, 6) entries 

 --  working on sensor id 18590/205
-- -- and that has (721, 6) entries 

 --  working on sensor id 18820/205
-- -- and that has (720, 6) entries 

 --  working on sensor id 17712/205
-- -- and that has (719, 6) entries 

 --  working on sensor id 12679/205
-- -- and that has (234, 6) entries 

 --  working on sensor id 10647/205
-- -- and that has (721, 6) entries 

 --  working on sensor id 8683/205
-- -- and that has (721, 6) entries 

 --  working on sensor id 11552/205
-- -- and that has (357, 6) entries 

 --  working on sensor id 14809/205
-- -- and that has (720, 6) entries 

 --  working on sensor id 17351/205
--

In [77]:
out_time_complete_arrays[0]

Unnamed: 0,p1,p2
2018-12-31 11:55:00,0.000000,0.000000
2018-12-31 12:00:00,4.605000,2.680000
2018-12-31 12:05:00,8.245000,3.315000
2018-12-31 12:10:00,6.850000,2.880000
2018-12-31 12:15:00,5.870000,2.820000
2018-12-31 12:20:00,7.970000,2.875000
2018-12-31 12:25:00,4.752500,2.622500
2018-12-31 12:30:00,3.600000,2.120000
2018-12-31 12:35:00,4.060000,2.160000
2018-12-31 12:40:00,4.966667,2.546667


In [66]:
out_time_complete_arrays__WITHOUT_WHERE_CLEANING[0]

Unnamed: 0,p1,p2
2018-12-31 11:55:00,,
2018-12-31 12:00:00,4.605000,2.680000
2018-12-31 12:05:00,8.245000,3.315000
2018-12-31 12:10:00,6.850000,2.880000
2018-12-31 12:15:00,5.870000,2.820000
2018-12-31 12:20:00,7.970000,2.875000
2018-12-31 12:25:00,4.752500,2.622500
2018-12-31 12:30:00,3.600000,2.120000
2018-12-31 12:35:00,4.060000,2.160000
2018-12-31 12:40:00,4.966667,2.546667


In [75]:
out_time_complete_arrays__WITHOUT_WHERE_CLEANING[0].replace( np.nan, 0 )

Unnamed: 0,p1,p2
2018-12-31 11:55:00,0.000000,0.000000
2018-12-31 12:00:00,4.605000,2.680000
2018-12-31 12:05:00,8.245000,3.315000
2018-12-31 12:10:00,6.850000,2.880000
2018-12-31 12:15:00,5.870000,2.820000
2018-12-31 12:20:00,7.970000,2.875000
2018-12-31 12:25:00,4.752500,2.622500
2018-12-31 12:30:00,3.600000,2.120000
2018-12-31 12:35:00,4.060000,2.160000
2018-12-31 12:40:00,4.966667,2.546667


In [84]:
out_time_complete_arrays__WITHOUT_WHERE_CLEANING[0]['p2'].where( out_time_complete_arrays__WITHOUT_WHERE_CLEANING[0]['p2'] == np.nan, 0 )

2018-12-31 11:55:00    0.0
2018-12-31 12:00:00    0.0
2018-12-31 12:05:00    0.0
2018-12-31 12:10:00    0.0
2018-12-31 12:15:00    0.0
2018-12-31 12:20:00    0.0
2018-12-31 12:25:00    0.0
2018-12-31 12:30:00    0.0
2018-12-31 12:35:00    0.0
2018-12-31 12:40:00    0.0
2018-12-31 12:45:00    0.0
2018-12-31 12:50:00    0.0
2018-12-31 12:55:00    0.0
2018-12-31 13:00:00    0.0
2018-12-31 13:05:00    0.0
2018-12-31 13:10:00    0.0
2018-12-31 13:15:00    0.0
2018-12-31 13:20:00    0.0
2018-12-31 13:25:00    0.0
2018-12-31 13:30:00    0.0
2018-12-31 13:35:00    0.0
2018-12-31 13:40:00    0.0
2018-12-31 13:45:00    0.0
2018-12-31 13:50:00    0.0
2018-12-31 13:55:00    0.0
2018-12-31 14:00:00    0.0
2018-12-31 14:05:00    0.0
2018-12-31 14:10:00    0.0
2018-12-31 14:15:00    0.0
2018-12-31 14:20:00    0.0
                      ... 
2019-01-01 21:30:00    0.0
2019-01-01 21:35:00    0.0
2019-01-01 21:40:00    0.0
2019-01-01 21:45:00    0.0
2019-01-01 21:50:00    0.0
2019-01-01 21:55:00    0.0
2

In [117]:
p1_col_only = out_time_complete_arrays__WITHOUT_WHERE_CLEANING[0]['p1']
p1_col_only_as_df = pd.DataFrame( p1_col_only )
p1_col_only_as_df.columns

Index(['p1'], dtype='object')

In [118]:
p1_col_only_as_df.where( p1_col_only_as_df['p1'] <= 0, 0 )

Unnamed: 0,p1
2018-12-31 11:55:00,0.0
2018-12-31 12:00:00,0.0
2018-12-31 12:05:00,0.0
2018-12-31 12:10:00,0.0
2018-12-31 12:15:00,0.0
2018-12-31 12:20:00,0.0
2018-12-31 12:25:00,0.0
2018-12-31 12:30:00,0.0
2018-12-31 12:35:00,0.0
2018-12-31 12:40:00,0.0


In [119]:
p1_col_only_as_df[ p1_col_only_as_df['p1'] == np.nan ]

Unnamed: 0,p1


In [120]:
filter = p1_col_only_as_df['p1'] == np.nan

In [121]:
p1_col_only_as_df.where( filter, 0 )

Unnamed: 0,p1
2018-12-31 11:55:00,0.0
2018-12-31 12:00:00,0.0
2018-12-31 12:05:00,0.0
2018-12-31 12:10:00,0.0
2018-12-31 12:15:00,0.0
2018-12-31 12:20:00,0.0
2018-12-31 12:25:00,0.0
2018-12-31 12:30:00,0.0
2018-12-31 12:35:00,0.0
2018-12-31 12:40:00,0.0


In [122]:
p1_col_only_as_df

Unnamed: 0,p1
2018-12-31 11:55:00,
2018-12-31 12:00:00,4.605000
2018-12-31 12:05:00,8.245000
2018-12-31 12:10:00,6.850000
2018-12-31 12:15:00,5.870000
2018-12-31 12:20:00,7.970000
2018-12-31 12:25:00,4.752500
2018-12-31 12:30:00,3.600000
2018-12-31 12:35:00,4.060000
2018-12-31 12:40:00,4.966667


In [90]:
nan_b = out_time_complete_arrays__WITHOUT_WHERE_CLEANING[0]['p1'][0]

In [91]:
nan_b

nan

In [95]:
p1_col_only[ p1_col_only['p1'] == nan_b ]

KeyError: 'p1'

In [76]:
out_time_complete_arrays__WITHOUT_WHERE_CLEANING[0]['p1'][0]

nan

In [74]:
type( out_time_complete_arrays__WITHOUT_WHERE_CLEANING[0]['p1'][0] ) 

numpy.float64

In [79]:
out_time_complete_arrays__WITHOUT_WHERE_CLEANING[0]['p1'][0] == 

False

In [82]:
np.nan

nan

In [72]:
out_time_complete_arrays__WITHOUT_WHERE_CLEANING[0][ out_time_complete_arrays__WITHOUT_WHERE_CLEANING[0]['p1'] == 'NaN' ]

  result = method(y)


TypeError: invalid type comparison