# Time issues in L1A files

### Showing the problem

I previously had discovered that file `2012063023` had illegal UTC time-stamps containing seconds `>60`:

In [1]:
from diviner import file_utils as fu
fu.l1adatapath

PosixPath('/q/marks/feidata/DIV:opsL1A/data')

In [2]:
cols = 'date utc b3_11'.split() # get a list of columns

In [3]:
fu.get_clean_l1a('2012063023').tail(30).index

DatetimeIndex(['2012-06-30 23:59:59.532000', '2012-06-30 23:59:59.662000',
               '2012-06-30 23:59:59.792000', '2012-06-30 23:59:59.922000',
               '2012-07-01 00:00:00.052000', '2012-07-01 00:00:00.182000',
               '2012-07-01 00:00:00.312000', '2012-07-01 00:00:00.443000',
               '2012-07-01 00:00:00.573000', '2012-07-01 00:00:00.703000',
               '2012-07-01 00:00:00.833000', '2012-07-01 00:00:00.963000',
               '2012-07-01 00:00:00.092000', '2012-07-01 00:00:00.220000',
               '2012-07-01 00:00:00.348000', '2012-07-01 00:00:00.476000',
               '2012-07-01 00:00:00.604000', '2012-07-01 00:00:00.732000',
               '2012-07-01 00:00:00.860000', '2012-07-01 00:00:00.988000',
               '2012-07-01 00:00:01.116000', '2012-07-01 00:00:01.244000',
               '2012-07-01 00:00:01.372000', '2012-07-01 00:00:01.500000',
               '2012-07-01 00:00:01.628000', '2012-07-01 00:00:01.756000',
               '2012-07-0

In [4]:
# look at the tail of raw data file.
# Note that I am using 'raw' parsing here, so 'date' and 'utc' are of type 'string'.
# This, because a data-type 'time' would be ill-defined like that and therefore doesn't exist.
fu.get_raw_l1a('2012063023')[cols].tail(30)

Unnamed: 0,date,utc,b3_11
28098,30-Jun-2012,23:59:59.532,27706
28099,30-Jun-2012,23:59:59.662,27705
28100,30-Jun-2012,23:59:59.792,27705
28101,30-Jun-2012,23:59:59.922,27711
28102,30-Jun-2012,23:59:60.052,27705
28103,30-Jun-2012,23:59:60.182,27708
28104,30-Jun-2012,23:59:60.312,27707
28105,30-Jun-2012,23:59:60.443,27707
28106,30-Jun-2012,23:59:60.573,27704
28107,30-Jun-2012,23:59:60.703,27706


### More problems, possibly related

What I didn't notice then is that the problems in this file go a bit further, with more serious consequences than just being an ill-defined formality. 
As you can see above, the file's `utc` time-stamps not only reach far over 2.048 seconds, which is our defined criterion for making the hour cut-off, but they also roll-over to the next hour, jumping back in time, if the >60 seconds would be interpreted as already rolled-over timestamps.

>Basically, we have a lot of data in here that should not exist because Diviner is not taking data at a rate faster than 128 milliseconds!

### Next hour block does not fit as well
This problem continues into the next hour block `2012070100` that starts with timestamps that overlap but again do not match any of the last times of above end of the data.

Compare the `utc` column and note how they overlap but do not match the last times from the previous 1-hour file.

In [5]:
fu.get_raw_l1a('2012070100').head(10)[cols]

Unnamed: 0,date,utc,b3_11
0,01-Jul-2012,00:00:01.394,27704
1,01-Jul-2012,00:00:01.522,27708
2,01-Jul-2012,00:00:01.650,27704
3,01-Jul-2012,00:00:01.778,27707
4,01-Jul-2012,00:00:01.906,27706
5,01-Jul-2012,00:00:02.034,27708
6,01-Jul-2012,00:00:02.162,27708
7,01-Jul-2012,00:00:02.290,27705
8,01-Jul-2012,00:00:02.418,27706
9,01-Jul-2012,00:00:02.546,27704


### Does it go on?
It seems mandatory to check the end of above hour `201207010` and the beginning of the next hour as well, to see if the 'invented' data problem persists:

In [6]:
fu.get_raw_l1a('2012070100').tail()[cols]

Unnamed: 0,date,utc,b3_11
28123,01-Jul-2012,01:00:01.195,27591
28124,01-Jul-2012,01:00:01.323,27595
28125,01-Jul-2012,01:00:01.451,27600
28126,01-Jul-2012,01:00:01.579,27601
28127,01-Jul-2012,01:00:01.707,27601


In [7]:
fu.get_raw_l1a('2012070101').head()[cols]

Unnamed: 0,date,utc,b3_11
0,01-Jul-2012,01:00:01.834,27602
1,01-Jul-2012,01:00:01.962,27597
2,01-Jul-2012,01:00:02.090,27597
3,01-Jul-2012,01:00:02.218,27590
4,01-Jul-2012,01:00:02.346,27588


Fortunately, it does not. Here, the times are neatly separated at the file endings.

## Mo' data, mo' problems.

I found so far one more set of data files that show the overlapping but non-matching times problem, but without showing seconds > 60, so the problems might be unrelated:

In [8]:
fu.get_raw_l1a('2011071909').tail(20)[cols]

Unnamed: 0,date,utc,b3_11
28108,19-Jul-2011,09:59:59.520,27798
28109,19-Jul-2011,09:59:59.648,27796
28110,19-Jul-2011,09:59:59.776,27797
28111,19-Jul-2011,09:59:59.904,27798
28112,19-Jul-2011,10:00:00.032,27801
28113,19-Jul-2011,10:00:00.160,27803
28114,19-Jul-2011,10:00:00.288,27802
28115,19-Jul-2011,10:00:00.416,27805
28116,19-Jul-2011,10:00:00.544,27804
28117,19-Jul-2011,10:00:00.672,27806


In [9]:
fu.get_raw_l1a('2011071910').head(20)[cols]

Unnamed: 0,date,utc,b3_11
0,19-Jul-2011,10:00:00.038,27801
1,19-Jul-2011,10:00:00.166,27803
2,19-Jul-2011,10:00:00.294,27802
3,19-Jul-2011,10:00:00.422,27805
4,19-Jul-2011,10:00:00.550,27804
5,19-Jul-2011,10:00:00.678,27806
6,19-Jul-2011,10:00:00.806,27805
7,19-Jul-2011,10:00:00.934,27809
8,19-Jul-2011,10:00:01.062,27805
9,19-Jul-2011,10:00:01.190,27808


I find this very curious, and without further investigation on JPL side, I am unable to assess which times are the correct ones. IT CANNOT BE BOTH!
With some additional hacking I would be able to calibrate these, in that case I would mark these areas as "highly unreliable calibration".

### New files found on 2014-01-14
Not as bad, this time no impossible time-stamps, just non-monotonic time jumps in the data-file, which is unusual.

In [10]:
tstr = '2010020913'

In [11]:
df = fu.get_clean_l1a(tstr)
ind = df.index
ts = pd.Series(ind)
tdiff = ts.diff()
tdiff[tdiff > "00:00:00.128000"]

6538    00:00:02.176000
6562    00:00:01.152000
6741    00:00:00.129000
6744    00:00:00.129000
6746    00:00:00.129000
              ...      
26398   00:00:00.129000
27028   00:00:00.129000
27031   00:00:00.129000
27661   00:00:00.129000
27664   00:00:00.129000
Length: 93, dtype: timedelta64[ns]

In [12]:
df = fu.get_raw_l1a(tstr)

Over 2 seconds jump:

In [13]:
df[6536:6540][cols]

Unnamed: 0,date,utc,b3_11
6536,09-Feb-2010,13:13:57.743,27683
6537,09-Feb-2010,13:13:57.871,27683
6538,09-Feb-2010,13:14:00.047,27681
6539,09-Feb-2010,13:14:00.175,27678


More than 1 second jump:

In [14]:
df[6560:6564][cols]

Unnamed: 0,date,utc,b3_11
6560,09-Feb-2010,13:13:59.797,27682
6561,09-Feb-2010,13:13:59.925,27682
6562,09-Feb-2010,13:14:01.077,27681
6563,09-Feb-2010,13:14:01.205,27679
