Use HKArchiveScanner in g3-reader Data Sever #46

BrianJKoopman · 2019-08-29T20:22:16Z

This PR replaces the custom G3 Pipeline for opening files with the use of the so3g HKArchive object and it's associated get_data() method.

This begins work on #29.

g3-reader

The G3ReaderServer has three attributes which keep track of the data cache:

cache_list, which keeps track of files we've already processed with the HKArchiveScanner
hkas, the HKArchiveScanner object
archive, the HKArchive object

When a query is issued by Grafana new files which aren't in the cache_list already will be processed by the hkas and archive will be updated with the HKArchive object returned by hkas.finalize().

archive.get_data() is then used to retrieve the data between the timestamps requested. We still naively downsample using MAX_POINTS so that we don't overwhelm the communication protocol (also because it wouldn't make sense displaying such high resolution data most of the time.) We would still benefit from downsampling to disk though so that we can avoid loading full resolution data all the time.

g3-file-scanner

Small change here, but one that requires action from users already using the g3-reader/file-scanner/database. I was previously removing spaces and lowercasing field names that got logged in the database. This would cause issues where in the so3g bits this wasn't occurring, so I took this out. Users will need to essentially wipe their DB instances and allow an updated g3-file-scanner rescan their data.

sisock-http

Due to differences in the implementation of the get_fields/get_data API in the so3g hk code using so3g's get_data isn't exactly a drop-in replacement within sisock. Details in #45, but the main thing is that timeline names are dynamically assigned in so3g, so that you can't cache the results from get_fields and expect them to match a later call to get_data. Since this is how sisock was designed sisock-http expects to be able to cache the get_fields results.

I've accommodated this in a sort of hacked way by identifying we've returned things from so3g's get_data by looking for it's first default dynamically generated field name 'group0', and then processing the results differently. This section does repeat a bit of code from the previous processing, but the error handling was different enough that I've not tried to make it nicer. Eventually, depending on how we handle #45 we might switch to using the new processing code in all instances.

Misc.

Started using txaio logging in more places, particularly for debug statements.
Added ability to set the log level via environment variable.

I've tested this both locally on a small subset of data, but also on all the HK data we've collected at Yale. Just yesterday we set the SAT1 system up with this updated g3-reader as well, as they are the heaviest user of the reader at the moment, and would benefit from any speed increases.

Overall this is dramatically faster than what we have currently. Several days of data can still take ~10 seconds to load ~25 timestreams on first load.

New so3g HKArchiveScanner requires some modifications to the http bridge to work with the way the ArchiveScanner returns data and timelines.

Use the ArchiveScanner more appropriately. We were scanning all the files every time a query was issued. This will now only scan them if we haven't done so already. This means loading the data the first time will be 'slow', while subsequent requests on those files already loaded should be faster. Perhaps it would be better to scan all files on startup, though once enough files exist this could take a long time, and would prevent returning data close to startup.

The exception catching a KeyError wasn't working for already configured fields recently. This imeplements the same work around as we were doing for the g3-reader. For fields that we can't find we just return an empty array, so if someone has already configured a query in Grafana we'll give them nothing back. This results in the field name still showing in the key of their plot (so they won't be confused why their query isn't showing up in the plot), but no data being displayed. Allows fields also configured for the same plot that do have data to be plotted.

ahincks · 2019-09-03T20:11:24Z

Glad to hear this is making this faster! I have one comment/question:

As far as I can see, it would be better if downsampling were done by so3g.hk.getdata.get_data() based on the min_stride parameter. This doesn't—or rather, shouldn't—require having downsampled files available, as so3g.hk.getdata should be able to slice the data itself. (In fact, I would hope that eventually it will be smart enough to decide whether to use a downsampled file and/or whether to slice more frequently sampled data.) So, would it be possible and desirable to put the code from _down_sample_data into so3g.hk.getdata.get_data() rather than have it here? The MAX_POINTS parameter could still be used internally to sisock in order to force a min_stride parameter to be passed to the so3g method if necessary.

I'll have a look at #45 next ...

BrianJKoopman · 2019-10-22T16:00:50Z

I'm alright with moving the down sampling to so3g. It is fairly independent, so I think I'm going to merge this for now and open an issue to move it.

That said, I still think downsampled files would be good for when users are trying to load longer datasets.

BrianJKoopman added 10 commits August 28, 2019 12:19

Modify g3-reader and sisock-http to serve HKArchive

47ee207

New so3g HKArchiveScanner requires some modifications to the http bridge to work with the way the ArchiveScanner returns data and timelines.

Add LOGLEVEL environ variable to http server

fca65ef

Start cleaning up g3_reader and add txaio logging

4ee29ab

Remove custom G3Pipeline for reading files from disk

ced1f8b

Handle field not existing properly to return empty list

1c1c517

Add some naive downsampling to avoid Websocket max message size

97b1c45

Handle unfinished files in hkcs.process_file() call

675033b

Move downsampling to separate function, remove old formatting function

8d8c32d

Revert to using specific so3g version

8ea3e20

BrianJKoopman added the enhancement label Aug 29, 2019

BrianJKoopman requested review from mhasself and ahincks August 29, 2019 20:22

BrianJKoopman added this to To do in g3 File Scanning and Reading via automation Aug 29, 2019

BrianJKoopman moved this from To do to In progress in g3 File Scanning and Reading Aug 29, 2019

BrianJKoopman merged commit b4607c8 into master Oct 22, 2019

g3 File Scanning and Reading automation moved this from In progress to Done Oct 22, 2019

BrianJKoopman deleted the g3-reader-hk-archive-scanner branch October 22, 2019 17:39

This was referenced Oct 22, 2019

move _down_sample_data from g3_reader to so3g library #48

Open

Use so3g getdata for reading g3 files #29

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use HKArchiveScanner in g3-reader Data Sever #46

Use HKArchiveScanner in g3-reader Data Sever #46

BrianJKoopman commented Aug 29, 2019

ahincks commented Sep 3, 2019

BrianJKoopman commented Oct 22, 2019

Use HKArchiveScanner in g3-reader Data Sever #46

Use HKArchiveScanner in g3-reader Data Sever #46

Conversation

BrianJKoopman commented Aug 29, 2019

g3-reader

g3-file-scanner

sisock-http

Misc.

ahincks commented Sep 3, 2019

BrianJKoopman commented Oct 22, 2019