Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use HKArchiveScanner in g3-reader Data Sever #46

Merged
merged 11 commits into from
Oct 22, 2019

Conversation

BrianJKoopman
Copy link
Member

This PR replaces the custom G3 Pipeline for opening files with the use of the so3g HKArchive object and it's associated get_data() method.

This begins work on #29.

g3-reader

The G3ReaderServer has three attributes which keep track of the data cache:

  1. cache_list, which keeps track of files we've already processed with the HKArchiveScanner
  2. hkas, the HKArchiveScanner object
  3. archive, the HKArchive object

When a query is issued by Grafana new files which aren't in the cache_list already will be processed by the hkas and archive will be updated with the HKArchive object returned by hkas.finalize().

archive.get_data() is then used to retrieve the data between the timestamps requested. We still naively downsample using MAX_POINTS so that we don't overwhelm the communication protocol (also because it wouldn't make sense displaying such high resolution data most of the time.) We would still benefit from downsampling to disk though so that we can avoid loading full resolution data all the time.

g3-file-scanner

Small change here, but one that requires action from users already using the g3-reader/file-scanner/database. I was previously removing spaces and lowercasing field names that got logged in the database. This would cause issues where in the so3g bits this wasn't occurring, so I took this out. Users will need to essentially wipe their DB instances and allow an updated g3-file-scanner rescan their data.

sisock-http

Due to differences in the implementation of the get_fields/get_data API in the so3g hk code using so3g's get_data isn't exactly a drop-in replacement within sisock. Details in #45, but the main thing is that timeline names are dynamically assigned in so3g, so that you can't cache the results from get_fields and expect them to match a later call to get_data. Since this is how sisock was designed sisock-http expects to be able to cache the get_fields results.

I've accommodated this in a sort of hacked way by identifying we've returned things from so3g's get_data by looking for it's first default dynamically generated field name 'group0', and then processing the results differently. This section does repeat a bit of code from the previous processing, but the error handling was different enough that I've not tried to make it nicer. Eventually, depending on how we handle #45 we might switch to using the new processing code in all instances.

Misc.

  • Started using txaio logging in more places, particularly for debug statements.
  • Added ability to set the log level via environment variable.

I've tested this both locally on a small subset of data, but also on all the HK data we've collected at Yale. Just yesterday we set the SAT1 system up with this updated g3-reader as well, as they are the heaviest user of the reader at the moment, and would benefit from any speed increases.

Overall this is dramatically faster than what we have currently. Several days of data can still take ~10 seconds to load ~25 timestreams on first load.

New so3g HKArchiveScanner requires some modifications to the http bridge to
work with the way the ArchiveScanner returns data and timelines.
Use the ArchiveScanner more appropriately. We were scanning all the files every
time a query was issued. This will now only scan them if we haven't done so
already. This means loading the data the first time will be 'slow', while
subsequent requests on those files already loaded should be faster.

Perhaps it would be better to scan all files on startup, though once enough
files exist this could take a long time, and would prevent returning data close
to startup.
@BrianJKoopman BrianJKoopman added this to To do in g3 File Scanning and Reading via automation Aug 29, 2019
@BrianJKoopman BrianJKoopman moved this from To do to In progress in g3 File Scanning and Reading Aug 29, 2019
The exception catching a KeyError wasn't working for already configured fields
recently. This imeplements the same work around as we were doing for the
g3-reader. For fields that we can't find we just return an empty array, so if
someone has already configured a query in Grafana we'll give them nothing back.
This results in the field name still showing in the key of their plot (so they
won't be confused why their query isn't showing up in the plot), but no data
being displayed. Allows fields also configured for the same plot that do have
data to be plotted.
@ahincks
Copy link
Collaborator

ahincks commented Sep 3, 2019

Glad to hear this is making this faster! I have one comment/question:

As far as I can see, it would be better if downsampling were done by so3g.hk.getdata.get_data() based on the min_stride parameter. This doesn't—or rather, shouldn't—require having downsampled files available, as so3g.hk.getdata should be able to slice the data itself. (In fact, I would hope that eventually it will be smart enough to decide whether to use a downsampled file and/or whether to slice more frequently sampled data.) So, would it be possible and desirable to put the code from _down_sample_data into so3g.hk.getdata.get_data() rather than have it here? The MAX_POINTS parameter could still be used internally to sisock in order to force a min_stride parameter to be passed to the so3g method if necessary.

I'll have a look at #45 next ...

@BrianJKoopman
Copy link
Member Author

I'm alright with moving the down sampling to so3g. It is fairly independent, so I think I'm going to merge this for now and open an issue to move it.

That said, I still think downsampled files would be good for when users are trying to load longer datasets.

@BrianJKoopman BrianJKoopman merged commit b4607c8 into master Oct 22, 2019
g3 File Scanning and Reading automation moved this from In progress to Done Oct 22, 2019
@BrianJKoopman BrianJKoopman deleted the g3-reader-hk-archive-scanner branch October 22, 2019 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

2 participants