-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use HKArchiveScanner in g3-reader Data Sever #46
Conversation
New so3g HKArchiveScanner requires some modifications to the http bridge to work with the way the ArchiveScanner returns data and timelines.
Use the ArchiveScanner more appropriately. We were scanning all the files every time a query was issued. This will now only scan them if we haven't done so already. This means loading the data the first time will be 'slow', while subsequent requests on those files already loaded should be faster. Perhaps it would be better to scan all files on startup, though once enough files exist this could take a long time, and would prevent returning data close to startup.
The exception catching a KeyError wasn't working for already configured fields recently. This imeplements the same work around as we were doing for the g3-reader. For fields that we can't find we just return an empty array, so if someone has already configured a query in Grafana we'll give them nothing back. This results in the field name still showing in the key of their plot (so they won't be confused why their query isn't showing up in the plot), but no data being displayed. Allows fields also configured for the same plot that do have data to be plotted.
Glad to hear this is making this faster! I have one comment/question: As far as I can see, it would be better if downsampling were done by I'll have a look at #45 next ... |
I'm alright with moving the down sampling to so3g. It is fairly independent, so I think I'm going to merge this for now and open an issue to move it. That said, I still think downsampled files would be good for when users are trying to load longer datasets. |
This PR replaces the custom G3 Pipeline for opening files with the use of the so3g HKArchive object and it's associated
get_data()
method.This begins work on #29.
g3-reader
The
G3ReaderServer
has three attributes which keep track of the data cache:cache_list
, which keeps track of files we've already processed with the HKArchiveScannerhkas
, the HKArchiveScanner objectarchive
, the HKArchive objectWhen a query is issued by Grafana new files which aren't in the
cache_list
already will be processed by thehkas
andarchive
will be updated with the HKArchive object returned byhkas.finalize()
.archive.get_data()
is then used to retrieve the data between the timestamps requested. We still naively downsample usingMAX_POINTS
so that we don't overwhelm the communication protocol (also because it wouldn't make sense displaying such high resolution data most of the time.) We would still benefit from downsampling to disk though so that we can avoid loading full resolution data all the time.g3-file-scanner
Small change here, but one that requires action from users already using the g3-reader/file-scanner/database. I was previously removing spaces and lowercasing field names that got logged in the database. This would cause issues where in the so3g bits this wasn't occurring, so I took this out. Users will need to essentially wipe their DB instances and allow an updated g3-file-scanner rescan their data.
sisock-http
Due to differences in the implementation of the
get_fields
/get_data
API in the so3g hk code using so3g'sget_data
isn't exactly a drop-in replacement within sisock. Details in #45, but the main thing is that timeline names are dynamically assigned in so3g, so that you can't cache the results fromget_fields
and expect them to match a later call toget_data
. Since this is how sisock was designed sisock-http expects to be able to cache theget_fields
results.I've accommodated this in a sort of hacked way by identifying we've returned things from so3g's
get_data
by looking for it's first default dynamically generated field name 'group0', and then processing the results differently. This section does repeat a bit of code from the previous processing, but the error handling was different enough that I've not tried to make it nicer. Eventually, depending on how we handle #45 we might switch to using the new processing code in all instances.Misc.
I've tested this both locally on a small subset of data, but also on all the HK data we've collected at Yale. Just yesterday we set the SAT1 system up with this updated g3-reader as well, as they are the heaviest user of the reader at the moment, and would benefit from any speed increases.
Overall this is dramatically faster than what we have currently. Several days of data can still take ~10 seconds to load ~25 timestreams on first load.