-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade MC #67
Upgrade MC #67
Conversation
Pull main into fork
… to all I3Extractors
…g multiple pulsemaps from each frame; and comment out un-used functions
…rter more generic as a consequence
…of self._indices; and simpify a duplicate for-loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Andreas! Looks nice.
It is not clear to me (because much have changed and getting an overview is difficult) how table names are defined now. Could you elaborate on this?
I do think having defined table names for certain quantities are important - its very convenient for making plotting code compatible with all databases.
EDIT:
Also, I think it would be a good check to make a tiny database (size doesn't matter as long as its not empty) and run a EXPLAIN QUERY PLAN
test to make sure indexation is still correct. See link for syntax usage.
(https://stackoverflow.com/questions/26936842/why-do-i-get-different-query-plans-in-sqlite3-for-the-same-query)
Expected behavior for queries using event_no is for the plan to utilize this index when querying - we can use one of the current databases to check if behavior is the same.
Absolutely: All instances of Wrt. the import sqlite3
import pandas as pd
db = "/groups/icecube/asogaard/temp/sqlite_test_upgrade/data_test/data/data_test.db"
#db = "/groups/icecube/leonbozi/datafromrasmus/GNNReco/data/databases/dev_level7_noise_muon_nu_classification_pass2_fixedRetro_v3/data/dev_level7_noise_muon_nu_classification_pass2_fixedRetro_v3.db"
pulsemap = "I3RecoPulseSeriesMapRFCleaned_mDOM"
#pulsemap = "SRTTWOfflinePulsesDC"
event_number = 1
with sqlite3.connect(db) as conn:
query1 = f"EXPLAIN QUERY PLAN select * from {pulsemap} where event_no = {event_number}"
query2 = f"EXPLAIN QUERY PLAN select energy from truth where event_no = {event_number}"
res1 = pd.read_sql(query1, conn)
res2 = pd.read_sql(query2, conn) (Commented-out lines are the reference.)
which seems to be the same apart from a pulse-series specific name for the index column ( The results of the second query are the exact same for the two files:
Let me know whether you are happy with there results or would like to see additional checks. [1] https://github.com/asogaard/gnn-reco/blob/b8527fd5907fa1f325a9a4176a36de4bbdda1a7f/src/gnn_reco/data/i3extractor.py#L13-L45 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation and checks! Approving
Hi @RasmusOrsoe,
I have added an
I3FeatureExtractorUpgrade
to read Upgrade data from I3-files. I have also made theDataConverter
classes completely generic wrt. which data are extracted which means that we can now read multiple pulsemaps from the same I3-files, and write to the same SQLite database, at the same time. To do this, I have re-written some ofSQLiteDataConverter
to be more generic and remove hard-coded references to e.g."truth"
,"RetroReco"
, etc.Let me know what you think! (The next step will then be ingesting Upgrade data into the model, but one thing at a time.)
Closes #46.