Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade MC #67

Merged
merged 12 commits into from
Nov 11, 2021
Merged

Upgrade MC #67

merged 12 commits into from
Nov 11, 2021

Conversation

asogaard
Copy link
Collaborator

Hi @RasmusOrsoe,

I have added an I3FeatureExtractorUpgrade to read Upgrade data from I3-files. I have also made the DataConverter classes completely generic wrt. which data are extracted which means that we can now read multiple pulsemaps from the same I3-files, and write to the same SQLite database, at the same time. To do this, I have re-written some of SQLiteDataConverter to be more generic and remove hard-coded references to e.g. "truth", "RetroReco", etc.

Let me know what you think! (The next step will then be ingesting Upgrade data into the model, but one thing at a time.)

Closes #46.

@asogaard asogaard self-assigned this Nov 11, 2021
Copy link
Collaborator

@RasmusOrsoe RasmusOrsoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Andreas! Looks nice.

It is not clear to me (because much have changed and getting an overview is difficult) how table names are defined now. Could you elaborate on this?

I do think having defined table names for certain quantities are important - its very convenient for making plotting code compatible with all databases.

EDIT:

Also, I think it would be a good check to make a tiny database (size doesn't matter as long as its not empty) and run a EXPLAIN QUERY PLAN test to make sure indexation is still correct. See link for syntax usage.
(https://stackoverflow.com/questions/26936842/why-do-i-get-different-query-plans-in-sqlite3-for-the-same-query)

Expected behavior for queries using event_no is for the plan to utilize this index when querying - we can use one of the current databases to check if behavior is the same.

@asogaard
Copy link
Collaborator Author

Absolutely:

All instances of I3Extractor now has a name attribute [1]. This is what is used to determine the name of the table corresponding to the extraction in question [2,3]. For instance, in I3FeatureExtractor the name is the same as the pulsemap [4], so that is completely unambiguous. For the truth and retro extractors, the default names are "truth" and "retro" [5,6], respectively, but these can be overridden using the first argument to the constructors, e.g. I3RetroExtractor("RetroReco").

Wrt. the EXPLAIN QUERY PLAN test, I have followed your helpful suggestions (thanks!) and run the following code snippet for existing database files as well as ones generated with the proposed code changes:

import sqlite3
import pandas as pd

db = "/groups/icecube/asogaard/temp/sqlite_test_upgrade/data_test/data/data_test.db"
#db = "/groups/icecube/leonbozi/datafromrasmus/GNNReco/data/databases/dev_level7_noise_muon_nu_classification_pass2_fixedRetro_v3/data/dev_level7_noise_muon_nu_classification_pass2_fixedRetro_v3.db"
pulsemap = "I3RecoPulseSeriesMapRFCleaned_mDOM"
#pulsemap = "SRTTWOfflinePulsesDC"
event_number = 1
with sqlite3.connect(db) as conn:
    query1 = f"EXPLAIN QUERY PLAN select * from {pulsemap} where event_no = {event_number}"
    query2 = f"EXPLAIN QUERY PLAN select energy from truth where event_no = {event_number}"
    res1 = pd.read_sql(query1, conn)
    res2 = pd.read_sql(query2, conn)

(Commented-out lines are the reference.)
The results of the first query, for the two files, are:

SEARCH I3RecoPulseSeriesMapRFCleaned_mDOM USING INDEX event_no_I3RecoPulseSeriesMapRFCleaned_mDOM (event_no=?)  # New
SEARCH SRTTWOfflinePulsesDC USING INDEX event_no_SRTTWOfflinePulsesDC (event_no=?)  # Original

which seems to be the same apart from a pulse-series specific name for the index column (event_no_<pulsemap>).

The results of the second query are the exact same for the two files:

SEARCH truth USING INTEGER PRIMARY KEY (rowid=?)

Let me know whether you are happy with there results or would like to see additional checks.


[1] https://github.com/asogaard/gnn-reco/blob/b8527fd5907fa1f325a9a4176a36de4bbdda1a7f/src/gnn_reco/data/i3extractor.py#L13-L45
[2] https://github.com/asogaard/gnn-reco/blob/b8527fd5907fa1f325a9a4176a36de4bbdda1a7f/src/gnn_reco/data/sqlite_dataconverter.py#L64
[3] https://github.com/asogaard/gnn-reco/blob/b8527fd5907fa1f325a9a4176a36de4bbdda1a7f/src/gnn_reco/data/sqlite_dataconverter.py#L188-L192
[4] https://github.com/asogaard/gnn-reco/blob/b8527fd5907fa1f325a9a4176a36de4bbdda1a7f/src/gnn_reco/data/i3extractor.py#L66-L69
[5] https://github.com/asogaard/gnn-reco/blob/b8527fd5907fa1f325a9a4176a36de4bbdda1a7f/src/gnn_reco/data/i3extractor.py#L197
[6] https://github.com/asogaard/gnn-reco/blob/b8527fd5907fa1f325a9a4176a36de4bbdda1a7f/src/gnn_reco/data/i3extractor.py#L242

Copy link
Collaborator

@RasmusOrsoe RasmusOrsoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation and checks! Approving

@asogaard asogaard merged commit 02b41b6 into graphnet-team:main Nov 11, 2021
@asogaard asogaard deleted the upgrade-mc branch November 11, 2021 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ingest Upgrade MC
2 participants