-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update mc readers #693
Update mc readers #693
Conversation
With the last "load_mcsensor_response_df functionality extended" commit we get:
|
Currently load_mcsensor_response_df returns a Tuple with 2 pandas DataFrames: sns_response and bin_widths. This is the only mc-reader with this behavior. The rest of the mc-readers just return what you ask (hits, particles). |
42f79f9
to
f08cfc2
Compare
The MC structure table has been bothering me for a while and I am very happy to see it improving! However I think we should all agree on what we want to do and try and keep the code as clean as possible. |
I think that eliminating completely the event model requires a more thorough discussion, including @jjgomezcadenas. For the MC readers, we agreed on making a tag and then move on eliminating the readers that convert tables into dictionaries. |
:-) I wish I had the time to pay attention to this, or even come over and discuss it with you all in person. Unfortunately, I've just unleashed a deluge of activity in my life, so I'm completely out of the picture for a while. |
I'm glad you commented :-), I didn't mention you because we know you're out of the picture, but your interventions whenever you find the time are greatly appreciated! |
No, I didnt mean to eliminate the whole event model, that doesnt seem feasible at the moment (though note that we are duplicating some functionalities to work with pandas directly when the performance is the issue... ). I meant to declutter it, ie remove types that are simply not used anywhere in IC (as MCParticle would be in this case we eliminate the reader)
I think that we should at least keep the possibility for a user to create MCHit type since it is needed to be fed in the paolina functions for example. My suggestion would be to have a function that transforms hits dataframe to a dictionary {event : List[MCHit]}. I am not familiar if the output of the other readers (sns responses etc) is used anywhere. |
Indeed, your suggestion, even sporadic, are more than welcome! Thanks for still keeping an eye on us! :) |
976da13
to
c9e315d
Compare
5655f11
to
74ff84b
Compare
0bd0ae7
to
a1d6f60
Compare
Andrew, may be the error comes from the event number one (the one with id = 1) ... according to the plot you have sent it wouldn't have any particle |
Nope it's definitely the last one, there are 50 hit rows and the last index is 48 not 49 as it needs to be |
Hi! The I would propose to use another RWF file in the tests that use it, we have some of them already in the |
That was exactly my point ... The event with id=1 has 4 hits and 0 particles what has no sense at all. |
#713 [author: andLaing] Removes the use of the file electrons_40keV_z250_RWF.h5 in the isidora tests and in the commandline tests for isidora and irene. This file was found to have badly written MC info in PR #693 [reviewer: paolafer] This PR eliminates a test file that had wrong MC true information. The tests that used it have been modified to use a different file with similar content and the file has been removed. All the tests succeed now, so I approve this PR.
d8b209c
to
9790e94
Compare
invisible_cities/io/mcinfo_io.py
Outdated
|
||
with tb.open_file(file_name, 'r') as h5in: | ||
mc_tbls = ['hits', 'particles', 'sns_response'] | ||
evt_list = list(_try_unique_evt_itr(h5in.root.MC, mc_tbls)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we understand why are there events that are not in particles or hits table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a question for the NEXUS authors really.
There are certain types of simulations that have some and not other MC types saved, a simulation of optical photons for example would have an empty hits table as far as I understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are different kinds of simulations, where we save different kinds of output (some will save only sensor response information and not true particles, sometimes the opposite happen, for instance). We want to be able to read all kind of MC files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An option would be to resurrect the old events
table, which contained two columns, namely the eventID and the total deposited true energy, but I don't know if it's worth it...
invisible_cities/io/mcinfo_io.py
Outdated
def sensor_binning_old(file_name : str) -> pd.DataFrame: | ||
# Is a pre-Feb 2020 MC file | ||
config = pd.read_hdf(file_name, 'MC/configuration').set_index('param_key') | ||
bins = config[config.index.str.contains('binning')].copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since config was read in a previous line and not given as a mutable input, you don't have to copy it
If all events not found in loop
Slight adjustment to comparisons in test_copy_mc_info_which_events_out_of_range
Makes df index types the same in old and new format reads
Avoids warning/error from pandas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR modifies MC io module to adapt for the new nexus output format, ensuring that the main readers/writer can be applied to the old format as well. The new functions use extensively dataframes that are faster to manipulate with than the previous dictionary based readers/writer. All the functionalities are well tested, and new MC test dataset is introduced to guarantee the equality in information read from the old and new format.
Some functionalities are repeated due to different old/new table structures, and are planned to be removed in the future once there is no need for the 'old' files readers.
The road to user-transparent compatibility was bumpy but it sparks joy to finally approve this PR! Well done!
Updates the pandas DataFrame based readers
for the MC hits, particles and sensors taking into
account the new nexus output structure.
Adapts existing functions to return the same structure
for both old and new format files.