Update mc readers #693

andLaing · 2020-02-25T15:53:59Z

Updates the pandas DataFrame based readers
for the MC hits, particles and sensors taking into
account the new nexus output structure.

Adapts existing functions to return the same structure
for both old and new format files.

jmunozv · 2020-02-26T18:23:47Z

With the last "load_mcsensor_response_df functionality extended" commit we get:

The DB_name and run_number is not mandatory, so new geometries not stored in the DataBase can be handled without the need to pass arbitrary values to these variables.
A new parameter called sns_name has been added for just the input files with the new hdf5 persistency format. This sns_name, if set, is used by the function to only return the sns_response of that type of sensors.

jmunozv · 2020-02-26T18:29:52Z

Currently load_mcsensor_response_df returns a Tuple with 2 pandas DataFrames: sns_response and bin_widths. This is the only mc-reader with this behavior. The rest of the mc-readers just return what you ask (hits, particles).
Given that the bin_widths are easily got via get_sensor_binning() function provided by IC, my proposal, for consistency, is that this function should return only the sns_response.

andLaing · 2020-02-28T12:01:55Z

With the commit 6ca6062 we remove the old readers which returned dictionaries of evm.event_model objects. @paolafer suggests a tag be made before merging these changes into the head.

mmkekic · 2020-02-28T20:22:41Z

The MC structure table has been bothering me for a while and I am very happy to see it improving! However I think we should all agree on what we want to do and try and keep the code as clean as possible.
Since the MC table structure is changing we need duplicated readers to be able to read different data format during this transition period, is a bit ugly but I dont see a way around it.
Seems that the new readers are always returning pandas dataframe, so the question is do we still want to be able to read the MC information as our custom types (ie dictionaries)? @msorel , @ausonandres , @paolafer , @Aretno might want to comment on this? If the answer is no, I suggest we make an IC tag and clean the unused types from the event model (and all the functions using those types). If the answer is yes we would need to provide a functions that transform dataframes to needed types.

paolafer · 2020-02-29T16:42:58Z

I think that eliminating completely the event model requires a more thorough discussion, including @jjgomezcadenas. For the MC readers, we agreed on making a tag and then move on eliminating the readers that convert tables into dictionaries.

jacg · 2020-02-29T16:51:49Z

... eliminating completely the event model ...

:-)

I wish I had the time to pay attention to this, or even come over and discuss it with you all in person. Unfortunately, I've just unleashed a deluge of activity in my life, so I'm completely out of the picture for a while.

paolafer · 2020-02-29T16:55:24Z

I wish I had the time to pay attention to this, or even come over and discuss it with you all in person. Unfortunately, I've just unleashed a deluge of activity in my life, so I'm completely out of the picture for a while.

I'm glad you commented :-), I didn't mention you because we know you're out of the picture, but your interventions whenever you find the time are greatly appreciated!

mmkekic · 2020-02-29T21:40:16Z

I think that eliminating completely the event model requires a more thorough discussion, including @jjgomezcadenas.

No, I didnt mean to eliminate the whole event model, that doesnt seem feasible at the moment (though note that we are duplicating some functionalities to work with pandas directly when the performance is the issue... ). I meant to declutter it, ie remove types that are simply not used anywhere in IC (as MCParticle would be in this case we eliminate the reader)

For the MC readers, we agreed on making a tag and then move on eliminating the readers that convert tables into dictionaries.

I think that we should at least keep the possibility for a user to create MCHit type since it is needed to be fed in the paolina functions for example. My suggestion would be to have a function that transforms hits dataframe to a dictionary {event : List[MCHit]}. I am not familiar if the output of the other readers (sns responses etc) is used anywhere.

mmkekic · 2020-02-29T22:01:48Z

I wish I had the time to pay attention to this, or even come over and discuss it with you all in person. Unfortunately, I've just unleashed a deluge of activity in my life, so I'm completely out of the picture for a while.

I'm glad you commented :-), I didn't mention you because we know you're out of the picture, but your interventions whenever you find the time are greatly appreciated!

Indeed, your suggestion, even sporadic, are more than welcome! Thanks for still keeping an eye on us! :)

invisible_cities/reco/paolina_functions_test.py

invisible_cities/cities/penthesilea_test.py

andLaing · 2020-03-28T18:41:01Z

While adjusting the cities to use the new paradigm for the copying of MC info I've come across a problem in a few tests that I'm not sure the best way to solve. The file used (electrons_40keV_z250_RWF.h5) seems to have an error in the MC tables as can be seen in the attached screenshot. The extents row for the last event gives the last hit and last particle one less than the total number of rows in the corresponding table. This causes an error when trying to read either of these tables into a coherent dataframe since the last value of the event_id column is a nan. I could probably catch the exception caused and replace the nan with the last valid event_id but I was worried that this would cover up other corrupt files. It's not relevant for new MC but I thought I'd better ask for opinions before I start rewriting functions or changing test files

jmunozv · 2020-03-28T20:17:50Z

Andrew, may be the error comes from the event number one (the one with id = 1) ... according to the plot you have sent it wouldn't have any particle

andLaing · 2020-03-28T20:20:31Z

Nope it's definitely the last one, there are 50 hit rows and the last index is 48 not 49 as it needs to be

paolafer · 2020-03-30T11:07:29Z

Hi! The electrons_40keV_z250_RWF.h5 file has clearly an error in it, since, as Javi says, in the extents table you can see that event 0 and event 1 have the same last_particle value. I've checked that the MCRD correspondent file has the correct MC true information and that, if I run diomira on it, a correct table in the RWF file is produced, therefore, the IC code seems to be fine. For some reason, the electrons_40keV_z250_RWF.h5 file was produced in a wrong way.

I would propose to use another RWF file in the tests that use it, we have some of them already in the test_data folder, if those tests don't assume anything in particular for the content of the file.

jmunozv · 2020-03-30T11:12:08Z

That was exactly my point ... The event with id=1 has 4 hits and 0 particles what has no sense at all.

#713 [author: andLaing] Removes the use of the file electrons_40keV_z250_RWF.h5 in the isidora tests and in the commandline tests for isidora and irene. This file was found to have badly written MC info in PR #693 [reviewer: paolafer] This PR eliminates a test file that had wrong MC true information. The tests that used it have been modified to use a different file with similar content and the file has been removed. All the tests succeed now, so I approve this PR.