New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trf robustness #1773
Trf robustness #1773
Conversation
…nd trf table creation modified. run tests and try to maintain the interface on trf2pandas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few changes to try and fix the "clean" tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
When I merged in my draft PR/branch on my local system, incorporated the missing items in the dict (comment in the review) the trf to-csv went to completion and the data looked about right (less knowing what the items are supposed to contain).
This is a huge step forward for TRF decoding!
"2537_220": "Y1 Leaf 78/Positional Error (mm)", | ||
"2538_220": "Y1 Leaf 79/Positional Error (mm)", | ||
"2539_220": "Y1 Leaf 80/Positional Error (mm)", | ||
"2170_111": "Mlc Status/Actual Value (None)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you want to put this as the column name after the "Y1 Leaf 80/Positional Error" column name entry (below)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes i believe this is used on the latest version of MR Linac (v4). But you can double check on the version you have files from. I believe this 2170_111 is present. I agree with the below suggestion with the dictionary filling with unknowns. And then look in the software for completion of config.json later.
np.concatenate((timestamps[i], item_parts)) | ||
for i, item_parts in enumerate(item_part_values_data) | ||
] | ||
column_names = ["Timestamp Data"] + [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will raise a key not found error when there is a a value c in the column_names_from_data that is not in the column_names_from_dict. That's appropriate behaviour for python, but...
One could either just add to the dictionary and hope for the best, or
column_names_from_dict_including_unknowns = dict(column_names_from_dict)
for c in column_names_from_data:
if c not in column_names_from_dict_including_unknowns:
column_names_from_dict_including_unknowns[c] = "Item: " + c
print(f'"{c}": "Item: {c}",')
and emit the missing items before the exception is raised.
That would let the user/programmer add those items in temporarily for themselves in the config.json file. Which (looking forward) is what I did (this code is great!) so I could decode some MR Linac TRF data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a much better way to deal with and avoid runtime error. Since we have the list for the item parts from the header then it can be completed later.
👍 I absolutely agree! Some brilliant work here @mguerrajordao 🙂 🎉 |
@sjswerdloff @SimonBiggs Hi both, happy that this small contribution is welcome by you. I learned a lot from the previous code and just tried to give my suggestions. It took a while peaking into the data in the header. |
@mguerrajordao , there seem to be some regression test failures (with some of the sample TRF data that is stored on Zenodo for the purpose of regression tests). |
@sjswerdloff, yes i have noticed. i have discussed with @SimonBiggs to maintain the interfaces. I am guessing it could be down to naming on the dictionary. i will double check. But could also due to previous output having 4 columns as Unknown which now are set as the Timestamps. So I'll look one by one. |
@mguerrajordao I am not familiar enough with the code and tests to say, but I imagine the values are being compared. |
@sjswerdloff I'll focus on the tests today and push any changes on that if i manage too. |
Got one issue: Original config.json has the linear scales as degree (deg). Somehow in the past i must have corrected to mm. I will rollback the config.json to match original and follow up on next regression failure. "Table Longitudinal/Scaled Actual (deg)", |
Got another couple of issues on the value comparison
|
One of the ideas was to do this in two or three PRs. Have the first PR (this one) leave all the tests unchanged and all the results the same. Then, have a follow up PR be allowed to adjust the baselines as well as update a range of items. In some scenarios being stuck in this approach would be quite painful. So, we weren't planning on make it a requirement to the level of making it painful, only if it is reasonably achievable. |
Yup, this was done to match the original Elekta TRF decoding tool. And done under the impression that that tool was going to stick around. So it had a job of matching the previous tool, even when it had errors. But now, that tool is no longer available, and having these column labels be wrong for compatibility is no longer the right choice I believe. Still, let's have this PR remain consistent with the current baselines, and then a follow up PR can make those corrections to the code and the baselines datasets. |
@SimonBiggs |
I was under the impression the tool was still available, but I have no idea what a clinical site has to do to get it. |
Corrected the difference on "Dose/Raw value (1/64th Mu)" by applying an offset... Now the test passes up to trf (version 1 files where 350 columns are expected): dataframe["Dose/Raw value (1/64th Mu)"] = dataframe[ However now failing on the version 3 (Int 4). The converted dataframe has 351 (+1 extra column due to the timestamp), however the reference dataframe has 4 extra columns (from unknown 1 to 4). I will shim this part by splitting the int64 timestamp back into 4 * int16 values so we can continue... |
@SimonBiggs @sjswerdloff After shimming with unknown splitting int64 into 4*16 i was able to pass the test 😄 .
|
Beautiful stuff @mguerrajordao :) |
…ataframe columns. table.py: amended the factor of 10 on Table Isocentric, corrected from bipolar signed int16 on Dose/Raw value (1/64th of Mu) and split and dropped the int64 Timestamp Data column into 4 int16 named unknow1-4 columns
…o trf-robustness merging with differences for pytest pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is absolutely beautiful stuff @mguerrajordao. It's brilliant to see TRF be taken into the next era of Integrity :).
Thank you so much @mguerrajordao. I have made a few stylistic like comments. This is almost ready for a merge :)
Also, heads up @mguerrajordao, running the following in the CI suite on GitHub actions fails for this PR:
So, it would be worth running that locally and making sure it passes. |
Hi Marcelo, Tried to send you a private email thanking you and asking for feedback, but it was blocked with the following message:
|
Hi Simon, |
Regarding the poetry lint
|
…ded for the processing on table.py. run pytest will all tests marked as slow. all pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mguerrajordao,
Beautiful stuff :). I have a few remaining "nit-pick" comments. Which you are free to ignore if you choose. This has my approval :).
Before merging the changelog file at the top of the repo needs to be updated.
@@ -12,197 +12,83 @@ | |||
# See the License for the specific language governing permissions and | |||
# limitations under the License. | |||
|
|||
from typing import List | |||
# from typing import List |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# from typing import List |
decoded_rows, column_adjustment_key = decode_rows(trf_table_contents) | ||
version = header_table_contents["version"].values[0].astype(int) | ||
item_parts_length = header_table_contents["item_parts_length"].values[0].astype(int) | ||
item_parts = header_table_contents["item_parts"].values[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth doing this header type conversion back right after the header has been parsed. As in type fixing of the header contents is likely the job of the "decode header" function, not the "decode table" function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SimonBiggs I'm not sure i understand. do you mean passing the variables (version, item_parts_length, item_parts) into the function decode_trf_table, instead of passing header_table_contents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess where I am confused is with the following piece:
.astype(int)
My thought is that when the header_table_contents
is created, what if it was made int right away? This would mean that it wouldn't be the job of downstream functions to convert it to int.
I do actually believe you're doing this already, at least for item_parts_length
:
pymedphys/lib/pymedphys/_trf/decode/header.py
Line 162 in d4f9900
item_parts_length = int(len(item_parts)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And potentially .astype(int)
isn't needed for version either?
pymedphys/lib/pymedphys/_trf/decode/header.py
Line 154 in d4f9900
version = np.frombuffer(groups[4][8:12], dtype=np.int32).item() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't block merge on this though. Really just a "nit-pick". I'll merge after the tests pass, and you can opt to make this change in the next PR if you want to.
@@ -57,7 +57,8 @@ def trf2pandas(trf: path_or_binary_file) -> Tuple["pd.DataFrame", "pd.DataFrame" | |||
|
|||
trf_header_contents, trf_table_contents = split_into_header_table(trf_contents) | |||
header_dataframe = header_as_dataframe(trf_header_contents) | |||
table_dataframe = decode_trf_table(trf_table_contents) | |||
# table_dataframe = decode_trf_table(trf_table_contents) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# table_dataframe = decode_trf_table(trf_table_contents) |
Co-authored-by: Simon Biggs <simon.biggs@radiotherapy.ai>
Co-authored-by: Simon Biggs <simon.biggs@radiotherapy.ai>
@SimonBiggs Please see comments. Thanks for merging.
|
Amazing stuff @mguerrajordao! 🎉 Congrats on becoming a PyMedPhys contributor 🙂. Absolutely wonderful to have your contribution 🙂 Next thing to add to your to-do list is to add yourself to the contributors list 🙂. Put yourself below Derek and above Jake: https://github.com/pymedphys/pymedphys/blame/main/README.rst#L154 |
list of changes:
lib/pymedphys/_trf/decode/config.json
lib/pymedphys/_trf/decode/header.py
lib/pymedphys/_trf/decode/table.py
-this exhibits most of the changes. line grouping can be exactly determined from the headed item_parts_number and item_parts list. most of the code that required inspecting the file and trying, can now be removed.