Trf robustness #1773

mguerrajordao · 2022-11-18T02:38:09Z

list of changes:

lib/pymedphys/_trf/decode/config.json

added version_row dict and offset for the decoding of the trf from the decoded version in the trf header.
added item_part_names with a dictionary for the item_parts list from the decoded data in the trf header.

lib/pymedphys/_trf/decode/header.py

some formatting changes were due to the environment using black automaticaly.
added extra mu, version, item_parts_number, item_parts_length and the item_parts list decoded from the data in the trf_header
improved regex_trf for finding the length of the dynamically generated part of the header
decoded the multiple groups (change ascii to utf-8 but the the result should be similar):
mu, version and item_parts_number are also decoded from the last group of the regex match.
the item_parts is a numpy array dynamically calculated with the item_parts_number which can be read from above. This allows dynamically generating the data, independent of the version.

lib/pymedphys/_trf/decode/table.py
-this exhibits most of the changes. line grouping can be exactly determined from the headed item_parts_number and item_parts list. most of the code that required inspecting the file and trying, can now be removed.

function decode_trf_table has changed while keeping the same interface. decode_column calculates the the line_grouping dynamically from the header (version, and item_parts_length and offset). offset needs to be calculated from the version (as in the dictionary on config.json), because for different versions, different dtypes may be used, so this still has to be hardcoded, however it can be easily estimated according to the version.
on convert_data_table, convert_applying_negative, divide_by_10 and remaining do not need to be used anymore.
the code is smaller but still needs validation

…nd trf table creation modified. run tests and try to maintain the interface on trf2pandas

SimonBiggs

Just a few changes to try and fix the "clean" tests

lib/pymedphys/_trf/decode/table.py

lib/pymedphys/_imports/imports.py

sjswerdloff

LGTM.
When I merged in my draft PR/branch on my local system, incorporated the missing items in the dict (comment in the review) the trf to-csv went to completion and the data looked about right (less knowing what the items are supposed to contain).
This is a huge step forward for TRF decoding!

sjswerdloff · 2022-11-19T04:06:36Z

lib/pymedphys/_trf/decode/config.json

+        "2537_220": "Y1 Leaf 78/Positional Error (mm)",
+        "2538_220": "Y1 Leaf 79/Positional Error (mm)",
+        "2539_220": "Y1 Leaf 80/Positional Error (mm)",
+        "2170_111": "Mlc Status/Actual Value (None)"


did you want to put this as the column name after the "Y1 Leaf 80/Positional Error" column name entry (below)?

Yes i believe this is used on the latest version of MR Linac (v4). But you can double check on the version you have files from. I believe this 2170_111 is present. I agree with the below suggestion with the dictionary filling with unknowns. And then look in the software for completion of config.json later.

sjswerdloff · 2022-11-19T04:30:44Z

lib/pymedphys/_trf/decode/table.py

+            np.concatenate((timestamps[i], item_parts))
+            for i, item_parts in enumerate(item_part_values_data)
+        ]
+        column_names = ["Timestamp Data"] + [


this will raise a key not found error when there is a a value c in the column_names_from_data that is not in the column_names_from_dict. That's appropriate behaviour for python, but...
One could either just add to the dictionary and hope for the best, or

column_names_from_dict_including_unknowns = dict(column_names_from_dict) for c in column_names_from_data: if c not in column_names_from_dict_including_unknowns: column_names_from_dict_including_unknowns[c] = "Item: " + c print(f'"{c}": "Item: {c}",')

and emit the missing items before the exception is raised.
That would let the user/programmer add those items in temporarily for themselves in the config.json file. Which (looking forward) is what I did (this code is great!) so I could decode some MR Linac TRF data.

This is a much better way to deal with and avoid runtime error. Since we have the list for the item parts from the header then it can be completed later.

SimonBiggs · 2022-11-21T01:21:51Z

This is a huge step forward for TRF decoding!

👍 I absolutely agree! Some brilliant work here @mguerrajordao 🙂 🎉

mguerrajordao · 2022-11-21T02:19:08Z

@sjswerdloff @SimonBiggs Hi both, happy that this small contribution is welcome by you. I learned a lot from the previous code and just tried to give my suggestions. It took a while peaking into the data in the header.
Also the timestamps for each entry were a shot of luck of trying to read the first bytes of each row and decoding it as a different integer. When the value looked like a epoch timestamp I was happy that when compared to the date in the header, it was around the same timestamp, so it gave confidence.
Another difficulty was to pick on somebody else code and try to add something without breaking it. I am still very rookie on the collaboration code, and being pymedphys such a grown library, please excuse any practices that do not conform to the norm.

sjswerdloff · 2022-11-21T02:48:49Z

@mguerrajordao , there seem to be some regression test failures (with some of the sample TRF data that is stored on Zenodo for the purpose of regression tests).
Can you look into those?
Running the tests locally can be done with a command line (might be something like):
poetry run pymedphys dev tests --run-only-slow

mguerrajordao · 2022-11-21T02:56:38Z

@sjswerdloff, yes i have noticed. i have discussed with @SimonBiggs to maintain the interfaces. I am guessing it could be down to naming on the dictionary. i will double check. But could also due to previous output having 4 columns as Unknown which now are set as the Timestamps. So I'll look one by one.
Does the test compare the actual decoded data as well?

sjswerdloff · 2022-11-21T03:16:45Z

@mguerrajordao I am not familiar enough with the code and tests to say, but I imagine the values are being compared.
Thats a big part of the value of regression testing...
It sounds like some change to the tests is appropriate given that you are decoding things that were not being decoded before.
I'll try to take a look at this later this week, but if you make edits to the test code and put that in the review, I'll make it a high priority to review that promptly (within 24 hours).

mguerrajordao · 2022-11-21T03:27:57Z

@sjswerdloff I'll focus on the tests today and push any changes on that if i manage too.
I'll make sure that the regression test pass first or give justification why any part is failing.

mguerrajordao · 2022-11-21T03:40:28Z

Got one issue:

Original config.json has the linear scales as degree (deg). Somehow in the past i must have corrected to mm. I will rollback the config.json to match original and follow up on next regression failure.

"Table Longitudinal/Scaled Actual (deg)",
"Table Longitudinal/Positional Error (deg)",
"Table Lateral/Scaled Actual (deg)",
"Table Lateral/Positional Error (deg)",
"Table Height/Scaled Actual (deg)",
"Table Height/Positional Error (deg)",

mguerrajordao · 2022-11-21T04:16:19Z

Got another couple of issues on the value comparison

On "Table Isocentric/Scaled Actual (deg)" there is a factor of 10 mismatch. I have corrected it and will investigate further.
On "Dose/Raw value (1/64th Mu)" we have negative values with the new method. This is obviously wrong. I haven't noticed this before but i believe this due to the sign/unsigned int nature. The datatype are being read as signed int16 for all the values in a row. (which makes sense for most of the values in bipolar). However i think we are reaching the counter limit on this particular item and additional conversion needs to be made. (this is being processed as +/- 32768 (15bit plus 1 bit for sign). Will check further on how convert it.

SimonBiggs · 2022-11-21T05:30:38Z

It sounds like some change to the tests is appropriate given that you are decoding things that were not being decoded before.

One of the ideas was to do this in two or three PRs. Have the first PR (this one) leave all the tests unchanged and all the results the same. Then, have a follow up PR be allowed to adjust the baselines as well as update a range of items. In some scenarios being stuck in this approach would be quite painful. So, we weren't planning on make it a requirement to the level of making it painful, only if it is reasonably achievable.

SimonBiggs · 2022-11-21T05:33:41Z

Original config.json has the linear scales as degree (deg).

Yup, this was done to match the original Elekta TRF decoding tool. And done under the impression that that tool was going to stick around. So it had a job of matching the previous tool, even when it had errors.

But now, that tool is no longer available, and having these column labels be wrong for compatibility is no longer the right choice I believe.

Still, let's have this PR remain consistent with the current baselines, and then a follow up PR can make those corrections to the code and the baselines datasets.

mguerrajordao · 2022-11-21T05:50:00Z

@SimonBiggs
I am trying to convert whatever is necessary back to pass the regression tests. Working on it. I'll have to modify/add the last step of conversion with some individual conversions in order to match the original dateset. Then we can move forward from there.

sjswerdloff · 2022-11-21T05:51:34Z

But now, that tool is no longer available, and having these column labels be wrong for compatibility is no longer the right choice I believe.

I was under the impression the tool was still available, but I have no idea what a clinical site has to do to get it.
Which suggests that one might want to provide some mechanism for retaining that compatibility.
On the other hand, having someone hand edit their CSV or have an outboard conversion for the labelling isn't that big a deal.

mguerrajordao · 2022-11-21T06:07:04Z

Corrected the difference on "Dose/Raw value (1/64th Mu)" by applying an offset... Now the test passes up to trf (version 1 files where 350 columns are expected):

dataframe["Dose/Raw value (1/64th Mu)"] = dataframe[
"Dose/Raw value (1/64th Mu)"
].apply(lambda x: x + 2**16 if x < 0 else x)

However now failing on the version 3 (Int 4). The converted dataframe has 351 (+1 extra column due to the timestamp), however the reference dataframe has 4 extra columns (from unknown 1 to 4). I will shim this part by splitting the int64 timestamp back into 4 * int16 values so we can continue...

mguerrajordao · 2022-11-21T07:17:13Z

@SimonBiggs @sjswerdloff
Some progress...

After shimming with unknown splitting int64 into 4*16 i was able to pass the test 😄 .
I will push the code into the branch. I've omitted some warnings on pylinac version and the virtualenv i have.
somehow poetry is clashing with another flask application i have (managed with pipenv for virtual environment). But i think it can be safely ignored.

Running pytest with cwd set to: /mnt/d/programming/pymedphys/lib/pymedphys

Test session starts (platform: linux, Python 3.10.8, pytest 7.1.3, pytest-sugar 0.9.5)
rootdir: /mnt/d/programming/pymedphys/lib/pymedphys
plugins: rerunfailures-10.2, hypothesis-5.49.0, sugar-0.9.5, anyio-3.6.1, Faker-15.3.2
collecting ...
 tests/trf/test_decode.py ✓                                                                                                                                                                                                                                                                                    100% ██████████
====================================================================================================================================================== warnings summary ======================================================================================================================================================

Results (36.33s):
       1 passed
     185 deselected
poetry run pymedphys dev tests -k test_decode --slow  34.12s user 3.89s system 94% cpu 40.038 total

Running pytest with cwd set to: /mnt/d/programming/pymedphys/lib/pymedphys

Test session starts (platform: linux, Python 3.10.8, pytest 7.1.3, pytest-sugar 0.9.5)
rootdir: /mnt/d/programming/pymedphys/lib/pymedphys
plugins: rerunfailures-10.2, hypothesis-5.49.0, sugar-0.9.5, anyio-3.6.1, Faker-15.3.2
collecting ...
 tests/trf/test_date_convert.py ✓                                                                                                                                                                                                                                                                              100% ██████████
====================================================================================================================================================== warnings summary ======================================================================================================================================================

Results (9.13s):
       1 passed
     185 deselected
poetry run pymedphys dev tests -k test_date_convert  7.55s user 3.59s system 88% cpu 12.645 total

SimonBiggs · 2022-11-21T07:20:46Z

Beautiful stuff @mguerrajordao :)

…ataframe columns. table.py: amended the factor of 10 on Table Isocentric, corrected from bipolar signed int16 on Dose/Raw value (1/64th of Mu) and split and dropped the int64 Timestamp Data column into 4 int16 named unknow1-4 columns

…o trf-robustness merging with differences for pytest pass.

SimonBiggs

This is absolutely beautiful stuff @mguerrajordao. It's brilliant to see TRF be taken into the next era of Integrity :).

Thank you so much @mguerrajordao. I have made a few stylistic like comments. This is almost ready for a merge :)

.vscode/settings.json

lib/pymedphys/_trf/decode/header.py

lib/pymedphys/_trf/decode/table.py

SimonBiggs · 2022-11-21T08:15:03Z

Also, heads up @mguerrajordao, running the following in the CI suite on GitHub actions fails for this PR:

poetry run pymedphys dev lint

So, it would be worth running that locally and making sure it passes.

SimonBiggs · 2022-11-21T08:30:21Z

Hi Marcelo,

Tried to send you a private email thanking you and asking for feedback, but it was blocked with the following message:

The response from the remote server was:
550 Administrative prohibition - envelope blocked - https://community.mimecast.com/docs/DOC-1369#550 [t1PY0WDfN5qt9Mhz9I8JzA.uk251]

mguerrajordao · 2022-11-21T08:43:13Z

Hi Simon,
I don't know why. I checked mimecast for flagged messages, and I see your attempt email. It says the envelope rejected with some details. We can use my private email.

mguerrajordao · 2022-11-21T08:45:23Z

Regarding the poetry lint poetry run pymedphys dev lint
It complains regarding some of the code. But then fails after scoring 9.98/10. i will investigate.

Linting with cwd set to:
    /mnt/d/programming/pymedphys

************* Module lib.pymedphys._streamlit.apps.metersetmap._trf
lib/pymedphys/_streamlit/apps/metersetmap/_trf.py:299:22: E1120: No value for argument 'header_table_contents' in function call (no-value-for-parameter)
************* Module lib.pymedphys._trf.decode.detect
lib/pymedphys/_trf/decode/detect.py:34:16: E1123: Unexpected keyword argument 'input_line_grouping' in function call (unexpected-keyword-arg)
lib/pymedphys/_trf/decode/detect.py:34:16: E1123: Unexpected keyword argument 'input_linac_state_codes_column' in function call (unexpected-keyword-arg)
lib/pymedphys/_trf/decode/detect.py:34:16: E1123: Unexpected keyword argument 'reference_state_code_keys' in function call (unexpected-keyword-arg)
lib/pymedphys/_trf/decode/detect.py:34:16: E1120: No value for argument 'version' in function call (no-value-for-parameter)
lib/pymedphys/_trf/decode/detect.py:34:16: E1120: No value for argument 'item_parts_length' in function call (no-value-for-parameter)
lib/pymedphys/_trf/decode/detect.py:34:16: E1120: No value for argument 'item_parts' in function call (no-value-for-parameter)

-----------------------------------
Your code has been rated at 9.98/10

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/mnt/d/programming/pymedphys/lib/pymedphys/cli/__init__.py", line 142, in pymedphys_cli
    args.func(args, remaining)
  File "/mnt/d/programming/pymedphys/lib/pymedphys/_dev/tests.py", line 200, in run_pylint
    subprocess.check_call(command)
  File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/home/mguerrajordao/.cache/pypoetry/virtualenvs/pymedphys-l94ZSRXe-py3.10/bin/python', '-m', 'pylint', 'pymedphys', '--rcfile=/mnt/d/programming/pymedphys/lib/pymedphys/.pylintrc']' returned non-zero exit status 2.
poetry run pymedphys dev lint  81.54s user 14.68s system 48% cpu 3:16.68 total

…ded for the processing on table.py. run pytest will all tests marked as slow. all pass

SimonBiggs

Hi @mguerrajordao,

Beautiful stuff :). I have a few remaining "nit-pick" comments. Which you are free to ignore if you choose. This has my approval :).

Before merging the changelog file at the top of the repo needs to be updated.

lib/pymedphys/_trf/decode/header.py

SimonBiggs · 2022-11-21T20:30:31Z

lib/pymedphys/_trf/decode/table.py

@@ -12,197 +12,83 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-from typing import List
+# from typing import List


Suggested change

# from typing import List

SimonBiggs · 2022-11-21T20:34:18Z

lib/pymedphys/_trf/decode/table.py

-    decoded_rows, column_adjustment_key = decode_rows(trf_table_contents)
+    version = header_table_contents["version"].values[0].astype(int)
+    item_parts_length = header_table_contents["item_parts_length"].values[0].astype(int)
+    item_parts = header_table_contents["item_parts"].values[0]


Would it be worth doing this header type conversion back right after the header has been parsed. As in type fixing of the header contents is likely the job of the "decode header" function, not the "decode table" function.

@SimonBiggs I'm not sure i understand. do you mean passing the variables (version, item_parts_length, item_parts) into the function decode_trf_table, instead of passing header_table_contents?

I guess where I am confused is with the following piece:

.astype(int)

My thought is that when the header_table_contents is created, what if it was made int right away? This would mean that it wouldn't be the job of downstream functions to convert it to int.

I do actually believe you're doing this already, at least for item_parts_length:

pymedphys/lib/pymedphys/_trf/decode/header.py

Line 162 in d4f9900

item_parts_length = int(len(item_parts))

And potentially .astype(int) isn't needed for version either?

pymedphys/lib/pymedphys/_trf/decode/header.py

Line 154 in d4f9900

version = np.frombuffer(groups[4][8:12], dtype=np.int32).item()

I won't block merge on this though. Really just a "nit-pick". I'll merge after the tests pass, and you can opt to make this change in the next PR if you want to.

lib/pymedphys/_trf/decode/table.py

SimonBiggs · 2022-11-21T20:41:48Z

lib/pymedphys/_trf/decode/trf2pandas.py

@@ -57,7 +57,8 @@ def trf2pandas(trf: path_or_binary_file) -> Tuple["pd.DataFrame", "pd.DataFrame"

    trf_header_contents, trf_table_contents = split_into_header_table(trf_contents)
    header_dataframe = header_as_dataframe(trf_header_contents)
-    table_dataframe = decode_trf_table(trf_table_contents)
+    # table_dataframe = decode_trf_table(trf_table_contents)


Suggested change

# table_dataframe = decode_trf_table(trf_table_contents)

Co-authored-by: Simon Biggs <simon.biggs@radiotherapy.ai>

mguerrajordao · 2022-11-22T01:56:48Z

@SimonBiggs Please see comments. Thanks for merging.
Just not too sure on:

Would it be worth doing this header type conversion back right after the header has been parsed. As in type fixing of the header contents is likely the job of the "decode header" function, not the "decode table" function.

SimonBiggs · 2022-11-22T02:37:42Z

Amazing stuff @mguerrajordao! 🎉 Congrats on becoming a PyMedPhys contributor 🙂. Absolutely wonderful to have your contribution 🙂

Next thing to add to your to-do list is to add yourself to the contributors list 🙂. Put yourself below Derek and above Jake:

https://github.com/pymedphys/pymedphys/blame/main/README.rst#L154

mguerrajordao and others added 3 commits November 2, 2022 00:36

my first commit for pull request on trf robustness. header decoding a…

cdf8442

…nd trf table creation modified. run tests and try to maintain the interface on trf2pandas

remove debug prints.

b15e60a

Update imports.py

d719068

SimonBiggs marked this pull request as draft November 18, 2022 04:31

Update imports.py

db00fb5

SimonBiggs reviewed Nov 18, 2022

View reviewed changes

lib/pymedphys/_trf/decode/table.py Outdated Show resolved Hide resolved

lib/pymedphys/_trf/decode/table.py Outdated Show resolved Hide resolved

lib/pymedphys/_imports/imports.py Outdated Show resolved Hide resolved

lib/pymedphys/_imports/imports.py Outdated Show resolved Hide resolved

SimonBiggs added 4 commits November 18, 2022 15:58

Update lib/pymedphys/_trf/decode/table.py

d919ee7

Update lib/pymedphys/_trf/decode/table.py

ace48b6

Update lib/pymedphys/_imports/imports.py

04350cc

Update lib/pymedphys/_imports/imports.py

999ad44

sjswerdloff approved these changes Nov 19, 2022

View reviewed changes

mguerrajordao added 2 commits November 21, 2022 15:30

Merge remote-tracking branch 'refs/remotes/origin/trf-robustness' int…

f216855

…o trf-robustness merging with differences for pytest pass.

SimonBiggs requested changes Nov 21, 2022

View reviewed changes

.vscode/settings.json Show resolved Hide resolved

lib/pymedphys/_trf/decode/header.py Outdated Show resolved Hide resolved

lib/pymedphys/_trf/decode/header.py Show resolved Hide resolved

lib/pymedphys/_trf/decode/table.py Outdated Show resolved Hide resolved

ensure lint passes, remove code commented previously which is not nee…

1b6d6b9

…ded for the processing on table.py. run pytest will all tests marked as slow. all pass

SimonBiggs mentioned this pull request Nov 21, 2022

> Somehow i noticed that the auto-formatting was not accordingly to the old code #1774

Open

SimonBiggs marked this pull request as ready for review November 21, 2022 20:21

SimonBiggs approved these changes Nov 21, 2022

View reviewed changes

mguerrajordao and others added 2 commits November 22, 2022 09:50

Update lib/pymedphys/_trf/decode/header.py

1bc1925

Co-authored-by: Simon Biggs <simon.biggs@radiotherapy.ai>

Update lib/pymedphys/_trf/decode/table.py

d4f9900

Co-authored-by: Simon Biggs <simon.biggs@radiotherapy.ai>

SimonBiggs merged commit 8b9215d into main Nov 22, 2022

SimonBiggs deleted the trf-robustness branch November 22, 2022 02:38

SimonBiggs mentioned this pull request Nov 22, 2022

Improving TRF robustness #1765

Open

Matthew-Jennings mentioned this pull request Mar 12, 2024

0.40.0 release prep #1825

Merged

Trf robustness #1773

Trf robustness #1773

Conversation

mguerrajordao commented Nov 18, 2022

SimonBiggs left a comment

Choose a reason for hiding this comment

sjswerdloff left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SimonBiggs commented Nov 21, 2022

mguerrajordao commented Nov 21, 2022

sjswerdloff commented Nov 21, 2022

mguerrajordao commented Nov 21, 2022

sjswerdloff commented Nov 21, 2022

mguerrajordao commented Nov 21, 2022

mguerrajordao commented Nov 21, 2022

mguerrajordao commented Nov 21, 2022

SimonBiggs commented Nov 21, 2022

SimonBiggs commented Nov 21, 2022 • edited

mguerrajordao commented Nov 21, 2022

sjswerdloff commented Nov 21, 2022

mguerrajordao commented Nov 21, 2022

mguerrajordao commented Nov 21, 2022

SimonBiggs commented Nov 21, 2022

SimonBiggs left a comment

Choose a reason for hiding this comment

SimonBiggs commented Nov 21, 2022 • edited

SimonBiggs commented Nov 21, 2022

mguerrajordao commented Nov 21, 2022

mguerrajordao commented Nov 21, 2022

SimonBiggs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SimonBiggs Nov 22, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mguerrajordao commented Nov 22, 2022

SimonBiggs commented Nov 22, 2022 • edited

SimonBiggs commented Nov 21, 2022 •

edited

SimonBiggs commented Nov 21, 2022 •

edited

SimonBiggs Nov 22, 2022 •

edited

SimonBiggs commented Nov 22, 2022 •

edited