MRG: Use mffpy for EGI MFF event reading#13932
Conversation
|
Hi @scott-huberty! Here is the first micro-PR for the event reader refactor, exactly as we discussed yesterday. Upstream Update: Failing Checks: |
| # mffpy.Reader for locating the Events.xml files inside the MFF. | ||
| _soft_import("mffpy", "reading EGI MFF data") | ||
| _soft_import("defusedxml", "reading EGI MFF data") | ||
| import defusedxml.ElementTree as DET |
There was a problem hiding this comment.
Is there a reason you're using difusedxml instead of mffpy.XML.from_file( )?
There was a problem hiding this comment.
Thanks so much for taking a look! There are two reasons for this:
MNE-Python has a strict internal policy requiring defusedxml for all XML parsing to protect against XML vulnerability attacks.
Because of the 9-digit fractional timestamp bug we discussed in issue BEL-Public/mffpy#138 calling mffpy.XML directly crashes the CI right now. I'm using defusedxml to route the parsing through a temporary shim in mne/fixes.py until your PR gets merged and released!
There was a problem hiding this comment.
The struggle is real as we say. This might be something more for @scott-huberty and @drammock to ponder beyond this GSOC.
The MFF reader in mffpy currently used for reading epochs or evoked files is via Reader() is already using the mffpy XML.from_file() at least for getting header information and events (mne/io/egi/egimff.py). So there's already some exposure to another XML reader in the current codebase.
The pros of keeping the mffpy way of things is just how easy some things are like returning typed objects (.sensors, .epochs, .events), recover=True for malformed XML, and the fact that Reader() is already using it.
for evfile in sorted(glob(op.join(input_fname, "Events_*.xml"))):
track = XML.from_file(evfile)
for event in track.events:
code = event.get("code")
if code is None:
continue
...
...
I suppose we could have a conversation with BEL and other mffpy maintainers about switching the current lxml backend to defusedxml.
My thesis is currently: "if the point of this project is to use mffpy to parse MFF files, then you should use mffpy's way of doing things and if those need changing, then we should probably change those upstream". Perhaps the compromise could be using the XML.* functions for this with the goal of changing things upstream in mffpy.
There was a problem hiding this comment.
Hmm - okay faster than expected I have a mostly working defusedxml backend for mffpy. Might be able to crank this out in a week or so.
| files_list = [] | ||
| tracks = [] | ||
| for xml_name in files_list: | ||
| if not xml_name.lower().endswith(".xml"): |
There was a problem hiding this comment.
This will have you parsing all XML files instead of just event ones. Probably unnecessary file IO.
There was a problem hiding this comment.
Great catch here! You are completely right—parsing info.xml and the others here is totally unnecessary I/O. I will update this loop to explicitly filter for the event files (e.g., startswith("Events_") or similar) in my next commit. Thank you!
| if event_start is None: | ||
| continue | ||
| start_sec = (event_start - start_time).total_seconds() | ||
| code_str = ev.get("code", "") |
There was a problem hiding this comment.
There are probably a couple checks you could add:
- codes are supposed to be 4 characters (e.g. "STIM") and if not, then you're probably not reading a real MFF file
- You probably also want to read in the labels for use if/when you add
annotations
There was a problem hiding this comment.
These are really great suggestions. Because this is the very first step of my GSoC project, my mentors requested that I keep this initial micro-PR strictly confined to 1:1 functional parity with MNE's legacy reader. The legacy reader didn't enforce the 4-character limit or map the labels, so I want to avoid expanding the scope just yet.
However, adding proper annotations is on my roadmap for Phase 2 here #13926, so I will definitely be referencing those labels when we get there!
|
Thx for taking a look at this @pmolfese !! |
Reference issue (if any)
None.
What does this implement/fix?
This replaces the internal EGI MFF event-reading path with
mffpyand keeps the existingRawMffevent contract intact. It also adds a small compatibility shim inmne/fixes.pyformffpytimestamp parsing so real-world MFF files with nanosecond fractional seconds can still be read correctly.Additional information
I tested this change against the EGI MFF test module with the current testing dataset, and the full
mne/io/egi/tests/test_egi.pysuite passes locally.This branch intentionally keeps the scope narrow. The separate changelog fragment should use the final PR number once the PR is opened on GitHub.
AI disclosure:
I used GitHub Copilot to help draft the PR text and review the scope.