Skip to content

MRG: Use mffpy for EGI MFF event reading#13932

Open
PragnyaKhandelwal wants to merge 7 commits into
mne-tools:mainfrom
PragnyaKhandelwal:egi-mff-events
Open

MRG: Use mffpy for EGI MFF event reading#13932
PragnyaKhandelwal wants to merge 7 commits into
mne-tools:mainfrom
PragnyaKhandelwal:egi-mff-events

Conversation

@PragnyaKhandelwal
Copy link
Copy Markdown
Contributor

Reference issue (if any)

None.

What does this implement/fix?

This replaces the internal EGI MFF event-reading path with mffpy and keeps the existing RawMff event contract intact. It also adds a small compatibility shim in mne/fixes.py for mffpy timestamp parsing so real-world MFF files with nanosecond fractional seconds can still be read correctly.

Additional information

I tested this change against the EGI MFF test module with the current testing dataset, and the full mne/io/egi/tests/test_egi.py suite passes locally.

This branch intentionally keeps the scope narrow. The separate changelog fragment should use the final PR number once the PR is opened on GitHub.

AI disclosure:
I used GitHub Copilot to help draft the PR text and review the scope.

@PragnyaKhandelwal
Copy link
Copy Markdown
Contributor Author

Hi @scott-huberty! Here is the first micro-PR for the event reader refactor, exactly as we discussed yesterday.
Quick status update on what this PR includes:
Refactored the EGI MFF event-reading path to use mffpy while keeping the existing RawMff event contract intact.
Removed legacy unused event helper code to address the Vulture dead-code warnings (commit b10ea86).
Added a temporary compatibility shim in mne/fixes.py for 9-digit fractional-second beginTime values (using defusedxml).
Kept the scope intentionally narrow to event-reading internals only.

Upstream Update:
I raised the issue here BEL-Public/mffpy#138 regarding the timestamp bug. The maintainer got back to me immediately and confirmed their pending PR #133 fixes this case! Once we review this PR and figure out the test failures, I will update the TODO comment in our shim to explicitly track their PR

Failing Checks:
The style checks are green now, but we are hitting some functional test failures in the CI. I am going to look into the logs to track down the mismatch, but if you have a minute to look at the failing CI checks and see if it's something obvious that I'm missing, I'd really appreciate the help!

@PragnyaKhandelwal PragnyaKhandelwal marked this pull request as ready for review May 29, 2026 13:55
Comment thread mne/io/egi/events.py
# mffpy.Reader for locating the Events.xml files inside the MFF.
_soft_import("mffpy", "reading EGI MFF data")
_soft_import("defusedxml", "reading EGI MFF data")
import defusedxml.ElementTree as DET
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you're using difusedxml instead of mffpy.XML.from_file( )?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for taking a look! There are two reasons for this:

MNE-Python has a strict internal policy requiring defusedxml for all XML parsing to protect against XML vulnerability attacks.

Because of the 9-digit fractional timestamp bug we discussed in issue BEL-Public/mffpy#138 calling mffpy.XML directly crashes the CI right now. I'm using defusedxml to route the parsing through a temporary shim in mne/fixes.py until your PR gets merged and released!

Copy link
Copy Markdown
Contributor

@pmolfese pmolfese May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The struggle is real as we say. This might be something more for @scott-huberty and @drammock to ponder beyond this GSOC.

The MFF reader in mffpy currently used for reading epochs or evoked files is via Reader() is already using the mffpy XML.from_file() at least for getting header information and events (mne/io/egi/egimff.py). So there's already some exposure to another XML reader in the current codebase.

The pros of keeping the mffpy way of things is just how easy some things are like returning typed objects (.sensors, .epochs, .events), recover=True for malformed XML, and the fact that Reader() is already using it.

for evfile in sorted(glob(op.join(input_fname, "Events_*.xml"))):
	track = XML.from_file(evfile)
	for event in track.events:
		code = event.get("code")
		if code is None:
			continue
...
...

I suppose we could have a conversation with BEL and other mffpy maintainers about switching the current lxml backend to defusedxml.

My thesis is currently: "if the point of this project is to use mffpy to parse MFF files, then you should use mffpy's way of doing things and if those need changing, then we should probably change those upstream". Perhaps the compromise could be using the XML.* functions for this with the goal of changing things upstream in mffpy.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - okay faster than expected I have a mostly working defusedxml backend for mffpy. Might be able to crank this out in a week or so.

Comment thread mne/io/egi/events.py Outdated
files_list = []
tracks = []
for xml_name in files_list:
if not xml_name.lower().endswith(".xml"):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will have you parsing all XML files instead of just event ones. Probably unnecessary file IO.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch here! You are completely right—parsing info.xml and the others here is totally unnecessary I/O. I will update this loop to explicitly filter for the event files (e.g., startswith("Events_") or similar) in my next commit. Thank you!

Comment thread mne/io/egi/events.py
if event_start is None:
continue
start_sec = (event_start - start_time).total_seconds()
code_str = ev.get("code", "")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are probably a couple checks you could add:

  1. codes are supposed to be 4 characters (e.g. "STIM") and if not, then you're probably not reading a real MFF file
  2. You probably also want to read in the labels for use if/when you add annotations

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are really great suggestions. Because this is the very first step of my GSoC project, my mentors requested that I keep this initial micro-PR strictly confined to 1:1 functional parity with MNE's legacy reader. The legacy reader didn't enforce the 4-character limit or map the labels, so I want to avoid expanding the scope just yet.
However, adding proper annotations is on my roadmap for Phase 2 here #13926, so I will definitely be referencing those labels when we get there!

@scott-huberty
Copy link
Copy Markdown
Contributor

Thx for taking a look at this @pmolfese !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants