Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add E-mail and MIME lexer #1246

Merged
merged 12 commits into from
Nov 16, 2019
Merged

Add E-mail and MIME lexer #1246

merged 12 commits into from
Nov 16, 2019

Conversation

tzing
Copy link
Contributor

@tzing tzing commented Oct 29, 2019

This PR add lexer to parse raw E-mail source and MIME data.

It was originally designed for E-mail, but MIME format is common used in E-mail.
So a minimal MIME lexer is provided, it could handle the nested multipart format but it only processes few headers.
(Currently it only supports Content-Type and Content-Transfer-Encoding, which are necessary to recognize the data in the payload.)

Also, the multipart/form-data format, which is common used in HTTP transfer is not handled.

@Anteru Anteru self-assigned this Nov 10, 2019
@Anteru Anteru merged commit bd306cf into pygments:master Nov 16, 2019
@Anteru
Copy link
Collaborator

Anteru commented Nov 16, 2019

Queued for the next release, thanks!

@Anteru Anteru mentioned this pull request Feb 6, 2021
if not body.strip():
return 0.1

invalid_headers = MIMELexer.tokens["header"].sub("", header)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is raising an exception. What's the intention of sub("", header)? Is that meant to be re.sub or just a typo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this line is for removing all lines that could be a valid header.
Probably mistakenly treat MIMELexer.tokens["header"] as a re pattern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see MIMELexer.tokens["header"] is a list of tuples that contain regex strings. Should it loop over that list and return any regex matches against header?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this analyse_text for now. It is not easy to find a good implementation that is not too broad for such a generic format as MIME.

Anyone: If you have one, please open a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants