Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[eml loader] The email loader doesn't import email.parser #2312

Closed
frosencrantz opened this issue Feb 13, 2024 · 8 comments
Closed

[eml loader] The email loader doesn't import email.parser #2312

frosencrantz opened this issue Feb 13, 2024 · 8 comments
Labels

Comments

@frosencrantz
Copy link
Contributor

frosencrantz commented Feb 13, 2024

Small description
The eml loader uses the Python built email.parser module, but does not import it.

Expected result
Load and read MIME messages.

Actual result with screenshot
If you get an unexpected error, please include the full stack trace that you get with Ctrl-E.

Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/visidata/threads.py", line 220, in _toplevelTryFunc
t.status = func(*args, **kwargs)
File "/usr/lib/python3.9/site-packages/visidata/sheets.py", line 260, in reload
self.loader()
File "/usr/lib/python3.9/site-packages/visidata/sheets.py", line 285, in loader
for r in self.iterload():
File "/usr/lib/python3.9/site-packages/visidata/loaders/eml.py", line 18, in iterload
parser = email.parser.Parser()
AttributeError: module 'email' has no attribute 'parser'

Steps to reproduce with sample data and a .vd

vd -f eml foo.mhtml

EDIT: add example command:

vd -z -f eml https://raw.githubusercontent.com/testimio/mhtml-parser/master/demos/wikipedia.mhtml

For this error you do not even need a valid email file or MIME file, since there is a missing import email.parser which fails when the code tries to parse the input

Additional context
Please include the version of VisiData and Python. Latest develop branch. Python 3.9.2

I was able to save a webpage as an MHTML file via my web browser. Since this is an MIME format, I though VisiData could read it. There is a missing import.

Even with the proper import in place, there is another bug that causes the code to not properly open or save individual file content from within the file. I only tried using an mhtml file.

@frosencrantz
Copy link
Contributor Author

frosencrantz commented Feb 15, 2024

Also, related is this line refers to a global that does not exist options. Maybe it should be vd.options?

https://github.com/saulpw/visidata/blob/develop/visidata/pyobj.py#L165

@frosencrantz
Copy link
Contributor Author

According to https://en.wikipedia.org/wiki/MHTML, the eml and mhtml file formats are the same. Feature Request, add mhtml as an alias of eml.

Searching GitHub for sample mhtml files https://github.com/search?q=path%3A%2F%5C.mhtml%24%2F&type=code , I found: https://github.com/testimio/mhtml-parser/tree/master/demos
Ex:

Trying to open these files appears broken, even if I try just to use x command to extract a file, it has errors.

vd -z -f eml https://raw.githubusercontent.com/testimio/mhtml-parser/master/demos/wikipedia.mhtml
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/visidata/basesheet.py", line 211, in execCommand
escaped = super().execCommand2(cmd, vdglobals=vdglobals)
File "/usr/lib/python3.9/site-packages/visidata/extensible.py", line 79, in wrappedfunc
r = oldfunc(*args, **kwargs)
File "/usr/lib/python3.9/site-packages/visidata/basesheet.py", line 76, in execCommand2
exec(code, vdglobals, LazyChainMap(vd, self))
File "extract-part", line 1, in <module>
'VisiData: a curses interface for exploring and arranging tabular data'
File "/usr/lib/python3.9/site-packages/visidata/_open.py", line 23, in inputPath
return Path(vd.inputFilename(*args, **kwargs))
File "/usr/lib/python3.9/site-packages/visidata/_open.py", line 18, in inputFilename
return vd.input(prompt, type="filename", *args, completer=completer, **kwargs).strip()
File "/usr/lib/python3.9/site-packages/visidata/_input.py", line 543, in input
ret = vd.editText(y, promptlen, w=w,
File "/usr/lib/python3.9/site-packages/visidata/_input.py", line 376, in editText
v = type(starting_value)(v)
TypeError: NoneType takes no arguments

It looks like https://github.com/saulpw/visidata/blob/develop/visidata/loaders/eml.py#L49, the cursorRow.get_filename() is returning None.

@saulpw
Copy link
Owner

saulpw commented Feb 18, 2024

There is an import email, and it works to use email.parser in Python 3.10. We can do import email.parser to make this work in 3.9, and I don't think there's any downside.

@saulpw
Copy link
Owner

saulpw commented Feb 18, 2024

I fixed the import and the other obvious error. I don't know about eml vs mhtml at the moment, I'll have to look into it later.

@frosencrantz
Copy link
Contributor Author

frosencrantz commented Feb 19, 2024

Thanks for the fix Saul!

I have some additional comments to show how to reproduce the other stack trace I was seeing.

Github has .eml files: https://github.com/search?q=path%3A%2F%5C.eml%24%2F&type=code

Here are a few:

I still see the error when using the x command for entries without a filename. I think it would be useful if when diving into a row, that it would open the file in VisiData, so you can read html files or other data files it contains. This is my interest. I have an mhtml file containing an html file with a table I would like to view with VisiData

Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/visidata/basesheet.py", line 211, in execCommand
escaped = super().execCommand2(cmd, vdglobals=vdglobals)
File "/usr/lib/python3.9/site-packages/visidata/extensible.py", line 79, in wrappedfunc
r = oldfunc(*args, **kwargs)
File "/usr/lib/python3.9/site-packages/visidata/basesheet.py", line 76, in execCommand2
exec(code, vdglobals, LazyChainMap(vd, self))
File "extract-part", line 1, in <module>
'VisiData: a curses interface for exploring and arranging tabular data'
File "/usr/lib/python3.9/site-packages/visidata/_open.py", line 23, in inputPath
return Path(vd.inputFilename(*args, **kwargs))
File "/usr/lib/python3.9/site-packages/visidata/_open.py", line 18, in inputFilename
return vd.input(prompt, type="filename", *args, completer=completer, **kwargs).strip()
File "/usr/lib/python3.9/site-packages/visidata/_input.py", line 543, in input
ret = vd.editText(y, promptlen, w=w,
File "/usr/lib/python3.9/site-packages/visidata/_input.py", line 376, in editText
v = type(starting_value)(v)
TypeError: NoneType takes no arguments

https://asciinema.org/a/HENigRDqV7EOsvf6TM6uN1DAg

vd  https://raw.githubusercontent.com/guardian/grid/c7ed800a97eeed21251a915d193683b7668a6377/dev/config/usages.eml https://raw.githubusercontent.com/phiresky/ripgrep-all/179f7cb5ceb63eee7de1abbdde2b44e7fa7d100e/exampledir/mail_nested.eml

The main comment from https://en.wikipedia.org/wiki/MHTML:

The .mhtml (Web archive) and .eml (email) filename extensions are interchangeable: either filename extension can be changed from one to the other.

@frosencrantz
Copy link
Contributor Author

I know Wikipedia is always correct. However, even if you don't believe that, the proof is by looking at the sample "mhtml" files on GitHub and see that the VisiData eml loader can read these mhtml files.

@frosencrantz
Copy link
Contributor Author

I’m going to close this bug and create two new issues, a FR and a bug based on the last comments from this issue. Thank you for fixing the original issue.

@frosencrantz
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants