Parse RTF attachments/bodies #26

jrideout · 2018-11-28T06:24:59Z

Extract rtfbody (Extract RTF Bodies #30)
Add CLI support (Add RTF output option to tnefparse #39)
~~Parse the rtf in some way, perhaps via an optional dependency.~~

petri · 2018-11-28T11:54:48Z

What does this mean exactly? Same as tnefparse --htmlbody but for RTF bodies? I seem to remember from a long time ago that RTF is indeed embedded/wrapped in some funky way in TNEF...

jrideout · 2018-11-28T23:26:03Z

Exactly, we'll want to support tnefparse --rtfbody

jrideout · 2018-11-28T23:29:42Z

The one thing I'm not certain about is if we need to decompress the rtf, or if the rtf data is valid even when compressed. https://github.com/delimitry/compressed_rtf seems to do what we need to just decompress the data without fully parsing it.

petri · 2018-11-29T09:39:17Z

Nice! That's a small dependency well worth it I'd think.

jrideout · 2018-11-30T15:37:58Z

This does what I want for RTF parsing: https://gist.github.com/gilsondev/7c1d2d753ddb522e7bc22511cfb08676

I'd rather not add a dependency for a full rtf parser. Should we include this file in our source, or just leave it outside the scope of the project?

petri · 2018-11-30T19:06:04Z

Hm. I consider document format conversions to be outside the scope, but some limited use cases might fall on the borderline. If I may ask, what's the goal - just support extraction of plaintext words for indexing, or something else?

jrideout · 2018-11-30T21:21:51Z

what's the goal - just support extraction of plaintext words for indexing

just that

I consider document format conversions to be outside the scope

I agree. Let's stop here. Users can do their own RTF parsing if desired.

…ing_from_with_multiple_addresses BP-205: Fix header From: parsing when multiple addresses exist

petri · 2018-12-01T17:13:30Z

I gave this some more thought. Conversions of tnef body content in general are out of scope of tnefparse.

But I am pretty sure I remember the RTF/HTML bodies are to some extent specific to the MS TNEF implementations, with some quirks and deviations. That makes me think extraction of plaintext is something that's within the scope here.

So that can be revisited.

petri added the enhancement label Nov 28, 2018

jrideout mentioned this issue Nov 30, 2018

Parse all attribute level properties #38

Merged

jrideout closed this as completed Nov 30, 2018

jrideout pushed a commit to agaridata/tnefparse that referenced this issue Nov 30, 2018

Merge pull request koodaamo#26 from agaridata/feature/BP-205_fix_pars…

b4146ea

…ing_from_with_multiple_addresses BP-205: Fix header From: parsing when multiple addresses exist

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse RTF attachments/bodies #26

Parse RTF attachments/bodies #26

jrideout commented Nov 28, 2018 •

edited

Loading

petri commented Nov 28, 2018 •

edited

Loading

jrideout commented Nov 28, 2018

jrideout commented Nov 28, 2018

petri commented Nov 29, 2018

jrideout commented Nov 30, 2018 •

edited

Loading

petri commented Nov 30, 2018

jrideout commented Nov 30, 2018 •

edited

Loading

petri commented Dec 1, 2018 •

edited

Loading

Parse RTF attachments/bodies #26

Parse RTF attachments/bodies #26

Comments

jrideout commented Nov 28, 2018 • edited Loading

petri commented Nov 28, 2018 • edited Loading

jrideout commented Nov 28, 2018

jrideout commented Nov 28, 2018

petri commented Nov 29, 2018

jrideout commented Nov 30, 2018 • edited Loading

petri commented Nov 30, 2018

jrideout commented Nov 30, 2018 • edited Loading

petri commented Dec 1, 2018 • edited Loading

jrideout commented Nov 28, 2018 •

edited

Loading

petri commented Nov 28, 2018 •

edited

Loading

jrideout commented Nov 30, 2018 •

edited

Loading

jrideout commented Nov 30, 2018 •

edited

Loading

petri commented Dec 1, 2018 •

edited

Loading