Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URLs line-wrapped and therefore broken when using text format emails #96

Closed
jikamens opened this issue Feb 5, 2020 · 11 comments · Fixed by #172
Closed

URLs line-wrapped and therefore broken when using text format emails #96

jikamens opened this issue Feb 5, 2020 · 11 comments · Fixed by #172
Labels
good first issue An easy task to tackle if you want to contribute. has:solution is:bug Nasty little bugger.

Comments

@jikamens
Copy link
Contributor

jikamens commented Feb 5, 2020

When emails are being sent in text format instead of HTML, links are converted to markdown syntax and then the body of the text is line-wrapped. The line-wrapping doesn't treat links in any special way, which means that links with hyphens in them can get wrapped at the hyphen, thus breaking the link.

@Profpatsch Profpatsch added is:bug Nasty little bugger. good first issue An easy task to tackle if you want to contribute. labels Feb 6, 2020
@f6k
Copy link

f6k commented Feb 24, 2020

Hello. I have the same issue. I digged a little and this is what I found.

Actually the thing is that html2text (aka html2markdown) (version 2018.1.9 here) have defaults to put links inline in the document and to wrap links. This can be seen in its config.py. On my box, Debian, it's located in /usr/lib/python3/dist-packages/html2text/config.py where we can read that :

# Use inline, rather than reference, formatting for images and links
INLINE_LINKS = True
(...)
WRAP_LINKS = True

If we jump to the config.py of rss2email we see that nothing is declared for those variables. The funny thing is that there is actually something for putting the links after each paragraph, but since, by default, html2text put links inline, this one is obviously ineffective. My workaround would be to add two lines in config.py of rss2email under the def setup_html2text(self, section ='DEFAULT') about INLINE_LINKS and WRAP_LINKS:

(...)
_html2text.LINKS_EACH_PARAGRAPH = self.getboolean(section, 'links-after-each-paragraph')
_html2text.BODY_WIDTH = self.getint(section, 'body-width')
_html2text.INLINE_LINKS = self.getboolean(section, 'inline-links')
_html2text.WRAP_LINKS = self.getboolean(section, 'wrap-links')

Something should be added too under ## html2text options lower in the config.py, according to html2text defaults:

# Put the links inline rather than references
('inline-links', str(True)),
# Wrap links according to body width
('wrap-links', str(True)),

And then, in ~/.config/rss2email.cfg, having something like that under [DEFAULT]:

inline-links = False
links-after-each-paragraph = True
wrap-links = False

So, as a result, we should NOT have inline links but unwrapped links put as references at the end of each paragraph. One could play with inline-links and wrap-links to have the links inline in the document and not wrapped. Actually I did tested that directly from command line with:

html2markdown -b 72 --no-wrap-links something.html

and it' working very well. No reason that it shouldn't from rss2email calls to html2text.

Note that those modifications in config.py and rss2email.cfg has not been properly tested yet. I'll make an update if I find more.

By the way, while I'm at it. At first, even if the links are inline, I thinked that I could just put body-width = 0 so the text (and then the links inline) should not be wrapped. But it doesn't work. I'm still searching why rss2email doesn't properly take my modification for body-width. Whatever I put in my rss2email.cfg it sticks to the default declared by html2text (BODY_WIDTH = 78).

@Profpatsch
Copy link
Contributor

@f6k do you want to prepare a patch?

@f6k
Copy link

f6k commented Feb 24, 2020

No, no need to. I was able to test my idea but it doesn't work at all. It's no suprise though since even changing the value of body-width in rss2email.cfg doesn't make a thing.

I think that there's something wrong in the way rss2email calls html2text. I tried few things (even tried to hack feed.py) based on codes examples I found but my python skills are near zero. I will continue to try though.

EDIT: actually, it's not "I think". I can say it: there's something wrong in the way rss2email calls html2text. Just to see, I hacked html2text/config.py and after calling again r2e I had the results I wanted (i.e. links no wrapped, body width to some other value than 78, no inline links but references links, etc.). Now let's see what can be done so thatrss2email use correctly html2text.

@auouymous
Copy link
Contributor

#103 fixes the html2text options and this change works fine when using _html2text.config.*.

@Ekleog
Copy link
Member

Ekleog commented Sep 11, 2020

AFAIU this has been fixed by #103, so I'm closing, feel free to poke me if I was wrong and this is actually still a problem :)

@Ekleog Ekleog closed this as completed Sep 11, 2020
@auouymous
Copy link
Contributor

@Ekleog #103 fixed the other html2text options, it didn't add these two new options.

@f6k The following patch works fine if you want to open a PR.

diff --git a/r2e.1 b/r2e.1
index cc50a50..302796c 100644
--- a/r2e.1
+++ b/r2e.1
@@ -232,6 +232,10 @@ Put the links after each paragraph instead of at the end.
 .IP body-width
 Wrap long lines at position. Any negative value for no wrapping, 0 for 78 width
 (compatibility), or any positive width.
+.IP inline-links
+Put the links inline rather than references.
+.IP wrap-links
+Wrap links according to body width.
 .RE
 .SS Mailing
 .IP email-protocol
diff --git a/rss2email/config.py b/rss2email/config.py
index 9070194..da746a1 100644
--- a/rss2email/config.py
+++ b/rss2email/config.py
@@ -69,6 +69,8 @@ class Config (_configparser.ConfigParser):
         # hack to prevent breaking the default in every existing config file
         body_width = self.getint(section, 'body-width')
         _html2text.config.BODY_WIDTH = 0 if body_width < 0 else 78 if body_width == 0 else body_width
+        _html2text.config.INLINE_LINKS = self.getboolean(section, 'inline-links')
+        _html2text.config.WRAP_LINKS = self.getboolean(section, 'wrap-links')
 
 
 CONFIG = Config()
@@ -201,6 +203,10 @@ CONFIG['DEFAULT'] = _collections.OrderedDict((
         # Wrap long lines at position.
         # Any negative value for no wrapping, 0 for 78 width (compatibility), or any positive width.
         ('body-width', str(0)),
+        # Put the links inline rather than references.
+        ('inline-links', str(True)),
+        # Wrap links according to body width.
+        ('wrap-links', str(True)),
 
         ### Mailing
         # Select protocol from: sendmail, smtp, imap

@Ekleog
Copy link
Member

Ekleog commented Sep 13, 2020

Sorry about that, reopening, and thank you for the patch! :)

@Ekleog Ekleog reopened this Sep 13, 2020
@auouymous
Copy link
Contributor

The patch is technically f6k's work from this issue, I just added the #103 fix and copied his words to the man page to help speed up the PR.

@f6k
Copy link

f6k commented Sep 19, 2020

@auouymous sorry for the delay in my answer! tanks for the patch and the work on it!

@f6k The following patch works fine if you want to open a PR.

well, i'd be glad if you open a PR; do you think you could do it please? since you seems to have all working on your side, it'll be easier i guess? thank you very much

@Ekleog
Copy link
Member

Ekleog commented Mar 19, 2021

Hmm so it looks to me like #106 that recently landed already does half of this change. Would it be possible for someone to take this patch, turn it into a PR, add a test like #106 did (probably using the tooling introduced by #159), add a changelog entry and check that CI passes? :)

@auouymous
Copy link
Contributor

@Ekleog On it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue An easy task to tackle if you want to contribute. has:solution is:bug Nasty little bugger.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants