Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

links_in_email includes doctype uri and namespace #72

Open
nicholaides opened this issue Sep 7, 2011 · 6 comments
Open

links_in_email includes doctype uri and namespace #72

nicholaides opened this issue Sep 7, 2011 · 6 comments

Comments

@nicholaides
Copy link

In HTML emails links_in_email will include the URIs specified in the Doctype URI, any HTML namespace tags.

e.g.

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html xmlns='http://www.w3.org/1999/xhtml' xml:lang='en' lang='en'>

This breaks click_first_link_in_email for HTML emails because the first link is http://www.w3.org/TR/html4/loose.dtd in this example.

I patched my version to this hacky solulution:

def links_in_email(email)
  URI.extract(email.default_part_body.to_s, ['http', 'https']).reject do |link|
    ['www.w3.org', 'www.openmobilealliance.org'].include? URI.parse(link).host
  end
end

An ideal solution would probably use Nokogiri to get the links from all the <a href=...> elements

@nicholaides
Copy link
Author

Here's a better solution:

def links_in_email(email)
  part = email.default_part
  if part.content_type =~ /text\/html/
    Nokogiri::HTML::Document.parse(part.body.to_s).search('a').map{|a| a[:href] }
  else
    URI.extract(part.body.to_s, ['http', 'https'])
  end
end

@iblue
Copy link

iblue commented Nov 28, 2011

URI.extract also does not work for URIs in single quotes (which are generated when using HAML).

URI.extract("<a href=3D'http://example.org/'>Click me.</a>") # => ["http://example.org/'"]

Using @nicholaides solution would require Nokogiri, so I would suggest using some simple regular expressions instead.

@TylerRick
Copy link

Thanks for the workaround, @nicholaides!

It would be nice if this could be fixed in this gem. Nokogiri seems like a much more reliable solution than regular expressions.

If the dependency on Nokogiri is a problem (it's certainly not for me), what about making it optional?

One could either require 'nokogiri' and rescue LoadError, or check for defined?(Nokogiri)... and only if it's loaded, use Nokogiri in links_in_email....

@bmabey
Copy link
Collaborator

bmabey commented Jun 4, 2013

Yeah, that sounds reasonable. I'll merge in a PR with optional Nokogiri support.

@coorasse
Copy link

this issue is pretty old, but still valid

@Systho
Copy link

Systho commented Mar 22, 2017

still valid and with a working fix, could you merge it ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants