Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad Uri #938

Closed
irfancharania opened this issue Jul 27, 2015 · 25 comments · Fixed by #1125
Closed

Bad Uri #938

irfancharania opened this issue Jul 27, 2015 · 25 comments · Fixed by #1125

Comments

@irfancharania
Copy link
Contributor

The browser seems to be more forgiving with malformed links than ruby.

I've got a site that I'm trying to extract urls from and because the url is not properly formed, ruby just borks with a Error when fetching url: bad URI(is not URI?) and stops parsing the rest of the page.

Example:

{
  "expected_update_period_in_days": "2",
  "url": "https://dl.dropboxusercontent.com/u/28950293/test.html",
  "type": "html",
  "mode": "on_change",
  "extract": {
    "url": {
      "css": "a",
      "value": "@href"
    },
    "title": {
      "css": "a",
      "value": "normalize-space(.)"
    }
  }
}

Is there any way to use something like uri_escape available for liquid during the extraction?

At the very least, I think it should ignore the bad uri, and continue trying to parse the rest of the links on the page.

@knu
Copy link
Member

knu commented Jul 27, 2015

WebsiteAgent automatically parses the url value in a created event as URL, resolving it as a relative URL from the current page, and parsing is done strictly there. I think we could make it fall back to the original value, while I don't feel like applying uri_escape automatically.

An ugly workaround is to rename the key from url to something different and then make an EventFormattingAgent to transform the value through something like "url": "{{ key | uri_escape | uri_expand(base_uri) }}".

@irfancharania
Copy link
Contributor Author

No, uri_escape shouldn't be applied automatically as it might end up double escaping.

I was thinking maybe a flag to set the escape option but then that didn't make much sense either because this seems like a one-off case specifically for extracting url.

The ugly workaround doesn't quite work in this case... The problem is that once the WebsiteAgent comes across the malformed url, it stops extracting. It will only output everything before it encountered the bad url (and nothing after).

If the html is as follows:

    <ul>
        <li><a href="http://google.com">google</a></li>
        <li><a href="https://www.google.ca/search?q=some query">broken</a></li>
        <li><a href="https://www.google.ca/search?q=some%20query">escaped</a></li>
    </ul>

Extract should try and process as many as possible, not stop after it finds a bad url. Ignoring the second one and just logging it would suffice I think.

@knu
Copy link
Member

knu commented Jul 27, 2015

Did you actually try the workaround? The "rename the key from url to something different" part is essential because WebsiteAgent tries to parse a value as URI only if the key is "url". WebsiteAgent wouldn't stop after a URI error unless it found a bad URL in the "url" key.

@irfancharania
Copy link
Contributor Author

Sorry, I hadn't.
I just did and it does work.

@knu
Copy link
Member

knu commented Jul 27, 2015

OK, it's my turn to seek for a real fix. I still think it's an idea to first try parsing a value as URL and if it fails then fall back to escape unescaped characters in it and retry parsing. What do you think?

@irfancharania
Copy link
Contributor Author

I spoke too soon. While the first part works, the second part fails...
It encodes the whole url instead of just the query string.

Unless I misinterpretted something?
Here are my agents:

WebsiteAgent

{
  "expected_update_period_in_days": "2",
  "url": "https://dl.dropboxusercontent.com/u/28950293/test.html",
  "type": "html",
  "mode": "on_change",
  "extract": {
    "item_url": {
      "css": "a",
      "value": "@href"
    },
    "title": {
      "css": "a",
      "value": "normalize-space(.)"
    }
  }
}

EventFormattingAgent

{
  "instructions": {
    "url": "{{item_url | uri_escape | uri_expand(base_uri) }}"
  },
  "mode": "merge"
}

@knu
Copy link
Member

knu commented Jul 27, 2015

Ah, of course. uri_escape would escape the colon and slash as well. What's needed here is something that does what Ruby's URI.escape (which had been deprecated as "misfeature") would do.

@irfancharania
Copy link
Contributor Author

Running under Ruby 2.2.2, this issue goes away, and the WebsiteAgent runs as expected:

[
  {
    "url": "http://google.com",
    "title": "google"
  },
  {
    "url": "https://www.google.ca/search?q=some%20query",
    "title": "broken"
  },
  {
    "url": "https://www.google.ca/search?q=some%20query",
    "title": "escaped"
  }
]

@irfancharania
Copy link
Contributor Author

I've got a different problem now: unicode characters in url

For a url like this in a site: http://ko.wikipedia.org/wiki/위키백과:대문
I get an error stating URI must be ascii only

@cantino
Copy link
Member

cantino commented Jul 29, 2015

Do you know which Gem is producing that error?

@irfancharania
Copy link
Contributor Author

It seems the problem is this line.

Comment it out and the agent in the first post works...

Otherwise it gives this error:

I, [2015-07-29T16:11:12.466857 #20]  INFO -- : Fetching https://dl.dropboxusercontent.com/u/28950293/test.html
I, [2015-07-29T16:11:12.681116 #20]  INFO -- : Extracting html at a: ["http://google.com", "https://www.google.ca/search?q=some query", "https://www.google.ca/search?q=some%20query", "http://ko.wikipedia.org/wiki/위키백과:대문", "https://www.google.ca/search?q=위키백과:대문"]
I, [2015-07-29T16:11:12.684229 #20]  INFO -- : Extracting html at a: ["google", "broken", "escaped", "unicode url", "unicode param"]
I, [2015-07-29T16:11:12.685909 #20]  INFO -- : Storing new parsed result for 'AAA - Test Fetch': {"url"=>"http://google.com", "title"=>"google"}
I, [2015-07-29T16:11:12.716302 #20]  INFO -- : Storing new parsed result for 'AAA - Test Fetch': {"url"=>"https://www.google.ca/search?q=some%20query", "title"=>"broken"}
I, [2015-07-29T16:11:12.717489 #20]  INFO -- : Storing new parsed result for 'AAA - Test Fetch': {"url"=>"https://www.google.ca/search?q=some%20query", "title"=>"escaped"}
E, [2015-07-29T16:11:12.718580 #20] ERROR -- : Error when fetching url: URI must be ascii only "http://ko.wikipedia.org/wiki/\u{c704}\u{d0a4}\u{bc31}\u{acfc}:\u{b300}\u{bb38}"
/app/vendor/ruby-2.2.2/lib/ruby/2.2.0/uri/generic.rb:1100:in `rescue in merge'
/app/vendor/ruby-2.2.2/lib/ruby/2.2.0/uri/generic.rb:1097:in `merge'
/app/app/models/agents/website_agent.rb:320:in `block (3 levels) in check_url'
/app/app/models/agents/website_agent.rb:317:in `each'
/app/app/models/agents/website_agent.rb:317:in `block (2 levels) in check_url'
/app/app/models/agents/website_agent.rb:315:in `times'
/app/app/models/agents/website_agent.rb:315:in `block in check_url'
/app/vendor/bundle/ruby/2.2.0/gems/liquid-3.0.6/lib/liquid/context.rb:132:in `stack'
/app/app/models/agents/website_agent.rb:279:in `check_url'
/app/app/models/agents/website_agent.rb:266:in `block in check_urls'
/app/app/models/agents/website_agent.rb:265:in `each'
/app/app/models/agents/website_agent.rb:265:in `check_urls'
/app/app/models/agents/website_agent.rb:259:in `check'
/app/app/concerns/dry_runnable.rb:17:in `dry_run!'
/app/app/controllers/agents_controller.rb:53:in `dry_run'
/app/vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.2/lib/action_controller/metal/implicit_render.rb:4:in `send_action'
/app/vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.2/lib/abstract_controller/base.rb:198:in `process_action'
/app/vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.2/lib/action_controller/metal/rendering.rb:10:in `process_action'
/app/vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.2/lib/abstract_controller/callbacks.rb:20:in `block in process_action'
/app/vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.2/lib/active_support/callbacks.rb:117:in `call'
/app/vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.2/lib/active_support/callbacks.rb:117:in `call'
/app/vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.2/lib/active_support/callbacks.rb:555:in `block (2 levels) in compile'
/app/vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.2/lib/active_support/callbacks.rb:505:in `call'
/app/vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.2/lib/active_support/callbacks.rb:505:in `call'
/app/vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.2/lib/active_support/callbacks.rb:92:in `_run_callbacks'
/app/vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.2/lib/active_support/callbacks.rb:776:in `_run_process_action_callbacks'
/app/vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.2/lib/active_support/callbacks.rb:81:in `run_callbacks'
/app/vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.2/lib/abstract_controller/callbacks.rb:19:in `process_action'
/app/vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.2/lib/action_controller/metal/rescue.rb:29:in `process_action'
/app/vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.2/lib/action_controller/metal/instrumentation.rb:32:in `block in process_action'
/app/vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.2/lib/active_support/notifications.rb:164:in `block in instrument'
/app/vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.2/lib/active_support/notifications/instrumenter.rb:20:in `instrument'
/app/vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.2/lib/active_support/notifications.rb:164:in `instrument'
/app/vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.2/lib/action_controller/metal/instrumentation.rb:30:in `process_action'
/app/vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.2/lib/action_controller/metal/params_wrapper.rb:250:in `process_action'
/app/vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.2/lib/active_record/railties/controller_runtime.rb:18:in `process_action'
/app/vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.2/lib/abstract_controller/base.rb:137:in `process'
/app/vendor/bundle/ruby/2.2.0/gems/actionview-4.2.2/lib/action_view/rendering.rb:30:in `process'
/app/vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.2/lib/action_controller/metal.rb:196:in `dispatch'
/app/vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.2/lib/action_controller/metal/rack_delegation.rb:13:in `dispatch'
/app/vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.2/lib/action_controller/metal.rb:237:in `block in action'
/app/vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.2/lib/action_dispatch/routing/route_set.rb:74:in `call'
/app/vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.2/lib/action_dispatch/routing/route_set.rb:74:in `dispatch'
/app/vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.2/lib/action_dispatch/routing/route_set.rb:43:in `serve'
/app/vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.2/lib/action_dispatch/journey/router.rb:43:in `block in serve'
/app/vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.2/lib/action_dispatch/journey/router.rb:30:in `each'
/app/vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.2/lib/action_dispatch/journey/router.rb:30:in `serve'
/app/vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.2/lib/action_dispatch/routing/route_set.rb:819:in `call'
/app/vendor/bundle/ruby/2.2.0/gems/warden-1.2.3/lib/warden/manager.rb:35:in `block in call'
/app/vendor/bundle/ruby/2.2.0/gems/warden-1.2.3/lib/warden/manager.rb:34:in `catch'
/app/vendor/bundle/ruby/2.2.0/gems/warden-1.2.3/lib/warden/manager.rb:34:in `call'
/app/vendor/bundle/ruby/2.2.0/gems/rack-1.6.4/lib/rack/etag.rb:24:in `call'
/app/vendor/bundle/ruby/2.2.0/gems/rack-1.6.4/lib/rack/conditionalget.rb:38:in `call'
/app/vendor/bundle/ruby/2.2.0/gems/rack-1.6.4/lib/rack/head.rb:13:in `call'
/app/vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.2/lib/action_dispatch/middleware/params_parser.rb:27:in `call'
/app/vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.2/lib/action_dispatch/middleware/flash.rb:260:in `call'
/app/vendor/bundle/ruby/2.2.0/gems/rack-1.6.4/lib/rack/session/abstract/id.rb:225:in `context'
/app/vendor/bundle/ruby/2.2.0/gems/rack-1.6.4/lib/rack/session/abstract/id.rb:220:in `call'
/app/vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.2/lib/action_dispatch/middleware/cookies.rb:560:in `call'
/app/vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.2/lib/active_record/query_cache.rb:36:in `call'
/app/vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract/connection_pool.rb:649:in `call'
/app/vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.2/lib/action_dispatch/middleware/callbacks.rb:29:in `block in call'
/app/vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.2/lib/active_support/callbacks.rb:88:in `call'
/app/vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.2/lib/active_support/callbacks.rb:88:in `_run_callbacks'
/app/vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.2/lib/active_support/callbacks.rb:776:in `_run_call_callbacks'
/app/vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.2/lib/active_support/callbacks.rb:81:in `run_callbacks'
/app/vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.2/lib/action_dispatch/middleware/callbacks.rb:27:in `call'
/app/vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.2/lib/action_dispatch/middleware/remote_ip.rb:78:in `call'
/app/vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.2/lib/action_dispatch/middleware/debug_exceptions.rb:17:in `call'
/app/vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.2/lib/action_dispatch/middleware/show_exceptions.rb:30:in `call'
/app/vendor/bundle/ruby/2.2.0/gems/railties-4.2.2/lib/rails/rack/logger.rb:38:in `call_app'
/app/vendor/bundle/ruby/2.2.0/gems/railties-4.2.2/lib/rails/rack/logger.rb:20:in `block in call'
/app/vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.2/lib/active_support/tagged_logging.rb:68:in `block in tagged'
/app/vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.2/lib/active_support/tagged_logging.rb:26:in `tagged'
/app/vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.2/lib/active_support/tagged_logging.rb:68:in `tagged'
/app/vendor/bundle/ruby/2.2.0/gems/railties-4.2.2/lib/rails/rack/logger.rb:20:in `call'
/app/config/initializers/silence_worker_status_logger.rb:5:in `call_with_silence_worker_status'
/app/vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.2/lib/action_dispatch/middleware/request_id.rb:21:in `call'
/app/vendor/bundle/ruby/2.2.0/gems/rack-1.6.4/lib/rack/methodoverride.rb:22:in `call'
/app/vendor/bundle/ruby/2.2.0/gems/rack-1.6.4/lib/rack/runtime.rb:18:in `call'
/app/vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.2/lib/active_support/cache/strategy/local_cache_middleware.rb:28:in `call'
/app/vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.2/lib/action_dispatch/middleware/static.rb:113:in `call'
/app/vendor/bundle/ruby/2.2.0/gems/rack-1.6.4/lib/rack/sendfile.rb:113:in `call'
/app/vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.2/lib/action_dispatch/middleware/ssl.rb:24:in `call'
/app/vendor/bundle/ruby/2.2.0/gems/railties-4.2.2/lib/rails/engine.rb:518:in `call'
/app/vendor/bundle/ruby/2.2.0/gems/railties-4.2.2/lib/rails/application.rb:164:in `call'
/app/vendor/bundle/ruby/2.2.0/gems/railties-4.2.2/lib/rails/railtie.rb:194:in `public_send'
/app/vendor/bundle/ruby/2.2.0/gems/railties-4.2.2/lib/rails/railtie.rb:194:in `method_missing'
/app/vendor/bundle/ruby/2.2.0/gems/unicorn-4.8.3/lib/unicorn/http_server.rb:576:in `process_client'
/app/vendor/bundle/ruby/2.2.0/gems/unicorn-4.8.3/lib/unicorn/http_server.rb:670:in `worker_loop'
/app/vendor/bundle/ruby/2.2.0/gems/unicorn-4.8.3/lib/unicorn/http_server.rb:525:in `spawn_missing_workers'
/app/vendor/bundle/ruby/2.2.0/gems/unicorn-4.8.3/lib/unicorn/http_server.rb:140:in `start'
/app/vendor/bundle/ruby/2.2.0/gems/unicorn-4.8.3/bin/unicorn:126:in `<top (required)>'
/app/vendor/bundle/ruby/2.2.0/bin/unicorn:23:in `load'
/app/vendor/bundle/ruby/2.2.0/bin/unicorn:23:in `<main>'

@cantino
Copy link
Member

cantino commented Jul 30, 2015

Oh wow, that looks like the error is deep in the Ruby standard library. I wonder if escaping the unicode in some way would help? Alternatively, we could possibly call to_s on it and then concatenate the strings ourselves instead of using the URI library.

@irfancharania
Copy link
Contributor Author

Well, really, the sites should be encoding the urls properly... For my
particular use case, if huginn just skips the bad ones and carries on
trying to extract the rest, it's good enough.
On Jul 29, 2015 7:18 PM, "Andrew Cantino" notifications@github.com wrote:

Oh wow, that looks like the error is deep in the Ruby standard library. I
wonder if escaping the unicode in some way would help? Alternatively, we
could possibly call to_s on it and then concatenate the strings ourselves
instead of using the URI library.


Reply to this email directly or view it on GitHub
#938 (comment).

@cantino
Copy link
Member

cantino commented Jul 30, 2015

You could definitely add a rescue similar to #943

@hughwi
Copy link

hughwi commented Aug 17, 2015

I also have the same error - but with a different setup. I have tried both the RSS and the Website Agent, and the same result (error) - but different failures:

I have upgraded Ruby to 2.2.2 and also updated Huginn to the latest version from Git Master, but no luck.

The RSS agent is as follows:

{
"expected_update_period_in_days": "1",
"clean": "true",
"url": "[ "http://www.teekay.com/rss/pressrelease.aspx", "http://www.swireshipping.com/index.php?option=com_content&view=category&id=2&Itemid=27&format=feed&type=rss", "http://www.pacificbasin.com/en/global/rss_xml.php?cat=announcement", "http://www.ap-group.co.uk/feed/", "http://www.imo.org/Pages/PressBriefingsRSS.aspx", "http://www.incoships.com.au/rss_news.cfm", "http://www.iss-shipping.com/XML/issnewsfeed.xml", "http://www.gepowerconversion.com/feed/inspire", "http://www.nyk.com/english/release/top_news.xml", "http://www.shi.samsung.co.kr/Eng/Rss/pr.aspx", "http://feeds.feedburner.com/EcoMarinePowerNews", "http://www.brittany-ferries.co.uk/press-office?articleaction=rss", "http://www.gtt.fr/feed/", "http://panasonic.net/news/rss/topics.xml", "http://www.rotork.com/en/media/rss", "http://www.imageline.co.uk/news-rss.php?news_id=16", "http://www.khi.co.jp/english/news_atom/news.xml", "http://www.raytheon-anschuetz.com/?type=9817", "http://worldmaritimenews.com/feed", "http://maritime-connector.com/news-rss/", "http://www.seadiscovery.com/Services/MarineTechnologyNewsXML.aspx", "http://www.joc.com/rss/13/all/rss.xml", "http://www.rolls-royce.com/press/rss.jsp", "http://www.wartsila.com/ss/Satellite?childpagename=WCom%2FUtilities%2FRSS&p=1267106760296&packedargs=locale%3Den%26newstitle%3DAll%2Bnews%2Bfrom%2BW%25C3%25A4rtsil%25C3%25A4%26rsscategory%3Dall%26site%3DWartsila&pagename=WCom%2FCommon%2FThirdPartyWrapper", "http://www.mandieselturbo.com/web/rss.aspx", "http://www.imo.org/MediaCentre/PressBriefings/_layouts/listfeed.aspx?List=%7b1FC6FB5E-156F-4CF2-A9F2-CCF052B87DAC%7d", "http://www.elabor8.co.uk/feed/", "http://j-l-a.com/press-releases?format=feed&type=atom", "http://www.businessweek.com/feeds/most-popular.rss", "http://www.indiatimes.com/feeds/feeds.xml", "http://www.wartsila.com/ss/Satellite?childpagename=WCom%2FUtilities%2FRSS&p=1267106760296&packedargs=locale%3Den%26newstitle%3DMarine%2BSolutions%2BNews%26rsscategory%3DMarine%2BSolutions%26site%3DWartsila&pagename=WCom%2FCommon%2FThirdPartyWrapper", "http://www.corporate.man.eu/en/technicalpages/RSS-Feed.html", "http://www.daimler.com/rss/5-1-494337.xml", "http://www.ship-technology.com/feeds/news-feed.xml", "http://www.maritimedenmark.dk/rss.aspx", "http://www.channelnewsasia.com/rss/latest_cna_asiapac_rss.xml", "http://www.channelnewsasia.com/rss/latest_cna_biz_rss.xml", "http://rssfeeds.shell.com/shell_media_releases", "http://www.tognum.com/press/press-releases/?type=100", "http://maritimeglobalnews.com/rss", "http://www.maerskpress.com/rss/Latest-News", "http://www.scania.com/_inc/rss.aspx", "http://www.chevron.com/news/press/articles.rss", "http://www.westfalia-separator.com/home/backend.php", "http://blogactiv.eu/feed/rss/", "http://www.littelfuse.com/rss.xml", "http://www.getransportation.com/index.php?option=com_ijoomla_rss&act=xml&cat=15&feedtype=RSS2.0", "http://www.micanti.com/feed/", "http://finance.thestandard.com.hk/feed/breaking.xml", "http://www.businesswire.com/portal/site/home/news/industry/?vnsId=31121", "http://tentea.ec.europa.eu/scripts/rss.php?channel=112470000004394", "http://www.kline.co.jp/en/news/news.xml", "http://www.agv.gr/?feed=rss2", "http://ir.horizonlines.com/corporate.rss?c=188937&Rule=Cat=news~subcat=ALL", "http://www.vanoord.com/news/rss.xml", "http://www.inmarsat.com/rss/index.htm?type=pressrelease", "http://www.tognum.com/index.php?id=1698&type=100&tt_news[category]=20", "http://www.zim.com/newsrss.xml", "https://www.dynamar.com/news/rss", "http://www.minervamarine.com/rss.xml" ]"
}

The error I get is:

E, [2015-08-17T15:41:07.077118 #20] ERROR -- : Failed to fetch [ARRAY OF URLS AS ABOVE] with message 'bad URI(is not URI?): [ "http://www.teekay.com/rss/pressrelease.aspx", "http://www.swireshipping.com/index.php':

For the Website agent, I get the following:

{
"expected_update_period_in_days": "1",
"url": "["http://www.teekay.com/rss/pressrelease.aspx","http://www.swireshipping.com/index.php?option=com_content&view=category&id=2&Itemid=27&format=feed&type=rss","http://www.pacificbasin.com/en/global/rss_xml.php?cat=announcement","http://www.ap-group.co.uk/feed/","http://www.imo.org/Pages/PressBriefingsRSS.aspx","http://www.incoships.com.au/rss_news.cfm","http://www.iss-shipping.com/XML/issnewsfeed.xml","http://www.gepowerconversion.com/feed/inspire","http://www.nyk.com/english/release/top_news.xml","http://www.shi.samsung.co.kr/Eng/Rss/pr.aspx","http://feeds.feedburner.com/EcoMarinePowerNews","http://www.brittany-ferries.co.uk/press-office?articleaction=rss","http://www.gtt.fr/feed/","http://panasonic.net/news/rss/topics.xml","http://www.rotork.com/en/media/rss","http://www.imageline.co.uk/news-rss.php?news_id=16","http://www.khi.co.jp/english/news_atom/news.xml","http://www.raytheon-anschuetz.com/?type=9817","http://worldmaritimenews.com/feed","http://maritime-connector.com/news-rss/","http://www.seadiscovery.com/Services/MarineTechnologyNewsXML.aspx","http://www.joc.com/rss/13/all/rss.xml","http://www.rolls-royce.com/press/rss.jsp","http://www.wartsila.com/ss/Satellite?childpagename=WCom%2FUtilities%2FRSS&p=1267106760296&packedargs=locale%3Den%26newstitle%3DAll%2Bnews%2Bfrom%2BW%25C3%25A4rtsil%25C3%25A4%26rsscategory%3Dall%26site%3DWartsila&pagename=WCom%2FCommon%2FThirdPartyWrapper","http://www.mandieselturbo.com/web/rss.aspx","http://www.imo.org/MediaCentre/PressBriefings/_layouts/listfeed.aspx?List=%7b1FC6FB5E-156F-4CF2-A9F2-CCF052B87DAC%7d","http://www.elabor8.co.uk/feed/","http://j-l-a.com/press-releases?format=feed&type=atom","http://www.businessweek.com/feeds/most-popular.rss","http://www.indiatimes.com/feeds/feeds.xml","http://www.wartsila.com/ss/Satellite?childpagename=WCom%2FUtilities%2FRSS&p=1267106760296&packedargs=locale%3Den%26newstitle%3DMarine%2BSolutions%2BNews%26rsscategory%3DMarine%2BSolutions%26site%3DWartsila&pagename=WCom%2FCommon%2FThirdPartyWrapper","http://www.corporate.man.eu/en/technicalpages/RSS-Feed.html","http://www.daimler.com/rss/5-1-494337.xml","http://www.ship-technology.com/feeds/news-feed.xml","http://www.maritimedenmark.dk/rss.aspx","http://www.channelnewsasia.com/rss/latest_cna_asiapac_rss.xml","http://www.channelnewsasia.com/rss/latest_cna_biz_rss.xml","http://rssfeeds.shell.com/shell_media_releases","http://www.tognum.com/press/press-releases/?type=100","http://maritimeglobalnews.com/rss","http://www.maerskpress.com/rss/Latest-News","http://www.scania.com/_inc/rss.aspx","http://www.chevron.com/news/press/articles.rss","http://www.westfalia-separator.com/home/backend.php","http://blogactiv.eu/feed/rss/","http://www.littelfuse.com/rss.xml","http://www.getransportation.com/index.php?option=com_ijoomla_rss&act=xml&cat=15&feedtype=RSS2.0","http://www.micanti.com/feed/","http://finance.thestandard.com.hk/feed/breaking.xml","http://www.businesswire.com/portal/site/home/news/industry/?vnsId=31121","http://tentea.ec.europa.eu/scripts/rss.php?channel=112470000004394","http://www.kline.co.jp/en/news/news.xml","http://www.agv.gr/?feed=rss2","http://ir.horizonlines.com/corporate.rss?c=188937&Rule=Cat=news~subcat=ALL","http://www.vanoord.com/news/rss.xml","http://www.inmarsat.com/rss/index.htm?type=pressrelease","http://www.tognum.com/index.php?id=1698&type=100&tt_news[category]=20","http://www.zim.com/newsrss.xml","http://www.dynamar.com/news/rss","http://www.minervamarine.com/rss.xml"]",
"type": "xml",
"mode": "on_change",
"extract": {
"title": {
"css": "item title",
"value": ".//text()"
},
"url": {
"css": "item link",
"value": ".//text()"
},
"description": {
"css": "item description",
"value": ".//text()"
},
"pubDate": {
"css": "item pubDate",
"value": ".//text()"
}
}
}

E, [2015-08-17T15:43:14.839367 #17] ERROR -- : Ignoring a non-HTTP url: "["ESCAPED URL LIST FROM ABOVE"]"

@cantino
Copy link
Member

cantino commented Aug 17, 2015

It looks like you have the Array inside of a string. It should be [ ... ], not "[ ... ]".

@haroonis
Copy link

haroonis commented Oct 8, 2015

Hi there - was there a suggested resolution on this? I updated and started again with a fresh Huginn install in case I'd configured something incorrectly but no dice - the error I'm getting is with non-ASCII characters:
"ERROR -- : Error when fetching url: URI must be ascii only"

@irfancharania
Copy link
Contributor Author

I'm not sure where this is at.
I've got it working on my Huginn instance running Ruby 2.2.2 and using this

@cantino
Copy link
Member

cantino commented Oct 10, 2015

@haroonis what URL is causing that error?

@haroonis
Copy link

Thanks for the suggestion @irfancharania Irfan - I'm using a workaround for now. I've named the URL differently so it's not processed as a URL. Not ideal as I then have to append it to the domain name (it's a relative URL) however I'm not set up to recompile the code at the moment (I use Windows mainly!).
@cantino it includes a £ sign which works in the browser but isn't a compliant URL.

@cantino
Copy link
Member

cantino commented Oct 14, 2015

Does @irfancharania's #958 PR fix these issues?

@haroonis
Copy link

I found it a bit tricky to install huginn in the first place so am not sure how I'd go about testing it! It looks to have failed some checks according to that link?

@cantino
Copy link
Member

cantino commented Oct 31, 2015

I believe @knu was going to take a look at that PR again.

knu added a commit that referenced this issue Nov 2, 2015
@knu
Copy link
Member

knu commented Nov 2, 2015

Just pushed #1125!

knu added a commit that referenced this issue Nov 14, 2015
@knu knu closed this as completed in #1125 Nov 14, 2015
@cantino
Copy link
Member

cantino commented Nov 14, 2015

Awesome!

TildeWill pushed a commit to omniscopeio/huginn that referenced this issue Nov 28, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants