You are ignoring rel="self" #349

Open
zharben opened this Issue Feb 14, 2014 · 4 comments

Projects

None yet

3 participants

@zharben
zharben commented Feb 14, 2014

In the last several hours, our website has received about 6000 requests to the same page from someone using your RSS reader.

It appears the reader is ignoring the rel="self" attribute that we include on some RSS items.

To address the problem, we are blocking your user agent.

You should fix this - it makes you look like a DDoS tool!!!

@zharben
zharben commented Feb 14, 2014

More specifically, you are not handling "atom:link" elements correctly.

See http://tools.ietf.org/search/rfc4287#section-4.2.7.2 (item 3)

@mattkatz

Hi Zharben - what is your website? getting a sample will greatly help in troubleshooting. Is simplepie traversing the website multiple times?

I am not a simplepie developer, but I use the simplepie library in my code (as does wordpress) so I hope we could fix the issue...

@zharben
zharben commented Aug 26, 2014

Hi Matt,

Thanks for reaching out! We resolved this issue by blocking the user agent that simplepie presents. Our site is www.volunteermatch.org - I'd prefer not to have any changes tested on our site, based on the problems we experienced.

Here's a sample from our RSS feed (with some content removed) :

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:georss="http://www.georss.org/georss" xmlns:vm="http://www.volunteermatch.org/schema/2009/1/vmrss" xmlns:gml="http://www.opengis.net/gml" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Volunteer Opportunities within 20 miles of Santa Rosa, CA, USA</title>
    <link>http://www.volunteermatch.org/search/index.jsp?r=20.0&amp;aff=&amp;l=Santa+Rosa%2C+CA%2C+USA</link>
    <atom:link href="http://www.volunteermatch.org/search/index.jsp?rss=true&amp;r=20.0&amp;aff=&amp;l=Santa+Rosa%2C+CA%2C+USA" rel="self" type="application/rss+xml" />
    <description>VolunteerMatch - Where Volunteering Begins</description>
    <language>en-us</language>
    <pubDate>Tue, 26 Aug 2014 09:56:06 PDT</pubDate>
    <lastBuildDate>Tue, 26 Aug 2014 09:56:06 PDT</lastBuildDate>
    <item>
      <title>Test</title>
      <link>http://www.volunteermatch.org/search/opp1727956.jsp</link>
      <description>Test</description>
      <pubDate>Mon, 25 Aug 2014 12:23:17 PDT</pubDate>
      <guid isPermaLink="true">http://www.volunteermatch.org/search/opp1727956.jsp</guid>
      <category>Women</category>
      <category>Seniors</category>
      <category>Community</category>
    </item>
  </channel>
</rss>

The problem occurs when simplepie parses the "atom:link" element. The rel="self" attribute indicates that the specified URL represents the current feed. simplepie seems to ignore the rel="self" attribute, and crawls the link. This triggers an infinite recursion.

@plaidfluff

This might not be a problem with SimplePie itself, but with a program making use of SimplePie and doing something incorrectly. SimplePie is just a library.

Unfortunately, it sounds like whomever wrote the software didn't bother to override the user-agent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment