This script extracts links from Twitter and presents them as an Atom feed for aggregation.
It supports the most obvious bits of the Twitter API: searches, lists, user timelines, mentions etc.
An RSS/Atom feed of tweets is usually pretty useless because the feed items point to tweets, which typically aren't things you wish to bother opening in a browser after you've read them in a feed reader. It is more useful to examine the tweets, strip out the links, and have the feed items point to the links themselves rather than the tweets.
Study: People Far Away From You Not Actually Smaller http://t.co/123456789
<entry> <title>Study: People Far Away From You Not Actually Smaller</title> <link>http://www.theonion.com/articles/...</link> <content> <b>The Onion (@theonion)</b> Study: People Far Away From You Not Actually Smaller http://www.theonion.com/articles/... </content> </entry>
<entry> <title>Study: People Far Away From You Not Actually Smaller http://t.co/123456789</title> <link>http://twitter.com/theonion/statuses/123456789</link> <content> Study: People Far Away From You Not Actually Smaller http://t.co/123456789 </content> </entry>
Tweets which don't contain links simply don't appear in the resultant feed.
It is designed to have minimal requirements (PHP >= 5.2 & libcurl) and run on common shared hosting. It doesn't include any caching so it is the user's responsibility to configure their feed reader/RSS client responsibly (so that you don't fall afoul of Twitter's API rate restrictions). It is also assumed that the user will run it from a protected location (e.g. http-auth) and thus it does not perform authentication/authorisation.
You need to create an app with a Twitter Dev account (which is simple & free, in case you've never done it before) to use this script. One upshot of this is that you are authenticated as you when you use it, so you can do things like query your own home timeline or fetch tweets from protected accounts that you are authorised for.
- If you haven't already done so, register for a Twitter Dev account.
- Create a new Twitter app and make note of the OAuth tokens & keys.
git clone https://github.com/hjst/twitter2atom.git
cd twitter2atom && git submodule init && git submodule update
config.phpand copy/paste the tokens & keys from step 2.
- That's it, you should now be able to query feed.php and get Atom data.
Once installed on your web server, you can add URLs like these to your feed reader:
The full list of options/parameters is as follows:
You can copy/paste the
SEARCHTERM straight from https://twitter.com/search
For example the list:
OWNER=henryto_dd and LIST=china
This is your "home" timeline when you're logged in to Twitter, i.e. the people you follow.
For example the profile:
The tweets directed at you, i.e. what you see at twitter.com/mentions
Each method will also pass through whatever else you put in the query parameters. Common examples being "count=50" to get more results, or "until=2013-08-08" to limit them. Check the REST API v1.1 docs for the full details.
Removing URL shorteners
By default one layer of t.co URL shortening is removed, and there is an optional setting to recursively strip all URL shortening (to handle cases where tweets contain URLs wrapped by bit.ly/goo.gl/whatever before they are auto-wrapped with t.co by twitter):
Be aware though that this causes at least one HTTP HEAD request for each link in the feed, so it significantly increases the amount of time & resources required to generate the feed.
In the config file you can define a domain blacklist. Any links to these domains will be silently ignored and will not appear in Atom feeds.
It's worth pointing out that the Twitter API treats uploaded/embedded images as Media entities rather than URL entities. Twitter2Atom only operates on URL entities, so there is no need to blacklist pic.twitter.com and similar domains to avoid most photos of cats and people's lunch.
instagram.com might be a good idea though, depending on your
- James Mallison's twitter-api-php wrapper handles Twitter's v1.1 OAuth stuff.
- Josh Fraser's rolling-curl lib is used for parallel HTTP HEAD requests.
Coming up with good titles for links is difficult when you have to parse them somehow out of something as free-form as a tweet (especially when there is more than one link in a tweet). My naïve attempt is hardly brilliant so if you see any particularly bad titles (or have a suggestion for an improvement) then please get in touch (or submit a pull request - look in the
I've only implemented the Twitter API sources that I'm immediately interested in. If you would like to use other sources (something with Trending perhaps?) then get in touch.
I had a thought that in some cases it may be useful to de-duplicate the list of links prior to rendering them in the Atom feed. I'm not sure though. For my own purposes (Fever "spark" feeds) it is better to retain duplicates (as they increase the link's prominence). I'd welcome suggestions/use cases for this.