This script extracts links from Twitter and presents them as an Atom feed for aggregation.
It supports the most obvious bits of the Twitter API: searches, lists, user timelines, mentions etc.
An RSS/Atom feed of tweets is usually pretty useless because the feed items point to tweets, which typically aren't things you wish to bother opening in a browser after you've read them in a feed reader. It is more useful to examine the tweets, strip out the links, and have the feed items point to the links themselves rather than the tweets.
This tweet:
Study: People Far Away From You Not Actually Smaller http://t.co/123456789
Becomes:
<entry>
<title>Study: People Far Away From You Not Actually Smaller</title>
<link>http://www.theonion.com/articles/...</link>
<content>
<b>The Onion (@theonion)</b>
Study: People Far Away From You Not Actually Smaller
http://www.theonion.com/articles/...
</content>
</entry>
Rather than:
<entry>
<title>Study: People Far Away From You Not Actually Smaller http://t.co/123456789</title>
<link>http://twitter.com/theonion/statuses/123456789</link>
<content>
Study: People Far Away From You Not Actually Smaller http://t.co/123456789
</content>
</entry>
Tweets which don't contain links simply don't appear in the resultant feed.
It is designed to have minimal requirements (PHP >= 5.2 & libcurl) and run on common shared hosting. It doesn't include any caching so it is the user's responsibility to configure their feed reader/RSS client responsibly (so that you don't fall afoul of Twitter's API rate restrictions). It is also assumed that the user will run it from a protected location (e.g. http-auth) and thus it does not perform authentication/authorisation.
You need to create an app with a Twitter Dev account (which is simple & free, in case you've never done it before) to use this script. One upshot of this is that you are authenticated as you when you use it, so you can do things like query your own home timeline or fetch tweets from protected accounts that you are authorised for.
- If you haven't already done so, register for a Twitter Dev account.
- Create a new Twitter app and make note of the OAuth tokens & keys.
git clone https://github.com/hjst/twitter2atom.git
cd twitter2atom && git submodule init && git submodule update
- Rename
config.php-dist
toconfig.php
and copy/paste the tokens & keys from step 2. - That's it, you should now be able to query feed.php and get Atom data.
Once installed on your web server, you can add URLs like these to your feed reader:
http://.../feed.php?op=search&q=infosec
http://.../feed.php?op=list&list_name=china&list_owner=henryto_dd
The full list of options/parameters is as follows:
feed.php?op=search&q=SEARCHTERM
You can copy/paste the SEARCHTERM
straight from https://twitter.com/search
feed.php?op=list&list_name=LIST&list_owner=OWNER
For example the list:
https://twitter.com/henryto_dd/lists/china
OWNER=henryto_dd and LIST=china
feed.php?op=timeline
This is your "home" timeline when you're logged in to Twitter, i.e. the people you follow.
feed.php?op=timeline&user=USER
For example the profile:
https://twitter.com/ells
USER=ells
feed.php?op=mentions
The tweets directed at you, i.e. what you see at twitter.com/mentions
Each method will also pass through whatever else you put in the query parameters. Common examples being "count=50" to get more results, or "until=2013-08-08" to limit them. Check the REST API v1.1 docs for the full details.
By default one layer of t.co URL shortening is removed, and there is an optional setting to recursively strip all URL shortening (to handle cases where tweets contain URLs wrapped by bit.ly/goo.gl/whatever before they are auto-wrapped with t.co by twitter):
feed.php?op=WHATEVER&unshorten_links=1
Be aware though that this causes at least one HTTP HEAD request for each link in the feed, so it significantly increases the amount of time & resources required to generate the feed.
In the config file you can define a domain blacklist. Any links to these domains will be silently ignored and will not appear in Atom feeds.
It's worth pointing out that the Twitter API treats uploaded/embedded images as Media entities rather than URL entities. Twitter2Atom only operates on URL entities, so there is no need to blacklist pic.twitter.com and similar domains to avoid most photos of cats and people's lunch.
Blacklisting instagram.com
might be a good idea though, depending on your
needs.
- James Mallison's twitter-api-php wrapper handles Twitter's v1.1 OAuth stuff.
- Josh Fraser's rolling-curl lib is used for parallel HTTP HEAD requests.
-
Coming up with good titles for links is difficult when you have to parse them somehow out of something as free-form as a tweet (especially when there is more than one link in a tweet). My naïve attempt is hardly brilliant so if you see any particularly bad titles (or have a suggestion for an improvement) then please get in touch (or submit a pull request - look in the
clean_title
method inTwitter2Atom.php
). -
I've only implemented the Twitter API sources that I'm immediately interested in. If you would like to use other sources (something with Trending perhaps?) then get in touch.
-
I had a thought that in some cases it may be useful to de-duplicate the list of links prior to rendering them in the Atom feed. I'm not sure though. For my own purposes (Fever "spark" feeds) it is better to retain duplicates (as they increase the link's prominence). I'd welcome suggestions/use cases for this.