A set of PHP functions for determining the canonical URL for a post, given a POSSEd copy. A partial implementation of the algorithm outlined at original-post-discovery.
A demo can be found on waterpigs.co.uk/services/original-post.
Install using Composer ./composer.phar require indieweb/original-post-discovery:dev-master
.
<?php
require __DIR__ . '/vendor/autoload.php';
list($url, $err) = IndieWeb\discoverOriginalPost('https://twitter.com/BarnabyWalters/status/423465842148671488');
if ($err !== null) {
// handle HTTP errors here
}
// do stuff (e.g. auto-fill in-reply-to form controls) with $url
string $str = cleanString($str)
cleans up a bunch of weird encoding and character issues which can occur, specifically converting non-breaking space codepoints into normal spaces to handle some Twitter.com bugsstring|null $url = originalPostUrlFromTwitter($html)
is a pure function for parsing HTML from Twitter.com and looking in it for trailing URLsstring $str = stripHashtags($str)
removes hashtags from a stringstring|null $url = getTrailingUrl($str)
finds parenthesised (text text. (http://example.com)
) or ellipsis (text text… http://example.com
) trailing URLs in a stringstring|null $str = getUrlFromPermashortid($str)
looks for a trailing permashortid ((cctld.me id)
) and converts it into a URL (assumes HTTP)
A small PHPUnit test suite is provided — if making contributions please at least ensure that all the existing tests pass before/after your changes are made. If you could add new tests to cover the code you added that would be great too.
- Initial extraction from Taproot, readme and basic test suite