Skip to content
This repository has been archived by the owner on Sep 9, 2022. It is now read-only.

Entities sometimes have faulty indices, leading to malformed HTML output #51

Open
graulund opened this issue Jul 3, 2013 · 0 comments

Comments

@graulund
Copy link
Owner

graulund commented Jul 3, 2013

Tweet entities, which allow the highlighting of certain link-able parts of a tweet in HTML, have indices that specify where in the original tweet these transformations are to be made (character indexes).

However, Tweet Nest does not apply them until after some characters are transformed into HTML entities (such as ">" to ">"), and in some cases SmartyPants stupefication which can turn an em dash into three hyphens. In both these cases, more characters are created than were originally in the tweet, and thus the tweet entity indices get faulty. Faulty tweet entity indices result in misplaced links that cover part of the tweet text that should not have been linked, and thus renders some tweets unable to be read properly because this text is cut off.

This issue is complicated because it only occurs some of the time, usually on old tweets loaded recently, where the list of entities does not cover everything, for historical reasons. Old tweets that were loaded before the occurrence of tweet entities have no entities at all. Old tweets that were loaded after the occurence of tweet entities seem to have at least hashtag entities, regardless of the tweet creation date, and maybe more. Additionally, some tweets appear HTML entity-decoded to begin with (no raw ">" characters), while some tweets don't, depending on client and other things.

All these things result in inconsistent displaying behaviour depending on tweet source, tweet posting time, and the time it was loaded into the system, and can thus result in a poor viewing experience in Tweet Nest.

I don't have an immediate solution, but maybe some clever heads here on GitHub can look at the issue. Otherwise, we might have to look into using an external PHP library that can properly parse these tweet entities while simultaneously encoding HTML entities correctly.

For relevant code, see the tweetHTML, linkifyTweet and entitifyTweet functions in inc/html.php.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant