New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recent Twitter char counting changes 馃槶 #8

Closed
snarfed opened this Issue Nov 18, 2017 · 2 comments

Comments

Projects
None yet
1 participant
@snarfed
Contributor

snarfed commented Nov 18, 2017

evidently, along with the recent switch to 280 chars, twitter also changed the way they count chars. (IRC discussion.) it's no longer simply unicode code points, it's some weighted thing i don't understand yet. one big effect is that emoji seem to now count for two chars, not one.

this made e.g. @tantek's recent bridgy publish attempt fail. we tried to publish this content:

This is meditation. #FBF to last Saturday, in the zone, racing the Mt. Tam Trail Run half marathon with singular focus. 馃摲 chasquirunner.com

Awake the night before @TheNorthFace #ECSCA, thinking back to last Saturday鈥檚 challenges and triumphs, this鈥 

http://tantek.com/2017/321/t2/last-saturday-racing-tam-half-marathon

which twitter 403ed it with "Tweet needs to be a bit shorter."

if you count chars normally, it's 319 total, - 17 + 23 for chasquirunner.com, - 68 + 23 for the permalink, == 280.

however, if you paste it into the twitter UI, that says it's 3 chars over. deleting the camera and ellipsis emoji each drop it by 2. i haven't found the last extra char yet.

goddammit.

@snarfed

This comment has been minimized.

Contributor

snarfed commented Nov 18, 2017

ok, after reading the docs, i understand the new way. some chars (code points) now count for two instead of one, and they have a data driven config that determines which are which. it'll take a little work to implement, but not a ton. seems doable.

https://developer.twitter.com/en/docs/developer-utilities/twitter-text

@snarfed snarfed closed this Nov 21, 2017

snarfed added a commit to snarfed/brevity-testcases that referenced this issue Nov 24, 2017

@snarfed

This comment has been minimized.

Contributor

snarfed commented Nov 24, 2017

reopening, this is still buggy. :/

added a failing test case in kylewm/brevity-testcases#3. the problem is that the new str_length() returns weighted count, but we also use it to index into the python string, which we shouldn't. those are separate measurements that can't be mixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment