New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recent Twitter char counting changes 馃槶 #8

snarfed opened this Issue Nov 18, 2017 · 2 comments


None yet
1 participant

snarfed commented Nov 18, 2017

evidently, along with the recent switch to 280 chars, twitter also changed the way they count chars. (IRC discussion.) it's no longer simply unicode code points, it's some weighted thing i don't understand yet. one big effect is that emoji seem to now count for two chars, not one.

this made e.g. @tantek's recent bridgy publish attempt fail. we tried to publish this content:

This is meditation. #FBF to last Saturday, in the zone, racing the Mt. Tam Trail Run half marathon with singular focus. 馃摲

Awake the night before @TheNorthFace #ECSCA, thinking back to last Saturday鈥檚 challenges and triumphs, this鈥

which twitter 403ed it with "Tweet needs to be a bit shorter."

if you count chars normally, it's 319 total, - 17 + 23 for, - 68 + 23 for the permalink, == 280.

however, if you paste it into the twitter UI, that says it's 3 chars over. deleting the camera and ellipsis emoji each drop it by 2. i haven't found the last extra char yet.



This comment has been minimized.


snarfed commented Nov 18, 2017

ok, after reading the docs, i understand the new way. some chars (code points) now count for two instead of one, and they have a data driven config that determines which are which. it'll take a little work to implement, but not a ton. seems doable.

@snarfed snarfed closed this Nov 21, 2017

snarfed added a commit to snarfed/brevity-testcases that referenced this issue Nov 24, 2017


This comment has been minimized.


snarfed commented Nov 24, 2017

reopening, this is still buggy. :/

added a failing test case in kylewm/brevity-testcases#3. the problem is that the new str_length() returns weighted count, but we also use it to index into the python string, which we shouldn't. those are separate measurements that can't be mixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment