New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emoji causes failed slug generation #1702

Closed
cleverdevil opened this Issue Apr 14, 2017 · 9 comments

Comments

Projects
None yet
2 participants
@cleverdevil
Contributor

cleverdevil commented Apr 14, 2017

While trying to do this:

Perform a "repost" from Quill of this Twitter permalink.

I encountered this error:

The repost works, but the generated slug doesn't actually work, so you can't link to the post on my site. This is the slug that is generated:

https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this

As you can see, the link returns a 404. However, you can scroll down and find the post itself in my timeline.

Also of note, once you find the post in my timeline, is that it also displays the improper encoding of special characters from this related issue:

Aaron Parecki on Twitter: "Micropub PR published today! 🎉 https://t.co/B4se7jOPPr This is the last step before REC status! We'd love your impl reports and feedback!"

Some other notes:

🕷  cat version.known
version = "0.9.5"
build = 2017041101
@cleverdevil

This comment has been minimized.

Show comment
Hide comment
@cleverdevil

cleverdevil May 2, 2017

Contributor

Some more context, here. I am using MySQL for my backing storage for my Known instance. For the above example, here it what is stored in the entities table:

_id is f489a455d3570e046c63751a5755d141
uuid is https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this

Contents:

{"access":"PUBLIC","owner":"http:\/\/cleverdevil.io\/profile\/cleverdevil","body":"https:\/\/twitter.com\/aaronpk\/status\/852613547922042880","repostof":"https:\/\/twitter.com\/aaronpk\/status\/852613547922042880","description":false,"tags":false,"pageTitle":"Aaron Parecki on Twitter: \"Micropub PR published today! \ud83c\udf89 https:\/\/t.co\/B4se7jOPPr This is the last step before REC status! We'd love your impl reports and feedback!\"","slug":"aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this","created":1492118357,"updated":1492118357,"publish_status":"published","_id":"f489a455d3570e046c63751a5755d141","uuid":"https:\/\/cleverdevil.io\/2017\/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this","entity_subtype":"IdnoPlugins\\Like\\Like"}

As far as I can tell, everything looks good. The entity is there in MySQL, the JSON can be parsed and is valid. But, I get a 404 when I try and visit the permalink.

Contributor

cleverdevil commented May 2, 2017

Some more context, here. I am using MySQL for my backing storage for my Known instance. For the above example, here it what is stored in the entities table:

_id is f489a455d3570e046c63751a5755d141
uuid is https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this

Contents:

{"access":"PUBLIC","owner":"http:\/\/cleverdevil.io\/profile\/cleverdevil","body":"https:\/\/twitter.com\/aaronpk\/status\/852613547922042880","repostof":"https:\/\/twitter.com\/aaronpk\/status\/852613547922042880","description":false,"tags":false,"pageTitle":"Aaron Parecki on Twitter: \"Micropub PR published today! \ud83c\udf89 https:\/\/t.co\/B4se7jOPPr This is the last step before REC status! We'd love your impl reports and feedback!\"","slug":"aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this","created":1492118357,"updated":1492118357,"publish_status":"published","_id":"f489a455d3570e046c63751a5755d141","uuid":"https:\/\/cleverdevil.io\/2017\/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this","entity_subtype":"IdnoPlugins\\Like\\Like"}

As far as I can tell, everything looks good. The entity is there in MySQL, the JSON can be parsed and is valid. But, I get a 404 when I try and visit the permalink.

@cleverdevil

This comment has been minimized.

Show comment
Hide comment
@cleverdevil

cleverdevil May 2, 2017

Contributor

That said, there seems to be nothing in the metadata table for this entity.

Contributor

cleverdevil commented May 2, 2017

That said, there seems to be nothing in the metadata table for this entity.

@cleverdevil

This comment has been minimized.

Show comment
Hide comment
@cleverdevil

cleverdevil May 2, 2017

Contributor

Okay, I just manually inserted the following rows into my MySQL database, and now the post shows up:

insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'access',
  'PUBLIC'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'body',
  'https://twitter.com/aaronpk/status/852613547922042880'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'created',
  '2017-04-13 09:04:17'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'description',
  '0'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'entity_subtype',
  'IdnoPlugins\Like\Like'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'likeof',
  'https://twitter.com/aaronpk/status/852613547922042880'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'owner',
  'http://cleverdevil.io/profile/cleverdevil'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'pageTitle',
  'Aaron Parecki on Twitter: "Micropub PR published today..."'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'publish_status',
  'published'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'slug',
  'aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'tags',
  '0'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'updated',
  '1492118357'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'uuid',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  '_id',
  'f489a455d3570e046c63751a5755d141'
);
Contributor

cleverdevil commented May 2, 2017

Okay, I just manually inserted the following rows into my MySQL database, and now the post shows up:

insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'access',
  'PUBLIC'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'body',
  'https://twitter.com/aaronpk/status/852613547922042880'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'created',
  '2017-04-13 09:04:17'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'description',
  '0'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'entity_subtype',
  'IdnoPlugins\Like\Like'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'likeof',
  'https://twitter.com/aaronpk/status/852613547922042880'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'owner',
  'http://cleverdevil.io/profile/cleverdevil'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'pageTitle',
  'Aaron Parecki on Twitter: "Micropub PR published today..."'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'publish_status',
  'published'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'slug',
  'aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'tags',
  '0'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'updated',
  '1492118357'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'uuid',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  '_id',
  'f489a455d3570e046c63751a5755d141'
);
@cleverdevil

This comment has been minimized.

Show comment
Hide comment
@cleverdevil

cleverdevil May 2, 2017

Contributor

So, confirmed, the source of this bug is the metadata rows not getting inserted, even though the entity itself shows up in the entities table.

Contributor

cleverdevil commented May 2, 2017

So, confirmed, the source of this bug is the metadata rows not getting inserted, even though the entity itself shows up in the entities table.

@mapkyca

This comment has been minimized.

Show comment
Hide comment
@mapkyca

mapkyca May 3, 2017

Collaborator

Interesting, that implies it's an encoding issue on mysql. Entity is stored in the main entity table, but metadata is used to search. Would explain why I've been unable to replicate it on my localhost (which is mongo).

Collaborator

mapkyca commented May 3, 2017

Interesting, that implies it's an encoding issue on mysql. Entity is stored in the main entity table, but metadata is used to search. Would explain why I've been unable to replicate it on my localhost (which is mongo).

@mapkyca

This comment has been minimized.

Show comment
Hide comment
@mapkyca

mapkyca May 3, 2017

Collaborator

Interestingly on my localhost, switching to mysql, sharing and posting content with an emoji works fine.

Could you confirm your exact steps to replicate? and/or from my IRC readback I notice you've got a unit test, that'd be handy to have.

Collaborator

mapkyca commented May 3, 2017

Interestingly on my localhost, switching to mysql, sharing and posting content with an emoji works fine.

Could you confirm your exact steps to replicate? and/or from my IRC readback I notice you've got a unit test, that'd be handy to have.

@mapkyca

This comment has been minimized.

Show comment
Hide comment
@mapkyca

mapkyca May 3, 2017

Collaborator

... wondering if this is a local mysql version/encoding issue...

Collaborator

mapkyca commented May 3, 2017

... wondering if this is a local mysql version/encoding issue...

@cleverdevil

This comment has been minimized.

Show comment
Hide comment
@cleverdevil

cleverdevil May 3, 2017

Contributor

Okay, I was able to fix this issue entirely with changes to MySQL, and no changes to Known's code, by following the advice from this post about mysql and utf8mb4.

TL;DR:

  1. Modify the Known MySQL database to use the right character set and collation:

    ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;

  2. Modify each table in Known (specifically the metadata and entities tables):

    ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

  3. In order for #2 to actually work, you have to resize columns from VARCHAR(255) down to VARCHAR(191).

  4. Modify each column that may store emoji to also use the proper character set:

    ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

  5. Change MySQL server settings:

[client]
default-character-set = utf8mb4

[mysql]
default-character-set = utf8mb4

[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci

I believe that the installation instructions and any automated installation bits and pieces when running against MySQL should create the database and tables properly in the first place, and detect any misconfiguration regarding character sets and inform the user.

Contributor

cleverdevil commented May 3, 2017

Okay, I was able to fix this issue entirely with changes to MySQL, and no changes to Known's code, by following the advice from this post about mysql and utf8mb4.

TL;DR:

  1. Modify the Known MySQL database to use the right character set and collation:

    ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;

  2. Modify each table in Known (specifically the metadata and entities tables):

    ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

  3. In order for #2 to actually work, you have to resize columns from VARCHAR(255) down to VARCHAR(191).

  4. Modify each column that may store emoji to also use the proper character set:

    ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

  5. Change MySQL server settings:

[client]
default-character-set = utf8mb4

[mysql]
default-character-set = utf8mb4

[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci

I believe that the installation instructions and any automated installation bits and pieces when running against MySQL should create the database and tables properly in the first place, and detect any misconfiguration regarding character sets and inform the user.

@mapkyca

This comment has been minimized.

Show comment
Hide comment
@mapkyca

mapkyca May 4, 2017

Collaborator

Going to close this as it's not a known specific bug.

In the latest master branch I've added a stub troubleshooting section in the docs... I'd welcome a pull request to this with your solution!

Collaborator

mapkyca commented May 4, 2017

Going to close this as it's not a known specific bug.

In the latest master branch I've added a stub troubleshooting section in the docs... I'd welcome a pull request to this with your solution!

@mapkyca mapkyca closed this May 4, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment