Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

emoji字符需utf8mb4_unicode_ci #69

Closed
thekingofcity opened this issue Jan 30, 2018 · 1 comment
Closed

emoji字符需utf8mb4_unicode_ci #69

thekingofcity opened this issue Jan 30, 2018 · 1 comment

Comments

@thekingofcity
Copy link
Member

thekingofcity commented Jan 30, 2018

在提交Issue之前请先回答下面问题,谢谢!

1.你是怎么操作的?

博文中存在meoji时
python3 first_task_execution/home_first.py

尽量把你的操作过程描述清楚,最好能够复现问题。

2.你期望的结果是什么?

能正确插入表

3.实际上你得到的结果是什么?

DB operation error,here are details:(pymysql.err.InternalError) (1366, "Incorrect string value: '\\xF0\\x9F\\x98\\x8E\\xE5\\xBD...' for column 'weibo_cont' at row 1") [SQL: 'INSERT INTO weibo_data (weibo_id, weibo_cont, weibo_img, weibo_video, repost_num, comment_num, praise_num, uid, is_origin, device, weibo_url, create_time, comment_crawled, repost_crawled, dialogue_crawled) VALUES (%(weibo_id)s, %(weibo_cont)s, %(weibo_img)s, %(weibo_video)s, %(repost_num)s, %(comment_num)s, %(praise_num)s, %(uid)s, %(is_origin)s, %(device)s, %(weibo_url)s, %(create_time)s, %(comment_crawled)s, %(repost_crawled)s, %(dialogue_crawled)s)'] [parameters: {'weibo_cont': '#周二型不型# 这套礼服驳头部分我们采用全真丝制作,配上极致的黑色作为底色无论任何时间场合,你都能成为一匹黑马😎彰显王者的风采~\xa0\xa0#成都西服高级定制# 更多绅士款欢迎预约到店试穿 \u200b\u200b\u200b\u200b', 'weibo_img': 'https://wx3.sinaimg.cn/thumb150/005ODqergy1fmub4fcxcgj31fi24e1kx.jpg;https://wx4.sinaimg.cn/thumb150/005ODqergy1fmub4mkpprj321w3344qp.jpg;https://wx1 ... (322 characters truncated) ... w334x42.jpg;https://wx4.sinaimg.cn/thumb150/005ODqergy1fmub4kgv4vj321w334kjl.jpg;https://wx2.sinaimg.cn/thumb150/005ODqergy1fmub4xr7btj333421w1b7.jpg', 'repost_crawled': 0, 'device': '', 'weibo_id': '4189261395361540', 'weibo_video': '', 'weibo_url': 'https://weibo.com/5328876591/FBs5ZmuMs?from=page_1006065328876591_profile&wvr=6&mod=weibotime', 'create_time': '2017-12-26 17:40', 'praise_num': 3, 'is_origin': 1, 'comment_crawled': 0, 'repost_num': 0, 'comment_num': 6, 'dialogue_crawled': 0, 'uid': '5328876591'}]

4.你使用的是哪个版本的WeiboSpider? 你的操作系统是什么?是否有读本项目的常见问题

commit 5fc365b
Ubuntu 16.04

PS: 苹果博文emoji表情问题,见
http://blog.csdn.net/tongsh6/article/details/52292336
https://www.cnblogs.com/h--d/p/5712490.html
http://blog.csdn.net/qiaqia609/article/details/51161943
把weibo_data的weibo_cont字段字符集设置成utf8mb4_unicode_ci即可解决
weibo_comment的comment_cont字段也需要同样的更改

ALTER TABLE weibo.weibo_data MODIFY COLUMN `weibo_cont` text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci ;

希望能在创建表时解决问题

@ResolveWang
Copy link
Member

下个版本我试试,是否可以直接创建 utf8mb4 的相关字段

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants