抓取公众号文章时，时间格式清洗出错 #46

showthesunli · 2022-03-22T12:51:08Z

测试脚本如下：

from src.collector.wechat_feddd.start import WeiXinSpider
WeiXinSpider.request_config = {"RETRIES": 3, "DELAY": 5, "TIMEOUT": 20}
WeiXinSpider.start_urls = ['https://mp.weixin.qq.com/s/OrCRVCZ8cGOLRf5p5avHOg']
WeiXinSpider.start()

错误原因：
数据清洗时，期望的数据格式是 2022-03-21 20:59，但实际抓取回来的数据是 2022-03-22 20:37:12，导致 clean_doc_ts函数报错。如下图

The text was updated successfully, but these errors were encountered:

showthesunli · 2022-03-22T12:54:09Z

如果把wechat_itme.py中的doc_ts抓取换成第47行，是可以正常抓取的，如下图

howie6879 · 2022-03-22T14:27:04Z

是 bug，时间提取将更换成从js脚本直接提取：

howie6879 · 2022-03-22T15:21:14Z

已修复，更新景镜像重新启动即可：

docker pull liuliio/schedule:v0.2.4

howie6879 closed this as completed in c01348b Mar 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

抓取公众号文章时，时间格式清洗出错 #46

抓取公众号文章时，时间格式清洗出错 #46

showthesunli commented Mar 22, 2022

showthesunli commented Mar 22, 2022

howie6879 commented Mar 22, 2022

howie6879 commented Mar 22, 2022 •

edited

Loading

抓取公众号文章时，时间格式清洗出错 #46

抓取公众号文章时，时间格式清洗出错 #46

Comments

showthesunli commented Mar 22, 2022

showthesunli commented Mar 22, 2022

howie6879 commented Mar 22, 2022

howie6879 commented Mar 22, 2022 • edited Loading

howie6879 commented Mar 22, 2022 •

edited

Loading