Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
nghuyong committed Jan 3, 2024
1 parent 28a93ad commit a64db10
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -372,6 +372,7 @@ python run_spider.py tweet_by_keyword

## 更新日志

- 2024.01: 支持转发推文溯源到原推文 [#314](https://github.com/nghuyong/WeiboSpider/issues/314)
- 2023.12: 支持采集推文的二级评论 [#302](https://github.com/nghuyong/WeiboSpider/issues/302)
- 2023.12: 支持采集指定时间段的用户推文 [#308](https://github.com/nghuyong/WeiboSpider/issues/308)
- 2023.04: 支持针对推文id的推文采集 [#272](https://github.com/nghuyong/WeiboSpider/issues/272)
Expand Down
4 changes: 4 additions & 0 deletions weibospider/spiders/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@ def parse_tweet_info(data):
"pic_urls": ["https://wx1.sinaimg.cn/orj960/" + pic_id for pic_id in data.get('pic_ids', [])],
"pic_num": data['pic_num'],
'isLongText': False,
'is_retweet': False,
"user": parse_user_info(data['user']),
}
if '</a>' in tweet['source']:
Expand All @@ -116,6 +117,9 @@ def parse_tweet_info(data):
tweet['url'] = f"https://weibo.com/{tweet['user']['_id']}/{tweet['mblogid']}"
if 'continue_tag' in data and data['isLongText']:
tweet['isLongText'] = True
if 'retweeted_status' in data:
tweet['is_retweet'] = True
tweet['retweet_id'] = data['retweeted_status']['mid']
return tweet


Expand Down

0 comments on commit a64db10

Please sign in to comment.