Skip to content

新闻网页爬虫 Using Scrapy to get the news pages' json description document and article text document.

License

Notifications You must be signed in to change notification settings

quyunye/News-Crawler-Scrapy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

News-Crawler-Scrapy

  • Get the news pages' json description document and article text document.

  • The JSON file contains the node, title, path, author, publicated_time and so on.

  • Japan's National Daily, kyodonews, The Asahi Shimbun... japanese news websites

  • Using scrapy 1.5, python3.7

  • Testing is effective at the end of 2018, and the site may change the code.

  • This repository contains only crawler code and does not contain any crawled articles and other files.

Scrapy爬虫-新闻网站

  • 获取网站新闻文章的json格式索引信息和txt格式文章内容

  • JSON文件包含每篇文章网页、标题、路径、作者、发布时间等信息

  • 目前以朝日新闻、共同社、毎日新闻社为例

  • 环境:python3.7, scrapy 1.5

  • 网站前端样式可能会有修改,本代码在2018年末测试有效

  • 此仓库只包含技术性爬取代码,不含文章等其他任何文件

About

新闻网页爬虫 Using Scrapy to get the news pages' json description document and article text document.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages