Skip to content

Naver and Daum news web crawler via JSoup + Selenium.

License

Notifications You must be signed in to change notification settings

sunight1999/news-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

News Crawler

Naver and Daum news web crawler via Jsoup + Selenium.

It will do crawling all of news from naver and daum, or if you specified categories what you want, it only crawls those things.

Unfortunately naver using ajax to refresh page for updated news in every some minutes. So, I had to use selenium, because using ajax means web page is loaded dynamically and Jsoup cannot read them. For these behind story, you have to install firefox browser and download its driver. Crawler will open new instance of browser and use it to crawling.

Core Library Versions

Prerequisite

  1. You have to install Firefox web browser.
  2. ...and Firefox Driver too, from here.

How to

  1. Download above core libraries from refereced link and this repository.
  2. Move all of jar files of core libraries to repository directory.
  3. Import project to eclipse photon or just use NaverCrawler or DaumCrawler .java files.

Supported Categories and Specifying

  • Naver : Breaking, Politics, Economic, Society, Culture, World, Science
  • Daum : Politics, Economic, Society, Culture, Foreign(=World), Digital(=Science), Sports, Entertain

If you want to specify categories selectively, follow below code.

public static void main(String args[]){}
  NaverCrawler ncrawler = new NaverCrawler();
  ncrawler.setCategory(NaverCrawler.CAT_CULTURE | NaverCrawler.CAT_SOCIETY | NaverCrawler.CAT_SCIENCE);  
  ncrawler.run();
}

Result

today politics

About

Naver and Daum news web crawler via JSoup + Selenium.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages