Skip to content

shaheen-syed/pygooglenewsscraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pygooglenewsscraper

Scrape the news content from the Google news website (https://news.google.com).

It uses a keyword to retrieve the news title, URL, publisher, and date. The complete news content can then be retrieved from the URL.

Installation

pip3 install pygooglenewsscraper

Examples

Retrieve Google News items through a search keyword

from pygooglenewsscraper import GoogleNews, NewsArticle

# define keyword
keyword = 'artificial intelligence'

# google news object
googlenews = GoogleNews(keyword = keyword)

# perform google news search and retrieve raw news
raw_news = googlenews.get_raw_news()

# parse out the news articles
news = googlenews.parse_news(html = raw_news.text)

# print out results
for k, v in news.items():

	print(v['title'])
	print(v['url'])
	print(v['publisher'])
	print(v['date'])
	print()

Extract the news content for each URL

# get main content of news items
for k, v in news.items():

	# news article object
	news_article = NewsArticle(url = v['url'])

	# parse out news
	news_content = news_article.parse_main_content()

	print(news_content['content'])