<a href="https://colab.research.google.com/github/jgamel/learn_n_dev/blob/python_web_scrapping/GoogleNews_example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Google News

PyGoogleNews, created by the NewsCatcher Team, acts like a Python wrapper for Google News or an unofficial Google News API. It is based on one simple trick: it exploits a lightweight Google News RSS feed.

What data points can it fetch for you?

* Top stories
* Topic-related news feeds
* Geolocation specific news feed
* An extensive query-based search feed

In [2]:
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)

Mounted at /content/gdrive


In [3]:
import sys
sys.path.append('/content/gdrive/My Drive/Colab Notebooks')

In [4]:
from pygooglenews import GoogleNews
import json
import time

gn = GoogleNews()
top = gn.top_news()

entries = top["entries"]
count = 0
for entry in entries:
  count = count + 1
  print(
    str(count) + ". " + entry["title"] + entry["published"]
  )
  time.sleep(0.25)

1. Russia invades Ukraine: Live updates - CNNFri, 25 Feb 2022 16:48:00 GMT
2. Biden to nominate Ketanji Brown Jackson to the Supreme Court - The Washington PostFri, 25 Feb 2022 16:47:58 GMT
3. Kyiv residents told to make Molotov cocktails as they await Russian assault - ReutersFri, 25 Feb 2022 12:25:00 GMT
4. Zelensky to EU leaders: "This might be the last time you see me alive" - AxiosFri, 25 Feb 2022 16:19:47 GMT
5. Biden looks to cripple Russian economy with sanctions in response to invasion of Ukraine - Yahoo NewsThu, 24 Feb 2022 20:59:59 GMT
6. The Memo: Biden locks into battle with enigmatic Putin | TheHill - The HillFri, 25 Feb 2022 11:00:08 GMT
7. There will be a cost if Florida lawmakers embrace hate | Editorial - Orlando SentinelThu, 24 Feb 2022 10:16:09 GMT
8. Convoy of truck drivers could disrupt I-40 traffic in Raleigh on Friday morning - WRAL NewsFri, 25 Feb 2022 10:40:35 GMT
9. Three Former Officers Convicted of Violating George Floyd’s Rights - The New York TimesFri, 25

The code above shows how you can extract certain data points from the top news articles in the Google RSS feed. You can replace the code “gn.top_news()” with “gn.topic_headlines('business')” to get the top headlines related to “Business” or you could have replaced it with “gn.geo_headlines('San Fran')” to get the top news in the San Fransisco region.

You can also use complex queries such as “gn.search('boeing OR airbus')” to find news articles mentioning Boeing or Airbus or “gn.search('boeing -airbus')” to find all news articles that mention Boeing but not Airbus.

When web-scraping news articles with this library, for every news entry that you capture, you get the following data points, that you can use for data processing, or training your machine learning model, or running NLP scripts:

Title - contains the Headline for the article
Link - the original link for the article
Published - the date on which it was published
Summary - the article summary
Source - the website on which it was published
Sub-Articles - list of titles, publishers, and links that are on the same topic
We extracted just a few of the available data points, but you can extract the others as well, based on your requirements. Here’s a small example of the results produced by complex queries.

If you run the code below:

In [8]:
from pygooglenews import GoogleNews

gn = GoogleNews()
s = gn.search('boeing OR airbus') 


for entry in s["entries"]:
    print(entry["title"])

Airbus to test hydrogen-fueled engine on A380 jet - CNN
Emirates may cancel Boeing 777X if delays extend beyond 2023 -report - Reuters
Emirates warns Airbus over A350 deliveries amid paint row -report - Reuters
Judge OKs Boeing settlement with investors over 737 MAX - The Seattle Times
Norwegian Air CEO considers Airbus jets amid drawn-out Boeing litigation - Reuters
Boeing, Airbus in talks for multi-jet order from Air India - ThePrint
Boeing wins helicopter contract for Thailand - DefenseNews.com
Seattle engineer Marvi Matos Rodriguez shares why she chose Boeing - Boeing
Boeing Would Kill Us All to Increase Its Profits - Jacobin magazine
Why Boeing, GE, and Ingersoll Rand Stocks Dropped Today - Motley Fool
12 Years After Its First Flight: The Story Of The 4th Boeing 787 Prototype - Simple Flying
Raytheon says it may not ship around 70 engines to Airbus in first quarter - Reuters
UK court orders Airbus to delay Qatar Airways plane cancellations - Al Jazeera English
Kuwait's state carri

### Search Google News and Save to CSV File

In [20]:
import pandas as pd
import csv
from pygooglenews import GoogleNews

gn = GoogleNews (lang = 'en', country = 'UK') 

Xmassearch = gn.search('intitle:Christmas', helper = True, from_ = '2019-12-01', to_= '2019-12-31')

print(Xmassearch['feed'].title)

for item in Xmassearch ['entries']:
  print(item['title'])

file = open("Christmassearch.csv", "w")
writer = csv.writer(file)

writer.writerow(["Xmassearch"])

file.close()

"intitle:Christmas after:2019-12-01 before:2019-12-31" - Google News
Should We Stop Sending Christmas Cards? - International Environmental Technology
Winter weekenders: 15 picturesque UK towns for a pre-Christmas break - The Guardian
The best Christmas films that aren't really about Christmas - Epigram
The curious tale of how Japan got hooked on KFC at Christmas - Wired.co.uk
Farm turkeys still a winner at Christmas - with over 10 million sold in UK - FG Insight
Is This $15 Million Christmas Tree The Most Expensive In The World? - Forbes
Five Christmas mysteries unwrapped - BBC News
Is the Christmas spirit all in the brain? - BBC News
Why football at Christmas is a very British tradition - BBC News
Prime Minister Boris Johnson's Christmas message: 24 December 2019 - GOV.UK
The 50 greatest Christmas songs – ranked! | Music - The Guardian
How advertising through the ages has shaped Christmas - The Conversation UK
London Christmas Museum Guide 2019 - What's Open, What's Closed - ArtLyst
O