<a href="https://colab.research.google.com/github/jgamel/learn_n_dev/blob/python_web_scrapping/GoogleNews_example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Google News

PyGoogleNews, created by the NewsCatcher Team, acts like a Python wrapper for Google News or an unofficial Google News API. It is based on one simple trick: it exploits a lightweight Google News RSS feed.

What data points can it fetch for you?

* Top stories
* Topic-related news feeds
* Geolocation specific news feed
* An extensive query-based search feed

Mount Drive:

In [None]:
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)

Mounted at /content/gdrive


In [None]:
import sys
sys.path.append('/content/gdrive/My Drive/Colab Notebooks')

### Example 1:

Pull Top Headlines from Google News

In [None]:
from pygooglenews import GoogleNews
import json
import time

gn = GoogleNews()
top = gn.top_news()

entries = top["entries"]
count = 0
for entry in entries:
  count = count + 1
  print(
    str(count) + ". " + entry["title"] + entry["published"]
  )
  time.sleep(0.25)

1. Live updates: Russia invades Ukraine - CNNSun, 27 Feb 2022 22:29:00 GMT
2. Sanctions on Russia's Central Bank Deal Direct Blow to Country's Financial Strength - The Wall Street JournalSun, 27 Feb 2022 20:14:00 GMT
3. EU shuts airspace to Russian airlines, will buy Ukraine arms - Associated PressSun, 27 Feb 2022 20:00:46 GMT
4. Retired general explains why Russian vehicles are running into trouble - CNNSun, 27 Feb 2022 19:10:54 GMT
5. In 'unprecedented' action, Putin orders nuclear deterrent forces to be on 'high alert' amid spiraling tensions over Ukraine - NBC NewsSun, 27 Feb 2022 22:16:26 GMT
6. At CPAC, Trump delivers a reminder of his muscle - POLITICOSun, 27 Feb 2022 12:00:00 GMT
7. Fence reinstalled around Capitol building ahead of Biden's State of the Union - NBC NewsSun, 27 Feb 2022 18:30:30 GMT
8. Clyburn: supreme court nomination of Ketanji Brown Jackson ‘beyond politics’ - The GuardianSun, 27 Feb 2022 18:48:00 GMT
9. He just won a $10 million lottery for the second time -

The code above shows how you can extract certain data points from the top news articles in the Google RSS feed. You can replace the code “gn.top_news()” with “gn.topic_headlines('business')” to get the top headlines related to “Business” or you could have replaced it with “gn.geo_headlines('San Fran')” to get the top news in the San Fransisco region.

You can also use complex queries such as “gn.search('boeing OR airbus')” to find news articles mentioning Boeing or Airbus or “gn.search('boeing -airbus')” to find all news articles that mention Boeing but not Airbus.

When web-scraping news articles with this library, for every news entry that you capture, you get the following data points, that you can use for data processing, or training your machine learning model, or running NLP scripts:

Title - contains the Headline for the article
Link - the original link for the article
Published - the date on which it was published
Summary - the article summary
Source - the website on which it was published
Sub-Articles - list of titles, publishers, and links that are on the same topic
We extracted just a few of the available data points, but you can extract the others as well, based on your requirements. Here’s a small example of the results produced by complex queries.

If you run the code below:

In [None]:
from pygooglenews import GoogleNews

gn = GoogleNews()
s = gn.search('russia -putin') 


for entry in s["entries"]:
    print(entry["title"])

UPS and FedEx halting shipments to Russia and Ukraine - Reuters
FIFA impose measures on Russia in response to its invasion of Ukraine - The Athletic
Factbox-Companies With Exposure to Russia - U.S. News & World Report
Ukrainian minister says Russia lost some 4300 men in invasion - Reuters
The 'unprecedented' sanctions on Russia could make war unsustainable, expert says - NPR
Analyzing the state of Russia's military - NPR
Russia continues to advance on Kyiv in attempt to topple Ukrainian government - NPR
3 ways Russia's invasion of Ukraine will impact the American economy - NBC News
U.S. banks prepare for cyber attacks after latest Russia sanctions - Reuters
SWIFT ban prevents Russia from moving money easily. It also has unintended effects - NPR
Boxing's governing organizations won't sanction any title bouts in Russia due to invasion of Ukraine - ESPN
France urges its citizens making short-term visits to Russia to leave - Reuters
Biden sanctions spare Russia's energy sector. What that m

### Example 2:

Search Google News and Save to CSV File

In [None]:
import pandas as pd
import csv
from pygooglenews import GoogleNews

gn = GoogleNews (lang = 'en', country = 'UK') 

russiasearch = gn.search('intitle:russia', helper = True, from_ = '2022-01-01', to_= '2022-12-31')

print(russiasearch['feed'].title)

for item in russiasearch ['entries']:
  print(item['title'])

with open('/content/gdrive/My Drive/russia_search.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(['russiasearch'])  # I presume you meant this to be your header
    
    # Use your loop from before...
    for item in russiasearch ['entries']:
        # And write each item
        writer.writerow([item['title']])

file.close()

"intitle:russia after:2022-01-01 before:2022-12-31" - Google News
Russia-Ukraine live updates: Kyiv to hold talks with Moscow - Al Jazeera English
Russian ex-official: Putin's plan is full victory by March 2 - Al Jazeera English
Russia homes in on Kyiv and Kharkiv and pushes across Black Sea coast - Financial Times
Ukraine, Russia agree to talks ‘without preconditions’: Zelenskyy - Al Jazeera English
Putin signals escalation as he puts Russia’s nuclear force on high alert - The Guardian
As Russia invades Ukraine, Iraqis remember painful war memories - Al Jazeera English
More than 2,000 arrested at anti-war protests in Russia - Al Jazeera English
Ukraine appeals for foreign volunteers to join fight against Russia - The Guardian
Two top Russian billionaires speak out against invasion of Ukraine - The Guardian
Russia’s invasion of Ukraine: List of key developments from Day 4 - Al Jazeera English
‘A global financial pariah’: how central bank sanctions could hobble Russia - Financial Times
