# Description of the `FinanceNewsScraper` Class

The `FinanceNewsScraper` class is designed to scrape financial news articles from the business section of Google News based on a set of specified buzzwords and a given date range.

- **Initialization (`__init__`)**: 
  - The scraper accepts two sets of buzzwords:
    - **Must-have buzzwords**: Keywords that must appear in the article title or description.
    - **Percentage-based buzzwords**: Keywords that need to match a certain percentage within the article.
  - It also takes a `start_date`, `end_date`, and an interval for scraping in chunks (e.g., weekly).

- **URL Construction (`construct_url`)**: 
  - This function builds a Google News RSS URL specifically for the business section, incorporating the provided buzzwords and date range.

- **Fetching Data (`fetch_rss_feed`)**: 
  - This function retrieves the RSS feed using the constructed URL, retrying up to three times if errors are encountered.
  - **Robust Retry Mechanism**:
      - To ensure stable scraping even when there are network issues, the class includes a retry mechanism. It retries the process multiple times if it fails to retrieve the Yahoo Finance page, adding reliability to the data extraction process.


- **Keyword Matching**:
  - **Must-have buzzwords**: Ensures that at least one must-have buzzword appears in the article's title or description.
  - **Percentage-based buzzwords**: Verifies that a minimum percentage of the provided buzzwords are present in the article.

- **Article Parsing (`parse_articles`)**: 
  - This function parses the RSS feed and extracts relevant information such as the article title, URL, and publication date, but only for articles that match the buzzword criteria.

- **Scraping (`scrape`)**: 
  - This method iterates through the specified date range, fetching and parsing articles in chunks as defined by the provided interval.

- **Saving to CSV (`save_to_csv`)**: 
  - After scraping, the articles are saved to a CSV file using the `pandas` library for easy storage and further analysis.

This class simplifies the process of scraping Google News for business-related articles based on keywords, while also offering functionality to save the results as a CSV file for later analysis.


In [1]:
import requests
import random
from bs4 import BeautifulSoup
from datetime import datetime, timedelta
import math
import time
import csv
import pandas as pd

In [17]:
class FinanceNewsScraper:
    def __init__(self, primary_buzzwords, secondary_buzzwords, start_date, end_date, required_percentage, interval):
        """
        Initialize the scraper with two sets of buzzwords, start date, end date, and required percentage.
        :param primary_buzzwords: List of buzzwords that must be present.
        :param secondary_buzzwords: List of buzzwords to search for with percentage matching.
        :param start_date: The start date (YYYY-MM-DD) for the articles.
        :param end_date: The end date (YYYY-MM-DD) for the articles.
        :param required_percentage: The percentage of percentage-based buzzwords that should be present (default 60%).
        """
        self.primary_buzzwords = primary_buzzwords
        self.secondary_buzzwords = secondary_buzzwords
        self.start_date = datetime.strptime(start_date, '%Y-%m-%d')
        self.end_date = datetime.strptime(end_date, '%Y-%m-%d')
        self.required_percentage = required_percentage / 100  # Convert percentage to decimal for calculations
        self.base_url = "https://news.google.com/rss"
        self.interval = interval
        self.max_retries = 3  # Number of retries in case of failure

    def construct_url(self, start_date, end_date):
        """
        Construct the Google News RSS URL with all buzzwords and date range.
        :return: The constructed URL.
        """
        combined_buzzwords = self.primary_buzzwords + self.secondary_buzzwords
        query = " AND ".join(combined_buzzwords)  # Combine all buzzwords with 'AND' to ensure all words are present
        formatted_query = query.replace(" ", "%20")  # Format query for URL
        
        url = f"{self.base_url}?q={formatted_query}+after:{start_date}+before:{end_date}&hl=en-US&gl=US&ceid=US:en"
        
        return url

    def fetch_rss_feed(self, start_date, end_date, max_retries=5, backoff_factor=2):
        """
        Fetch the RSS feed from Google News for a given date range, with retries and exponential backoff to avoid 503 errors.
        :param start_date: The start date for fetching articles.
        :param end_date: The end date for fetching articles.
        :param max_retries: Maximum number of retries if the request fails.
        :param backoff_factor: Factor by which the wait time increases after each failure.
        :return: BeautifulSoup object with the RSS feed content.
        """
        rss_url = self.construct_url(start_date, end_date)
        attempt = 0
        delay = 5  # Start with an initial delay of 5 seconds

        while attempt < max_retries:
            try:
                response = requests.get(rss_url, headers={'User-Agent': 'Mozilla/5.0'})
                
                if response.status_code == 200:
                    return BeautifulSoup(response.content, 'xml')  # Parsing as XML
                else:
                    print(f"Failed to retrieve RSS feed with status code {response.status_code}. Retrying...")

            except requests.RequestException as e:
                print(f"Error fetching the RSS feed: {e}. Retrying...")

            # Apply the exponential backoff
            attempt += 1
            time.sleep(delay)
            delay *= backoff_factor  # Increase the delay exponentially

        print("Max retries exceeded. Could not fetch the RSS feed.")
        return None


    def contains_any_primary_buzzwords(self, text):
        """
        Check if any must-have buzzwords are present in the given text.
        :param text: The text to search for must-have buzzwords (case-insensitive).
        :return: True if at least one must-have buzzword is found, False otherwise.
        """
        text = text.lower()
        return any(buzzword.lower() in text for buzzword in self.primary_buzzwords)

    def contains_percentage_of_buzzwords(self, text):
        """
        Check if at least the required percentage of percentage-based buzzwords are present in the given text.
        :param text: The text to search for percentage-based buzzwords (case-insensitive).
        :return: True if the required percentage of percentage-based buzzwords are found, False otherwise.
        """
        text = text.lower()
        buzzwords_found = sum(1 for buzzword in self.secondary_buzzwords if buzzword.lower() in text)
        required_count = math.ceil(len(self.secondary_buzzwords) * self.required_percentage)
        
        # The condition now checks if at least the required count of buzzwords is found
        return buzzwords_found >= required_count

    def parse_articles(self, soup):
        """
        Parse the RSS feed and extract article information.
        Only return articles where all must-have buzzwords and a percentage of percentage-based buzzwords are found.
        :param soup: BeautifulSoup object of the RSS feed.
        :return: List of dictionaries with article titles, URLs, and publication dates.
        """
        articles = []
        for item in soup.find_all('item'):
            title = item.title.text
            link = item.link.text
            description = item.description.text if item.description else ""
            pub_date = item.pubDate.text
            pub_date = datetime.strptime(pub_date, '%a, %d %b %Y %H:%M:%S %Z')  # Format the date
            
            # Check if any must-have buzzwords are present in title or description
            first_100_words = " ".join(description.split()[:100])
            if self.contains_any_primary_buzzwords(title) or self.contains_any_primary_buzzwords(first_100_words):
                # Check if the required percentage of percentage-based buzzwords are present
                if self.contains_percentage_of_buzzwords(title) or self.contains_percentage_of_buzzwords(first_100_words):
                    articles.append({'title': title, 'url': link, 'date': pub_date})
        return articles

    def scrape(self):
        """
        Scrape the RSS feed and extract articles that match both must-have and percentage-based buzzwords.
        :return: List of articles (titles, URLs, and dates).
        """
        all_articles = []
        delta = timedelta(days=self.interval)  # Fetch in intervals (e.g., weekly)
        current_start_date = self.start_date
        print(f"Fetching articles from {current_start_date.strftime('%Y-%m-%d')} to {self.end_date.strftime('%Y-%m-%d')}")

        # Loop through the date range with the specified interval
        while current_start_date < self.end_date:
            current_end_date = min(current_start_date + delta, self.end_date)

            soup = self.fetch_rss_feed(current_start_date.strftime('%Y-%m-%d'),
                                       current_end_date.strftime('%Y-%m-%d'))
            if soup:
                articles = self.parse_articles(soup)
                all_articles.extend(articles)

            current_start_date += delta  # Move to the next interval

        if all_articles:
            print(f"Found {len(all_articles)} articles matching the criteria.")
        else:
            print("No articles found matching the criteria.")
        return all_articles

    def save_to_csv(articles, filename):
        """
        Save the scraped articles to a CSV file using pandas.
        :param articles: List of articles with title, URL, and date.
        :param filename: The name of the CSV file (default is "articles.csv").
        """
        # Convert the list of articles to a pandas DataFrame and drop duplicates in case there are any
        df = pd.DataFrame(articles).drop_duplicates()
        
        # Save DataFrame to CSV
        df.to_csv(filename, index=False, encoding='utf-8') 

In [18]:
primary_buzzwords = ["exxon", "xom", "chevron", "cvx", "viper", "vnom", "murphy", "mur","gas",
                     "petrol","fuel","price","oil","shell","carbon","hydrogen","nuclear","coal","fossil",
                     "energy","smart grid","power"]
secondary_buzzwords =  ["israel", "gaza", "palestine", "conflict", "war", "hamas", 
                       "ukraine", "russia","airstrike","attack", "crisis","oil","prices","nato","invasion"
                       "iran","afghanistan","china","taiwan","military","air strike","balakot","india-pakistan"
                       "indo-pacific","south china sea","market","nuclear","escalate","zelensky","putin"]   # List of buzzwords to search for
start_date = '2022-10-27'
end_date = '2024-10-27'  # End date
required_percentage = 10  # required_percentage% of the secondary buzzwords should be in title or description
interval = 1

scraper_news = FinanceNewsScraper(primary_buzzwords, secondary_buzzwords, start_date, end_date, required_percentage, interval)
articles_news = scraper_news.scrape()

# Save the output as CSV files 
FinanceNewsScraper.save_to_csv(articles_news, "google_news_geopolitical_energy.csv")

Fetching articles from 2022-10-27 to 2024-10-27
Failed to retrieve RSS feed with status code 503. Retrying...
Failed to retrieve RSS feed with status code 503. Retrying...
Failed to retrieve RSS feed with status code 503. Retrying...
Failed to retrieve RSS feed with status code 503. Retrying...
Failed to retrieve RSS feed with status code 503. Retrying...
Max retries exceeded. Could not fetch the RSS feed.


KeyboardInterrupt: 

### `FinanceNewsAPIScraper` Class Description

The `FinanceNewsAPIScraper` class is designed to fetch, filter, and save news articles from NewsAPI based on specified buzzwords. Key functionalities:

- **Initialization**: Takes in an API key, primary and secondary buzzwords, date range, and retry settings.
- **Fetch News**: Sends API requests and retries if rate limits are hit.
- **Filter**: Filters articles to ensure primary buzzwords are present, with a required percentage of secondary buzzwords.
- **Display & Save**: Displays the filtered articles and provides an option to save them to a CSV file.

### Key Methods:
- `fetch_news()`
- `filter_articles()`
- `display_articles()`


In [22]:
class FinanceNewsAPIScraper:
    def __init__(self, api_key, primary_buzzwords, secondary_buzzwords, start_date, end_date, required_percentage, retry_after):
        self.api_key = api_key
        self.primary_buzzwords = primary_buzzwords
        self.secondary_buzzwords = secondary_buzzwords
        self.start_date = start_date
        self.end_date = end_date
        self.required_percentage = required_percentage / 100
        self.base_url ='https://newsapi.org/v2/everything'
        self.retry_after = retry_after

    def contains_any_primary_buzzwords(self, text):
        text = text.lower()
        return any(buzzword.lower() in text for buzzword in self.primary_buzzwords)

    def contains_required_percentage_of_secondary_buzzwords(self, text):
        text = text.lower()
        buzzwords_found = sum(1 for buzzword in self.secondary_buzzwords if buzzword.lower() in text)
        required_count = math.ceil(len(self.secondary_buzzwords) * self.required_percentage)
        return buzzwords_found >= required_count

    def fetch_news(self, retries=3):
        params = {
                'q': ' OR '.join(self.primary_buzzwords + self.secondary_buzzwords),
                'apiKey': self.api_key,
                'from': self.start_date,
                'to': self.end_date,
                'language': 'en',
                'sortBy': 'relevancy'
            }
        attempt = 0
        while attempt < retries:
            response = requests.get(self.base_url, params=params)
            if response.status_code == 200:
                return response.json()
            elif response.status_code == 429:
                print(f"Rate limit exceeded. Retrying after {self.retry_after} seconds...")
                time.sleep(self.retry_after)
            else:
                print(f"Failed to fetch news articles. Status code: {response.status_code}")
                return None
            attempt += 1
        print("Max retries exceeded. Could not fetch the news.")
        return None

    def filter_articles(self, news_data):
        filtered_articles = []
        if news_data and 'articles' in news_data:
            for article in news_data['articles']:
                title = article['title']
                description = article['description'] or ""
                content = title + " " + description
                if self.contains_any_primary_buzzwords(content) and self.contains_required_percentage_of_secondary_buzzwords(content):
                    filtered_articles.append(article)
        return filtered_articles

    def display_articles(self, articles):
        if articles:
            for i, article in enumerate(articles, start=1):
                print(f"{i}. {article['title']} ({article['publishedAt']})")
        else:
            print("No articles found.")

    def scrape_and_filter_news(self):
        news_data = self.fetch_news()
        if news_data:
            filtered_articles = self.filter_articles(news_data)
            self.display_articles(filtered_articles)
            return filtered_articles  # Ensure filtered_articles is returned
        else:
            print("Failed to retrieve or filter articles.")
            return []

    def save_to_csv(self, articles, filename):
        if articles:
            data = [{
                'title': article['title'],
                'publishedAt': article['publishedAt'],
                'url': article['url']
            } for article in articles]
            df = pd.DataFrame(data)
            df.to_csv(filename, index=False, encoding='utf-8')
            print(f"Articles saved to {filename}")
        else:
            print("No articles to save.")


In [21]:
api_keys = ['51f3c8bce6b1473e9537d03fe37815e3','5d7f3433c9404b6aaba5c5db771f2c79','25d106c70b3c4ff3af1fb174e0afc2ed','5fd42a249293445fb159cdba3905b6ac','f49d4986d0cd4ec5b9a92905e9f6dc65']

**News Focus:** `Geopoliticial conflicts`


In [131]:
secondary_buzzwords = ["israel", "gaza", "palestine", "conflict", "war", "hamas", 
                       "ukraine", "russia","airstrike","attack", "crisis","oil","nato"
                       "iran","afghanistan","china","taiwan","military"
                        ,"south china sea","market","nuclear","escalate","zelensky","putin"]  # List of buzzwords to search for
required_percentage = 100/len(secondary_buzzwords)
retry_after = 60
start_date = '2024-09-27'
end_date = '2024-10-27'

**News Focus:** `Climate events`


In [128]:
secondary_buzzwords = ["extreme weather", "climate crisis", "rising sea levels", "coastal erosion", "heatwaves",
    "atmospheric rivers", "carbon emissions","greenhouse gases", "renewable energy", "deforestation", "carbon footprint",
    "sustainable development", "biodiversity loss","hurricane","tsunami","floods"]
 # List of buzzwords to search for
required_percentage = 3
retry_after = 60
start_date = '2024-09-27'
end_date = '2024-10-27'

**News Focus:** `Elections`

In [105]:
secondary_buzzwords = ["trump","kamala","harris","biden","modi","narendra","xi","putin", "election", 
                       "2024","campaign","vote","voter"]
 # List of buzzwords to search for
required_percentage = 100/len(secondary_buzzwords)
retry_after = 60
start_date = '2024-09-27'
end_date = '2024-10-27'

**News Focus:** `Pandemics`

In [96]:
secondary_buzzwords = ["pandemic","epidemic","covid","corona","covid","bird-flu", "public health crisis", "outbreak", "quarantine",
                        "lockdown","infection", "epicenter","virus","bacteria","vaccine","epicenter"]
 # List of buzzwords to search for
required_percentage = 100/len(secondary_buzzwords)
retry_after = 60
start_date = '2024-09-27'
end_date = '2024-10-27'

**News Focus:** `Government Policies`

In [94]:
secondary_buzzwords = ["tariffs", "sanctions", "subsidies", "regulations", "reforms","taxation", "immigration","trade",
                         "defense", "cybersecurity","privacy", "surveillance", "government", "monetary",
                        "labor", "manufacturing", "exports", "imports", "automation","health", "care","housing"] # List of buzzwords to search for
               
required_percentage = 100/len(secondary_buzzwords)
retry_after = 60
start_date = '2024-09-27'
end_date = '2024-10-27'

**Industry Focus:** `Energy`

**Timeline:** `One month`

In the US the two biggest stocks in these industries are `ExxonMobil` (XOM) and `Chevron` (CVX)

*Stock Focus*: `XOM`, `CVX`, `WFRD`, `VNOM`.

In [132]:
primary_buzzwords = ["exxon", "xom", "chevron", "cvx", "viper", "vnom", "murphy", "mur","gas",
                     "petrol","fuel","price","oil","shell","carbon","hydrogen","nuclear","coal","fossil",
                     "energy","smart grid","power"]
scraper = FinanceNewsAPIScraper(api_keys[1], primary_buzzwords, secondary_buzzwords, start_date, end_date, required_percentage, retry_after)
filtered_articles = scraper.scrape_and_filter_news()
scraper.save_to_csv(filtered_articles, "news_api_energy_one_month_geopolitical.csv")

1. The World Is Still Hooked on Russian Energy—at Its Own Peril (2024-10-20T11:30:00Z)
2. Jamie Dimon says the global order is at risk — and raging conflicts could explode into World War 3 (2024-10-25T15:49:49Z)
3. The American Who Waged a Tech War on China (2024-10-10T10:00:00Z)
4. 2 of the world's biggest oil producers are looking for new ways to power their economies (2024-10-22T02:43:33Z)
5. Why the Oil Market Is Not Shocked (2024-10-20T12:00:00Z)
6. China's economic issues are so serious that even the oil market doesn't care about Middle East tensions — for now (2024-10-01T06:38:56Z)
7. World’s Top Uranium Miner Backs Nuclear Plant in Referendum (2024-10-07T10:04:10Z)
8. Exclusive-Qatar LNG sales to key Asian markets confronted by US, UAE rivalry (2024-10-21T10:40:22Z)
9. China nuclear sub sank in its dock, US officials say (2024-09-27T15:35:26Z)
10. Stock market today: Tech rally leads stocks higher as oil prices plunge and earnings kick off (2024-10-08T20:12:25Z)
11. Hard Nuclea

**Industry Focus:** `Materials sector`

**Timeline:** `One month`

In the US the two biggest stocks in this industry are `Sherwin-Williams` (SHW) and `DuPont` (DD)

*Stock Focus*: `SHW`, `DD`, `GOLD`, `RIO`.

In [133]:
api_key = '5d7f3433c9404b6aaba5c5db771f2c79'  
primary_buzzwords = ["sherwin", "williams", "shw", "dupont", "barrick", "gold", "rio","tinto","steel",
                     "copper","aluminum","chemicals","minning","recycling","bio-based","materials","carbon",
                     "building","cement"]
scraper = FinanceNewsAPIScraper(api_keys[1], primary_buzzwords, secondary_buzzwords, start_date, end_date, required_percentage, retry_after)
filtered_articles = scraper.scrape_and_filter_news()
scraper.save_to_csv(filtered_articles, "news_api_materials_one_month_geopolitical.csv")

1. Vietnam condemns China for assault on its fishermen in the disputed South China Sea (2024-10-03T10:45:15Z)
2. Vietnam Accuses China of ‘Brutal’ Attack on Fishing Boat in South China Sea (2024-10-04T03:45:00Z)
3. Apple CEO Tim Cook and COO Jeff Williams Visiting China (2024-10-22T14:59:09Z)
4. China's economic issues are so serious that even the oil market doesn't care about Middle East tensions — for now (2024-10-01T06:38:56Z)
5. 11 Things You Should Never Put in the Dishwasher (2024-10-25T12:10:03Z)
6. US bans new types of goods from China over allegations of forced labor (2024-10-02T18:46:00Z)
Articles saved to news_api_materials_one_month_geopolitical.csv


**Industry Focus:** `Industrials sector`

**Timeline:** `One month`

In the US the two biggest stocks in this industry are `UPS` (UPS) and `Raytheon` (RTX)

*Stock Focus*: `UPS`, `RTX`, `DAL`, `UAL`,`LMT`,`BA`.

In [134]:
primary_buzzwords = ["ups", "raytheon", "rtx", "delta","airlines","dal","united","ual","lmt","ba","boeing","machinery"
                     ,"engineering","automation","defense","aerospace"]
scraper = FinanceNewsAPIScraper(api_keys[1], primary_buzzwords, secondary_buzzwords, start_date, end_date, required_percentage, retry_after)
filtered_articles = scraper.scrape_and_filter_news()
scraper.save_to_csv(filtered_articles, "news_api_industrials_one_month_geopolitical.csv")

1. The American Who Waged a Tech War on China (2024-10-10T10:00:00Z)
2. Jamie Dimon says the global order is at risk — and raging conflicts could explode into World War 3 (2024-10-25T15:49:49Z)
3. Putin hosts a summit in a bid to show the West it can't keep Russia off the global stage (2024-10-21T05:55:04Z)
4. U.S. to send air defense system and troops to Israel (2024-10-13T17:36:05Z)
5. Efforts by the US to counter China in the South China Sea could backfire (2024-10-12T07:00:01Z)
6. China, at UN, warns against 'expansion of the battlefield' in the Ukraine war (2024-09-28T16:21:20Z)
7. What's actually happening in the South China Sea? (2024-10-16T15:35:36Z)
8. China is menacing Taiwan with a drone swarm attack — and the US is playing catch-up (2024-10-08T13:42:24Z)
9. The West is trying to starve the wrong part of Russia's war machine, defense experts say (2024-10-20T11:23:01Z)
10. What is the U.N. peacekeeping force stationed in Lebanon? (2024-10-11T22:22:00Z)
11. DJI sues the US Dep

**Industry Focus:** `Utilities sector`

**Timeline:** `One month`

In the US the two biggest stocks in this industry are `Duke energy` (DUK) and `Consolidated Edison` (ED)

*Stock Focus*: `DUK`, `ED`.

In [135]:
primary_buzzwords = ["duke", "energy", "duk", "consolidated", "edison", "ed", "electricity",
                     "water supply","solar","wind","hydropower","energy prices","climate change","public utilities"]
scraper = FinanceNewsAPIScraper(api_keys[1], primary_buzzwords, secondary_buzzwords, start_date, end_date, required_percentage, retry_after)
filtered_articles = scraper.scrape_and_filter_news()
scraper.save_to_csv(filtered_articles, "news_api_utilities_one_month_geopolitical.csv")

1. India and China agree to de-escalate border tensions (2024-10-21T11:06:40Z)
2. The World Is Still Hooked on Russian Energy—at Its Own Peril (2024-10-20T11:30:00Z)
3. The American Who Waged a Tech War on China (2024-10-10T10:00:00Z)
4. 2 of the world's biggest oil producers are looking for new ways to power their economies (2024-10-22T02:43:33Z)
5. Wider Middle East conflict threatens the global economy — when the US and China already face headwinds, experts say (2024-10-04T08:58:26Z)
6. China, at UN, warns against 'expansion of the battlefield' in the Ukraine war (2024-09-28T16:21:20Z)
7. Why the Oil Market Is Not Shocked (2024-10-20T12:00:00Z)
8. A defiant Putin closes global summit aimed at reshaping global order (2024-10-25T03:50:30Z)
9. TikTok owner ByteDance unveils its first earbuds in China (2024-10-10T14:35:07Z)
10. Exclusive-Qatar LNG sales to key Asian markets confronted by US, UAE rivalry (2024-10-21T10:40:22Z)
11. China wants Taiwan to make mistakes and is looking for ex

**Industry Focus:** `Healthcare sector`

**Timeline:** `One month`

In the US the two biggest stocks in this industry are `UnitedHealth Group` (UNH) and `Johnson & Johnson` (JNJ)

*Stock Focus*: `UNH`, `JNJ`.

In [136]:
primary_buzzwords = ["united health group", "health", "unh", "johnson & johnson","jnj","pharmaceuticals"
                     ,"biotechnology","drug", "clinical","fda"]
scraper = FinanceNewsAPIScraper(api_keys[0], primary_buzzwords, secondary_buzzwords, start_date, end_date, required_percentage, retry_after)
filtered_articles = scraper.scrape_and_filter_news()
scraper.save_to_csv(filtered_articles, "news_api_health_one_month_geopolitical.csv")

1. I’m a doctor — these are the 3 worst chemicals in your food that could damage your health (2024-10-25T14:42:45Z)
2. Ransomware gang Trinity joins pile of scumbags targeting healthcare (2024-10-09T13:45:08Z)
3. UMC Health System diverted patients following a ransomware attack (2024-10-01T18:00:18Z)
4. A cyberattack on Kuwait Health Ministry impacted hospitals in the country (2024-09-28T06:11:16Z)
5. Top Foreign-Policy Takeaways From the Vice Presidential Debate (2024-10-02T04:38:57Z)
6. How Big Oil is Astroturfing opposition to Wind and Solar, and Helping Destroy the Earth (2024-10-09T04:06:52Z)
Articles saved to news_api_health_one_month_geopolitical.csv


**Industry Focus:** `Financials sector`

**Timeline:** `One month`

In the US the two biggest stocks in this industry are `Berkshire Hathaway` (BRK.A and BRK.B) and `JPMorgan Chase` (JPM)

*Stock Focus*: `BRK.A`, `BRK.BH`,`JPM`,`BAC`,`USB`, `pypl`.

In [137]:
primary_buzzwords = ["berkshire","hathaway","jp morgan","j.p. morgan","chase","bank","of america",
                     "brk.a","brk.b","bac","usb","paypal","pypl","stock","interest rates","inflation",
                     "bonds","capital","investment"]
scraper = FinanceNewsAPIScraper(api_keys[1], primary_buzzwords, secondary_buzzwords, start_date, end_date, required_percentage, retry_after)
filtered_articles = scraper.scrape_and_filter_news()
scraper.save_to_csv(filtered_articles, "news_api_finance_one_month_geopolitical.csv")

1. Stock market today: Tech rally leads stocks higher as oil prices plunge and earnings kick off (2024-10-08T20:12:25Z)
2. Efforts by the US to counter China in the South China Sea could backfire (2024-10-12T07:00:01Z)
3. BRICS' new declaration shows Russia's war still can't get the international backing that Moscow wants (2024-10-24T07:36:46Z)
4. Investing in China's stock market is like 'picking up dimes in front of bulldozers' given the nation's long-running stagnation, 'Big Short' investor Kyle Bass says (2024-10-01T18:30:44Z)
5. Putin's dollar problem is on clear display at the BRICS summit, starting with the moment guests land at the airport (2024-10-23T09:08:09Z)
6. China could raise nearly $1 trillion of fresh debt in the next 3 years to revive its economy (2024-10-16T16:52:44Z)
7. The US Navy is burning through missiles in the Middle East that it would need in a war with China (2024-10-11T08:00:01Z)
8. Why Chinese stocks will climb another 50% from current levels, research CEO

**Industry Focus:** `Consumers sector`

**Timeline:** `One month`

In the US the biggest stocks in this industry are `Coca-Cola` (KO), `Procter & Gamble` (PG), `Amazon` (AMZN), `Tesla` (TSLA) and McDonald's` (MCD)

*Stock Focus*: `KO`, `PG`, `AMZN`, `MCD`, `TSLA`,`TM`

In [138]:
primary_buzzwords = ["coca-cola","ko","procter & gamble","pg","amazon","macdonald's","amzn"
                     ,"mcd","retail","supply chain","household"
                     ,"automobile","tesla","tsla","toyota","tm","vehicles","car"
                     ,"ev","electric"]
scraper = FinanceNewsAPIScraper(api_keys[1], primary_buzzwords, secondary_buzzwords, start_date, end_date, required_percentage, retry_after)
filtered_articles = scraper.scrape_and_filter_news()
scraper.save_to_csv(filtered_articles, "news_api_consumers_one_month_geopolitical.csv")

1. India and China agree to de-escalate border tensions (2024-10-21T11:06:40Z)
2. 3 of China's Tesla rivals had a record month, putting pressure on Elon Musk (2024-10-01T10:04:03Z)
3. The Meteoric Rise of Temu and Pinduoduo—and What Might Finally Slow Them Down (2024-10-04T06:00:00Z)
4. Paris Motor Show opens during a brewing EV trade war between the EU and China (2024-10-14T17:14:36Z)
5. Dow Jones Futures Rise; Nvidia Climbs In Buy Area As China Stocks Sell Off (2024-10-08T13:12:07Z)
6. How America Must Stand Up to Putin’s ‘Axis of Evil’ (2024-10-24T17:00:04Z)
7. China's economic issues are so serious that even the oil market doesn't care about Middle East tensions — for now (2024-10-01T06:38:56Z)
8. BlackRock's CEO says China is the biggest supporter of Russia's economy amidst the Ukraine war (2024-10-02T06:10:04Z)
9. A defiant Putin closes global summit aimed at reshaping global order (2024-10-25T03:50:30Z)
10. This Chinese EV maker is up for a battle with Elon Musk in China (2024-1

**Industry Focus:** `IT sector`

**Timeline:** `One month`

In the US the biggest stocks in this industry are `Apple` (AAPL), and `Microsoft` (MSFT)
*Stock Focus*: `AAPL`, `MSFT`, `QCOM`, `NVDA`, `CRWD`

In [139]:
primary_buzzwords = ["crowd strike holdings","crwd","apple","microsoft","chips",
                     "qualcomm", "nvidia","information","technology","cybersecurity"
                     ,"semiconductors"," ai ","artificial","iphone","macbook","tim cook","bill gates"]
scraper = FinanceNewsAPIScraper(api_keys[4], primary_buzzwords, secondary_buzzwords, start_date, end_date, required_percentage, retry_after)
filtered_articles = scraper.scrape_and_filter_news()
scraper.save_to_csv(filtered_articles, "news_api_it_one_month_geopolitical.csv")

1. Tim Walz Called To Eliminate Spending on National Missile Defense (2024-10-14T09:00:12Z)
Articles saved to news_api_it_one_month_geopolitical.csv


**Industry Focus:** `Communication services sector`

**Timeline:** `One month`

In the US the biggest stocks in this industry are `Facebook` (FB), and `Alphabet` (GOOG)
*Stock Focus*: `FB`, `GOOG`, `WBD`, `NFLX`

In [140]:
primary_buzzwords = ["facebook","meta","fb","alphabet","goog","warner bros","wbd","netflix","nflx"
                     ,"broadband","5g","media","ott","television","streaming","platforms","film"
                     ,"movie","industry"]
scraper = FinanceNewsAPIScraper(api_keys[1], primary_buzzwords, secondary_buzzwords, start_date, end_date, required_percentage, retry_after)
filtered_articles = scraper.scrape_and_filter_news()
scraper.save_to_csv(filtered_articles, "news_api_communication_one_month_geopolitical.csv")

1. China-Linked Hackers Target US Internet Providers in Latest Attack (2024-09-30T22:35:00Z)
2. Apple Shares Trailer for 'Submerged' Immersive Vision Pro Short Film (2024-10-07T16:23:10Z)
3. U.S. Wiretap Systems Targeted in China-Linked Hack (2024-10-05T21:21:00Z)
4. Apple Launches 'Submerged' Short Film for Vision Pro, Outlines Upcoming Content (2024-10-10T16:23:53Z)
5. China Cyber Association Calls For Review of Intel Products Sold In China (2024-10-17T10:00:00Z)
6. A year after October 7: How Hamas’s attack and Israel’s response broke the world (2024-10-04T13:35:07Z)
Articles saved to news_api_communication_one_month_geopolitical.csv


**Industry Focus:** `Real estate sector`

**Timeline:** `One month`

In the US the biggest stocks in this industry are `American Tower` (AMT), and `Simon Property Group` (SPG)
*Stock Focus*: `AMT`, `SPG`

In [141]:
primary_buzzwords = ["american tower","amt","simon property group","spg","real","estate","commercial"
                     ,"residential","property","housing","rental","home","prices","mortgage","land"
                     "bubble","loan","urban","planning"]
scraper = FinanceNewsAPIScraper(api_keys[1], primary_buzzwords, secondary_buzzwords, start_date, end_date, required_percentage, retry_after)
filtered_articles = scraper.scrape_and_filter_news()
scraper.save_to_csv(filtered_articles, "news_api_realestate_one_month_geopolitical.csv")

1. Iran and China-linked actors used ChatGPT for preparing attacks (2024-10-11T11:05:49Z)
2. How Big Oil is Astroturfing opposition to Wind and Solar, and Helping Destroy the Earth (2024-10-09T04:06:52Z)
3. Amanda Ba’s Surreal New Exhibition Grapples with Chinese Values (2024-09-27T16:22:00Z)
Articles saved to news_api_realestate_one_month_geopolitical.csv
