<a href="https://colab.research.google.com/github/fdac25/trading/blob/main/src/forbesScraper.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Article scraper - scrapes forbes articles for article title, author, publication date, and content

# Imports
import requests
from bs4 import BeautifulSoup
import pandas as pd
import re
import time
import random

In [2]:
# Get critical elements from article webpage
def scrape(url, wait_times=[1,5,30], printErrors = True):
  # Run until scrape is successful
  completed = False
  errors = 0
  while(not completed):
    # Get html and scrape elements
    try:
      response = requests.get(url)
      soup = BeautifulSoup(response.content, 'html.parser')
      title = soup.find_all('h1')[0].text.strip()
      date = soup.find_all('time')[0].text.strip()
      p = [elem.text.strip() for elem in soup.find_all('p')]
      author = p[0][2:-1]
      body = "\n".join(p[2:])
      completed = True

    # Print html and wait a bit if there's an error
    except:
      errors += 1
      print(f"({errors}) Scraping error")
      if(printErrors): print(soup.prettify())
      completed = False
      if(len(wait_times) > 1): time.sleep(wait_times[errors])
      else: time.sleep(wait_times[0])

      # Skip article if the article cannot be scraped
      if errors >= len(wait_times)-1:
        print("(!) Skipping article...")
        return "0", "0", "0", "0"

  time.sleep(wait_times[0])
  return title, date, author, body

# Test - print elements
title, author, date, body = scrape("https://www.forbes.com/sites/jonmarkman/2025/11/04/you-missed-nvidia-because-you-think-in-straight-lines/")
print(title)
print("")
print(date)
print("")
print(author)
print("")
print(body)

You Missed Nvidia Because You Think In Straight Lines

Jon Markman

Nov 04, 2025, 10:57am EST

Your greatest risk in this market isn’t what you own. It's the mental model you use to value it.
Investors often struggle with an ingrained inability to move beyond linear thinking. They dismiss transformative technologies as bubbles because their intuition can’t grasp the fundamentals of exponential math. Exponential growth fundamentally differs from linear progress. Small, iterative doublings lead to massive scale quickly, unlike steady, incremental increases.
Some of you may be aware of the A4 paper folding problem. A standard A4 is 1 mm thick. The first few folds are trivial, adding a few millimeters. This is the deceptive early slowness where linear forecasts tell investors to sell. But keep going: Folding the same sheet 50 times leads to thickness that exceeds the distance to the Moon. Exponential processes defy intuition through sudden, vast scale changes.
Our cognitive bias, an expone

In [3]:
# Open CSV containing links
df = pd.DataFrame(pd.read_csv("forbes_search.csv"))
links = df['Link'].tolist()
numLinks = len(links)

In [4]:
data = [["0"] for i in range(numLinks)]

# Loop through articles until all articles have been scraped
passNum = 1
while(True):
  for i in range(numLinks):
    if data[i][0] == "0":
      print(f"Scraping article {i+1} of {numLinks}")
      dataAquired = False
      data[i] = scrape(links[i], [random.uniform(1,3)], False)

  # Post-pass
  passNum += 1
  aquired = 0
  for i in range(numLinks):
    if data[i][0] != "0": aquired += 1
  print(f"Scraped {aquired} out of {numLinks} articles")
  if aquired == numLinks: break
  print(f"Waiting before starting pass {passNum}...")
  time.sleep(5)
print("Scraping Complete")

Scraping article 1 of 738
Scraping article 2 of 738
Scraping article 3 of 738
Scraping article 4 of 738
Scraping article 5 of 738
Scraping article 6 of 738
Scraping article 7 of 738
Scraping article 8 of 738
Scraping article 9 of 738
Scraping article 10 of 738
Scraping article 11 of 738
Scraping article 12 of 738
Scraping article 13 of 738
Scraping article 14 of 738
Scraping article 15 of 738
Scraping article 16 of 738
Scraping article 17 of 738
Scraping article 18 of 738
Scraping article 19 of 738
Scraping article 20 of 738
Scraping article 21 of 738
Scraping article 22 of 738
Scraping article 23 of 738
Scraping article 24 of 738
Scraping article 25 of 738
Scraping article 26 of 738
Scraping article 27 of 738
Scraping article 28 of 738
Scraping article 29 of 738
Scraping article 30 of 738
Scraping article 31 of 738
Scraping article 32 of 738
Scraping article 33 of 738
Scraping article 34 of 738
Scraping article 35 of 738
Scraping article 36 of 738
Scraping article 37 of 738
Scraping a

In [5]:
# Export data as CSV
df_final = pd.DataFrame(data, columns=['Title', 'Time', 'Author', 'Body'])
df_final.insert(0, 'Link', df['Link'])
df_final.to_csv('forbes_articles.csv', index=True)
df_final.head(10)

Unnamed: 0,Link,Title,Time,Author,Body
0,https://www.forbes.com/sites/catherinebrock/20...,Stock Prices Finish Week Lower After Put Filin...,"Nov 07, 2025, 06:30pm EST",Catherine Brock,U.S. stocks retreated this week after investor...
1,https://www.forbes.com/sites/greatspeculations...,A Decade Of Rewards: $83 Bil From NVIDIA Stock,"Nov 07, 2025, 11:55am EST",Trefis Team,"Over the past ten years, NVIDIA (NVDA) stock h..."
2,https://www.forbes.com/sites/johnkoetsier/2025...,8 Robotics Startups Backed By Nvidia And Amazon,"Nov 06, 2025, 02:20pm EST",John Koetsier,Amazon and Nvidia are backing eight AI and rob...
3,https://www.forbes.com/sites/ywang/2025/11/05/...,Founder Of The ‘Nvidia Of China’ Triples His W...,"Nov 05, 2025, 04:39pm EST",Yue Wang,This story is part of Forbes’ coverage of Chin...
4,https://www.forbes.com/sites/jonmarkman/2025/1...,You Missed Nvidia Because You Think In Straigh...,"Nov 04, 2025, 10:57am EST",Jon Markman,Your greatest risk in this market isn’t what y...
5,https://www.forbes.com/sites/roomykhan/2025/11...,All Roads Lead To NVIDIA: Bankrolling Its Own ...,"Nov 03, 2025, 11:06am EST",Roomy Khan,\nNVIDIA just wrote a $5 billion check for a s...
6,https://www.forbes.com/sites/siladityaray/2025...,Trump Says He’ll Allow China-Nvidia Deals Exce...,"Nov 03, 2025, 05:57am EST",Siladitya Ray,President Donald Trump on Sunday said he will ...
7,https://www.forbes.com/sites/marcochiappetta/2...,"US DOE Taps Nvidia, AMD, And Oracle For Quarte...","Oct 31, 2025, 05:34pm EDT",Marco Chiappetta,"Over the last few days, the U.S. Department of..."
8,https://www.forbes.com/sites/the-prototype/202...,Nvidia Expands Into Quantum Computing And Fusi...,"Oct 31, 2025, 02:01pm EDT",Alex Knapp,"In this week’s edition of The Prototype, we lo..."
9,https://www.forbes.com/sites/carminegallo/2025...,Nvidia’s $5 Trillion Storyteller-In-Chief,"Oct 31, 2025, 05:30am EDT",Carmine Gallo,A plaque that hangs above a Denny’s booth in S...
