# Import libraries

In [2]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time


# Set Up Search

We’ll start by searching Google for articles from the Financial Times that mention Bitcoin using a trick called "site:ft.com bitcoin".

However, Google doesn’t like bots much, so instead of using requests, we’ll use a Python library called serpapi, which is free to start with and built for scraping search results.

But for now, let's try the basic approach using requests and BeautifulSoup to check if we can get results from DuckDuckGo, which is more scrape-friendly.

In [None]:

# Step 1: Define the search query
query = "bitcoin site:ft.com"
url = f"https://html.duckduckgo.com/html/?q={query}"

# Step 2: Send the request
headers = {
    "User-Agent": "Mozilla/5.0"
}
response = requests.get(url, headers=headers)

# Step 3: Parse the results
soup = BeautifulSoup(response.text, "html.parser")
results = soup.find_all("a", class_="result__a")

# Step 4: Store the results
articles = []
for result in results:
    title = result.get_text()
    link = result['href']
    articles.append({"title": title, "link": link})

# Step 5: Convert to DataFrame
ft_df = pd.DataFrame(articles)
print(ft_df.head())

# Step 6: Save to CSV
ft_df.to_csv("bitcoin_ft_articles.csv", index=False)



                                               title  \
0  Recht 24/7 - Online Kanzlei | Anwalt sofort on...   
1  1 Unze Kupfermünze Bitcoin Waage | Kettner Ede...   
2                          Bitcoin - Financial Times   
3  BlackRock closes in on crown of world's larges...   
4  Bitcoin's halving makes its future pay-to-play...   

                                                link  
0  //duckduckgo.com/l/?uddg=https%3A%2F%2Fduckduc...  
1  //duckduckgo.com/l/?uddg=https%3A%2F%2Fduckduc...  
2  //duckduckgo.com/l/?uddg=https%3A%2F%2Fwww.ft....  
3  //duckduckgo.com/l/?uddg=https%3A%2F%2Fwww.ft....  
4  //duckduckgo.com/l/?uddg=https%3A%2F%2Fwww.ft....  


#NOTE -  - What are headers?
Headers are extra info you send with your request to the website.

The User-Agent tells the site who is visiting — usually a web browser like Chrome or Firefox.

Without this, the site might know it’s a bot (not a human) and block you.
So we're pretending to be a regular browser by saying:

“Hi, I’m a normal human using Mozilla/Chrome!”

🛡️ Why it's important: Helps avoid blocks or errors from sites that dislike bots.

❓ What is BeautifulSoup doing?
Think of BeautifulSoup like a smart highlighter. It takes the messy HTML code of a webpage and helps you pick out the parts you want.

soup = BeautifulSoup(...)
→ Turns the raw HTML from the page into a searchable format.

soup.find_all("a", class_="result__a")
→ Looks for all <a> tags (which are links), with the class "result__a", which DuckDuckGo uses for search result titles.

Basically, you’re saying:

“Show me all the links that are search results.”

Then we loop through those links and pull out the title and URL.