# Scraping Titles 'Le Temps'

In [3]:
# Import required packages
import requests
from bs4 import BeautifulSoup
import pandas as pd
from datetime import datetime
from dateutil import relativedelta

In a first step, you can define a list with all the dates that should be scraped - this can be adapted according to needs, for a new year the start can be set to datetime(2023, 3, 1), as until this date the scraping has been done. Like this, the dataset to be treated will be smaller as well. 

In [None]:
# Define start and end dates
start = datetime(1998, 3, 1)
end = datetime.now()

date_list = []

date = start
while date <= end:
    date_list.append(date.strftime("%Y%m"))
    date += relativedelta(months=1)


Start the scraping of the newspaper 'Le Temps' online Archives. The titles are on the first page under the class "main-grid", enclosed inside sections like `<li><a href="url">Title</a></li>`, we can access and scrape the link and its text.

In [None]:
# URL of the Le Temps archive page
url_start = "https://www.letemps.ch/archive/"
data = []
missed_data = []

for i in date_list:
    url = url_start+i

    headers = {"User-Agent": "Mozilla/5.0"}
    response = requests.get(url, headers=headers)
    response.raise_for_status()

    soup = BeautifulSoup(response.text, "html.parser")

    main_grid = soup.find(class_="main-grid")

    if main_grid:
        articles = main_grid.select("li a")
    
    # Extract and print titles and links
        for article in articles:
            title = article.get_text(strip=True)
            link = article["href"]

            if not link.startswith("http"):
                link = "https://www.letemps.ch" + link

            data.append({"Title": title, "Link": link, "Date": i})
    else:
        print("No 'main-grid' section found on the page.")

# Create dataframe with the data inside of it
df = pd.DataFrame(data=data, columns= ["Title", "Link", "Date"])
df.head()

Unnamed: 0,Title,Link,Date
0,La Banque vaudoise de crédit coulée par sa fré...,https://www.letemps.ch/societe/banque-vaudoise...,199803
1,Un policier a abusé d'un adolescent pendant si...,https://www.letemps.ch/societe/un-policier-abu...,199803
2,"Le métier d'inventeur, ou l'art de plaire à se...",https://www.letemps.ch/societe/metier-dinvente...,199803
3,Par Luis Lema: une Europe en expansion,https://www.letemps.ch/opinions/editoriaux/lui...,199803
4,Les cantons sont poussés à des choix drastiques,https://www.letemps.ch/suisse/cantons-pousses-...,199803


In order to save the scraped data at its different steps, it is possible to export them to an excel for example. I used this to be able to work on the partial steps in the same time. 

In [None]:
# Dataframe to excel
df.to_excel("Titles_le_Temps.xlsx",sheet_name='Le_Temps')  