# Website scraping

---

Group name: Group C

---


## Setup

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

## Webscraping with requests

In [2]:
# Website: FiveThirtyEight
url = "https://fivethirtyeight.com/features/2022-governor-state-government/"

In [3]:
html = requests.get(url)

In [4]:
print(f"HTTP status code: {html.status_code}")

HTTP status code: 200


In [5]:
soup = BeautifulSoup(html.text, 'html.parser')

In [6]:
print(soup.get_text())









The Midterms Made State Governments Bluer | FiveThirtyEight

































































































Skip to main content







FiveThirtyEight





Search



Search




ABC News

Menu




											The Midterms Made State Governments Bluer									
Share on Facebook
Share on Twitter






						Politics					



						Sports					



						Science					



						Podcasts					



						Video					



						ABC News					
















2022 Election
The Midterms Made State Governments Bluer



By Nathaniel Rakich


									Nov. 17, 2022, at 6:00 AM								


 





Rebecca Cook / Reuters


Abortion bans, right-to-work laws, voting restrictions — for years, a lot of the major legislation coming out of state capitols has been conservative. But after Democrats’ clear victory in state-level elections last week, landmark liberal policies could be coming to a state near you.  
For the first time in years, more Americans will live

In [7]:
# Titel des Artikels
article_title = soup.h1.text
print(article_title)

The Midterms Made State Governments Bluer


In [8]:
# Thema des Artikels
article_topic = soup.find("a", {"class":"term"}).text
print(article_topic)

2022 Election


In [9]:
# Autor des Artikels
article_author = soup.find("a",{"class":"author url fn"}).text
print(article_author)

Nathaniel Rakich


In [10]:
# Veröffentlichungszeitpunkt des Artikels
article_datetime = soup.find("time",{"class":"datetime"}).text.replace("\n","").replace("\t","")
print(article_datetime)

Nov. 17, 2022, at 6:00 AM


In [11]:
# Artikeltext
article_text_html = soup.findAll("p", {"data-paragraph":"main"})

In [12]:
article_text_list = []
for i in article_text_html:
    article_text_list.append(i.get_text().replace("\xa0",""))

In [13]:
article_text = "".join(article_text_list)

In [14]:
article_text

'Abortion bans, right-to-work laws, voting restrictions — for years, a lot of the major legislation coming out of state capitols has been conservative. But after Democrats’ clear victory in state-level elections last week, landmark liberal policies could be coming to a state near you.For the first time in years, more Americans will live in a state fully controlled by Democrats than in one fully controlled by Republicans.1 Thanks to their wins in gubernatorial or state-legislative elections, Democrats2 took complete control of three new state governments in the 2022 elections: Michigan, Minnesota and Vermont. They broke the GOP monopoly on power in Arizona and, potentially, New Hampshire.3 They also kept full control of state government in four of the five states where they were in danger of losing it. And they prevented Republicans from taking full control of North Carolina, Wisconsin and maybe even Alaska. Republicans, on the other hand, didn’t flip a single legislative chamber from b

In [15]:
# Tags des Atrikel
article_tags_html = soup.find_all("a",{"class":"tag"})

In [16]:
article_tags = []
for i in article_tags_html:
    article_tags.append(i.contents[0].rstrip())

In [17]:
article_tags = ", ".join(article_tags)
article_tags

'2022 Election, 2022 Midterms, 2022 Governors Elections, State Legislatures, State Legislature Elections, Trifectas, State Government'

In [18]:
# Erstellung des Dataframes
df = pd.DataFrame({
    "Titel" : article_title,
    "URL" : url,
    "Thema" : article_topic,
    "Autor" : article_author,
    "Datum" : article_datetime,
    "Artikeltext" : article_text,
    "Tags" : article_tags
}, index=[0])

In [19]:
df

Unnamed: 0,Titel,URL,Thema,Autor,Datum,Artikeltext,Tags
0,The Midterms Made State Governments Bluer,https://fivethirtyeight.com/features/2022-gove...,2022 Election,Nathaniel Rakich,"Nov. 17, 2022, at 6:00 AM","Abortion bans, right-to-work laws, voting rest...","2022 Election, 2022 Midterms, 2022 Governors E..."


In [24]:
# Export zu CSV-Datei
df.to_csv("/Users/lukas/Documents/ds_homework_1/DS-Homework1/data/raw/fivethirtyeight_article.csv")