We’re going to scrape the https://news.ycombinator.com/news front page, using
requests and Beautiful Soup. 

Take some time to explore the page if you haven’t heard
about it already. 

Hacker News is a popular aggregator of news articles that “hackers”
(computer scientists, entrepreneurs, data scientists) find interesting.
We’ll store the scraped information in a simple Python list of dictionary objects for
this example. The code to scrape this page looks as follows:

In [None]:
import requests
import re
from bs4 import BeautifulSoup
articles = []
url = 'https://news.ycombinator.com/news'
r = requests.get(url)
html_soup = BeautifulSoup(r.text, 'html.parser')
for item in html_soup.find_all('tr', class_='athing'):
item_a = item.find('a', class_='storylink')
item_link = item_a.get('href') if item_a else None
item_text = item_a.get_text(strip=True) if item_a else None
next_row = item.find_next_sibling('tr')
item_score = next_row.find('span', class_='score')
item_score = item_score.get_text(strip=True) if item_score else '0 points'
# We use regex here to find the correct element
item_comments = next_row.find('a', string=re.compile('\d+(&nbsp;|\s)
comment(s?)'))
item_comments = item_comments.get_text(strip=True).replace('\xa0', ' ') \
if item_comments else '0 comments'
articles.append({
'link' : item_link,
'title' : item_text,
'score' : item_score,
'comments' : item_comments})
for article in articles:
print(article)

This will output the following:

`{'li nk': 'http://moolenaar.net/habits.html', 'title': 'Seven habits of 
effective text editing (2000)', 'score': '44 points', 'comments': 
'9 comments'}
{'li nk': 'https://www.repository.cam.ac.uk/handle/1810/251038', 'title': 
'Properties of expanding universes (1966)', 'score': '52 points', 
'comments': '8 comments'}
[...]`


Try expanding this code to scrape a link to the comments page as well. Think
about potential use cases that would be possible when you also scrape the comments
themselves (for example, in the context of text mining).