Skip to content

Scripts for extracting comments from dynamic web pages with Selenium

Notifications You must be signed in to change notification settings

ffedox/ilgiornale-scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

98 Commits
 
 
 
 
 
 

Repository files navigation

Exctracting comments with a Selenium-based web scraper

Scripts for extracting comments and articles from the website of the newspaper Il Giornale.

Contents

  1. Extract_article_ilgiornale.py: script for extracting the text of an article given its URL.
  2. Extract_comments_ilgiornale.py: script for extracting the comments of an article given its URL.

Getting the code

A copy of all the files can be downloaded by cloning the git repository:

git clone https://github.com/ffedox/ilgiornale_scraping

Setup and installation

  1. Install BeautifulSoup
    pip install beautifulsoup4
  2. Install Tkinter
    pip install tk
  3. Install Selenium
    pip install selenium
  4. Download ChromeDriver or install Chromedriver-Autoinstaller
    pip install chromedriver-autoinstaller
  5. Add ChromeDriver to system's PATH or include the path when instantiating webdriver.Chrome
    driver = webdriver.Chrome(executable_path='C:/path/to/chromedriver.exe'

About

Scripts for extracting comments from dynamic web pages with Selenium

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages