Skip to content

Web scraper for extracting information about companies

Notifications You must be signed in to change notification settings

mgranchelli/companies-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web scraper for extracting information about companies

Python project that uses the scrapy framework to scrape informations from websites.

This project is the homework 5 of 2022-2023 Ingegneria dei Dati course at Roma Tre University. The webscraper allows to scrape information about companies from 3 websites: CompaniesMarketCap, Value.Today and Disfold.

Getting Started

$ git clone https://github.com/mgranchelli/companies-scraper.git
$ pip install -r requirements.txt

Running the Spider

The project has spiders named as market_cap, value_today and disfold which can be executed using following commands:

$ scrapy crawl <spider name>

To store the output to a JSON/CSV file:

$ scrapy crawl <name of spider> -o <output file name>.json
$ scrapy crawl <name of spider> -o <output file name>.csv

About

Web scraper for extracting information about companies

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages