Python project that uses the scrapy framework to scrape informations from websites.
This project is the homework 5 of 2022-2023 Ingegneria dei Dati course at Roma Tre University. The webscraper allows to scrape information about companies from 3 websites: CompaniesMarketCap, Value.Today and Disfold.
$ git clone https://github.com/mgranchelli/companies-scraper.git
$ pip install -r requirements.txt
The project has spiders named as market_cap
, value_today
and disfold
which can be executed using following commands:
$ scrapy crawl <spider name>
To store the output to a JSON/CSV file:
$ scrapy crawl <name of spider> -o <output file name>.json
$ scrapy crawl <name of spider> -o <output file name>.csv