Web scraper for extracting information about companies

Python project that uses the scrapy framework to scrape informations from websites.

This project is the homework 5 of 2022-2023 Ingegneria dei Dati course at Roma Tre University. The webscraper allows to scrape information about companies from 3 websites: CompaniesMarketCap, Value.Today and Disfold.

Getting Started

$ git clone https://github.com/mgranchelli/companies-scraper.git
$ pip install -r requirements.txt

Running the Spider

The project has spiders named as market_cap, value_today and disfold which can be executed using following commands:

$ scrapy crawl <spider name>

To store the output to a JSON/CSV file:

$ scrapy crawl <name of spider> -o <output file name>.json
$ scrapy crawl <name of spider> -o <output file name>.csv

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
output		output
webscraper		webscraper
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web scraper for extracting information about companies

Getting Started

Running the Spider

About

Releases

Packages

Languages

mgranchelli/companies-scraper

Folders and files

Latest commit

History

Repository files navigation

Web scraper for extracting information about companies

Getting Started

Running the Spider

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages