Skip to content

ramadanmostafa/reuter_spider

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

reuters spider

this project is a scrapy spider that a list of symbols in a text file then scrap reuters.com to get required information.

Installation you just need to have python 3.6 installed and all packages included in requirements.txt.

Used Packages

-scrapy: get and parse pages.

-sqlite database: to store the scrapped data.

How to run

on the project root folder,

cd reuters

update symbol.txt for the targeted symbols (one each line).

python run_crawler.py

this cmd will start the crawler that 1- read all symbols from symbols.txt 2- get html code for all symbol pages (one by one) 3- parse each page to get date we are interested in 4- store each item (parsed data) into sqlite db 5- write the item to the output csv file

after the crawler is done, it will generate errors.log file that contains the crawler errors and output.csv that contains the scrapped data

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages