GitHub

reuters spider

this project is a scrapy spider that a list of symbols in a text file then scrap reuters.com to get required information.

Installation you just need to have python 3.6 installed and all packages included in requirements.txt.

Used Packages

-scrapy: get and parse pages.

-sqlite database: to store the scrapped data.

How to run

on the project root folder,

cd reuters

update symbol.txt for the targeted symbols (one each line).

python run_crawler.py

this cmd will start the crawler that 1- read all symbols from symbols.txt 2- get html code for all symbol pages (one by one) 3- parse each page to get date we are interested in 4- store each item (parsed data) into sqlite db 5- write the item to the output csv file

after the crawler is done, it will generate errors.log file that contains the crawler errors and output.csv that contains the scrapped data

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.idea		.idea
reuters		reuters
.gitignore		.gitignore
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

ramadanmostafa/reuter_spider

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages