Scrapy

Overview

A web scraper built using Scrapy, a free open-source web crawling framework written in Python.

"Any content that can be viewed on a webpage can be scraped. Period."

Purpose

A prevalent problem faced by society today is that of fake news. This issue can be combatted using a machine learning based tool that classifies articles or parts of articles as being untrue. However, in order to use machine learning, one needs a lot of data. By building a robust web scraper, I hope to be able to gather the necessary data to develop a dataset for a fake news detector tool.

Requirements

Python 3.5+
Scrapy

Install

The quick way:

pip install scrapy

See the install section in the documentation at https://docs.scrapy.org/en/latest/intro/install.html for more details.

Problems

overcome four distinct threat defense mechanisms

User agent filtering
Obfuscated javascript redirects
Captchas
Header consistency checks

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
quotes_spider		quotes_spider
README.rst		README.rst
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scrapy

Overview

Purpose

Requirements

Install

Problems

About

Releases

Packages

Languages

hshah032/Scrapy

Folders and files

Latest commit

History

Repository files navigation

Scrapy

Overview

Purpose

Requirements

Install

Problems

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages