Skip to content

malvanos/simpleCrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

simpleCrawler

Simple python script that uses the bs4 package to extract the text from the body of a webpage without html tags or Javascript.

Requirements

Install the following packages:

pip install requests pip install bs4

For the Selenium crawler you need also the selenium package:

pip install selenium

You have also to download the driver from here and unzip it in a directory named driver:

http://chromedriver.storage.googleapis.com/index.html

Instructions

Run python simple.py www.google.com to test it.

Run python simpleSelenium.py www.google.com to test it with the google chrome driver.

About

A simple crawler without Javascript support.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages