An awful search engine and crawler
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
bin
include
lib/python2.7
spider
static
templates
tests
.gitignore
application.py
data_structure
database.py
helpers.py
readme.md
routes.py
settings.py
stopwords.py
unit_tests.py

readme.md

Toastie

About

Toastie is a basic web spider written in Python that scans webpages' content and enters it into a database. The front-end of the website allows the user to search through scanned pages.

Technologies

Python, Flask, Werkzeug, Jinja, Requests, urlparse, re, robotparser, pymongo, MongoDB, Twitter Bootstrap

Installation

Install python2.7, pip, mongodb, and the python-mongodb connector
sudo pip install Virtualenv
git clone http://www.github.com/gmiller2007/Py-Webscraper
source bin/activate
bin/pip install flask
bin/pip install pymongo
bin/pip install BeautifulSoup
deactivate

Run the Application

source bin/activate
bin/python2.7 application.py

Notes

To safely close the virtual environment run the command 'deactivate'