Python-web-crawler

This is a Python based web crawler that uses Pagerank to rank search results. Build it using the foundation Python course on Udacity.

The inputs needed are :

Seed Page - This is the page, from which it starts crawling the web.

Search term - The term you want to search. Soon, I will add support for querying of multiple words, but for now, give a single word

eg:is

Maximum depth - This is the number of links to crawl completely. It would take 30 second for first link and for second link 60 seconds and it keeps on doubling. So, maximum 10 links are more than enough, I would say

eg:10

After you give the above three inputs, the program starts running. It may take a lot of time, to crawl depending on the depth you have specified. So the depth number, will be visible, decrementing itself, when ever a link is completely crawled; so that when it reaches 0 you know the crawling ended.

Also, I used the page rank algorithm (compute_ranks module). The page ranks are displayed alongside the links after the search results are shown. Then the program sorts them, and presents the sorted results.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
crawler.py		crawler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Python-web-crawler

About

Uh oh!

Releases

Packages

Languages

rishabhjain30/Python-web-crawler

Folders and files

Latest commit

History

Repository files navigation

Python-web-crawler

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages