Skip to content

sciurus/webster

Repository files navigation

OVERVIEW

This is the code I used in the production of my master's thesis. It includes scripts that prepare a focused web crawl, run it, and analye the results. As you might expect of an academic project, I prioritized learning as I went and getting results quickly rather than a solid design or clean code. If you want to develop your own focused web crawler, there are much better bases (e.g Pattern or Scapy) to build on than this. Still, since I learned a lot from code others shared with me I wanted to continue that spirit by releasing this.

THESIS ABSTRACT

Websites are an increasingly important part of how people seek out political information. Many political scientists believe that the web provides diverse new sources of information. However, some research suggests that the structure of the web promotes a winners-take-all pattern wherein a few sites on any topic receive the bulk of the attention. By building software to analyze the links between webpages on a topic, I am able to demonstrate the latter pattern on four political issues. When analyzing the top sites on each issue, I find that traditional sources of information are outnumbered by web-only sources with formats that allow user participation.

About

Focused Web Crawler

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published