Spiders and Scrapers in Prolog (8/16/2016)
This project evolved out of my attempts to automate the data collection for various projects I am working on. After surveying numerous other languages, Prolog looked like an excellent choice for writing a flexible codebase that could be reused for many different purposes. I did not realize how simple various web programming projects become when you use the excellent tools provided by the SWI prolog community. This tutorial will show you how to take advantage of them.
This project will be broken up into 3 parts. Part 1 will guide you through the process of writing a web scraper -- a tool used to extract data from an HTML document. Data extraction is a key element in virtually any web programming project; Prolog makes this very easy by providing a database-like interface to any page. Part 2 will put the webscraper build in part 1 to good use by building a site crawler that verifies outbound links, and reports which ones are broken. Part 3 will build upon the crawler developed in part 2 and incorporate advanced prolog features such as threads, session managment, error handling, and manipulation of binary file types.
** IN PROGRESS **