Skip to content

rryley/prolog_web_programming

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spiders and Scrapers in Prolog (8/16/2016)

This project evolved out of my attempts to automate the data collection for various projects I am working on. After surveying numerous other languages, Prolog looked like an excellent choice for writing a flexible codebase that could be reused for many different purposes. I did not realize how simple various web programming projects become when you use the excellent tools provided by the SWI prolog community. This tutorial will show you how to take advantage of them.

This project will be broken up into 3 parts. Part 1 will guide you through the process of writing a web scraper -- a tool used to extract data from an HTML document. Data extraction is a key element in virtually any web programming project; Prolog makes this very easy by providing a database-like interface to any page. Part 2 will put the webscraper build in part 1 to good use by building a site crawler that verifies outbound links, and reports which ones are broken. Part 3 will build upon the crawler developed in part 2 and incorporate advanced prolog features such as threads, session managment, error handling, and manipulation of binary file types.

** IN PROGRESS **

About

Prolog implemented Web scraper and data cleaner

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages