GitHub - rryley/prolog_web_programming: Prolog implemented Web scraper and data cleaner

Spiders and Scrapers in Prolog (8/16/2016)

This project evolved out of my attempts to automate the data collection for various projects I am working on. After surveying numerous other languages, Prolog looked like an excellent choice for writing a flexible codebase that could be reused for many different purposes. I did not realize how simple various web programming projects become when you use the excellent tools provided by the SWI prolog community. This tutorial will show you how to take advantage of them.

This project will be broken up into 3 parts. Part 1 will guide you through the process of writing a web scraper -- a tool used to extract data from an HTML document. Data extraction is a key element in virtually any web programming project; Prolog makes this very easy by providing a database-like interface to any page. Part 2 will put the webscraper build in part 1 to good use by building a site crawler that verifies outbound links, and reports which ones are broken. Part 3 will build upon the crawler developed in part 2 and incorporate advanced prolog features such as threads, session managment, error handling, and manipulation of binary file types.

** IN PROGRESS **

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
src		src
t		t
test_input		test_input
README.md		README.md
README~		README~

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

t

t

test_input

test_input

README.md

README.md

README~

README~

Repository files navigation

About

Releases

Packages

Contributors 2

Languages

rryley/prolog_web_programming

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Languages