Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Examples from my book "Scripting Intelligence: Web 3.0 Information Gathering and Processing"

branch: master

Fetching latest commit…

Octocat-spinner-32-eaf2f5

Cannot retrieve the latest commit at this time

Octocat-spinner-32 appendices
Octocat-spinner-32 part1
Octocat-spinner-32 part2
Octocat-spinner-32 part3
Octocat-spinner-32 part4
Octocat-spinner-32 .gitignore
Octocat-spinner-32 README.txt
README.txt
Dear Reader,

That you for purchasing my book:

"Scripting Intelligence, Web 3.0 Information Gathering and Processing"  APress 2009

This directory contains the source code and data for the examples in my book. Even if you didn't buy the book, hopefully you will still find the example code useful.

When I wrote the examples for this book I created an Amazon EC2 AMI with the examples installed and running (as described in Appendix A). This AMI is very out of date and I suggest that you not try to use it.

I also suggest that you skip the old Rails demo programs in Part 4 of the book for reasons that I document on the errata web page for this book (http://markwatson.com/books/web3_book/).

The code in Parts 1, 2, and 3 of the book, while old, should still be relevant and useful. Almost all of the code is Ruby (with some Java Hadoop example code) and has useful utilities for Natural Language Processing (NLP), Semantic Web, accessing both relational and NoSQL type data stores, etc.

There are subdirectories for each part of my book. I did not separate the examples into directories for individual chapters because sometimes examples for different chapters in a book part share libraries and data.

Each subdirectory also contains a README.txt file. Many of the examples require other software (usually open source) to run - these dependencies, with download links, are listed in the book. The README.txt files in the book part subdirectories contain information for running the examples in the same order as the material appears in the book.

Here is a summary of the table of contents for the book:

PART 1 Text Processing: Natural Language Processing, Parsing Common Document Types, Cleaning, Segmenting, and Spell-Checking Text

PART 2 The Semantic Web: Using RDF and RDFS Data Formats, Delving Into RDF Data Stores, Performing SPARQL Queries and Understanding Reasoning, Implementing SPARQL Endpoint Web Portals

PART 3 Information Gathering and Storage: Relational Databases, Indexing and Search, Using Web Scraping to Create Semantic Relations, Strategies for Large-Scale Data Storage

PART 4 Information Publishing: Creating Web Mashups, Performing Large-Scale Data Processing, Building Information Web Portals
  
Best regards,
Mark Watson
Something went wrong with that request. Please try again.