Gleaner is a tool for extracting JSON-LD from web pages. You provide Gleaner a list of sites to index and it will access and retrieve pages based on the sitemap.xml of the domain(s). Gleaner can then check for well formed and valid structure in documents and process the JSON-LD data graphs into a form usable to drive a search interface.
GleanerIO
A set of projects implementing principles around indexing structured data on the web / schema.org (Developed as part of NSF's EarthCube)
Pinned Loading
Repositories
Showing 10 of 11 repositories
- scienceonschemaexamples Public
This repository will contain actual science on schema JSON-LD files, and HTML pages that contain JSON-LD scripts to document as possible test cases
gleanerio/scienceonschemaexamples’s past year of commit activity