SDARTS is a protocol for metasearching over document collections. You may consider using SDARTS if:
- You want to search (one or multiple) text or XML collections that you have from a single search interface.
- You want to search remote document collections that export their metadata under the Open Archives protocol.
- You want to search multiple web-based document collections from one, single search interface.
SDARTS was developed as part of PERSIVAL (an NSF Digital Library Initiative--Phase 2 project) at the Computer Science Department of Columbia University.
SDARTS is a hybrid of two previously existing protocols, STARTS and SDLIP. SDARTS is essentially an instantiation of the SDLIP protocol with a richer set of metadata, which can be effectively used for building sophisticated metasearchers. SDARTS makes a wide variety of collections with heterogeneous interfaces accessible under one uniform interface.
The SDARTS toolkit provides ready-to-use, configurable wrappers. They can be used directly for wrapping locally available text and XML collections, and for wrapping web-accessible databases.
The SDARTS toolkit also contains two optional sets of applications: The OAI SDARTS Cooperative Suite, which can makes SDARTS OAI-compliant and enables SDARTS to access OAI-compliant collections. We provide the SDARTS Automatic Content Summary Extraction for remote web databases, which extracts statistics about the vocabulary and the word frequencies of web databases over which SDARTS does not have immediate access.
- SDARTS Server
- SDARTS Web Client
- SDARTS Web Client (with Collection Selection)
- DBSelection module
- SDARTS Automatic Content Summary Extraction
- OAI-SDARTS Suite
Source code documentation
Wrapper Configuration
SDARTS supports three types of collections: text "doc" wrapper, xml "doc" wrapper, and "www" wrapper, which is for local plain text documents, local xml documents and remote web-based collections fronted by CGI-based search engine, respectively.
- SDARTS Server 3.0 (last updated on: Apr 2004)
- SDARTS SOAP API (last updated on: Apr 2004)
- Optional Component:
- OAI Harvester 2.0 for SDARTS (last updated on: May 2003)
- SDARTS Indexer and Database Selection (last updated on: May 2004)
- SDARTS Web Client 3.0 (last updated on: April 2004)
- SOAP API client Java classes (last updated on: April 2004)
(Note: These are the document collections themselves; the wrapping files are in the distribution)
- 20groups (.tar.gz) (.zip) (2,000 newsgroup articles; free text with structured headers)
- Aides (.tar.gz) (.zip) (66 XML documents)
The SDARTS API provides a way to query the collections indexed by an SDARTS server directly from within an application. The SDARTS API is a web service over SOAP, and the WSDL description of the service is provided, so the developers can use the API using their favorite language.
To use the SDARTS API, developers can either:
-
Download the WSDL description of the service (and use for example SOAP::Lite for Perl, or Visual Studio .NET, or any other language that supports web services), or
-
Download the necessary proxy files for Java from the "Download" section.
- Distributed Search over the Hidden-Web: Hierarchical Database Sampling and Selection, Panagiotis G. Ipeirotis and Luis Gravano, in Proceedings of the 28th International Conference on Very Large Data Bases (VLDB 2002), 2002.
- Extending SDARTS: Extracting Metadata from Web Databases and Interfacing with the Open Archives Initiative, Panagiotis G. Ipeirotis, Tom Barry, and Luis Gravano, in Proceedings of the Second ACM+IEEE Joint Conference on Digital Libraries (JCDL 2002)
- SDLIP + STARTS = SDARTS: A Protocol and Toolkit for Metasearching, Noah Green, Panagiotis G. Ipeirotis, and Luis Gravano, in Proceedings of the First ACM+IEEE Joint Conference on Digital Libraries (JCDL 2001)
- Distributed Search over the Hidden WebHierarchical Database Sampling and Selection (.ppt) (VLDB 2002)
- Extending SDARTS: Extracting Metadata from Web Databasesnd Interfacing with Open Archives Initiative (.ppt) (JCDL 2002)
- SDLIP + STARTS = SDARTS: Protocol and Toolkit for Metasearching (.ppt) (JCDL 2001)
- SDARTS - A Metasearching Protocol and Architecture for Digital Libraries (.ppt) (internal DLI2-PERSIVAL meeting at Columbia)