Ripple configuration properties

joshsh edited this page Mar 18, 2013 · 24 revisions

There are a number of aspects of Ripple's behavior, look and feel which can be customized using configuration properties. Ripple comes with a set of default properties, any or all of which may be overridden with user-defined properties: if you do not specify a value for a given property, Ripple falls back on the default value. Configuration properties may be provided at the Java level:

Properties props = ...

or passed in to the startup script:

./ my-config.props

See also Running Ripple.

Demo application properties

Ripple comes with a "Demo" program, which is what you're using when you execute the startup script The following properties allow you to customize the data store which the demo connects to.

The net.fortytwo.ripple.demo.sailType property defines the Sail implementation which Ripple uses as its RDF triple store. The default is LinkedDataSail. When using LinkedDataSail, you may also specify an alternate "base Sail" to use as its caching layer. The default is MemoryStore, a fast (but small-scale) in-memory triple store:

net.fortytwo.ripple.demo.sailType = net.fortytwo.linkeddata.sail.LinkedDataSail
net.fortytwo.ripple.demo.linkedDataSailBaseSail = org.openrdf.sail.memory.MemoryStore

You can save cached Linked Data between sessions using MemoryStore's persistence file:

net.fortytwo.ripple.demo.sailType = net.fortytwo.linkeddata.sail.LinkedDataSail
net.fortytwo.ripple.demo.linkedDataSailBaseSail = org.openrdf.sail.memory.MemoryStore
net.fortytwo.ripple.demo.memoryStorePersistFile = /Users/lebot/Desktop/ripple-save-file

This will cause the cache to be read in from the persistence file (if it exists) when Ripple initializes, and written back to the file when Ripple shuts down.

For a more dynamic cache which can also scale to hundreds of millions of triples, use NativeStore as the base Sail instead of MemoryStore. NativeStore is served from a configurable directory:

net.fortytwo.ripple.demo.sailType = net.fortytwo.linkeddata.sail.LinkedDataSail
net.fortytwo.ripple.demo.linkedDataSailBaseSail = org.openrdf.sail.nativerdf.NativeStore
net.fortytwo.ripple.demo.nativeStoreDirectory = /Users/lebot/Desktop/ripple-store 

You can also connect Ripple to NativeStore directly, without the Linked Data layer (for example, if you have pre-loaded a static dataset which you would like to explore with Ripple):

net.fortytwo.ripple.demo.sailType = org.openrdf.sail.nativerdf.NativeStore
net.fortytwo.ripple.demo.nativeStoreDirectory = /Users/lebot/Desktop/ripple-store 

Even higher-capacity triple stores, such as AllegroGraph and Blueprints Sail may be used with Ripple, although they are not built in to the demo. To use any other Sail implementation with Ripple, look at and modify it to use the Sail of your choice, or see the "Embedded Ripple" section in Running Ripple.

To connect Ripple to a remote SPARQL endpoint, use SPARQLRepositorySail:

net.fortytwo.ripple.demo.sailType = net.fortytwo.ripple.config.SPARQLRepositorySail
net.fortytwo.ripple.demo.sparqlEndpointUrl =

Command-line properties

These properties allow you to customize Ripple's interactive command-line environment.

net.fortytwo.ripple.cli.bufferQueryResults (a boolean value which defaults to false) tells Ripple whether to wait until all solutions to a query have been found, before displaying them. Otherwise, Ripple will stream solutions into the terminal as soon as they are computed (recommended for the impatient).

net.fortytwo.ripple.cli.bufferQueryResults = false

net.fortytwo.ripple.cli.resourceViewShowEdges (a boolean value which defaults to true) tells Ripple to display RDF statements about each query result (specifically, about the resource at the top of each stack). This is helpful when exploring new data, although it makes query results more verbose.

net.fortytwo.ripple.cli.resultViewShowEdges = true

net.fortytwo.ripple.cli.resultViewMaxObjects (an integer value which defaults to 32) places an upper limit on the number of objects Ripple will display for a given query result and predicate. For example, if resourceViewShowEdges is enabled (see above) and the query result is a person with 100 friends, Ripple will save space by showing you only 32 of the friends, indicating additional friends with an ellipsis ("..."). Of course, if you're interested in the friends specifically, you can add a foaf:knows. to the end of your query and see them all. A value of -1 for this property removes the limit entirely.

net.fortytwo.ripple.cli.resultViewMaxObjects = 32

net.fortytwo.ripple.cli.resultViewMaxPredicates (an integer value which defaults to -1) places an upper limit on the number of predicates Ripple will display for a given query result. For example, if the query result is a place which has a latitude and a longitude and an elevation and a category and a country code and... and you want to limit how many of these predicates can be displayed at a time, then use this property. The default is no limit.

net.fortytwo.ripple.cli.resultViewMaxPredicates = -1

net.fortytwo.ripple.cli.resultViewDeduplicateObjects (a boolean value which defaults to true) removes duplicate objects in result views. For example, if the query result is a person whose name is "Bob", and that fact has been asserted in multiple contexts (e.g. in multiple RDF documents), Ripple will only show the value "Bob" once. This frequently occurs in Linked Data, so eliminating duplicates makes query results less confusing.

net.fortytwo.ripple.cli.resultViewDeduplicateObjects = true

net.fortytwo.ripple.cli.resultViewPrintEntireStack (a boolean value which defaults to true) tells Ripple to display the whole solution (where solutions in Ripple are stacks) instead of only the topmost item on the stack. Solution stacks with more than one item are similar to multiple return values in other programming paradigms, and are not needed by all applications.

net.fortytwo.ripple.cli.resultViewPrintEntireStack = true

Program control properties

These properties deal with multithreading in Ripple

net.fortytwo.ripple.control.asynchronousQueries (a boolean value which defaults to true) allows Ripple to spawn new threads when waiting for I/O operations such as HTTP requests. This makes Linked Data crawling much faster.

net.fortytwo.ripple.control.asynchronousQueries = true

net.fortytwo.ripple.control.maxWorkerThreads (an integer value which defaults to 30) defines the maximum number of threads which Ripple may spawn. The more threads you allow, the more concurrent paths of execution Ripple may explore.

net.fortytwo.ripple.control.maxWorkerThreads = 30

Input/output properties

These properties define Ripple's interaction with data sources such as Linked Data providers. (a boolean value which defaults to false) tells Ripple what to do when it encounters a definition of a namespace prefix which is already bound to another URI. It may either keep the original definition (the default behavior) or override it with the new definition. For example, if Ripple discovers a document which uses the prefix "foaf" for the URI, and then discovers a second document which uses "foaf" for the URI, Ripple will continue to associate "foaf" with by default. = false (a boolean value which defaults to false) allows Ripple to use two different namespace prefixes for the same URI. For clarity, this is not allowed by default.' = false (a numeric value which defaults to 10000) is the number of milliseconds Ripple will wait on an HTTP connection. For example, if Ripple attempts to dereference a URI (through LinkedDataSail) and the remote server takes longer than 10 seconds to respond, Ripple will give up and use the thread for another request. = 10000 (a numeric value which defaults to 500) is the numer of milliseconds Ripple will wait between consecutive requests to the same HTTP host. This is a form of crawler etiquette which ensures that a Ripple program will not overload any given host with rapid-fire requests. Ripple waits a half of a second by default. Note that requests to different hosts may occur much more frequently. = 500

Data model properties

These properties deal with Ripple's RDF data model.

net.fortytwo.ripple.model.useBlankNodes (a boolean value which defaults to false) determines whether Ripple will use blank nodes, a kind of anonymous identifier in RDF. Use of blank nodes in Linked Data is sometimes discouraged, and Ripple avoids them (by default) as well, turning them into URIs instead.

net.fortytwo.ripple.model.useBlankNodes = false

net.fortytwo.ripple.model.memoizeListsFromRdf (a boolean value which defaults to true) allows Ripple to speed up program execution by caching Ripple programs and other RDF lists in an efficient linked-list format in memory. This feature can be disabled if there is a concern over very many programs causing the list cache to consume too much memory (which is generally not the case).

net.fortytwo.ripple.model.memoizeListsFromRdf = true

net.fortytwo.ripple.model.defaultNamespace (a URI which defaults to defines Ripple's default namespace, i.e. the URI to which the empty namespace prefix "" is bound. For example, if you type the QName :timbl in the command-line interface, this resolves to by default.

net.fortytwo.ripple.model.defaultNamespace =

LinkedDataSail properties

These properties are used in connection with LinkedDataSail.

net.fortytwo.linkeddata.cacheLifetime (a numeric value which defaults to 604800) is the number of seconds (not milliseconds) that LinkedDataSail will wait before either trying again to dereference a URI which previously failed, or making a new request for a URI which previously succeeded (the point being to keep up to date with Linked Data which changes over time). By default, LinkedDataSail will wait one week before refreshing its cache with respect to a given URI. Note that you can also supply a custom cache expiration policy, at the Java API level, using LinkedDataSail.getCache().setExpirationPolicy().

net.fortytwo.linkeddata.cacheLifetime = 604800

net.fortytwo.linkeddata.datatypeHandlingPolicy (an enum with the value "ignore", "normalize", or "verify") tells the parser what to do with data-typed literals. If the value is "ignore", the parser will simply accept the value of any literal, regardless of its data type. For example, if it encounters a literal which is typed as xsd:integer but whose value is "42.1", it just won't care. This is the default behavior. If the config value is "normalize", the parser will attempt to normalize the value. For example, "42.1" might be changed to "42" to make it into an integer. If the config value is "verify", the parser will throw an error when the value is invalid, possibly causing things to break but guaranteeing that bad data will not be passed along to downstream applications. See RDFParser.DatatypeHandling.

net.fortytwo.linkeddata.datatypeHandlingPolicy = ignore

net.fortytwo.linkeddata.memoryCacheCapacity (a numeric value which defaults to 10000) defines the maximum size of LinkedDataSail's fast in-memory cache. LinkedDataSail caches metadata about dereferenced Semantic Web URIs both in the triple store and also (for the most recently used URIs) in a special in-memory hash table. The in-memory cache makes cache lookups much faster when some URIs are accessed more often than others, which is almost always the case. The larger the in-memory cache, the faster these lookups occur (at the cost of additional memory).

net.fortytwo.linkeddata.memoryCacheCapacity = 10000