Overview

This repo is migrating. It is already a branch (named next) in the solrmarc/solrmarc repo. It will soon become the master in that repo.

Overview

This project is based on code written by Oliver Obenland, (See https://github.com/oobenland/SolrMarc-Indexer-Tests)
The key design improvement Oliver created is to essentially compile the indexing specification once, and then apply that "compiled" version to each of the records that need indexing. I have taken his code and added handling of the basic field specification of SolrMarc (such as: title_display = 245abnp ) via a parser specification (CUP and JFlex) which makes defining and handling more complex specifications simpler. The code has been released and is ready for use. This entire repository will be migrated to be the master branch of the solrmarc project.

Included with this project is a Swing-based interactive interface that could eventually be used to develop, modify, extend and debug a set of indexing specifications, but for now it can be used to see how some of the new features will work.

This project contains the implementation of an idea how to improve SolrMarc by improving performance, extendability and stability.

Description as provided by Oliver Obenland

The indexer is divided in a compile time and a runtime. The compile time is for loading configurations and translate/compile them to small indexer tasks with minimal functionality. The runtime loads records from input files, uses the small indexer tasks to extract data and send the data to Solr.

Compile time

This is mainly made out of factories. Each Factory is for one type of import configuration of the indexer properties (e.g marc.properties or marc_local.properties). Such a factory parses the configuration and creates a small indexer task. A factory is not a singleton but only one instance of this factory will be used, so each factory can build a cache or share information between indexer tasks. After the all configurations are compiled to tasks the factories will not be needed anymore and will be collected by the Garbage Collector. A task is not allowed to own an instance of its factory. Every single bit of calculation which can be done by the factory is a good bit of calculation. Everything which can be preprocessed should be done by the factory, not by the indexer task.

Runtime

At this point only the indexer task exists. No factories, no properties, no unnecessary processing. The input file gets read and for each record all indexer tasks will be called to create a new document.

Indexer task

A task is represented by the AbstractValueIndexer class and is a composition of three parts.

Extractor: reads data from a record
Mapping: translates the data by e.g mapping one value to another or by using a regex to extract a value.
Collector: transforms the data by e.g joining multiple strings to one string or by splitting a string in parts.

Each indexer task will generate the data of one solr field.

Name		Name	Last commit message	Last commit date
Latest commit History 198 Commits
buildtools		buildtools
extra_data		extra_data
index_java		index_java
index_scripts		index_scripts
lib-solrj		lib-solrj
lib		lib
records		records
resources		resources
src/org/solrmarc		src/org/solrmarc
test		test
translation_maps		translation_maps
.classpath		.classpath
.gitignore		.gitignore
.project		.project
New Release of SolrMarc.pptx		New Release of SolrMarc.pptx
Readme.md		Readme.md
build.properties		build.properties
build.xml		build.xml
log4j.properties		log4j.properties

solrmarc/solrmarc_the_next_generation

Folders and files

Latest commit

History

Repository files navigation

This repo is migrating. It is already a branch (named next) in the solrmarc/solrmarc repo. It will soon become the master in that repo.

Overview

Description as provided by Oliver Obenland

Compile time

Runtime

Indexer task

About

Resources

Stars

Watchers

Forks

Languages