Kite - Morphlines Examples
This module contains examples for how to unit test Morphline config files and custom Morphline commands.
For details consult the
pom.xml build file,
as well as the Morphline config files in the
as well as the test data files in the
as well as unit tests in the
src/test/java/ directory tree,
as well as the example custom morphline command implementations in the
src/main/java/ directory tree.
This step builds the software from source. It also runs the unit tests.
git clone https://github.com/kite-sdk/kite-examples.git cd kite-examples/kite-examples-morphlines #git checkout master #git checkout 1.0.0 # or whatever the latest version is mvn clean package
Using the Maven CLI to run test data through a morphline
- This section describes how to use the mvn CLI to run test data through a morphline config file.
- Here we use the simple MorphlineDemo class.
cd kite-examples/kite-examples-morphlines mvn test -DskipTests exec:java -Dexec.mainClass="org.kitesdk.morphline.api.MorphlineDemo" -Dexec.args="src/test/resources/test-morphlines/addValues.conf src/test/resources/test-documents/email.txt" -Dexec.classpathScope=test
- The first parameter in
exec.argsabove is the morphline config file and the remaining parameters specify one or more data files to run over. At least one data file is required.
- To print diagnostic information such as the content of records as they pass through the morphline commands, consider enabling TRACE log level, for example by adding the following line to your
Integrating with Eclipse
- This section describes how to integrate the codeline with Eclipse.
- Build the software as described above. Then create Eclipse projects like this:
cd kite-examples/kite-examples-morphlines mvn eclipse:eclipse
mvn eclipse:eclipsecreates several Eclipse projects, one for each maven submodule. It will also download and attach the jars of all transitive dependencies and their source code to the eclipse projects, so you can readily browse around the source of the entire call stack.
- Then in eclipse do Menu
File/Import/Maven/Existing Maven Project/on the root parent directory
~/kite-examples/kite-examples-morphlinesand select all submodules, then "Next" and "Finish".
- You will see some maven project errors that keep eclipse from building the workspace because
the eclipse maven plugin has some weird quirks and limitations. To work around this, next, disable
the maven "Nature" by clicking on the project in the browser, right clicking on Menu
Maven/Disable Maven Nature. Repeat this for each project. This way you get all the niceties of the maven dependency management without the hassle of the (current) Maven Eclipse plugin, everything compiles fine from within Eclipse, and junit works and passes from within Eclipse as well.
- When a pom changes simply rerun
mvn eclipse:eclipseand then run Menu
Eclipse/Refresh Project. No need to disable the Maven "Nature" again and again.
- To run junit tests from within eclipse click on the project (e.g.
kite-examples-morphlines) in the eclipse project explorer, right click,
Run As/JUnit Test.
Integrating with IntelliJ IDEA
- This section describes how to integrate the codeline with IntelliJ.
- Build the software as described above.
- Open the
pom.xmlfile in IntelliJ. This should create the entire project in the IDE. You do not need to "Import the project" or anything like that, just do
File>>Openand pick the
- You may have to select
build>>rebuildproject to get all the dependencies.
- You may have to build the project externally via
mvn testto resolve dependencies.
- You may have to select
- In IntelliJ, you should be able to right-click on the
testSimpleCSV()method inside the
ExampleMorphlineTest.javafile and see a choice to "Run testSimpleCSV" or "Debug testSimpleCSV" to run the unit test and see the magic green bar.
- To run all unit tests contained in the
ExampleMorphlineTestclass right-click on the
ExampleMorphlineTest.javafile and choose "Run ExampleMorphlineTest" or "Debug ExampleMorphlineTest".
Play around a bit before changing anything!
- Set some breakpoints and examine the morphline record.
- Examine the contents of the two sample input file records.
- Change one of the Asserts to insure failure to see what that looks like.
- Skip all this of course if you're already familiar with JUnit etc.
Get to work
- Put your sample input data file into the
resources/test-documentsdirectory, as a sibling to
- Change the Java unit test code method
ExampleMorphlineTest.testSimpleCSV()to use that sample input data file by replacing
simpleCSV.txtwith said file.
- Now start adding commands to the
simpleCSV.confmorphline config file in the
- You can use a different morphline config file, just put it in the same directory
simpleCSV.confand load it in the test by changing the
- In the
simpleCSV.conffile, you'll see a
SOLR_HOME_DIRvariable. That points to the
resources/solr/collection1/confdirectory (the /conf is implied). This is where your Solr
schema.xmlfile must live. As you add morphline commands to put new fields into the record, you'll probably be changing the schema as well by adding those fields.
- If you examine your records and don't see fields that you know you put in,
it's quite likely that you didn't add them to the
schema.xmlfile and thus the morphline command
sanitizeUnknownSolrFieldsremoved the field.
- Pedantic recommendation: Just add one or two morphline commands at a time, adding lots of things at once is an easy way to get lost.
Notice several things
- Notice several things about the current
- Actually adding the record to Solr is commented out. We don't need the complications of setting that up too at this stage.
- Near the top of the morphline config file, there are the import statements, one for Kite and one for the CDK. Use the Kite one! The pom is set up for Kite (e.g. for use with CDH 5). The CDK import is there for reference (e.g. for use with CDH 4).
Deploy to Flume or MapReduce or HBase Indexer
- Once this all runs to your satisfaction, copy the morphline config (and possibly the Solr schema file if you've modified it) to your Flume or MapReduce or HBase Indexer configuration and give it a spin.
- It's probably useful to just copy/paste the bits in the "commands" section of the morphlines configuration. Otherwise be careful to modify the SOLR_LOCATOR and (perhaps) import statements to reflect your setup.