Skip to content
Parses text blocks from files, strings, and other sources using Scala.
Scala
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
project
src
.gitignore
.travis.yml
LICENSE
build.sbt
credentials.sbt.enc
pubring.asc.enc
readme.md
secring.asc.enc

readme.md

#Sparse: text parsing for Scala You can use Sparse to parse text blocks from files and other sources.

Build Status

Maven Central

##Get Sparse First, add Sparse as an dependency in your project's build.sbt:

libraryDependencies += "eu.matthiasbraun" %% "sparse" % "1.0"

Then you can import its methods and objects:

import eu.matthiasbraun.sparse.Parser._

##Usage examples

###Basic Let's say the file you want to parse is this:

(unrelated text before first block)
start
  first line in first block
  second line in first block
end

(unrelated text before second block)
start
  first line in second block
  second line in second block
end

(more unrelated text)

Assuming that you're interested in the blocks that start with start and end with end, here's how you parse them: First of all, you load that file using one of the methods in scala.io.Source:

val yourFile = fromFile(new File("parse/this/file"))

In the case of our example file above, we know exactly how the start and end of a block looks like. So we can do the following to parse the two blocks from the file:

val blocksMaybe = parse(yourFile, from("start"), to("end"))

And this is how you print the blocks:

blocksMaybe match {
  case Success(blocks)    => blocks.foreach { println }
  case Failure(exception) => println(exception)
}

We got a Try back from parse which contains, if the parsing was successful, the blocks from the parsed file. The first block we got back is this:

start
  first line in first block
  second line in first block
end

Probably, the second block won't surprise you, but here it is for completeness' sake:

start
  first line in second block
  second line in second block
end

Otherwise, if something went wrong, the Try holds the first exception that occurred during parsing.

Should you be interested only in what's inside the blocks, and not in the lines that mark their beginning and their end, you might like to call parse like this:

parse(yourFile, after("start"), until("end"))

The first block returned by that call is a bit different compared to the one we made using to and from:

first line in first block
second line in first block

Up till now you've seen from, to, after, and until to mark the start and end point of your blocks.

There is another one, before, that you can use if you're interested in the line that precedes the matching line.

The resulting blocks of

parse(yourFile, before("start"), until("end"))

are

(unrelated text before second block)
start
  first line in first block
  second line in first block

and

(unrelated text before second block)
start
  first line in second block
  second line in second block

Intermediate

If the starts and the ends of your blocks vary you can define predicates to match them.

Let's change our example file a bit, to make parsing slightly more challenging:

blockStartPrefix: firstBlockHeader
  first line in first block
  second line in first block
end

blockStartPrefix: secondBlockHeader
  first line in second block
  second line in second block
end

Now, because the start of a block is different for each block, we can't match it verbatim as we did in the previous example. But we notice that beginnings of a block all share a common blockStartPrefix. Let's match that:

parse(yourFile, from(_.startsWith("blockStartPrefix"), to("end"))

Defining predicates is of course not limited to from. Imagine that block ends vary like so:

end of block 1 ###
end of block 2 ###

In this case, we use to(_.endsWith("###")) in order to match the end of a block.

If the patterns are more complicated than that, you can always resort to regular expressions:

from(_.matches(yourRegexPattern))

Expert

Maybe you need to consider the line number as well to determine if a line should be the beginning or the end of a block. Sparse lets you account for that, too:

val start = from((line, lineNr) => line.startsWith("start") && lineNr > 4)
parse(yourFile, start, to("end")

This way, the line not only has to begin with the string "start" but also needs to come after the fourth line in the file. If it's clear in your code that the first placeholder stands for the line and the second placeholder for the line number (or if you're feeling especially succinct today), you can shorten the above example to this:

val start = from(_.startsWith("start") && _ > 4)

Master

If you're not content with the predefined block markers (i.e., from, to, after, until, and before) you can roll your own:

/** The block begins two lines after the `predicate` matches. */
object twoLinesAfter extends MarkerFactory {
  override def apply(predicate: ((String, Int) => Boolean)) =
    BlockMarker(predicate, offset = +2)
}

Your custom marker is used like all the predefined ones shown above:

val blocksMaybe = parse(yourFile, twoLinesAfter("start"), to("end"))

If you're wondering why you could pass a simple string instead of the ((String, Int) => Boolean) predicate to twoLinesAfter, have a look at the MarkerFactory in Parser.scala

Dependencies of Sparse

You can’t perform that action at this time.