Skip to content
/ sparse Public

Parses text blocks from files, strings, and other sources using Scala.

License

Notifications You must be signed in to change notification settings

mb720/sparse

Repository files navigation

Sparse: text parsing for Scala

You can use Sparse to parse text blocks from files and other sources.

Build Status

Maven Central

Get Sparse

First, add Sparse as an dependency in your project's build.sbt:

libraryDependencies += "eu.matthiasbraun" %% "sparse" % "1.0"

Then you can import its methods and objects:

import eu.matthiasbraun.sparse.Parser._

Usage examples

Basic

Let's say the file you want to parse is this:

(unrelated text before first block)
start
  first line in first block
  second line in first block
end

(unrelated text before second block)
start
  first line in second block
  second line in second block
end

(more unrelated text)

Assuming that you're interested in the blocks that start with start and end with end, here's how you parse them: First of all, you load that file using one of the methods in scala.io.Source:

val yourFile = fromFile(new File("parse/this/file"))

In the case of our example file above, we know exactly how the start and end of a block looks like. So we can do the following to parse the two blocks from the file:

val blocksMaybe = parse(yourFile, from("start"), to("end"))

And this is how you print the blocks:

blocksMaybe match {
  case Success(blocks)    => blocks.foreach { println }
  case Failure(exception) => println(exception)
}

We got a Try back from parse which contains, if the parsing was successful, the blocks from the parsed file. The first block we got back is this:

start
  first line in first block
  second line in first block
end

Probably, the second block won't surprise you, but here it is for completeness' sake:

start
  first line in second block
  second line in second block
end

Otherwise, if something went wrong, the Try holds the first exception that occurred during parsing.

Should you be interested only in what's inside the blocks, and not in the lines that mark their beginning and their end, you might like to call parse like this:

parse(yourFile, after("start"), until("end"))

The first block returned by that call is a bit different compared to the one we made using to and from:

first line in first block
second line in first block

Up till now you've seen from, to, after, and until to mark the start and end point of your blocks.

There is another one, before, that you can use if you're interested in the line that precedes the matching line.

The resulting blocks of

parse(yourFile, before("start"), until("end"))

are

(unrelated text before second block)
start
  first line in first block
  second line in first block

and

(unrelated text before second block)
start
  first line in second block
  second line in second block

Intermediate

If the starts and the ends of your blocks vary you can define predicates to match them.

Let's change our example file a bit, to make parsing slightly more challenging:

blockStartPrefix: firstBlockHeader
  first line in first block
  second line in first block
end

blockStartPrefix: secondBlockHeader
  first line in second block
  second line in second block
end

Now, because the start of a block is different for each block, we can't match it verbatim as we did in the previous example. But we notice that beginnings of a block all share a common blockStartPrefix. Let's match that:

parse(yourFile, from(_.startsWith("blockStartPrefix"), to("end"))

Defining predicates is of course not limited to from. Imagine that block ends vary like so:

end of block 1 ###
end of block 2 ###

In this case, we use to(_.endsWith("###")) in order to match the end of a block.

If the patterns are more complicated than that, you can always resort to regular expressions:

from(_.matches(yourRegexPattern))

Expert

Maybe you need to consider the line number as well to determine if a line should be the beginning or the end of a block. Sparse lets you account for that, too:

val start = from((line, lineNr) => line.startsWith("start") && lineNr > 4)
parse(yourFile, start, to("end"))

This way, the line not only has to begin with the string "start" but also needs to come after the fourth line in the file. If it's clear in your code that the first placeholder stands for the line and the second placeholder for the line number (or if you're feeling especially succinct today), you can shorten the above example to this:

val start = from(_.startsWith("start") && _ > 4)

Master

If you're not content with the predefined block markers (i.e., from, to, after, until, and before), you can roll your own:

/** The block begins two lines after the `predicate` matches. */
object twoLinesAfter extends MarkerFactory {
  override def apply(predicate: ((String, Int) => Boolean)) =
    BlockMarker(predicate, offset = +2)
}

Your custom marker is used like all the predefined ones shown above:

val blocksMaybe = parse(yourFile, twoLinesAfter("start"), to("end"))

If you're wondering why you could pass a simple string instead of the ((String, Int) => Boolean) predicate to twoLinesAfter, have a look at the MarkerFactory in Parser.scala

Dependencies of Sparse

About

Parses text blocks from files, strings, and other sources using Scala.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages