# Files and Regular Expressions

## Reading files

In [1]:
import scala.io.Source

[32mimport [39m[36mscala.io.Source[39m

In [6]:
val file = Source.fromFile("test.csv")

[36mfile[39m: [32mio[39m.[32mBufferedSource[39m = [32mnon-empty iterator[39m

In [7]:
val lines = file.getLines
lines.length

[36mlines[39m: [32mIterator[39m[[32mString[39m] = [32mempty iterator[39m
[36mres6_1[39m: [32mInt[39m = [32m419[39m

In [8]:
file.close()

* Iterating over `Source` object directly iterates through one
character at a time.
* `Source.buffered` returns Iterator that has the capability to
peek without moving the iterator.
* Small files contents to string using `Source.mkString`

* `Source.fromURL`, `Source.fromString`, `Source.stdin` can be used to
read inputs from various sources.

* Scala's io is very limited. Looks like most often we need
to use java standard packages like `java.io`, `java.nio`

* For reading binary files, we have to use java class `FileInputStream`

* For writing files, we have to use `java.io.PrintWriter`

## Serialization

* Extend `scala.Serializable` trait
* Scala collections are serializable.
* To serialize and deserialize we use `ObjectOutputStream`
and `ObjectInputStream` respectively (binary serialization).


## Process control

* `scala.sys.process` package provides shell utilities.

In [9]:
import scala.sys.process._

// below shell command strings are converted to ProcessBuilder objects
// process package uses scala's implicit feature
// ! - returns the exit code of the the program
// 0 - success and non zero - failure
"echo 'Hello'".!

Hello


[32mimport [39m[36mscala.sys.process._

// below shell command strings are converted to ProcessBuilder objects
// process package uses scala's implicit feature
// ! - returns the exit code of the the program
// 0 - success and non zero - failure
[39m
[36mres8_1[39m: [32mInt[39m = [32m0[39m

In [10]:
// to get the output as string use !!
val out = "echo 'hello'".!!
println(out)

hello



[36mout[39m: [32mString[39m = [32m"""hello
"""[39m

In [11]:
// since |, >>, > all are known to be used as operators(bitwise, relational)
// we prefix those with # when we need to use them in the shell context
//pipe
"echo 'Hello'".#|("cat").!

// redirect to file
"echo 'Hello'".#>(new java.io.File("output.txt")).!

// append to file
"echo 'World'".#>>(new java.io.File("output.txt")).!

// redirect to stdin
// append to file
"cat".#<(new java.io.File("output.txt")).!

// combine shell command execution using &&
"echo 'Hello'".#&&("echo 'world'").!

Hello
Hello
World
Hello
world


[36mres10_0[39m: [32mInt[39m = [32m0[39m
[36mres10_1[39m: [32mInt[39m = [32m0[39m
[36mres10_2[39m: [32mInt[39m = [32m0[39m
[36mres10_3[39m: [32mInt[39m = [32m0[39m
[36mres10_4[39m: [32mInt[39m = [32m0[39m

In [20]:
// Using Process object directly
val envs = List[(String, String)](
    ("MY_ENV", "123456789")
)
val p = Process("""
echo "hello"
""", cwd=None, envs:_*)
p.!

hello


[36menvs[39m: [32mList[39m[([32mString[39m, [32mString[39m)] = [33mList[39m(([32m"MY_ENV"[39m, [32m"123456789"[39m))
[36mp[39m: [32mProcessBuilder[39m = [echo, hello]
[36mres19_2[39m: [32mInt[39m = [32m0[39m

## Regular Expressions

* `scala.util.matching.Regex` is the regex class
* `"some_regex_pattern".r` returns regex object.
* Prefer using **raw** strings(triple quoted) for regex with backslashes.
* `findAllIn`, `findFirstIn`, `replaceFirstIn`, `replaceAllIn` are
some of the useful methods.

In [22]:
val content = "1 apple, 20 banana, 10 watermelons"

// find quantities
val regexPattern = """[0-9]+""".r

for (m <- regexPattern.findAllIn(content))
    println(m)

val firstMatch: Option[String] = regexPattern.findFirstIn(content)
println(firstMatch)

1
20
10
Some(1)


[36mcontent[39m: [32mString[39m = [32m"1 apple, 20 banana, 10 watermelons"[39m
[36mregexPattern[39m: [32mscala[39m.[32mutil[39m.[32mmatching[39m.[32mRegex[39m = [0-9]+
[36mfirstMatch[39m: [32mOption[39m[[32mString[39m] = [33mSome[39m([32m"1"[39m)

In [23]:
// Suppose we need to mask all the quantities with XX
println(regexPattern.replaceAllIn(content, "XX"))

// suppose we need to double the quantities of every fruit
val newContent = regexPattern.replaceSomeIn(content,
    m => Some(s"${m.matched.toInt * 2}"))

println(newContent)

XX apple, XX banana, XX watermelons
2 apple, 40 banana, 20 watermelons


[36mnewContent[39m: [32mString[39m = [32m"2 apple, 40 banana, 20 watermelons"[39m

## Regex groups

In [24]:
val pattern = """([0-9]+)\s+([a-zA-Z]+)""".r

for (matchedText <- pattern.findAllIn(content))
    println(matchedText)

1 apple
20 banana
10 watermelons


[36mpattern[39m: [32mscala[39m.[32mutil[39m.[32mmatching[39m.[32mRegex[39m = ([0-9]+)\s+([a-zA-Z]+)

In [26]:
// to fetch the matched groups we need to use findFirstMatchIn
// and findAllMatchIn methods
for (m <- pattern.findAllMatchIn(content)) {
    println(m.matched) // entire match
    println(m.group(0), m.start(0), m.end(0)) // zeroth group is entire match
    println(m.group(1), m.start(1), m.end(1)) // first group
    println(m.group(2), m.start(2), m.end(2)) // second group
}

1 apple
(1 apple,0,7)
(1,0,1)
(apple,2,7)
20 banana
(20 banana,9,18)
(20,9,11)
(banana,12,18)
10 watermelons
(10 watermelons,20,34)
(10,20,22)
(watermelons,23,34)
