# Designing your own functions

*Functions*  are a group of statements that produce a new value.  They
are a fundamental part of Scala. 

You've probably already seen examples of functions:  we call them *methods* when they are defined for an entire class of object.  Strings, for example have a `split` method that lets you split up a String into an Array of Strings based on a pattern you define.  You're not limited to these predefined functions, however.  You can define your own functions using the `def` keyword.  

Consider the following cell using the `split` method to split a text up into an Array of words.


In [18]:
val gettysburg = """Four score and   seven years ago our fathers 
brought forth, on this continent, a new nation, 
conceived in Liberty, and dedicated to the proposition 
that all men are created equal."""

gettysburg.split(" ")

[36mgettysburg[39m: [32mString[39m = [32m"""Four score and   seven years ago our fathers 
brought forth, on this continent, a new nation, 
conceived in Liberty, and dedicated to the proposition 
that all men are created equal."""[39m
[36mres17_1[39m: [32mArray[39m[[32mString[39m] = [33mArray[39m(
  [32m"Four"[39m,
  [32m"score"[39m,
  [32m"and"[39m,
  [32m""[39m,
  [32m""[39m,
  [32m"seven"[39m,
  [32m"years"[39m,
  [32m"ago"[39m,
  [32m"our"[39m,
  [32m"fathers"[39m,
  [32m"""
brought"""[39m,
  [32m"forth,"[39m,
  [32m"on"[39m,
  [32m"this"[39m,
  [32m"continent,"[39m,
  [32m"a"[39m,
  [32m"new"[39m,
  [32m"nation,"[39m,
  [32m"""
conceived"""[39m,
  [32m"in"[39m,
  [32m"Liberty,"[39m,
  [32m"and"[39m,
  [32m"dedicated"[39m,
  [32m"to"[39m,
  [32m"the"[39m,
  [32m"proposition"[39m,
  [32m"""
that"""[39m,
  [32m"all"[39m,
  [32m"men"[39m,
  [32m"are"[39m,
  [32m"created"[39m,
  [32m"equal."[39m
)

This is a handy first cut, but not exactly what we want: punctuation marks are kept alongside alphabetic characters (`forth,`, `continent,`, `nation,`, `Liberty,` `equal.`), as are other white-space characters like the new line (notice the new line character kept with alphabet `brougth`, `conceived` and `that`).  Runs of more than one space character result in empty Strings (between `and` and `seven`).  To split up a String into exactly the kind of word units we want, we'll need to write our own function.



In [19]:
/** Divide a String into a Vector words, original version.
*
* @param str String to split into words.
*/
def naiveTokens(str: String): Vector[String] = {
    str.split(" ").toVector
}

defined [32mfunction[39m [36mnaiveTokens[39m

In [20]:
// Test naiveTokens:
naiveTokens(gettysburg)

[36mres19[39m: [32mVector[39m[[32mString[39m] = [33mVector[39m(
  [32m"Four"[39m,
  [32m"score"[39m,
  [32m"and"[39m,
  [32m""[39m,
  [32m""[39m,
  [32m"seven"[39m,
  [32m"years"[39m,
  [32m"ago"[39m,
  [32m"our"[39m,
  [32m"fathers"[39m,
  [32m"""
brought"""[39m,
  [32m"forth,"[39m,
  [32m"on"[39m,
  [32m"this"[39m,
  [32m"continent,"[39m,
  [32m"a"[39m,
  [32m"new"[39m,
  [32m"nation,"[39m,
  [32m"""
conceived"""[39m,
  [32m"in"[39m,
  [32m"Liberty,"[39m,
  [32m"and"[39m,
  [32m"dedicated"[39m,
  [32m"to"[39m,
  [32m"the"[39m,
  [32m"proposition"[39m,
  [32m"""
that"""[39m,
  [32m"all"[39m,
  [32m"men"[39m,
  [32m"are"[39m,
  [32m"created"[39m,
  [32m"equal."[39m
)

Implement something better

In [None]:
/** Divide a String into a Vector words, version 1.
*
* @param str String to split into words.
*/
def wordTokens1(str: String, sorted: Boolean = false) : Vector[String] = {
    val regularWhiteSpace = str.replaceAll("[\\s]+", " ")
    val tokens = regularWhiteSpace.replaceAll("[^\\'a-zA-Z ]","").split(" ").toVector
    tokens
}

In [22]:
// Test this version:
wordTokens1(gettysburg)

[36mres21[39m: [32mVector[39m[[32mString[39m] = [33mVector[39m(
  [32m"Four"[39m,
  [32m"score"[39m,
  [32m"and"[39m,
  [32m"seven"[39m,
  [32m"years"[39m,
  [32m"ago"[39m,
  [32m"our"[39m,
  [32m"fathers"[39m,
  [32m"brought"[39m,
  [32m"forth"[39m,
  [32m"on"[39m,
  [32m"this"[39m,
  [32m"continent"[39m,
  [32m"a"[39m,
  [32m"new"[39m,
  [32m"nation"[39m,
  [32m"conceived"[39m,
  [32m"in"[39m,
  [32m"Liberty"[39m,
  [32m"and"[39m,
  [32m"dedicated"[39m,
  [32m"to"[39m,
  [32m"the"[39m,
  [32m"proposition"[39m,
  [32m"that"[39m,
  [32m"all"[39m,
  [32m"men"[39m,
  [32m"are"[39m,
  [32m"created"[39m,
  [32m"equal"[39m
)

In [None]:
def wordTokens2(str: String, sorted: Boolean = false) : Vector[String] = {
    val regularWhiteSpace = str.replaceAll("[\\s]+", " ")
    val tokens = regularWhiteSpace.replaceAll("[^\\'a-zA-Z ]","").split(" ").toVector
    if (sorted) {
        tokens.sortBy(tkn => tkn.toLowerCase)
    } else {
        tokens
    }
}

In [21]:
/** Divide a String into a Vector words, version 1.
*
* @param str String to split into words.
*/
def wordTokens1(str: String, sorted: Boolean = false) : Vector[String] = {
    val regularWhiteSpace = str.replaceAll("[\\s]+", " ")
    val tokens = regularWhiteSpace.replaceAll("[^\\'a-zA-Z ]","").split(" ").toVector
    tokens
}

defined [32mfunction[39m [36mwordTokens1[39m

In [15]:
wordTokens2(tip)
wordTokens2(tip, sorted = true)

[36mres14_0[39m: [32mVector[39m[[32mString[39m] = [33mVector[39m(
  [32m"It's"[39m,
  [32m"a"[39m,
  [32m"long"[39m,
  [32m"road"[39m,
  [32m"to"[39m,
  [32m"Tipperary"[39m,
  [32m"and"[39m,
  [32m"then"[39m,
  [32m"some"[39m
)
[36mres14_1[39m: [32mVector[39m[[32mString[39m] = [33mVector[39m(
  [32m"a"[39m,
  [32m"and"[39m,
  [32m"It's"[39m,
  [32m"long"[39m,
  [32m"road"[39m,
  [32m"some"[39m,
  [32m"then"[39m,
  [32m"Tipperary"[39m,
  [32m"to"[39m
)