# Designing your own functions

*Functions*  are a group of statements that produce a new value.  They
are a fundamental part of Scala. 

You've probably already seen examples of functions:  we call them *methods* when they are defined for an entire class of object.  Strings, for example have a `split` method that lets you split up a String based on a pattern you define.  The new value it prodcues is an Array of Strings.  You're not limited to these predefined functions, however.  You can define your own functions.

Consider a very common challenge need when working with texts: we want to split the text of into a list of words.  We might start by defining "word" as "units separated by white space" and use the `split` method of the String class.

In [None]:
val gettysburg = """Four score and   seven years ago our fathers 
brought forth, on this continent, a new nation, 
conceived in Liberty, and dedicated to the proposition 
that all men are created equal."""

gettysburg.split(" ")

This is a handy first cut, but not exactly what we want: punctuation marks are kept alongside alphabetic characters (`forth,`, `continent,`, `nation,`, `Liberty,` `equal.`), as are other white-space characters like the new line (notice the new line character kept with alphabet `brougth`, `conceived` and `that`).  Runs of more than one space character result in empty Strings (between `and` and `seven`).  To split up a String into exactly the kind of word units we want, we'll need to write our own function.

## Defining a function

The next cell illustrates the syntax of a function definition.  It begins with the `def` keyword followed by the name of the function (here, `naiveTokens`).  If the function needs any information to complete its task, define parameters between parentheses to supply that information.  Our word dividing function will need a String of characters to divide up, for example.  We indicate that by naming a parameter and specifying its type (the parameter named `str` will be a String).  Separated from the parameter list by a colon is the identification of what type of value our function will create.  Our result will be A Vector of String values.  

We use the `=` sign to assign to this definition a series of statements grouped in curly brackets, the body of the function.  The value of the the last statement is the value that the function produces, or *returns*.  Here, the body of the function has two steps.  The first creates an expression named `wordArray` just like the preceding cell.  The second statement converts `wordArray` to a Vector.  This is the final statement of the body, so we will produce a Vector of String values, just as our function definition required.


In [None]:
/** Divide a String into a Vector words, original version.
*
* @param str String to split into words.
*/
def naiveTokens(str: String): Vector[String] = {
    val wordArray = str.split(" ")
    wordArray.toVector
}

Test your new function.  Of course, we haven't improved on the `split` method yet, but we'll know that we have in fact defined a function that takes one String parameter and successfully creates a Vector of String values.

In [None]:
// Test naiveTokens:
naiveTokens(gettysburg)

## Using regular expressions for better word divisions

Our function needs to solve the two distinct problems we identified above:

1.  runs of multiple spaces result in empty Strings
2.  we want to eliminate punctuation marks.

We can solve both problems with a similar approach.  First, we can replace all sequences of one or more white space characters (space, tab, newline...) with a single space character.  Then when we split on the space character, we shouldn't get any empty entries.

Second, we can elminate punctuation characters by replacing them with an empty string.

In [None]:
/** Divide a String into a Vector words, version 1.
*
* @param str String to split into words.
*/
def wordTokens1(str: String, sorted: Boolean = false) : Vector[String] = {
    val regularWhiteSpace = str.replaceAll("[\\s]+", " ")
    val tokens = regularWhiteSpace.replaceAll("[^\\'a-zA-Z ]","").split(" ").toVector
    tokens
}

In [None]:
// Test this version:
wordTokens1(gettysburg)

## Add a sorting option

In [None]:
def wordTokens2(str: String, sorted: Boolean = false) : Vector[String] = {
    val regularWhiteSpace = str.replaceAll("[\\s]+", " ")
    val tokens = regularWhiteSpace.replaceAll("[^\\'a-zA-Z ]","").split(" ").toVector
    if (sorted) {
        tokens.sortBy(tkn => tkn.toLowerCase)
    } else {
        tokens
    }
}

In [None]:
wordTokens2(gettysburg)

In [None]:
wordTokens2(gettysburg, sorted = true)