# Grouping the content of Vectors

## Before reading this

You should work through these notebooks on [named expressions and methods](named-expressions.ipynb), [Vectors](vectors.ipynb) and [Maps](maps.ipynb).


## A sample problem, and approach to a solution

We want to group together words in a text beginning with the same letter, and count how many words in the text begin with each letter.

One definition of "word" could be units separated by white space.  We can easily create a collection of words in this sense by applying the `split` method on a String.

If we group all the occurrences of the same word together in a collection, we can count the occurrences in each collection, and sort the results alphabetically.

We'll test this out on the copy of [Lincoln's Gettysburg Address now in the White House and formerly in the possession of Colonel Alexander Bliss](http://www.abrahamlincolnonline.org/lincoln/speeches/gettysburg.htm).

In [None]:
val blissCopy = """Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal.

Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. We are met on a great battle-field of that war. We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this.

But, in a larger sense, we can not dedicate -- we can not consecrate -- we can not hallow -- this ground. The brave men, living and dead, who struggled here, have consecrated it, far above our poor power to add or detract. The world will little note, nor long remember what we say here, but it can never forget what they did here. It is for us the living, rather, to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced. It is rather for us to be here dedicated to the great task remaining before us -- that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion -- that we here highly resolve that these dead shall not have died in vain -- that this nation, under God, shall have a new birth of freedom -- and that government of the people, by the people, for the people, shall not perish from the earth.

"""

The `split` method creates an Array;  we'll convert this to a more convenient Vector.

In [None]:
val words = blissCopy.split(" ").toVector

Strings are actually just Vectors of Characters, so we can use the `head` method to identify the first letter of a String, as in this example.

In [None]:
"Gettysburg Address".head

The Vector class has a method called `groupBy` that creates a Map.  The keys to the Map are the units you group things by;  the values associated with each key are Vectors with one or more elements of the original Vector.

The syntax looks a lot like the syntax you know for `filter` and `map` methods on Vectors. On the left of the fat arrow, we supply a name that will be used for every element in the Vector.  On the right of the fat arrow, we define an expression that will be used to group elements together.  Here, we're grouping every word in the Vector by its first letter (`wrd.head`).

In [None]:
val groupedByLetter = words.groupBy( wrd => wrd.head)



As you can see, the result is a Map: its keys are Chars; its values are Vectors of Strings.  Chars are single characters.  We can express character values between single quotes. (Note the difference from Strings, that we can express between double quotes.)

We can use normal Map notation to see what words begin with `'s'`, for example:

In [None]:
groupedByLetter('s')


To find out how many words occur for each letter, we only need to find the size of the Vector associated with each key.  We can map each key / value pair to a pairing of the key with the vector's size.

In [None]:
val groupedByCount = groupedByLetter.map{ case (k,v) => (k, v.size) }

The result is now a Map of characters to integers:  the number of words occurring for each letter.

We would like to be able to sort the results easily, but Maps are unordered.  One step will convert the Map to a Vector.

In [None]:
val letterCounts = groupedByCount.toVector

We have turned the Map into a Vector of tuples.  Each tuple groups a character and an integer.

Let's sort the Vector by the first component of the tuple.

In [None]:
val alphabetic = letterCounts.sortBy(  tupl => tupl._1)

Obviously we could equally easily sort by the count.  If we want to sort from most frequent to least frequent, we can reverse the default sorting from least to greatest.

In [None]:
val numericDescending = letterCounts.sortBy( tupl => tupl._2).reverse

`'t'`, '`a`' and `'w'` are the only letters that begin more than 20 words of the Gettsburg Address.

## Summary

- The `groupBy` method of the Vector class creates a Map using expressions you define as the key; its associated value is a Vector of elements from the original Vector.
- We transformed each map value's Vector to its size using the Map class' `map` method.
- To simplify sorting, we converted the final Map to a Vector.