## Defining a Software Specification

With F#, we are able to choose between working in a higher level language or lower level details. When we are able to represent our application logic entirely in higher level terminology, this gives us a software specification to reach a shared understanding of what will be implemented without going into the implementation details. This is useful for comminicating with stakeholders while also keeping the specification true to the implementation.

There is a need to understand a subset of the F# language, but we can keep this to a minimum by explicitly defining types in understandable chunks and leaving out any implementation details. This gives us a balance of a shared language for our domain and a buildable specification.

We will work through an example of a simple feature using this technique. We will also take advantage of [interactive notebooks](https://github.com/dotnet/interactive) to compile the specification itself.

### Feature Requirement

Split a string into words and give a count of how often each word appears.

In [None]:
type WordCount = WordCount of Word:string * Count:int

type CountWords = string -> WordCount seq

### There are a few steps to implement.

1. First we have to split the string.
2. Then any words will need to be normalized, removing punctuation and casing.
3. Then we have to iterate through and count the number of occurrances of each word.

In [None]:
type SplitIntoWords = string -> string seq

type NormalizeWords = string seq -> string seq

type CountEachWord = string seq -> WordCount seq

Now we have broken down our overall feature into the steps that make it up. We should ensure our types fit together by composing the pieces together.

In [None]:
/// Splits a string of text into words and creates a summary of the word counts.
let countWords
    (split:SplitIntoWords)
    (normalizeWords:NormalizeWords)
    (countEach:CountEachWord)
    : CountWords =
    split >> normalizeWords >> countEach

**Isn't this implementation?**

An implementation contains lower level code and logic - the _implementation details_ - and these are really missing here. This is a definition of part of the specification in terms of itself by composing it from other parts of the spec. It does actually compile, giving us confidence that our specification is correct and will keep us honest if we need to change our specification.

### Further Normalization

Normalizing the words could be broken down further into the normalization to do on each word.

* Removing the punctuation
* Changing them to lowercase

First, let's define a general type for normalizing a word.

In [None]:
type NormalizeWord = string -> string

Then define the types for the specific normalizations we want to do.

In [None]:
type RemovePunctuation = string -> string

type LowerCase = string -> string

Yes, these are the same type, but it's useful to break these out so that we can implement them separately and specify their composition. Let's do that now.

In [None]:
let normalizeWord
    (removePunctuation:RemovePunctuation)
    (lowerCase:LowerCase)
    : NormalizeWord =
    removePunctuation >> lowerCase

Our spec actually takes `NormalizeWords` rather than `NormalizeWord` because it operates over a sequence of words. We can specify that here as well. The `Seq.map` terminology remains high level, similar to the composition operator `>>` so we can use it in our spec without getting into the implementation details of working in a collection. 

In [None]:
let normalizeWords
    (normalizeWord:NormalizeWord)
    : NormalizeWords =
    Seq.map normalizeWord

## Implementation

At this point, we we have a high level specification that everyone can agree on and understand. By purposefully leaving out implementation specifics and keeping this in common domain terminology, we are able to reach a specification that is inclusive and relevant to everyone involved in the process.

We can now implement the specification, which is normally done in a development environment and not part of the specification. 

In [None]:
open System

// Implement the function types that were defined in the specification.

let split : SplitIntoWords =
    fun input ->
        input.Split((null:char array),StringSplitOptions.RemoveEmptyEntries)

let countEach : CountEachWord =
    fun (words:string seq) ->
        query {
            for word in words do
            groupBy word into wordGroup
            let count = 
                query { 
                    for word in wordGroup do
                    select word
                    count
                }
            select (wordGroup.Key, count)
        }
        |> Seq.sortByDescending snd
        |> Seq.map WordCount

let removePunctuation : RemovePunctuation =
    fun word ->
        System.Text.RegularExpressions.Regex.Replace(word, "[^A-Za-z0-9 -]", "")

let lowerCase : LowerCase =
    fun word -> word.ToLowerInvariant()

module Implementation =

    let normalizeWord : NormalizeWord = normalizeWord removePunctuation lowerCase
    let normalizeWords : NormalizeWords = normalizeWords normalizeWord
    let countWords : CountWords = countWords split normalizeWords countEach


The specification gets into lower level details, using language features like query expressions and framework features like the `String.Split` and `Regex.Replace` functions. Not only is it lower level, but it is no longer defined entirely in generic domain terms, using techniques and terminology that are exclusive to the implementor.

But the big question, was our spec accurate and does this actually work? Let's put the pieces together and try it.

In [None]:
let input = "This is a test of the emergency broadcast system. This is only a test."
Implementation.countWords input

index,Word,Count
0,this,2
1,is,2
2,a,2
3,test,2
4,of,1
5,the,1
6,emergency,1
7,broadcast,1
8,system,1
9,only,1


It worked fine on a sample where we can easily read all the data, so now for a real test.

In [None]:
let http = new System.Net.Http.HttpClient()
let input = http.GetStringAsync("http://catdir.loc.gov/catdir/samples/random045/2002031355.html").Result

Implementation.countWords input


index,Word,Count
0,the,233
1,and,211
2,a,136
3,i,122
4,to,107
5,was,87
6,of,85
7,that,81
8,it,80
9,you,80


## Summary

We've created a software specification by defining our application in high level terminology that everyone can communicate in, including our compiler so it can help us manage change.