# Structuring Data

In Excel you typically organize data in tables where each column is of a a specific *type* like number, text string or date.  In R you use tye *types* Logical, Numeric, Integer or Character.  Similarly, F# has *types* and understanding the different kinds of types and when to use them is paramount to programming.

F# has a lot of built-in typs such as `string`, `int` and `decimal` and it also has container types like `array`, `list` and `seq` (sequence) that can contain other items.  The .NET framework itself also has a lot of types that you can use, such as the commonly used `DateTime` type.

## Values
You use the `let` keyword to create values of a type.

In [30]:
let theAnswer = 42 // An integer
let greeting = "Hello world" // A string
let pi = 3.141 // A double

Notice how you don't have to specify the type anywhere.  You just create the value and most of the time F# will figure out what type you intended.  F# will keep track of the types behind the scenes which is very useful when defining functions as you will see later.

Values in F# are *immutable* by default.  That means you can never change a value once it has been created.  If you try, you will get an error.

In [31]:
theAnswer <- 43

Unhandled exception: input.fsx (1,1)-(1,16) typecheck error This value is not mutable. Consider using the mutable keyword, e.g. 'let mutable theAnswer = expression'.

The best part about types is that you can create your own.  For example, you can create *records*, *discriminated unions*, *anonymous types* and *tuples*.

## Records
You have already created your own type in the first example, where you created the `PersonPolicy` type.

In [32]:
type PersonPolicy = 
    {
        PersonId: string;
        PolicyNumber: string;
        Premium: decimal;
    }

As you can see, `PersonPolicy` is really just a combination of other types: `string` and `decimal`.  That way you can create an endless number of types by combining existing types.  This kind of type is called a *record* type.  It is also called a *product* type because its sample space is `string * string * decimal`.

In [33]:
let theAnswer = 42 // An integer
let greeting = "Hello world" // A string
let pi = 3.141 // A double
let pp = 
    {
        PersonId = "123";
        PolicyNumber = "Pol001";
        Premium = 10000m;
    }

> Why the `m` in line 8? Notice the `m` after the premium amount in line 8 above?  The `m` tells F# that you want a decimal and not an integer.  If you remove the `m`, you will get an error saying that F# cannot convert the integer 10000 to a decimal, since the Premium field is of type decimal.

Since values are immutable, F# provides us with a very easy way to copy a record where you change some of the properties.

In [34]:
let ppCopy = { pp with PolicyNumber = "Pol002"; Premium = 120m; }

display(pp)
display(ppCopy)

PersonId,PolicyNumber,Premium
123,Pol001,10000


PersonId,PolicyNumber,Premium
123,Pol002,120


## Discriminated Unions
Discriminated unions are a way of defining a type with mutually exclusive ways of creating values of that type.  It sounds weird but it is a really nice way to represent data.  Say for example that you have to policy systems in your company.  One is an old legacy system where policy numbers are represented as integers.  For the other newer system policy numbers are strings.  For this setup you might define the PersonPolicy like so.

In [35]:
type PolicyNumber =
    | LegacyPolicyNumber of int
    | NewPolicyNumber of string
    
type PersonPolicy2 =
    {
        PersonId: string;
        PolicyNumber: PolicyNumber;
        Premium: decimal;
    }    

Discriminated unions are also called *sum types*.  That is because a value of a discriminated union type can be *either* of the options.  For example, a policy number can be *either* `LegacyPolicyNumber` or `NewPolicyNumber`.

Together with product types, sum types are called *Algebraic Data Types (ADT)*.

The `PersonPolicy2` type is used like so.

In [36]:
let legacyPolicyPerson = 
    {
        PersonId = "1";
        PolicyNumber = LegacyPolicyNumber(42);
        Premium = 1000m;
    }
    
let newPolicyPerson =
    {
        PersonId = "2";
        PolicyNumber = NewPolicyNumber("Pol01");
        Premium = 1200m;    
    }

display(legacyPolicyPerson.PolicyNumber)
display(newPolicyPerson.PolicyNumber)

Item,Tag,IsLegacyPolicyNumber,IsNewPolicyNumber
42,0,True,False


Item,Tag,IsLegacyPolicyNumber,IsNewPolicyNumber
Pol01,1,False,True


Using discriminated unions like that gives you complete control and type safety when handling data.  In the above example, the discriminated union ensures that there is no doubt whether you are holding a legacy policy number or a new policy number.  The type tells us what it is.  Later on you will learn how to use *matching* to handle discriminated unions.

Together with *pattern matching*, discriminated unions show their full power.  Usually you would use pattern matching with something similar to a switch statement in C# and other languages, but with more power.  For one, pattern matching in F# is *exhaustive* which means that the F# compiler will report an error if you do not provide a case for each of the values in your disciminated union.

Let us see a small example of what pattern matching looks like.

In [37]:
let printPolicyNumber policyNumber = 
    match policyNumber with
    | LegacyPolicyNumber i -> printfn "The legacy policy number is: %d" i
    | NewPolicyNumber s -> printfn "The new policy number is: %s" s
    
(LegacyPolicyNumber 123) |> printPolicyNumber
(NewPolicyNumber "Pol123") |> printPolicyNumber


The legacy policy number is: 123
The new policy number is: Pol123


The match expression starts in line 2 and each of the possible matches start with the pipe ('|') character.  For each match the value is deconstructed so you can easily access the policy number value inside the distriminated union.  If you remove either line 3 or line 4, the F# compiler would complain with the error "the match cases were incomplete".  This is very cool because if you one day decide to add another type of policy, the F# compiler would tell you where to fix the code.

### Options
Probably one of the most important discriminated union in F# is the [`Option`](https://docs.microsoft.com/en-us/dotnet/fsharp/language-reference/options) type.  An option can be either `Some` or `None` where `None` means that the value does not exist.  For example, you would use `None` if a value read from a CSV file or from a database is missing.  Option values are set like so.

In [38]:
let existingValue = Some(42)
let missingValue = None

We will get back to options later in this guide.

Option is a *generic type* because it can be used for any other type.  It can be an option of int (like above), and option of double or whatever.  You define generic types with generic parameters which means that you define types that use other types as parameters.  For example, if you were to define your own option type it would look something like this.  Yes, it is all a bit weird but very cool.

In [39]:
// 'a means "generic type parameter"
type MyOption<'a> =
    | Some of 'a
    | None

let a = Some("abc") // a is of type MyOption<string> because we define 'a to be a string.
let b = None

printfn "%A" a
printfn "%A" b


Some "abc"
None


### Single Case Discriminated Unions

Let us say you have a function that creates a displayname from given name and surname (we will get to functions in the next chapter).

In [40]:
let createDisplayName givenName surName =
    givenName + " " + surName
    
let a = "Jakob"
let b = "Christensen"
createDisplayName a b

Jakob Christensen

Accidentally, you may call it with the parameters swapped because both parameters are of type `string` and therefore interchangeable.

In [41]:
createDisplayName b a

Christensen Jakob

Obviously, that is not what we want.  To make it harder for the caller to make this mistake, you can introduce single case discriminated unions.

In [42]:
// GivenName and SurName are single case discriminated unions
type GivenName = GivenName of string
type SurName = SurName of string

// createDisplayName2 requires the parameters to be of type GivenName and SurName.
// "Deconstruct" givenName and surName to get the actual string values inside.
let createDisplayName2 (GivenName givenName) (SurName surName) =
    givenName + " " + surName
    
// "Construct" a GivenName and a SurName
let a2 = (GivenName "Jakob")
let b2 = (SurName "Christensen")
createDisplayName2 a2 b2

Jakob Christensen

If you accidentally switch the two arguments, you will get an error because the types `GivenName` and `SurName` are not considered the same by F#, even though they both contain strings.

In [43]:
createDisplayName2 b2 a2

Unhandled exception: input.fsx (1,20)-(1,22) typecheck error This expression was expected to have type
    'GivenName'    
but here has type
    'SurName'    
input.fsx (1,23)-(1,25) typecheck error This expression was expected to have type
    'SurName'    
but here has type
    'GivenName'    

If you want to get the value "inside" a single case discriminated union, you need to deconstruct it.  The function `createDisplayName2` above shows how to do that easily for function parameters.  If you want to desconstruct without doing it as a function parameter, it is a bit more cumbersome.  This is how it is done.

In [44]:
let (GivenName deconstructedGivenName) = a2
deconstructedGivenName

Jakob

## Collections
A collection is a type that contains a list of other items.  Among others, F# has lists, arrays and sequences to contain other items in the same way that R has lists and arrays and Excel has table rows.  When you create a new collection, you tell F# what kind of items the collection contains.  It can be any type, like integers, string or `PersonPolicy2` items.  It can also be a collection of collections.  All items in a collection has to be of the same type.

You have already seen an example of how to do that, like below where we create a list of `PersonPolicy2` items.

In [45]:
let data = 
    [ 
        { PersonId = "P1"; PolicyNumber = (NewPolicyNumber "Pol01"); Premium = 100m };
        { PersonId = "P1"; PolicyNumber = (NewPolicyNumber "Pol02"); Premium = 200m };
        { PersonId = "P2"; PolicyNumber = (NewPolicyNumber "Pol03"); Premium = 150m };
        { PersonId = "P3"; PolicyNumber = (NewPolicyNumber "Pol04"); Premium = 250m };
        { PersonId = "P3"; PolicyNumber = (NewPolicyNumber "Pol05"); Premium = 350m };
    ]
    
data

index,PersonId,PolicyNumber,Premium
0,P1,"{ FSI_0039+PolicyNumber+NewPolicyNumber: Item: Pol01, Tag: 1, IsLegacyPolicyNumber: False, IsNewPolicyNumber: True }",100
1,P1,"{ FSI_0039+PolicyNumber+NewPolicyNumber: Item: Pol02, Tag: 1, IsLegacyPolicyNumber: False, IsNewPolicyNumber: True }",200
2,P2,"{ FSI_0039+PolicyNumber+NewPolicyNumber: Item: Pol03, Tag: 1, IsLegacyPolicyNumber: False, IsNewPolicyNumber: True }",150
3,P3,"{ FSI_0039+PolicyNumber+NewPolicyNumber: Item: Pol04, Tag: 1, IsLegacyPolicyNumber: False, IsNewPolicyNumber: True }",250
4,P3,"{ FSI_0039+PolicyNumber+NewPolicyNumber: Item: Pol05, Tag: 1, IsLegacyPolicyNumber: False, IsNewPolicyNumber: True }",350


You have already seen in the first example some of the cool stuff you can do with the functions from the `List` module.  We will get back to that later on.

F# has a third collection type called a sequence.  A sequence is similar to a list, except that sequences can be *lazily* evaluated, meaning that they can actually be infinite!  You can create sequences on the fly using a *sequence expression* with.  For examplte you can create a list of squared numbers like so. 

In [46]:
seq { for i in 1 .. 10 -> i * i }

index,value
0,1
1,4
2,9
3,16
4,25
5,36
6,49
7,64
8,81
9,100


Another way to create sequences is to *yield* elements. 

In [47]:
seq {
    yield { PersonId = "P1"; PolicyNumber = (NewPolicyNumber "Pol01"); Premium = 100m }
    yield { PersonId = "P1"; PolicyNumber = (NewPolicyNumber "Pol02"); Premium = 200m }
    yield { PersonId = "P2"; PolicyNumber = (NewPolicyNumber "Pol03"); Premium = 150m }
    yield { PersonId = "P3"; PolicyNumber = (NewPolicyNumber "Pol04"); Premium = 250m }
    yield { PersonId = "P3"; PolicyNumber = (NewPolicyNumber "Pol05"); Premium = 350m }    
}

index,PersonId,PolicyNumber,Premium
0,P1,"{ FSI_0039+PolicyNumber+NewPolicyNumber: Item: Pol01, Tag: 1, IsLegacyPolicyNumber: False, IsNewPolicyNumber: True }",100
1,P1,"{ FSI_0039+PolicyNumber+NewPolicyNumber: Item: Pol02, Tag: 1, IsLegacyPolicyNumber: False, IsNewPolicyNumber: True }",200
2,P2,"{ FSI_0039+PolicyNumber+NewPolicyNumber: Item: Pol03, Tag: 1, IsLegacyPolicyNumber: False, IsNewPolicyNumber: True }",150
3,P3,"{ FSI_0039+PolicyNumber+NewPolicyNumber: Item: Pol04, Tag: 1, IsLegacyPolicyNumber: False, IsNewPolicyNumber: True }",250
4,P3,"{ FSI_0039+PolicyNumber+NewPolicyNumber: Item: Pol05, Tag: 1, IsLegacyPolicyNumber: False, IsNewPolicyNumber: True }",350


You will find more information on how to create sequences in the [Microsoft documentation](https://docs.microsoft.com/en-us/dotnet/fsharp/language-reference/sequences).

You can create lists from sequences and vice versa using one of the functions `Seq.ofList`, `Seq.toList`, `List.ofSeq` and `List.toSeq`.

## Anonymous Types
Anonymous types are like records without a name.  They are very useful when transforming data.  Let us revisit the first example where we created a record type `PersonPremium` for summing up premiums for a person's policies.

```fsharp
type PersonPremium = 
    {
        PersonId: string;
        Premium: decimal;
    }
```

An anonymous type is defined with `{|` and `|}` like so.

In [48]:
let v = 
    {|
        PersonId = "123";
        Premium = 100m;
    |}
    
v

PersonId,Premium
123,100


So we can rewrite the original example using an anonymous type on the fly to hold the summed premiums, like so.

In [49]:
let groupedByPerson = 
    data
    |> List.groupBy (fun personPolicy -> personPolicy.PersonId)
    |> List.map (fun (personId, personPolicies) -> 
        {|  // Here begins the anonymous type
            PersonId = personId; 
            Premium = personPolicies |> List.sumBy (fun personPolicy -> personPolicy.Premium) 
        |})
    
groupedByPerson

index,PersonId,Premium
0,P1,300
1,P2,150
2,P3,600


## Tuples
The last type we will discuss is the *tuple* type.  A tuple is a product type like records and anonymous types, except a tuple has no named records.  It is very easy to create a tuple.  All you have to do is write a comma-separated list of values, like so.

In [50]:
let theTuple = (23, "hello")
theTuple

Item1,Item2
23,hello


If you need get the values inside the tuple, you "take it apart" by deconstructing it, i.e. with *pattern matching*.

In [51]:
let (theNumber, theString) = theTuple
display(theNumber)
display(theString)

hello

You have already used tuples in the first example when you created the plots.  Tuples were used for the labels and values in the graphs.  In line 2 below we map to a list of tuples.

In [52]:
groupedByPerson
|> List.map (fun g -> g.PersonId, g.Premium)
|> Chart.Bar

Tuples are not limited to just two value.  You can put any number of values in a tuple.

In [54]:
let tupleWithSeveralValues = "hello", 42, 65.34, DateTime.Now

display(tupleWithSeveralValues)

let (greeting, theAnswer, someNumber, today) = tupleWithSeveralValues

display(greeting)
display(theAnswer)
display(someNumber)
display(today)

Item1,Item2,Item3,Item4
hello,42,65.34,2020-09-17 11:44:30Z


hello