# Structuring Data

In Excel you typically organize data in tables where each column is of a a specific *type* like number, text string or date.  In R you use vector, matrix, list and array and your values are of *type* Logical, Numeric, Integer or Character.  Similarly, F# has *types*.

F# has a lot of built-in typs such as `string`, `int` and `decimal` and it also has container types like `array`, `list` and `seq` (sequence) that can contain other items.  The .NET framework itself also has a lot of types that you can use, such as the commonly used `DateTime` type.

## Values
You use the `let` keyword to create values of a type.

In [19]:
let theAnswer = 42 // An integer
let greeting = "Hello world" // A string
let pi = 3.141 // A double

Notice how you don't have to specify the type anywhere.  You just create the value and most of the time F# will figure out what type you intended.  F# will keep track of the types behind the scenes which is very useful when defining functions as you will see later.

Values in F# are *immutable* by default.  That means you can never change a value once it has been created.  If you try, you will get an error.

In [20]:
theAnswer <- 43

Stopped due to error


Unhandled exception: input.fsx (1,1)-(1,16) typecheck error This value is not mutable. Consider using the mutable keyword, e.g. 'let mutable theAnswer = expression'.

The best part about types is that you can create your own.  You can create *records* and *discriminated unions*.

## Records
You have already created your own type in the first example, where you created the `PersonPolicy` type.

In [21]:
type PersonPolicy = 
    {
        PersonId: string;
        PolicyNumber: string;
        Premium: decimal;
    }

As you can see, `PersonPolicy` is really just a combination of other types: `string` and `decimal`.  That way you can create an endless number of types by combining existing types.  This kind of type is called a *record* type.  It is also called a *product* type because its sample space is `string * string * decimal`.

In [22]:
let theAnswer = 42 // An integer
let greeting = "Hello world" // A string
let pi = 3.141 // A double
let pp = 
    {
        PersonId = "123";
        PolicyNumber = "Pol001";
        Premium = 10000m;
    }

> Why the `m`? Notice the `m` after the premium amount in line 8 above?  The `m` tells F# that you want a decimal and not an integer.  If you remove the `m`, you will get an error saying that F# cannot convert the integer 10000 to a decimal, since the Premium field is of type decimal.

## Discriminated Unions
Discriminated unions are a way of defining a type with mutually exclusive ways of creating values of that type.  It sounds weird but it is a really nice way to represent data.  Say for example that you have to policy systems in your company.  One is an old legacy system where policy numbers are represented as integers.  For the other newer system policy numbers are strings.  For this setup you might define the PersonPolicy like so.

In [23]:
type PolicyNumber =
    | LegacyPolicyNumber of int
    | NewPolicyNumber of string
    
type PersonPolicy2 =
    {
        PersonId: string;
        PolicyNumber: PolicyNumber;
        Premium: decimal;
    }    

The `PersonPolicy2` type is used like so.

In [24]:
let legacyPolicyPerson = 
    {
        PersonId = "1";
        PolicyNumber = LegacyPolicyNumber(42);
        Premium = 1000m;
    }
    
let newPolicyPerson =
    {
        PersonId = "2";
        PolicyNumber = NewPolicyNumber("Pol01");
        Premium = 1200m;    
    }

display(legacyPolicyPerson.PolicyNumber)
display(newPolicyPerson.PolicyNumber)

Item
42


Item
Pol01


Using discriminated unions like that gives you complete control and type safety when handling data.  In the above example, the discriminated union ensures that there is no doubt whether you are holding a legacy policy number or a new policy number.  The type tells us what it is.  Later on you will learn how to use *matching* to handle discriminated unions.

### Options
Probably the most important discriminated union in F# is the [`Option`](https://docs.microsoft.com/en-us/dotnet/fsharp/language-reference/options) type.  An option can be either `Some` or `None` where `None` means that the value does not exist.  For example, you would use `None` if a value read from a CSV file or from a database is missing.  Option values are set like so.

In [25]:
let existingValue = Some(42)
let missingValue = None

### Single Case Discriminated Unions

Let us say you have a function that creates a displayname from given name and surname (we will get to functions in the next chapter).

In [26]:
let createDisplayName givenName surName =
    givenName + " " + surName
    
let a = "Jakob"
let b = "Christensen"
createDisplayName a b

Jakob Christensen

Accidentally, you may call it like this because both parameters are of type `string` and therefore interchangeable.

In [27]:
createDisplayName b a

Christensen Jakob

To make it harder for the caller to make this mistake, you can introduce single case discriminated unions.

In [28]:
// GivenName and SurName are single case discriminated unions
type GivenName = GivenName of string
type SurName = SurName of string

// createDisplayName2 requires the parameters to be of type GivenName and SurName.
// "Deconstruct" givenName and surName to get the actual string values inside.
let createDisplayName2 (GivenName givenName) (SurName surName) =
    givenName + " " + surName
    
// "Construct" a GivenName and a SurName
let a2 = (GivenName "Jakob")
let b2 = (SurName "Christensen")
createDisplayName2 a2 b2

Jakob Christensen

If you accidentally switch the two arguments, you will get an error because the types `GivenName` and `SurName` are not considered the same by F#, even though they both contain strings.

In [29]:
createDisplayName2 b2 a2

Stopped due to error


Unhandled exception: input.fsx (1,20)-(1,22) typecheck error This expression was expected to have type
    'GivenName'    
but here has type
    'SurName'    
input.fsx (1,23)-(1,25) typecheck error This expression was expected to have type
    'SurName'    
but here has type
    'GivenName'    

If you want to get the value "inside" a single case discriminated union, you need to deconstruct it.  The function `createDisplayName2` above shows how to do that easily for function parameters.  If you want to desconstruct without doing it as a function parameter, it is a bit more cumbersome.  This is how it is done.

In [30]:
let (GivenName deconstructedGivenName) = a2
deconstructedGivenName

Jakob

## Lists and Arrays
F# has lists and arrays to contain other items in the same way that R has lists and arrays and Excel has table rows.  Lists and arrays are also called collections.  When you create a new collection, you tell F# what kind of items the collection contains.  It can be any type, like integers, string or `PersonPolicy2` items.  It can also be a collection of collections.  All items in a collection has to be of the same type.

F# has a third collection type called a sequence.  A sequence is similar to a list, except that sequences can be `lazily` evaluated, meaning that they can actually be infinite!  Sequences are used more frequently than lists in F# but for now we will stick to lists because lists can more easily be created manually.  You have already seen an example of how to do that, like below where we create a list of `PersonPolicy2` items.

In [31]:
let data = 
    [ 
        { PersonId = "P1"; PolicyNumber = (NewPolicyNumber "Pol01"); Premium = 100m };
        { PersonId = "P1"; PolicyNumber = (NewPolicyNumber "Pol02"); Premium = 200m };
        { PersonId = "P2"; PolicyNumber = (NewPolicyNumber "Pol03"); Premium = 150m };
        { PersonId = "P3"; PolicyNumber = (NewPolicyNumber "Pol04"); Premium = 250m };
        { PersonId = "P3"; PolicyNumber = (NewPolicyNumber "Pol05"); Premium = 350m };
    ]
    
data

index,PersonId,PolicyNumber,Premium
0,P1,{ FSI_0027+PolicyNumber+NewPolicyNumber: Item: Pol01 },100
1,P1,{ FSI_0027+PolicyNumber+NewPolicyNumber: Item: Pol02 },200
2,P2,{ FSI_0027+PolicyNumber+NewPolicyNumber: Item: Pol03 },150
3,P3,{ FSI_0027+PolicyNumber+NewPolicyNumber: Item: Pol04 },250
4,P3,{ FSI_0027+PolicyNumber+NewPolicyNumber: Item: Pol05 },350


You have already seen in the first example some of the cool stuff you can do with the functions from the `List` module.  We will get back to that later on.

## Anonymous Types
Anonymous types are like records without a name.  They are very useful when transforming data.  Let us revisit the first example where we created a record type `PersonPremium` for summing up premiums for a person's policies.

```fsharp
type PersonPremium = 
    {
        PersonId: string;
        Premium: decimal;
    }
```

An anonymous type is defined with `{|` and `|}` like so.

In [32]:
let v = 
    {|
        PersonId = "123";
        Premium = 100m;
    |}
    
v

PersonId,Premium
123,100


So we can rewrite the original example using an anonymous type on the fly to hold the summed premiums, like so.

In [33]:
let groupedByPerson = 
    data
    |> List.groupBy (fun personPolicy -> personPolicy.PersonId)
    |> List.map (fun (personId, personPolicies) -> 
        {|  // Here begins the anonymous type
            PersonId = personId; 
            Premium = personPolicies |> List.sumBy (fun personPolicy -> personPolicy.Premium) 
        |})
    
groupedByPerson

index,PersonId,Premium
0,P1,300
1,P2,150
2,P3,600
