Up and Running with F#
======================
A gentle guide to a powerful language

> ⚠️ This website is under construction!

[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://github.dev/johnW-ret/fstandsforfun)

In [1]:
#r "nuget: FSharp.Data"
#r "nuget: Plotly.NET"
#r "nuget: Plotly.NET.Interactive"

Loading extensions from `C:\Users\retru\.nuget\packages\plotly.net.interactive\4.2.1\lib\netstandard2.1\Plotly.NET.Interactive.dll`

In [2]:
open FSharp.Data

F# is composed of **expressions**:

In [3]:
// expressions separated by `;`s
1; 2; 3

// expressions separated by new lines
4
5
6

If you wrap expressions with `[]`, you get a **list**:

In [4]:
// This is a list.
[ 1; 2; 3 ]

// This is also a list.
[
    1
    2
    3
]

Lists are implemented as **linked lists**, which means the elements aren't contiguous (touching each other), but each element points to the next one, so traversing the list is like following points on a map.

You can write the same list above showing each addition like so:

In [5]:
// 1 points to 2 points to 3 points to []
1 :: 2 :: 3 :: []

The `::` operation is called a **cons**. Lists are nice when making new lists from other lists.

If you wrap expressions with `[||]`, you get an **array**.

In [6]:
[| 1; 2; 3 |]

All elements of a list must be the of the same **type**. So `[ 1; "a"; true ]` is not valid.

Arrays have their elements aligned contiguously, which means they fit nicely tightly packed next to each other. If you tried to add another element to an array, it might not fit in the space allocated for it - which is why you can't.

In [7]:
// this creates a *new array*, which can be slow for large arrays
Array.append [| 1; 2; 3; |] [| 4; 5; 6 |]

Since array elements are tightly packed next to each other, you can get a specific element by its index very quickly, something not well suited for lists:

In [8]:
// 0th  1st  2nd
[| 997; 998; 999 |][1]

In F#, the `,` separates tuple elements, not collection elements. Tuples are useful for all kinds of things in F#, and the language comes with a terse syntax for representing them:

In [9]:
// two-ple
1, 2

// 4-ary tuple
3, 4, 5, 6

// Unlike collections (lists and arrays), tuples can hold parameters of different types.
"Erica", 34, false

// Sometimes parentheses are required
(6, 7)

Unnamed: 0,Unnamed: 1
Item1,6
Item2,7


We can use tuples and lists together to plot points:

In [10]:
open Plotly.NET

Chart.Point(
[ 
    1, 2
    2, 4
    3, 3
])

You can also create lists and arrays using the range operator `..`,

In [11]:
// start..end (both inclusive)
[ 1 .. 10 ]
// evaluates to [ 1; 2; 3; .. 10 ]

//   ..step..
[| 5 .. -1 .. -5 |]

or by using sequence expressions:

In [12]:
[| for i in 1..10 -> i * i |]

Let's use them together to plot the list of integer squares up to 10:

In [13]:
open Plotly.NET

Chart.Line([ for x in 1 .. 10 -> x, x * x ])

F# supports all the built-in .NET types, as well as one called **`FSharpFunc`**. F# enables function constructs not possible in C#, but being built on top of the .NET Common Language Runtime, F# developers needed a special type not expressable in C# yet. `FSharpFunc` is the name of that .NET type, but in F#, it's simply called a *function*. 

In [14]:
0b00000101uy    // byte
2.              // float
2.0             // float
"abc"           // string
fun x -> x + 2  // int -> int

Types in F# can be thought of as domains, and functions as maps between those domains. Therefore, the way to read `int -> int` is "`int` mapped to `int`".

Functions aren't very useful unless we can *evaluate* them, which we do by *applying* their parameters with arguments, like so:

In [15]:
(fun x -> x + 2) 3

Functions are very important in F#, as they underpin the theory of how expressions can be assigned names. If we wanted to start assigning names to expressions in F# program, we could write our program like so:

In [16]:
(fun x ->   // define name (
    x + 2   //      rest-of-program-goes-here
) 3         // ) provide value `x` gets set to

As you can probably tell, this could easily get out of hand, as the passed parameter is visually separated from the name its bound to. The larger `rest-of-program-goes-here` gets, the farther apart they become.

Forunately, we can rewrite this use of *functions* with a **`let` expression**.

In [17]:
let x = 3       // define name = value `x` gets set to
x + 2           //      rest-of-program-goes-here (that uses `x`)

This visually looks a lot cleaner!

Assigning a name to an expression in F# is called a *binding*, because the value can't change once set. Using `=` without a `let` compares the equality of two objects.

In [18]:
let a = 3
a = 4

We can move the body up into the same line with `in`:

In [19]:
let x = 3 in x + 2

And assign the whole `let` expression to another one:

In [20]:
let five = let x = 3 in x + 2
five

Rewriting our inner `let` binding back to a `fun` looks like this:

In [21]:
let five = (fun x -> x + 2) 3
five

We can remove the `3` to delay binding the parameter to our function:

In [22]:
let add2 = fun x -> x + 2
add2 3

We can rewrite the above function by moving `x` to the left of the `=`. This results in the same behavior.

In [23]:
let add2 x = x + 2
add2 3

We can replace `x + 2` with another `fun`, one that introduces a parameter `y` and uses it in tandem with `x` (this is called **closure**):

In [24]:
let add x = fun y -> x + y
add 3 2

We can also move `y` to the left of `=`:

In [25]:
let add x y = x + y
add 3 2

Practically, we've come across a function `add` that can take not just one parameter, but two! It may not surprise you that we can keep repeating this process to allow for many parameters. However, under the hood, we can treat functions that take multiple parameters like `add` as if they were recursively enclosing `fun`s, which means we can bind them to names without applying every single one of their parameters:

In [26]:
let add2 = add 2
add2 3 

This feature is called **partial application**. On its surface, it's nice for adding convenient names to helper functions. In practice, it can allow you to express complexity using simple, modular pieces:

In [27]:
let combine f check x y =
    f <|| check x y

let id2 x y = x, y

let add = combine (+) id2
add 3 5 |> printfn "%d"

let divideThen check = combine (/) check
let safeDivide = divideThen (fun x y -> if y = 0 then x, nan else x, y)

safeDivide 4 0 |> printfn "%f"

8
NaN


F# is different from languages like Python and JavaScript in that it has a strong opinion on what is a **type**.

Types are rules that the compiler (what checks and builds your code) uses to check you're writing a correct program.

Let's start with a simple exercise comparing with Python:

In [28]:
#!connect jupyter --kernel-name pythonkernel --conda-env base --kernel-spec python3

The `#!connect jupyter` feature is in preview. Please report any feedback or issues at https://github.com/dotnet/interactive/issues/new/choose.

Kernel added: #!pythonkernel

We create a function that print's the combined age of two people:

In [29]:
def add_person_age(person1, person2):
    print(f"{person1.name} and {person2.name}'s combined age is {person1.age + person2.age}")

We construct two objects that each contain the attributes our function uses:

In [30]:
person1 = type("", (), {})
person1.name, person1.age = "Rebecca", 23
person2 = type("", (), {})
person2.name, person2.age = "Eric", 27

We can also explicitly define the type using a class:

In [31]:
class Person:
    # optionally write an __init__ function to instead pass attribute values to a constructor
    pass

person1 = Person()
person1.name, person1.age = "Rebecca", 23
person2 = Person()
person2.name, person2.age = "Eric", 27

And then call the method with our objects:

In [32]:
add_person_age(person1, person2)

# ❌ this code compiles but will always error
# person2.Age = None
# add_person_age(person1, person2)

# ❌ this also code compiles but will always error too
# person3 = type("", (), {})
# person3.name = "Eric"
# add_person_age(person1, person2)

Rebecca and Eric's combined age is 50


This is a really concise example of showing how object programming works in Python. Here, `person1` works a lot like a dictionary (or hashmap... or JSON object... or key-value pair set... all the same thing). You can easily assign attributes to an object, then use them later.

However, nothing exists in the language to tell you anything about the structure of `person1`. What if `age` got set to `None`, but you forgot to change it or to check for `None`?

In F#, objects (or better called *records*, in this case) don't work like dictionaries - you can't arbirarily assign attributes. Everything has to be immutable, remember? On the flip side, this makes it easy to know exactly what lives on a record once it's created. When writing code, you'll get IntelliSense to pop up and show possible attributes, and accessing an invalid attribute means your code won't compile, helping you catch errors before your code is run.

An alternative in F# to our Python object programing example uses something called **Statically Resolved Type Parameters**. Don't mind the fancy name - the *static* part just means your code is checked before it runs.

Here's our function definition:

In [33]:
let inline printPerson (person: ^T) = // define `person` as some generic type `^T
    printfn "%s's age is %d"
        // Both *access* the properties we want and *define* them on ^T at the same time!
        (^T: (member Name : string) person) // this adds a *constraint* requiring a `Name` of type string, and resolves to it on `person`
        (^T: (member Age : int) person) // like person.Age, but changes ^T to require `Age : int`

We can define a record of *unspecified type* called an **anonymous record** with the following syntax:

In [34]:
{| |} // empty

{| A = 3; B = "poyo!!" |}

{| C = 2.0; AnotherStruct = {| |} |}

Using anonymous records, we can evaluate `printPerson`:

In [35]:
open System.Drawing

printPerson {| Name = "Jerry"; Age = 31 |}
printPerson {| Name = "Rebecca"; FavoriteColor = Color.RebeccaPurple; Age = 12 |}
// printPerson {| Name = "Rebecca" |} ❌ uncomment this and it won't compile

Jerry's age is 31
Rebecca's age is 12


Effectively, this means that we can pass records with whatever stuff we want in them, as long as they at least have the properties we need (and those properties can't be null). 

We can mirror our `add_person_age` Python example below:

In [36]:
let inline addPersonAge (person1: ^T) (person2: ^T) =
    printfn "%s and %s's combined age is %d"
        (^T: (member Name : string) person1)            // here we add the constraint to ^T and use it on `person1`
        person2.Name                                    // the constraint already exists on ^T, so we can just use it on `person2`
        ((^T: (member Age : int) person1) + person2.Age)// same strategy here

Defining the constraints in the body of `addPersonAge` looks a bit funny when we have more than one parameter, so we can also define constraints from the "top down" at the top of the method.

In [37]:
// 'T is type `requires member Name and member Age`
let inline addPersonAge<'T
    when 'T : (member Name : string)
    and 'T : (member Age : int)>(person1: ^T) (person2: ^T) =
    printfn "%s and %s's combined age is %d"
        person1.Name
        person2.Name
        (person1.Age + person2.Age)

addPersonAge {| Name = "Joseph"; Age = 9 |} {| Name = "Kerry"; Age = 27 |}

Joseph and Kerry's combined age is 36


This style of programming is useful when you want to hack something together that works while still preventing yourself from making stupid errors*.

> Note ℹ️
> 
> *While SRTPs can be used in this way, they are fundamentally and more often used in libraries to assert invariants about types during compilation, to allow for abstract programming while maintaining high-performance. For example, enabling mathematical functions (`(+)`) to work with different numeric data types (`0.3 + 0.4` and `4 + 3`). This is why `let` bindings that use them require the `inline` keyword. See more [here](https://learn.microsoft.com/en-us/dotnet/fsharp/language-reference/generics/statically-resolved-type-parameters).

In practice, it's often easier to define **types** that constrain objects to be the *exact shape* of what you are expecting.

In [38]:
// this code defines a record type called `Person` which contains a `Name` of type `string` and an `Age` of type `int`
type Person = { Name : string; Age : int }

// to construct a named record, we use the following syntax (no `||`s)
let _ : Person = { Name = "John";  Age = 45 }

// notably, as long as we've defined `type Person` above, we *don't* need to rewrite `Person` again! The compiler just knows.
let examplePerson = { Name = "Don";  Age = 12 }

{| Name = nameof examplePerson; ``Type Name`` = examplePerson.GetType().Name; Value = examplePerson |}

Using `Person`, we can rewrite `addPersonAge` above:

In [39]:
let addPersonAge person1 person2 =
    printfn "%s and %s's combined age is %d"
        person1.Name
        person2.Name
        (person1.Age + person2.Age)

addPersonAge { Name = "Joseph"; Age = 9 } { Name = "Kerry"; Age = 27 }

Joseph and Kerry's combined age is 36


As you can see, when we've pre-defined our types, our example looks a lot more "pythonic". In the above code cell, we didn't have to annotate our code with a single type, but we still get the same level of type safety we'd get with a language like C#. This is called **type inference**.

We *did* still have to define the structure of `Person` up front though. This is called **domain modeling**, and it's useful when you are writing an application to model a business function and want to reduce number of possible error states to a minimum.

Sometimes, however, we're not writing an application but instead writing a script that deals with large amounts of data, and our stupid mistakes might come from accidentally misinterpreting the structure of our data.

F# has a feature for this called **type providers**. Whenever we're dealing with external data imports, type providers create a type for us based on the structure of the data we're importing, meaning we don't have to manually create types for huge data sets and our data and types never get out of sync.

Here's a quick example using the FSharp.Data WorldBank type provider that I took from [their documentation](https://fsprojects.github.io/FSharp.Data/library/WorldBank.html#Using-World-Bank-data-asynchronously):

In [40]:
let data = WorldBankData.GetDataContext()

data.Countries.``United Kingdom``.Indicators.``Gross capital formation (% of GDP)``
|> Seq.maxBy fst

Unnamed: 0,Unnamed: 1
Item1,2022.0
Item2,18.6020102387308


The WorldBank provider is a premade example where a type is specifically created and republished using a well-known data source. But we can also use "data type" (CSV, JSON, SQL, etc.,) type providers to create types for our own data:

In [41]:
// define our data source
[<Literal>]
let uri = "http://query1.finance.yahoo.com/v7/finance/download/MSFT?period1=1678116713&period2=1709739113&interval=1d&events=history&includeAdjustedClose=true"

// create a type using it
// (normally, we'd pass a smaller and local file with the same structure here, but passing the same `uri` is fine for example / notebooks)
type Stocks = CsvProvider<uri>

// get a sample of the data from the new type using the default data source
let msft = Stocks.GetSample()

// plot the high vs low daily difference over time
Chart.Line(
    xy = [ for row in msft.Rows -> row.Date, row.High - row.Low ]
)

Working with type providers can sometimes be less convenient than dynamically typed data access libraries like `pandas`'s `DataFrame` when your data isn't well structured.

For example, if you had data with similar structure as above (`Date`, `Open`, `Adj Close`, etc., columns) but for multiple companies all in one CSV file, they might be addressed with the name of the company first, then the column name, such as `MSFT_Date`, for example.

You could access this data in an unsafe way using programmatic access, such as with f-strings in Python (`frame[f"{ticker}_{column_name}"]`), but F# type providers would have no knowledge of the implicit structure via column naming because CSV does not allow for multi-level indexing unlike other data types like JSON. In fact, iterating over all or a subset of columns from a `CsvProvider` type requires a hack, and you'd be better off importing into a semi-strongly-typed `DataFrame` using a package called [Deedle](https://fslab.org/Deedle) than using type providers. 

However, when you need to access a few named columns of homogenous types, it's actually not too difficult to work with the data provided from type providers as pure collections and not use a data frame at all:

In [42]:
let resample (interval : TimeSpan) (observations : (DateTime * decimal array) seq) =
    let groups = observations |> Seq.groupBy (fun (date, _) -> DateTime((date.Ticks / interval.Ticks) * interval.Ticks))
    
    let flattenByAverage =
        Seq.reduce (fun acc next ->
            (acc |> Array.zip next)
            |> Array.map (fun (a, b) -> (a + b) * decimal 0.5)
        )

    let flattenByTakeFirst = Seq.head

    groups
    |> Seq.map (fun (key, group) ->
        key,
        let rows = group |> Seq.map snd
        flattenByAverage rows
    )

let print observations = 
    observations
    |> Seq.map (fun (date, (cols : 'a array)) -> {| Date = date; Low = cols[0]; High = cols[1] |})
    |> Array.ofSeq
    |> _.DisplayTable()

msft.Rows
    |> Seq.map (fun row -> row.Date, [| row.Low; row.High |])
    |> resample (TimeSpan.FromDays 7)
    |> print

Date,High,Low
2023-03-06 00:00:00Z,255.4656199375,249.8818779375
2023-03-13 00:00:00Z,276.5512450625,267.7306269375
2023-03-20 00:00:00Z,280.2400038125,274.1731263125
2023-03-27 00:00:00Z,285.7424945000,280.8943768125
2023-04-03 00:00:00Z,290.167492125,282.947505875
2023-04-10 00:00:00Z,288.5650063125,283.2793785000
2023-04-17 00:00:00Z,287.9837437500,284.1906227500
2023-04-24 00:00:00Z,303.6206265000,296.6893751250
2023-05-01 00:00:00Z,310.1125010625,304.0624923750
2023-05-08 00:00:00Z,310.9949970625,306.5987567500


If you do need access to a data frame, it's pretty trivial to import data into a `DataFrame` using Deedle and perform operations on it. You won't get the helpful IntelliSense hints informing you of valid navigations, and you won'll get most type errors at runtime instead of compile time, but the errors should be clearer than Python in most cases.

In [43]:
#r "nuget: Deedle"

In [44]:
open System.Net
open System.Net.Http
open Deedle

[<Literal>]
let url = "http://query1.finance.yahoo.com/v7/finance/download/MSFT?period1=1678116713&period2=1709739113&interval=1d&events=history&includeAdjustedClose=true"
let frame = 
    Frame.ReadCsv((new HttpClient()).GetStreamAsync(url).Result)
    |> Frame.indexRowsDate "Date"

frame.Print()

                         Open       High       Low        Close      Adj Close  Volume   
3/6/2023 12:00:00 AM  -> 256.429993 260.119995 255.979996 256.869995 254.778946 24109800 
3/7/2023 12:00:00 AM  -> 256.299988 257.690002 253.389999 254.149994 252.081085 21473200 
3/8/2023 12:00:00 AM  -> 254.039993 254.539993 250.809998 253.699997 251.634750 17340200 
3/9/2023 12:00:00 AM  -> 255.820007 259.559998 251.580002 252.320007 250.266006 26653400 
3/10/2023 12:00:00 AM -> 251.080002 252.789993 247.600006 248.589996 246.566360 28333900 
3/13/2023 12:00:00 AM -> 247.399994 257.910004 245.729996 253.919998 251.852966 33339700 
3/14/2023 12:00:00 AM -> 256.750000 261.070007 255.860001 260.790009 258.667053 33620300 
3/15/2023 12:00:00 AM -> 259.980011 266.480011 259.209991 265.440002 263.279205 46028000 
3/16/2023 12:00:00 AM -> 265.209991 276.559998 263.279999 276.200012 273.951630 54768800 
3/17/2023 12:00:00 AM -> 278.260010 283.329987 276.320007 279.429993 277.155334 69527400 

In [45]:
let df =
    frame?Low
    |> Series.sampleTime (TimeSpan.FromDays 7) Direction.Forward
    |> Series.mapValues (fun v -> Stats.mean v)
df.Print()

3/6/2023 12:00:00 AM   -> 251.8720002        
3/13/2023 12:00:00 AM  -> 260.0799988        
3/20/2023 12:00:00 AM  -> 272.40599979999996 
3/27/2023 12:00:00 AM  -> 278.0919984        
4/3/2023 12:00:00 AM   -> 283.64250925       
4/10/2023 12:00:00 AM  -> 283.0340024        
4/17/2023 12:00:00 AM  -> 285.17000160000003 
4/24/2023 12:00:00 AM  -> 289.076001         
5/1/2023 12:00:00 AM   -> 304.1639953999999  
5/8/2023 12:00:00 AM   -> 306.58600459999997 
5/15/2023 12:00:00 AM  -> 311.6499938        
5/22/2023 12:00:00 AM  -> 317.95             
5/29/2023 12:00:00 AM  -> 328.77999124999997 
6/5/2023 12:00:00 AM   -> 327.41800539999997 
6/12/2023 12:00:00 AM  -> 333.5020082        
...                    -> ...                
11/27/2023 12:00:00 AM -> 375.7160034        
12/4/2023 12:00:00 AM  -> 366.2200012        
12/11/2023 12:00:00 AM -> 367.547998         
12/18/2023 12:00:00 AM -> 370.3599976        
12/25/2023 12:00:00 AM -> 373.48750325000003 
1/1/2024 12:0