Up and Running with F#
======================
A gentle guide to a powerful language

> ⚠️ This website is under construction! 

F#'s default package manager and primary package source is called [**NuGet**](https://www.nuget.org). We'll need a charting library called **Plotly.NET** later on, so let's install it:

In [39]:
#r "nuget: FSharp.Data"
#r "nuget: Plotly.NET"
#r "nuget: Plotly.NET.Interactive"
#r "nuget: Deedle"

F# is composed of **expressions**:

In [2]:
open FSharp.Data
open Plotly.NET

[<Literal>]
let url = "http://query1.finance.yahoo.com/v7/finance/download/MSFT?period1=1678116713&period2=1709739113&interval=1d&events=history&includeAdjustedClose=true"

type Stocks = CsvProvider<url> //"https://query1.finance.yahoo.com/v7/finance/download/MSFT?period1=1678116713&period2=1709739113&interval=1d&events=history&includeAdjustedClose=true">

let msft = Stocks.GetSample() //Stocks.Load("https://query1.finance.yahoo.com/v7/finance/download/MSFT?period1=1678116713&period2=1709739113&interval=1d&events=history&includeAdjustedClose=true")


Chart.Line(// high minus low
    xy = [ for row in msft.Rows -> row.Date, row.High - row.Low ]
)

In [3]:
// expressions separated by `;`s
1; 2; 3

// expressions separated by new lines
4
5
6

If you wrap expressions with `[]`, you get a **list**:

In [4]:
// This is a list.
[ 1; 2; 3 ]

// This is also a list.
[
    1
    2
    3
]

All elements of a list must be the of the same *type*. So `[ 1; "a"; true ]` is not valid.

In F#, the `,` separates tuple elements, not collection elements. Tuples are useful for all kinds of things in F#, and the language comes with a terse syntax for representing them:

In [5]:
// two-ple
1, 2

// 4-ary tuple
3, 4, 5, 6

// Unlike collections (lists and arrays), tuples can hold parameters of different types.
"Erica", 34, false

// Sometimes parentheses are required
(6, 7)

Unnamed: 0,Unnamed: 1
Item1,6
Item2,7


We can use tuples and lists together to plot points:

In [6]:
open Plotly.NET

Chart.Point(
[ 
    1, 2
    2, 4
    3, 3
])

You can also create lists using the range operator `..`,

In [7]:
// start..end (both inclusive)
[ 1..10 ]
// evaluates to [ 1; 2; 3; .. 10 ]

//   ..step..
[ 5 .. -1 .. -5 ]

or by using sequence expressions:

In [8]:
[ for i in 1..10 -> i * i ]

Let's use them together to plot the list of integer squares up to 10:

In [9]:
open Plotly.NET

Chart.Line([ for x in 1 .. 10 -> x, x * x ])

F# supports all the built-in .NET types, as well as one called **`FSharpFunc`**. F# enables function constructs not possible in C#, but being built on top of the .NET Common Language Runtime, F# developers needed a special type not expressable in C# yet. `FSharpFunc` is the name of that .NET type, but in F#, it's simply called a *function*. 

In [10]:
0b00000101uy    // byte
2.              // float
2.0             // float
"abc"           // string
fun x -> x + 2  // int -> int

Types in F# can be thought of as domains, and functions as maps between those domains. Therefore, the way to read `int -> int` is "`int` mapped to `int`".

Functions aren't very useful unless we can *evaluate* them, which we do by *applying* their parameters with arguments, like so:

In [11]:
(fun x -> x + 2) 3

Functions are very important in F#, as they underpin the theory of how expressions can be assigned names. If we wanted to start assigning names to expressions in F# program, we could write our program like so:

In [12]:
(fun x ->   // define name (
    x + 2   //      rest-of-program-goes-here
) 3         // ) provide value `x` gets set to

As you can probably tell, this could easily get out of hand, as the passed parameter is visually separated from the name its bound to. The larger `rest-of-program-goes-here` gets, the farther apart they become. Forunately, we can fix this with a *`let`* expression.

In [13]:
let x = 3 in    // define name = value `x` gets set to
    x + 2       //      rest-of-program-goes-here (that uses `x`)

This visually looks a lot cleaner!

We can drop the `in` and the unnecessary spaces, but the underlying code is still the same:

In [14]:
let x = 3
x + 2

What's important to remember, is that conceptually, all the same elements are there (the name, the body, and the passed value). While the rules are relaxed a little in the outer-most scope, a `let` expression requires a body expression after it (that is not only just another `let` expression):

In [15]:
// but *can* do this
let a = 2 in
    let b = 3
    4

Assigning a name to an expression in F# is called a *binding*, because the value can't change once set. In fact, the `=` (when not used with `let`) is an in-place function that compares two objects, evaluating to `True` or `False`.

In [16]:
let a = 3
a = 4

Bindings are useful because of the obvious reason that sometimes code is more readable when you assign names to values, but bindings also allow you to *cache reference values*, potentially reducing computation time.

For example, in our squares example above, if we *also* wanted to chart $y + x \times sin(y)$ for each square, we could display it literally like so:

In [17]:
open Plotly.NET

Chart.combine [
    Chart.Line([ for x in 1 .. 10 -> x, x * x ])
    Chart.Line([ for x in 1 .. 10 -> x, float x * float x + Math.Sin(float x * float x) * float x ])
]

However, we could reuse the fact we already computed $y$ by binding a name to our squares list and use it instead:

In [18]:
open Plotly.NET

let squares = [ for x in 1 .. 10 -> x, x * x ]

Chart.combine [
    Chart.Line(squares)
    Chart.Line([ for x, y in squares -> x, float y + Math.Sin(float x * float x) * float x ])
]

If you're not already into types, it's very hard to talk about "types" and "fun" at the same time. Yet despite this, it's F#'s type system that helps make it such a "fun" language to play with. So let me demonstrate with some examples:

First, let's spin up a connection to our Python Jupyter kernel! (Yes we can do this with Polyglot Notebooks and it's awesome!!)

In [19]:
#!connect jupyter --kernel-name pythonkernel --conda-env base --kernel-spec python3

The `#!connect jupyter` feature is in preview. Please report any feedback or issues at https://github.com/dotnet/interactive/issues/new/choose.

Kernel added: #!pythonkernel

[pandas](https://pandas.pydata.org/) is a data manipulation and analysis library that provides a class called `DataFrame`. We can install it using Python's default package manager, `pip`.

In [20]:
pip install pandas

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.3.2 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip


It's very easy to import and print a CSV into a `DataFrame` in Python: 

In [21]:
import pandas as pd

# Read the CSV file into a Pandas DataFrame
df = pd.read_csv("http://query1.finance.yahoo.com/v7/finance/download/MSFT?period1=1678116713&period2=1709739113&interval=1d&events=history&includeAdjustedClose=true",
    # Notice, however, we have to know information about our file
    index_col="Date",
    parse_dates=[0])

df

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2023-03-06,256.429993,260.119995,255.979996,256.869995,254.778961,24109800
2023-03-07,256.299988,257.690002,253.389999,254.149994,252.081085,21473200
2023-03-08,254.039993,254.539993,250.809998,253.699997,251.634750,17340200
2023-03-09,255.820007,259.559998,251.580002,252.320007,250.266006,26653400
2023-03-10,251.080002,252.789993,247.600006,248.589996,246.566345,28333900
...,...,...,...,...,...,...
2024-02-29,408.640015,414.200012,405.920013,413.640015,413.640015,31947300
2024-03-01,411.269989,415.869995,410.880005,415.500000,415.500000,17800300
2024-03-04,413.440002,417.350006,412.320007,414.920013,414.920013,17596000
2024-03-05,413.959991,414.250000,400.640015,402.649994,402.649994,26919200


Manipulating data in a tabular format in the pandas domain of `DataFrame` and `Series` is pretty simple, given you know what data you're dealing with. However, it requires you to be aware of what types you're dealing with at all times. For example, these two bits of code do drastically different things:

In [22]:
Range = df["High"] - df["Low"] # Sets a variable of type `pandas.DataFrame` to the name `Range`
df["Range"] = df["High"] - df["Low"] # Sets data frame defined by expression to `Range` index of `df` 

df.Range

Date
2023-03-06     4.139999
2023-03-07     4.300003
2023-03-08     3.729995
2023-03-09     7.979996
2023-03-10     5.189987
                ...    
2024-02-29     8.279999
2024-03-01     4.989990
2024-03-04     5.029999
2024-03-05    13.609985
2024-03-06     6.769989
Name: Range, Length: 253, dtype: float64

This might be painfully obvious to some, but it's a problem that can easily compound - especially when interacting with the rest of the Python ecosystem or when dealing with complex data types.

One way to resolve this problem is to incorporate data types and names into your language using types. Here's an example with C# and `CsvHelper`:

In [23]:
#r "nuget: CsvHelper"

In [24]:
using System.IO;
using System.Net.Http;
using System.Globalization;
using CsvHelper;
using CsvHelper.Configuration.Attributes;

IList<Foo> records = null;

{
    string url = "http://query1.finance.yahoo.com/v7/finance/download/MSFT?period1=1678116713&period2=1709739113&interval=1d&events=history&includeAdjustedClose=true";
    using StreamReader reader = new((new HttpClient()).GetStreamAsync(url).Result);
    using CsvReader csv = new(reader, CultureInfo.InvariantCulture);
    records = csv.GetRecords<Foo>().ToList();
}

public class Foo
{
    public DateTime Date { get; set; }
    public float Open { get; set; }
    public float High { get; set; }
    public float Low { get; set; }
    public float Close { get; set; }
    [Name("Adj Close")]
    public float AdjClose { get; set; }
    public float Volume { get; set; }
}

records.Select(r => r.High - r.Low)

This makes it easier to navigate the data we're importing, as the structure of our data becomes part of the language. We can't accidentally use `AdjustedClose` instead of `AdjClose`, and if we do, our program won't compile and will tell us exactly where the problem is.

It's a bit inconvenient though... isn't it? We have to manually define the structure of the data we're importing in our program. If we're dealing with a huge dataset, we *can* just ignore some columns, but if we want to take advantage of navigating a huge dataset easily, it still puts the work of doing that upfront.

Fortunately, we can use a package called [FSharp.Data](https://fsprojects.github.io/FSharp.Data) which makes convenient use of an F# feature called *type providers* that can incrementally incorporate code and data into our compilation process.

There's a lot more that could be said about types, but this preliminary discussion is mostly to demonstrate the amazing benefit of type providers.

Let's create a type using `CsvProvider` and our data source:

In [25]:
open FSharp.Data
open Plotly.NET

[<Literal>]
let url = "http://query1.finance.yahoo.com/v7/finance/download/MSFT?period1=1678116713&period2=1709739113&interval=1d&events=history&includeAdjustedClose=true"

// pass the type provider your data
// (usually you'd pass a smaller, local file on your hard drive with the same structure)
type Stocks = CsvProvider<url> //"https://query1.finance.yahoo.com/v7/finance/download/MSFT?period1=1678116713&period2=1709739113&interval=1d&events=history&includeAdjustedClose=true">

// get a sample of the provided data source
let msft = Stocks.GetSample() //Stocks.Load("https://query1.finance.yahoo.com/v7/finance/download/MSFT?period1=1678116713&period2=1709739113&interval=1d&events=history&includeAdjustedClose=true")

// draw a chart with the sample data
Chart.Line(
    xy = [ for row in msft.Rows -> row.Date, row.High - row.Low ]
)

In [48]:
open System.Net
open System.Net.Http
open Deedle

[<Literal>]
let url = "http://query1.finance.yahoo.com/v7/finance/download/MSFT?period1=1678116713&period2=1709739113&interval=1d&events=history&includeAdjustedClose=true"
let frame = Frame.ReadCsv ((new HttpClient()).GetStreamAsync(url).Result)

// let a = frame?Date
frame.Print();

       Date                  Open       High       Low        Close      Adj Close  Volume   
0   -> 3/6/2023 12:00:00 AM  256.429993 260.119995 255.979996 256.869995 254.778931 24109800 
1   -> 3/7/2023 12:00:00 AM  256.299988 257.690002 253.389999 254.149994 252.081100 21473200 
2   -> 3/8/2023 12:00:00 AM  254.039993 254.539993 250.809998 253.699997 251.634750 17340200 
3   -> 3/9/2023 12:00:00 AM  255.820007 259.559998 251.580002 252.320007 250.266006 26653400 
4   -> 3/10/2023 12:00:00 AM 251.080002 252.789993 247.600006 248.589996 246.566360 28333900 
5   -> 3/13/2023 12:00:00 AM 247.399994 257.910004 245.729996 253.919998 251.852966 33339700 
6   -> 3/14/2023 12:00:00 AM 256.750000 261.070007 255.860001 260.790009 258.667053 33620300 
7   -> 3/15/2023 12:00:00 AM 259.980011 266.480011 259.209991 265.440002 263.279175 46028000 
8   -> 3/16/2023 12:00:00 AM 265.209991 276.559998 263.279999 276.200012 273.951630 54768800 
9   -> 3/17/2023 12:00:00 AM 278.260010 283.329987

In [60]:
let a = [1, 3; 2, 3] |> series

let b = frame.GetColumn<DateTime>("Date")

Data access and manipulation is pretty simple in both languages, so why choose F#?

One of the best features of F# is its type system. With F#, you get type inference like in TypeScript (but often better!), with the runtime-persistence of types you expect with C# and .NET, but the syntactical conciseness of Python.

...That was a lot of words though, and it's not immediately apparent to new or non-programmers why all that is beneficial!

The most apparent benefit of a strong and static type system is powerful "IntelliSense", or simply, the ability for your editor to be able to predict the next valid tokens you could input that would result in a valid program. This results in you - the programmer - hitting `Tab` a lot more but typing a lot less, which results in getting code written faster.

The benefits are going to be hard to show on the web version of this guide, but if you pop open [this source document](https://github.com/johnW-ret/fstandsforfun/blob/main/docs/index.ipynb), hit `.` to open vscode.dev, then open in VS Code desktop or create a new codespace in the browser, you can play around with the source code and see IntelliSense pop up as you edit.

Alternatively, feel free to just read along and try all that later.

I like to think of what you'd call *objects* in languages like Python or JavaScript as "databags". Their implementaton usually ends up as some form of a hash map.

In [28]:
# in python, as a dict
person1 = { "name": "Abby" }
person1["age"] = 26
person1[0] = "AAA"
person1["0"] = "BBB" # fun fact: note how 0 and "0" are different because Python is strongly-typed (even though it isn't statically typed)

person1

{'name': 'Abby', 'age': 26, 0: 'AAA', '0': 'BBB'}

What you would think of as *objects* are called *records* in F#. But there's a key difference: records are *immutable*, which means to make changes you have to make a new copy:

In [29]:
let person1 = {| Name = "Abby" |}
let person2 = {| person1 with Age = 26 |}

person2

Unnamed: 0,Unnamed: 1
Age,26
Name,Abby


This encourages a programming style where the *shape* of your data is provided up-front at object creation - you can't go after the fact and stick on or remove a bunch of marker fields. Likewise, any alterations to your object structure require you to redefine that structure (with the `with` keyword, as above).

This means that any time you hit `.`, you know you're seeing the full picture of all data available on that record.

If you know you're going to have more than one instance of some record of data (which is very often the case), creating a `type` allows you to tag a name to that structure:

In [30]:
type Person = { Name: string; Age: int }

let person1 = { Name = "Abby"; Age = 26 } // inferred as Person
let people = [ 
    { Name = "Abby"; Age = 26 }
    { Name = "Jeremy"; Age = 14 }
] // inferred as Person list

In [31]:

// ([ for idx in 0 .. (msft.NumberOfColumns - 1) ->
//     msft.Headers.Value[idx] => Series.ofValues (msft.Rows |> Seq.map (fun a -> (a :> ITuple)[idx]))
// ] |> frame).Print()


In [32]:

[ for country in stocks. stocks.Headers.Value -> 
      // Create two-level column key using tuple
        country => 
        // Create series from tuples returned by WorldBank
        Series.ofObservations country .Indicators.``GDP (current LCU)`` ]
  |> frame

frame (series [fb.Headers => fb.Rows])
// let arrRows = fb.Rows |> Array.ofSeq
// arrRows[0].Date

Stopped due to error


Error: input.fsx (2,18)-(2,24) typecheck error The value, namespace, type or module 'stocks' is not defined. Maybe you want one of the following:
   Stocks
input.fsx (6,9)-(6,15) typecheck error The value, namespace, type or module 'Series' is not defined.
input.fsx (7,6)-(7,11) typecheck error The value or constructor 'frame' is not defined. Maybe you want one of the following:
   Frame
   Format
   Trace
input.fsx (9,1)-(9,6) typecheck error The value or constructor 'frame' is not defined. Maybe you want one of the following:
   Frame
   Format
   Trace

In [33]:
[ for country in region.Countries -> 
      // Create two-level column key using tuple
      (region.Name, country.Name) => 
        // Create series from tuples returned by WorldBank
        Series.ofObservations country.Indicators.``GDP (current LCU)`` ]
  |> frame

Stopped due to error


Error: input.fsx (1,18)-(1,24) typecheck error The value, namespace, type or module 'region' is not defined.
input.fsx (3,8)-(3,14) typecheck error The value, namespace, type or module 'region' is not defined.
input.fsx (5,9)-(5,15) typecheck error The value, namespace, type or module 'Series' is not defined.
input.fsx (6,6)-(6,11) typecheck error The value or constructor 'frame' is not defined. Maybe you want one of the following:
   Frame
   Format
   Trace

> ⚠️ This area specifically is under construction

F# has a *very* useful feature called *partial application* which essentially allows you to partially apply some parameters to a function, then apply the rest later.

Let's build a function that adds two numbers:

In [34]:
// we can start with our original function
let add = (fun a -> a + 1)

// but change `1` to `b`, which means wrapping our function with another one that defines `b`
let add_1 = (fun b -> (fun a -> a + b))

add_1 2 3

It might make intuitive sense that we can call `add_1` with only one parameter. After all, it only takes one parameter, so it would make sense that it would evaluate to its body (now `fun a -> a + 2`)! Mouse over `add2To` to confirm its type is `int -> int`.

In [35]:
let add2To = add_1 2

The magic is that *all* F# functions are able to work this way. We can rewrite our `add_1` like so, but maintain the same functionality:

In [36]:
let add_1 = fun b a -> a + b
let add5To = add_1 5
add5To 3

In fact, that's not all. If we're binding a `fun` to a name, we can drop the ceremony of using `fun` (this is a bit jarring to get used to but looks a lot cleaner in practice):

In [37]:
let add_1 b a = a + b // The type of `add_1` is still `int -> int -> int`!

Let's use [this Plotly.NET example](https://plotly.net/simple-charts/range-plots.html) to map some data using functions and `let` bindings: 

In [38]:
// First, let's bind to an instance of `System.Random`
// (The `()` is what tell us we're getting a new *instance* and will (may?) be explained later.)
let rnd = System.Random()

// First, Let's define some initial points to plot.
let x = [ 1.; 2.; 3.; 4.; 5.; 6.; 7.; 8.; 9.; 10. ]
let y = [ 2.; 1.5; 5.; 1.5; 3.; 2.5; 2.5; 1.5; 3.5; 1. ]

// Let's define some points to plot *in terms of `y`*.
let yUpper = List.map (fun v -> v + rnd.NextDouble()) y
yUpper