# Who Attended Online User Meetings
This notebook is a supplement to the talk [The Heart of The Algorithm]() given by Rich Park at the APL Seeds '23 online user meeting. In that talk, he introduces the basic syntax of APL before explaining the core algorithm used to solve this problem.

In this notebook, we outline the problem and show how its solution can be used to gain insights from data. Then, we will build up the solution piece by piece, explaining every part in detail.

## The problem
Each year, Dyalog hosts a meeting of staff and users during which we give presentations and share ideas. For 2020 and 2021, these meetings were held online.

Afterwards, we obtained the attendance data from Zoom. It has been simplified and anonymised for the purpose of this problem.

**attendees.csv** is a table with columns:
- Attendee: the name of each attendee (fake, anonymous and unique for each attendee)
- Join Time: the date and time in `MM/DD/YYYY hh:mm` format when a user joined the meeting
- Leave Time: the date and time the user disconnected from the meeting

The same user may join and leave the meeting multiple times.

**Schedule.csv** is a table with columns:
- Session: A unique text ID for each session
- Title: Presentation title or session type (e.g. break)
- Start Time: Character date time when session started (same format as in Attendees.csv)
- End Time: Date and time that session ended

> Note that I have renamed these files to keep a convention that functions begin with capital letters and values are all lowercase.

In [56]:
]Get /d/Presentations/APLSeeds23/attendees.csv
5↑Attendees

In [58]:
]Get /d/Presentations/APLSeeds23/schedule.csv
5↑schedule

## Selecting data
We can select elements from arrays with square brackets. This is a special syntax in APL, but quite convenient.

In [59]:
schedule[3;2]   ⍝ 3rd row, 2nd column

Omitting an index returns all data along that dimension of the array:

In [27]:
schedule[1;]   ⍝ Header row is the first row

It may be preferred to select data from specific columns according to the name of the column from the header row.

To do this, we will to look up the position of our desired column in the header using the **index-of** `⍺⍳⍵` primitive:

In [28]:
schedule[1;]⍳'Start Time'

Index-of returns one greater than the length of the left argument (`1+≢⍺`) where elements in the right argument `⍵` are not found. In this case, none of the individual characters in `'Start Time'` were found in our header.

This is because `'Start Time'` has a different structure to our header. `'Start Time'` is a list of characters, whereas our header is a list of lists of characters. It is a nested list of character vectors.

We can see the difference by comparing the `]Box`-ed display:

In [29]:
schedule[1;]    ⍝ nested vector of character vectors
'Start Time'    ⍝ simple character vector
⊂'Start Time'   ⍝ nested scalar containing a character vector

The **nest** primitive only encloses its argument if it is simple, which is convenient for cases like this where we may want to supply a single list or a list of lists.

Stranding (juxtaposing arrays with spaces) forms a list of lists:

In [31]:
'one' 'two' 'three'

Enclosing adds a level of nesting:

In [32]:
⊂'one' 'two' 'three'

Enclose-if-simple nests a simple array:

In [33]:
⊆'one'

But adds no extra nesting to an already nested array:

In [34]:
⊆'one' 'two' 'three'

We can now use text column names to select columns from our data. We want the data without the header, so we will drop the first row.

In [53]:
5↑ (1↓schedule)[;schedule[1;]⍳⊆'Start Time' 'Title']

We can factor this out as a function:

In [54]:
Get←{(1↓⍺)[;⍺[1;]⍳⊆⍵]}
5↑ schedule Get 'Start Time' 'Title'

This is very convenient, but `1↓⍺` every time we want some data is a bit expensive. Instead, we will use `⎕CSV` to separate the header row on import and refer to the header directly.

```
path←'/d/Presentations/APLSeeds23/schedule.csv'
(s_data s_cols)←⎕CSV path ⍬ 1 1
Schedule←{s_data[;s_cols⍳⊆⍵]}
```

## Using datetimes
Our data has datetimes represented as lists of characters. In order to do comparison efficiently, we will convert these into scalar number. We will use the Unix time number, which is the number of seconds since 1st January 1970, so that we get 1 second precision.

The system function `⎕VFI` is used to safely convert character data into numbers. The **execute** primitive `⍎⍵` can be used to convert characters into number, but because it executes any APL expression it can be dangerous to use with data from external sources.

In the monadic case, `⎕VFI` checks space-separated tokens to see if they are valid APL numeric literals.

In [5]:
⎕VFI'42 1,4 1.5   1e3 2J¯4 2J-4 -6 ¯6'

It returns a two-element vector. The 1st element is a Boolean mask, a `1` indicates which numbers in the 2nd element were converted from valid literal numbers in the argument. The Boolean mask can be used with **compress** `⍺/⍵` to extract numbers.

In [6]:
⊃(//⎕VFI)'42 1,4 1.5   1e3 2J¯4 2J-4 -6 ¯6'

We can provide a left argument to specify other separator characters:

In [8]:
'/ :'⎕VFI'11/9/2020 14:00'

We know our datetimes should be all numbers, so we'll pick the 2nd element instead of using compress.

In [9]:
2⊃'/ :'⎕VFI'11/9/2020 14:00'

The system function `⎕DT` can convert between many datetime formats. We will convert from `⎕TS`-style time stamps to Unix time numbers.

But first, remember that our timestamps have the months first. `⎕TS`-style time stamps are `year month day hour minute second millisecond`. We can omit the milliseconds, but we must rotate our dates to be in the correct order.

In [10]:
¯1⌽11 9 2020

In a full time stamp, we only want to rotate the first three elements.

In [11]:
¯1(⌽@1 2 3)11 9 2020 14 0

We can then turn this into our Unix time number:

In [15]:
20⎕DT⊂2020 11 9 14 0

`⎕TS`-style time stamps are numeric lists. `⎕DT` accepts lists of numeric lists, so it can convert many time stamps with a single call. But then we need to enclose a single time stamp so it has the same structure.

In [14]:
2020 11 9 14 0                                     ⍝ A single time stamp
(2020 11 9 14 0)(2020 11 9 14 0)(2020 11 9 14 0)   ⍝ A list of time stamps
⊂2020 11 9 14 0                                    ⍝ A single enclosed timestamp

We can put these pieces together to form our datetime conversion function:

In [49]:
∇ Timestamp2Unix←{
  ⍺←⊢   ⍝ Amount to rotate date. Default is to reverse.
 ⍝ ⍵: character date times 
  20 ⎕DT ⍺∘(⌽@1 2 3)¨2⊃¨'/ :'∘⎕VFI¨⊆⍵
  }
∇

In [52]:
7↑¯1 Timestamp2Unix schedule Get 'Start Time' 'End Time'

## Comparing date times
As explained in [the presentation at APL Seeds '23](#), 

While the nested table is very convenient, we are going to use an "inverted table" format for this problem. An inverted table takes less memory and is faster to process.

While the default import of CSV is a matrix of character vectors, the inverted table is a vector of character matrices. The nested matrix is similar to a spreadsheet, whereas an inverted table is like a columnar (column store) database.

In [1]:
∇ ReadCSV←{
⍝ Convert CSV from source ⍵
⍝ ⍬ means ⍵ is a file path
⍝ 1 means all columns should be imported as character data
⍝ 1 means the first row should be read in separately as a header row
⍝ The result is a 2-element vector. First element is a nested 
  ⎕CSV ⍵ ⍬ 1 1 
}
∇