# Parsing a SWIFT Message

First off, we load _quercus_:

In [28]:
ret:{enlist(x;y)};
bind:{raze({(a;s):y;x[a]s}[x]')y@};
map:{bind[(ret x::)]};
trav:{({bind[{map[{raze(x;y)}[y]][x]}[y]]x}/)(x')y};
seqA:trav[::];
zero:{[x]()};
plus:{{x[z],y z}[x;y]};
mget:{enlist(x;x)};
mset:{[x;y]enlist(();x)};
mmod:{enlist(();x y)};
fil:{bind[{(zero;ret y)x y}[x]][y]};
opt:{plus[x;ret()]};
many:{plus[bind[{map[(enlist[z],)]y x}[x;.z.s];x];ret()]};
many1:{bind[{map[(enlist[y],)]many x}[x];x]};
upto:{fil[{x>=count y}[x]]many y};
among:{fil[{(x<=c)&(c:count[z])<=y}[x;y]]many z};
upto:among[0];
times:{$[x<1;ret();seqA x#y]};
sep1:{bind[{map[{enlist[x],y}[z]]many seqr[x;y]}[y;x]]x};
sep:{plus[sep1[x;y];ret()]};
skip:map zero;
item:{$[""~x;();enlist(first x;1_ x)]};
seqf:{bind[{map[{x(y;z)}[x;z]][y]}[x;z]]y};
seql:seqf[first];seqr:seqf[last];seq:seqf[enlist .];
sat:{bind[{$[x y;ret y;zero]}[x];item]};
oneof:{sat in[;x]};
noneof:{sat(not in[;x]::)};
between:{seqr[x;seql[z;y]]};
range:{sat{(x<=z)&z<=y}[x;y]};
digit:range ."09";
lwr:range ."az";
upr:range ."AZ";
letter:plus[lwr;upr];
alphanum:plus[letter;digit];
str:{{$[x~count[x]#y;enlist(x;count[x]_y);()]}[x]};
word:many1 letter;
num:many1 digit;
chr:{[x:`c]sat[(x=)]};
spaces:skip many space:chr" ";
eof:{$[""~x;ret[()]x;zero x]};
eol:chr"\n";
parens:between[chr"(";chr")"];
braces:between[chr"{";chr"}"];
lift:map enlist;
tok:{map(x$)};
(S;D;F;J;T):tok[`],(tok')"DFJT";
c:item;
j:J num;
s:S word;

rparse:{$[()~r:x y;'`parse;1<count r;'`ambig;[(a;s):r 0;not ""~s];'`spare;a]};
vparse:{.[{[x]1b}rparse::;(x;y);0b]};

## 1. Types of Characters Allowed

The SWIFT protocol includes predefined types of characters which are used along the specification. Our very first task consists on representing these types in terms of _quercus_.

#### n: numeric digits (0 through 9) only

The `digit` primitive from _quercus_ is all we need to represent SWIFT's _n_ type. In fact, `digit` consumes a character from the input as long as it is numerical.

In [5]:
digit"123"

"1" "23"


Otherwise, it silently fails.

In [6]:
digit"A23"



Thereby, we assign this parser to `nT` (n Type).

In [21]:
nT:digit

In [4]:
nT"123"

"1" "23"


In [5]:
nT"abc"



#### a: alphabetic letters (A through Z), upper case only

We also have `upr` in _quercus_, a native parser that knows how to deal with uppercase letters:

In [7]:
aT:upr

As expected it knows how to consume an uppercase character.

In [8]:
aT"ABC"

"A" "BC"


But it cannot parse lowercase or digits.

In [9]:
aT"abc"



#### c: alphabetic letters (upper case) and digits only

As we've seen, _quercus_ supplies parsers for uppercase and digit. However, there's no abstraction to parse both of them, as required by SWIFT's _c_ type. Fortunately, we have `plus`, which takes two parsers as input and produces a new one as result, where both of them are combined.

In [10]:
cT:plus[digit;upr]

We can check that it behaves as expected.

In [11]:
cT"ABC"
cT"123"
cT"abc"

"A" "BC"


"1" "23"




#### x: any character of the X permitted set

* a b c d e f g h i j k l m n o p q r s t u v w x y z
* A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
* 0 1 2 3 4 5 6 7 8 9
* / – ? : ( ) . , ‘ + CrLf Space

Once we know how to represent uppercase, digits, and their combination, it should be easy to accomodate a parser for lowercase characters. Although _quercus_ defines `lwr`, we'll use `letter`, where uppercase and lowercase are parsed together.

In [13]:
letter"ABC"
letter"abc"
plus[letter;digit]"123"

"A" "BC"


"a" "bc"


"1" "23"


But, what can we do to represent a particular set of characters `/–?:().,‘`? For this task, we can use `oneof`, which just parses "one of" the supplied characters as input.

In [14]:
oneof["/–?:().,‘"]"?abc"
oneof["/–?:().,‘"]"(abc"

"?" "abc"


"(" "abc"


We decided to leave the space aside, since it has its own primitive in _quercus_, `space`:

In [17]:
space" foo"

" " "foo"


Now that we've implemented the individual parsers, it's time to "plus" them together!

In [24]:
xT0:plus[plus[plus[digit;letter];space];oneof"/–?:().,‘"]

This works, but it gets difficult to read when the number of parsers to combine is large. Fortunately, we can leverage on q iterators to produce a nicer expression. This is the final version for `xT`:

In [15]:
xT:(plus/)digit,letter,space,oneof"/-?:().,‘"

In [21]:
xT"???123ABC"

"?" "??123ABC"


Consuming individual characters is all right, but there's still a lot of work to do in order to parse a whole SWIFT message. However, as we'll see, the idea of combining small parsers to conform larger and more complex ones will prevail along the exercise.

## 2. Field Examples

There are tons of fields scattered around the SWIFT specification. They are typically introduced with a name (& description), type and instance example. Our objective in this section is not only reading whole fields, but also consuming a sequence of them. Let's start by parsing them individually:

#### OutputDate: 6!n YYMMDD (240424)

In essence, what this notation is suggesting is that _OutputDate_ is a field that consists of 6 characters of type _n_. We could use `times` from _quercus_ to repeat a parser a given number of times.

In [None]:
od0:times[6;nT]

We can test that it works nicely by feeding the supplied example:

In [26]:
od0"240424"

"240424" ""


As you can see, `od0` is a parser that consumes 6 characters instead of a single one.

When parsing dates in q, you probably don't want a _string_ as a result. Instead, it would be nicer if the parser could return a date in a way no further manipulation is needed. We can use `D` from _quercus_ to produce a new parser that turns the result from the previous parser into a date.

In [39]:
od:D times[6;nT]

The new version for `od` consumes the whole input and produces a date as result.

In [40]:
od"240424"

2024.04.24 ""


As expected, if the input is not valid, eg. one of the characters doesn't belong to the SWIFT's _n_ type, the parser fails.

In [128]:
od"240X24"



#### LTAddress: 12!x (BANKFRPPAXXX)

We can extend the same idea to other fields. In fact, we could determine to parse the `LTAddress` and turn it into a symbol by means of `S`:

In [127]:
lt:S times[12;xT]

We can feed the supplied example to show that this parser is actually returning a symbol.

In [129]:
lt"BANKFRPPAXXX"

`BANKFRPPAXXX ""


#### SequenceNumber: 6!n (123456)

We also can cast (_tok_) into a _long_ type.

In [41]:
sn:J times[6;nT]

In [130]:
sn"123456"

123456 ""


#### Consuming a Sequence of Fields

Once we have made individual parsers for each field, it's time to put them all together. In particular, our next task consists on creating a parser that combines `lt`, `od` and `sn` so we can consume the whole input `"BANKFRPPAXXX240424123456"` (BANKFRPPAXXX + 240424 + 123456) in a sequential way.

Fortunately, _quercus_ supplies `seq` which takes two parsers as input and returns a new parser that runs the first parser and then it runs the second one.

In [131]:
seq[od;lt]"240424BANKFRPPAXXX123456"

(2024.04.24;`BANKFRPPAXXX) "123456"


If we want to sequence more than two parsers, we could use the same "plus" trick we saw above.

In [132]:
((seq/)od,lt,sn)"240424BANKFRPPAXXX123456"

((2024.04.24;`BANKFRPPAXXX);123456) ""


However, we'd like to avoid the nested results. Luckily, we can use `seqA` from _quercus_, where we get a non-nested result.

In [137]:
flds:seqA od,lt,sn;
flds"240424BANKFRPPAXXX123456"

(2024.04.24;`BANKFRPPAXXX;123456) ""


## 3. Dependent Fields

If the value of a field depends on the value of another one, we get a _dependent field_. In the following example, _delivery_ depends on _priority_.

#### Priority: 1!a (U)
This character, used within FIN Application Headers only, defines the priority with which a message is delivered. The possible values are:
* U = Urgent
* N = Normal

#### Delivery: 1!n (1)
Delivery monitoring options apply only to FIN user-to-user messages. The chosen option is expressed as a single digit:
* 1 = Non-Delivery Warning
* 2 = Delivery Notification
* 3 = Non-Delivery Warning and Delivery Notification

**If the message has priority 'U', the user must request delivery monitoring option '1' or '3'. If the message has priority 'N', the user can request delivery monitoring option '2' or, by leaving the option blank, no delivery monitoring.**

Implementing a parser for _Priority_ is straightforward.

In [55]:
pri:S oneof"UN";
pri"UN"

`U ,"N"


Note that we've used `S` to cast to symbol.

There are two different parsers for `Delivery` which depend on the priority. If priority equals `U` then the parser should be:

In [62]:
cU:oneof"13"

On the contrary, if priority equals `N` then the parser *could* consume `2`. The parser for consuming a particular character is easy, we can leverage on `chr`:

In [58]:
chr["2"]"2oo"

"2" "oo"


But how can we model the optional behaviour? Of course, we have another combinator for that, it's called `opt`!

In [99]:
cN:opt chr"2"

It just returns nothing (interesting) when no `2` is found in the input:

In [100]:
cN"abc"

() "abc"


But we get a weird behaviour when it actually finds such character:

In [101]:
cN"2abc"

"2" "abc" 
()  "2abc"


You can ignore this aspect for now, we'll come back to it later, but in essence, this is the way we express non-determinism.

Once we know how to deal with optional fields, let's get back to dependent ones. Having implemented all cases, we can deal with the _delivery_ logic by using `bind` from _quercus_:

In [103]:
dlv:bind[`U`N!(cU;cN)]pri

This is probably the most powerful combinator in the library. It takes the previous parser and a function (or a dictionary) that determines the next parser to use as inputs. In essence, this code manifests that if we find `U` as priority, we need to use `cU` for the remaining text. Otherwise, if we find `N` we should continue with `cN`.

In [105]:
dlv"U1"
dlv"U2" / ko
dlv"U3"
dlv"N2"

"1" ""




"3" ""


"2" ""  
()  ,"2"


In [None]:
"{1:BANKFRPPAXXX4321123456}"

In [106]:
cT:plus[upr;digit]

In [115]:
(braces seq[seql[tos upto[3;cT];chr":"];b1])"{1:BANKFRPPAXXX4321123456}{2:123456}"

(`1;`LTAddress`SessionNumber`SequenceNumber!("BANKFRPPAXXX";4321;123456)) ""


In [161]:
.sw.1:b1;
.sw.2:j;
.sw.3:str"foo";
p:many1 braces bind[get`.sw] seql[tos upto[3;cT];chr":"];
rparse[seql[p;eof]]"{1:BANKFRPPAXXX4321123456}{2:123456}{3:foo}"

`LTAddress`SessionNumber`SequenceNumber!("BANKFRPPAXXX";4321;123456)
123456
"foo"


## 4. Validation

For the next problem, we are going to introduce a simple validation. Say the output date has to be greater than or equal to today.

In [119]:
od"240424"

2024.04.24 ""


We have the possibility of filtering out branches by using `fil`.

In [121]:
vp:fil[.z.d<=]od

Now, if the value conforms the validation we get a result; otherwise we get nothing.

In [125]:
vp"250424"
vp"230424"

2025.04.24 ""




We have reached the end of the example!