Skip to content

Tutorial: Unary arithmetic

Martin Ender edited this page Jan 16, 2018 · 15 revisions

Under construction... (...and out of date. This was written for Retina 0.8.2.)

Table of contents

  1. Base conversion
    1. Decimal to Unary
    2. Unary to Decimal
    3. Binary to Unary
    4. Unary to Binary
    5. Unary to Any Base
  2. Basic arithmetic
    1. Addition
    2. Subtraction
    3. Multiplication
    4. Divsion
    5. Modulo
    6. Divmod
    7. GCD

Retina is perfectly capable of arithmetic computations, at least on integers. However, since all processing of data is string-based, the best representation of numbers is the unary numeral system, where each integer N is represented represented as a string of N 1s (or any other convenient character). Of course, this limits computations to natural numbers (i.g. non-negative integers) — we'll look into extensions to negative integers later. Since all mathematical operations have to be implemented manually on unary numbers in Retina, this article gives an overview of how various computations can be performed.

I'm hoping that this article can both act as a reference when looking for a specific mathematical operation as well as a learning resource that can be read from top to bottom to get a better grasp on how to solve more general problems in Retina.

A note of caution regarding zeros: Zeros can be a bit tricky in unary notation, since they correspond to the empty string. When you want to support unary zeros in your input for any sort of operation, that means that your regex will be able to match an empty string. If the input string contains more than just a single number, you'll have to be careful that these empty matches only occur in positions where you expect numbers to be. Whether and how exactly you ensure this depends very strongly on your specific scenario. Therefore, all the snippets below will assume that input is always non-zero. In most cases, adapting them for zero input only requires changing some +s or *s and potentially adding a lookaround to restrict admissible match positions. Keep this in mind if you have to deal with inputs that are potentially zero.

Base conversion

Before we start processing unary numbers, we have to get those unary representations. More likely than not, we want decimal input and output, so we'll have to do some conversion. This section covers only decimal/unary. More generic base conversion will be treated later.

Decimal to unary

You've got a decimal number (or several) in the string and want to convert them to unary. This can easily be done with Retina's repetition operator $*. When used in a substitution it repeats the character on the right by the first decimal number it finds in the token on the left. A simple way to turn every integer in the input into a unary representation using 1 would be:

\d+
$&$*1

However, if $* is the first or last token in the substitution pattern, $& and 1 will be assume as default arguments, respectively, so the above is equivalent to:

\d+
$*

A different unary digit can be used by giving $* a different right-hand operator than 1.

Unary to decimal

You're done with your computation and you want to turn the unary representation back into a decimal number. There's a couple of ways to go about that.

If the current string only contains a single unary number and all you want the output to be is its decimal representation, you can simply count the digits using a match stage:

M`1

If this is the last stage in the entire program, M` can even be omitted. This can also be used for multiple numbers, provided they are separated by linefeeds:

M%`1

Here, % is per-line mode, which treats each line as a separate string, so the stage counts the matches in each of them, and replaces each line with that number.

Sometimes you're not so lucky though, e.g. because you've got different separators or because there's other information in the string that needs to be retained. In this case you can use the . and # token modifiers in Retina's substitution syntax. The former specifies that a tokens length should be inserted instead, and the latter counts the number of captures.

So in the simplest case you can do:

1+
$.&

Alternatively, you could also do something like:

(1)+
$#1

The latter is quite useful because it can be directly combined with various arithmetic operations. E.g. if you know your input is even, and you want to halve it before converting to decimal, you could do:

(11)+
$#1

Or if want to subtract 1, just don't include it in the captures:

1(1)*
$#1

Binary to Unary

1
01
+`10
011
^0+

It uses this algorithm, but used 1 instead of |.

Note that the second rule is applied first, then followed by repeated application of the first rule, and finally one application of the third rule.

Only the second rule needs repetition.

Unary to Binary

+`(1+)\1
${1}0
01
1

The first stage is repeated.

It matches groups that are repeated once, and then only copies the repeated group once, and add a 0 behind it.

This stage effectively does the repeated carrying.

However, we need to replace 01 with 1 to finish the conversion.

Unary to Any Base (Positive Integer Base)

Let us say that we would like to convert unary to base 3.

+`(1+)\1\1
${1};
^
;
;(1*)
$.1

To modify it to other bases, just modify the first line. For example, base 4 would have the first line being +`(1+)\1\1\1 (or \1{3} instead of \1\1\1 which makes it shorter).

Alternatively, you could repeatedly apply divmod:

+r`(1*)(\3)*(?<=1);(1+)$
$1;$#2$*1;$3

The (?<=1) checks that the number is not zero, or else we would loop forever!

For example, 11111111111;111 (eleven;three) gets converted to 11;;1;;111 (0102_3, digits reversed) through the following steps (you can see this by putting a colon between + and r!):

  • 11;111;111 (32_3)
  • 11;;1;111 (102_3)
  • 11;;1;;111 (0102_3)
  • Now the number is 0 and the replacement stopped.

Basic arithmetic

This section covers the basic arithmetic operators, i.e. addition, subtraction, multiplication, division, and also modulo.

Addition

Adding two (or more) numbers is the simplest thing in unary. Simply remove the delimiters, such that all the numbers are written back to back. E.g. if you've separated your inputs with ;:

;

This works for an arbitrary number of inputs. Note that if you want to convert the result back to decimal using the match approach, you don't need to do anything at all: just count all the 1s in the string.

Addition with a fixed second operand (i.e. adding a hard-coded N to the input) is also quite simple using the repetition operator. The only question is how to match the input. This depends a bit on your exact scenario. You just need to make sure that you get a single match per number. If you've only got the number in the string something like ^ suffices. E.g. if you want to add 15 to the input:

^
15%*

If there are multiple numbers, it's usually easiest to match the entire number, and replace it with itself plus whatever you want to add:

1+
$&15$*

Remember that this implies a right-hand argument 1 for $*. Simpler solutions may be possible depending on the assumptions you can make about your input.

Subtraction

Of course, the natural numbers aren't closed under subtraction, which means given inputs a and b, a - b isn't necessarily a natural number. Ideally, you'll be able to assume that a ≥ b so you don't have to worry about this. If b > a is a possibility, then there are several reasonable ways to define the result a - b on the natural numbers. The two most natural implementations of subtraction in Retina allow for two such extensions.

Subtracting one number from another is fairly easy with a backreference. If the input is a;b, we can compute a - b as follows:

(1+);\1

Interestingly, this code gives the same result if the input is given as b;a. This is because this stage actually computes |a - b|.

There is another way to implement this: instead of using a backreference, we only remove one pair of digits from both numbers at the same time, but do so in a loop:

+`1;1
;

However, note that this leaves the separator ; in the string which you may or may not have to remove manually. The upside is that the ; will be on the or the right of the result, depending on whether a > b or b < a. That means, this solution can easily be adapted to compute max(a - b, 0), i.e. to clamp the subtraction to the natural numbers:

+`1;1
;
;1*

Subtracting a fixed constant from a number is trivial by matching and removing the desired amount of digits. Just make sure to anchor the match to one end of the number so that it can only match once. Subtracting 15:

\b1{15}

Note that this will leave the number unchanged if it is greater than 15. To clamp subtraction to non-negative results instead, we can simply make the quantifier variable:

\b1{1,15}

Subtracting a single input from a fixed constant is a bit trickier. Conceptually the simplest solution might be to insert the constant into the string and apply the two-operand computation above. Alternatively, we can make use of balancing groups if we really want to perform this computation in a single step. The idea is to push N empty matches onto a stack (e.g. group 1) and then remove one for every digit in the input. Using the # modifier in the substitution we can determine how many captures were left:

(){15}(?<-1>1)+
$#1$*

Note that this actually computes |15 - b|, so when b > 15, we will end up with b - 15. This is because the regex will stop matching 1s after the first 15, remove those completely, and the remainder will be unchanged. To clamp subtraction to non-negative results instead, we'll just have to match additional 1s separately:

(){15}(?<-1>1)+1*
$#1$*

We will revisit subtraction later when talking about extending the unary representation to negative integers.

Multiplication

Now it gets a bit trickier. To multiply two numbers a and b, we want to replace each digit in a with a copy of b. This can be done with a lookahead:

1(?=.*;(1+))
$1

The problem with this is that it turns a;b into a*b;b. Usually, we also want to get rid of the b. To do that we could append a second stage that removes ;1+. However, since the substitution of the above stage doesn't contain any literals (but only the group reference $1), we can actually squeeze the cleanup into that:

1(?=.*;(1+))|.
$1

Now for each digit in a, the left alternative matches and we replace it with a copy of b. But for the separator, the 1 doesn't match and for the digits in b, the lookahead doesn't match, so the engine will match those characters with . instead. But now group 1 is unused so $1 gives an empty string and these characters are simply removed.

As opposed to addition, this doesn't work for multiple arguments out of the box. I don't know of a simpler way to compute a product of several integers other than folding multiplication over successive pairs. There are several ways to do that though. Assume that our input is a;b;c and we want to computer a*b*c. It appears to be simplest to fold the multiplication such that a single application gives a*c;b even though that seems somewhat convoluted. However, by doing the main computation on the first number and simultaneously removing the last number, we can quite easily ensure that only those parts are processed. Here is the code:

+`\G1(?=.*;(1+))|;1+$
$1

Note that the + is just Retina's loop option, which does the folding. If you omitted it, then the result stage would exactly do one step of the above computation. Note that otherwise there are only two small differences from the two-argument case:

  1. We now need \G to ensure that only digits in the first input are replaced, because now the lookahead can also match digit in all other inputs except the last. Note that we don't need to change the lookahead. The greediness of .* ensures that we always multiply the first input by the last.
  2. The removal is done on the entire last input at once using ;1+$. This is because we need the anchor $ to ensure that only the last input is removed, otherwise this part would match both b and c.

Finally, how do we multiply a single number by a hardcoded second operand? Similar to hardcoded addition, we can use the repetition operator to replace each digit with a fixed number of 1s. E.g. multiplying by 15:

1
15$*

Division

Division is comparably simple. Of course, the natural numbers aren't closed under division either, so we need to decide if we want to round down or up (provided we can't guarantee that b divides a exactly). Let's assume we want to round down first.

The idea is to check how many backreferences of b fit into a. For simplicity, let's consider an input format of b;a first. To take care of the rounding down, we simply match any extraneous digits in a to remove them:

(1+);(\1)*1*
$#2$*

We might be restricted to taking input as a;b though. The above solution can be adapted using either a lookahead or right-to-left matching. The latter is a bit easier (remember to also read the regex from right to left):

r`1*(\2)*;(1+)
$#1$*

Rounding up can be done quite easily by pushing an additional capture for group 1 if any additional 1s have to be removed:

r`(?<1>1+)?(\2)*;(1+)
$#1$*

Division by a constant is even simpler. We just check how often the constant fits into the number, so we don't need a backreference and therefore we don't need RTL mode either. Dividing by 15:

(1{15})*1*
$#1$*

Or rounding up:

(1{15})*(?<1>1+)?
$#1$*

Also note that if you've only got a single number in the string, and you want the output to be merely the decimal result, you can use the counting feature of match stages instead. Rounding down:

M`1{15}

Rounding up:

M`1{1,15}

And of course the M` can be omitted if this is your program's last stage. Unfortunately, I don't think there's a good way to divide a fixed constant by the input. In that case, it seems to be simplest to prepend the constant to the input and use the two-operand version above.

Modulo

The modulo operation sort of falls out of the above division solutions. Instead of counting how often b fits into a we remove all multiples of b from a. If the input is b;a that's simply:

(1+);\1*

Otherwise, we return to right-to-left matching:

r`\1*;(1+)

The same goes for modulo a fixed number:

1{15}

Divmod

Note that this makes it very easy to implement divmod, i.e. turn a;b into (a/b);(a%b):

r`(1*)(\3)*;(1+)
$#2$*1;$1

GCD (Greatest Common Divisor, Highest Common Factor, HCF)

It is quite easy to find the GCD in Retina. We don't even need to use the Euclidean Algorithm.

Assuming the input is a;b:

^(1+)\1*;\1+$
$1

This finds the greatest number of which the two numbers are a multiple, and then replace the whole string by that number.