---
title: "The spreadsheet column problem"
author: "Damien Martin"
date: "2025-07-02"
categories: [interview,puzzles]
---

# The Spreadsheet Column Problem

In a spreadsheet, we have columns `A`, `B`, ...., `Z`, `AA`, `AB` etc, that allow us to specify a cell address like `A3`, `C6`, `CA56`, etc.

The problem here is to write a function `col_to_index(col: str)` that takes the column part, and maps it to the column number. As some examples:

* `col_to_index('A')` returns 0
* `col_to_index('B')` returns 1
* `col_to_index('C')` returns 2
* `col_to_index('Z')` returns 25
* `col_to_index('AA')` returns 26
* `col_to_index('AB')` returns 27

etc

If the columns were only single characters A-Z, this is pretty easy:

In [1]:
def col_to_index(col: str) -> int:
    """Only works for single letter columns"""
    return ord(col.upper()) - ord('A')

# Solution

Let's start by what it _isn't_. This looks close to a "change of base" problem, where A = 0, B = 1, etc. The problem is that `AA` is `10_base(26)`, or 26 in base 10.

We can see this more distinctly by using an alphabet with only two letters, and comparing binary and "column_names" with two letters

| Number | Binary (4 bit) | Letter representation (AB only) | 
|--------|----------------|---------------------------------|
| 0      |           0000 |                               A |
| 1      |           0001 |                               B |
| 2      |           0010 |                              AA |
| 3      |           0011 |                              AB |
| 4      |           0100 |                              BA |
| 5      |           0101 |                              BB |
| 6      |           0110 |                             AAA |
| 7      |           0111 |                             AAB |
| 8      |           1000 |                             ABA |
| 9      |           1001 |                             ABB |
| 10     |           1010 |                             BAA |
| 11     |           1011 |                             BAB |
| 12     |           1100 |                             BBA |
| 13     |           1101 |                             BBB |
| 14     |           1110 |                            AAAA |
| 15     |           1111 |                            AAAB |

This isn't quite just a base replacement problem!

We can generalize our problem a little bit for easy testing, and make the function
```
def col_to_index(col: str, symbols=string.ascii_uppercase) -> int:
    ....
```

## Breaking the problem into two parts

The problem is easiest to break into two parts. If the length of the string is `N`:

* How many strings can we make with `N-1` digits?
* If you interpret your string in its own base, what do you get?



For example, if our alphabet is just `['A', 'B']`, then `AAAB` is obtain by:

1. How many strings can we make with N-1 digits number? That is, what is `BBB` plus 1 (to include 0)? Answer: 14.
2. What does `A=0, B=1`, `AAAB` correspond to in binary? Answer: `0001_2` in binary is 1.

So `AAAB` is 14+1 = 15.


The second part is tedious to do by hand, but trivial on a computer. The first part is a little more interesting. Suppose there are `L` letters in your alphabet. How many strings are there of length `N-1`?

* Strings of length 1 = `L`
* Strings of length 2 = `L*L`
* Strings of length 3 = `L*L*L`

i.e. a geometric series!

So we have `Number of strings length N-1 or less` = (L^N - L) / (L - 1). So in our example of N = 4, the number of strings of length 3 or less is (2^4 - 2)/1 = 14. In general, if N=2 (the first non-trivial case), we have $(L^2 - L)/(L-1) = L(L-1)/(L-1) = L$.

For part 2, we need to convert each letter into a digit, and then parse the digit. This is pretty standard.

In [2]:
import string 

def parse(s: str, symbols: str) -> int:
    """Convert an alpha numeric string to an int, by treating each character as a digit.
    
    Symbols are in order: symbol[0] maps to 0, symbol[1] maps to 1, symbol[N] maps to N.
    The string of 'digits' is then interpreted in base_len(symbols).
    
    >>> parse("10", symbols=string.digits)
    10
    >>> parse("10", symbols="01")
    2
    >>> parse("BB", symbols="AB")
    3
    """
    accum = 0
    num_seq = [symbols.index(symbol) for symbol in s]
    for num in num_seq:
        accum = accum * len(symbols) + num
    return accum

Or, if you want to be a fancy functional programmer:

In [3]:
from functools import reduce

def parse(s: str, symbols: str) -> int:
    """Convert an alpha numeric string to an int, by treating each character as a digit.
    
    Symbols are in order: symbol[0] maps to 0, symbol[1] maps to 1, symbol[N] maps to N.
    The string of 'digits' is then interpreted in base_len(symbols).
    
    >>> parse("10", symbols=string.digits)
    10
    >>> parse("10", symbols="01")
    2
    >>> parse("BB", symbols="AB")
    3
    """
    num_seq = [symbols.index(symbol) for symbol in s]
    return reduce(lambda acc, val: len(symbols)*acc + val, num_seq, 0)

In [4]:
assert parse("10", symbols = string.digits) == 10
assert parse("10", symbols="01") == 2
assert parse("FF", symbols="0123456789ABCDEF") == 255

In [5]:
def num_strings_len_n_or_less(length: int, alphabet_size: int) -> int:
    return (alphabet_size**(length + 1) - alphabet_size) // (alphabet_size - 1)

In [6]:
# Our example of A = 0 to BBB = 13, so 14 strings of length 3 or less
assert num_strings_len_n_or_less(length=3, alphabet_size=2)==14

In [7]:
def col_to_index(col_name: str, symbols: str=string.ascii_uppercase) -> int:
    """Only works for single letter columns"""
    if len(col_name) == 0:
        return ValueError("Cannot convert empty string")
    length_of_string = len(col_name)
    parsed_as_num = parse(s=col_name, symbols=symbols)
    if len(col_name) == 1:
        return parsed_as_num
    return num_strings_len_n_or_less(length=length_of_string - 1, alphabet_size=len(symbols)) + parsed_as_num


In [8]:
assert col_to_index('A', symbols="AB")==0
assert col_to_index('B', symbols="AB")==1
assert col_to_index('AA', symbols="AB")==2
assert col_to_index('AB', symbols="AB")==3
assert col_to_index('BA', symbols="AB")==4
assert col_to_index('BB', symbols="AB")==5
assert col_to_index('BBB', symbols="AB")==13

In [9]:
assert col_to_index("AA") == 26
assert col_to_index("AB") == 27
assert col_to_index("BA") == 26 + 26
assert col_to_index("ZZ") + 1 == col_to_index("AAA")

# Testing and invariants

We have written some common-sense tests so far, and I would be pretty confident this is working as intended. But it is worth asking, what properties do we think this function should have? Can we test it?

I am going to exclude error handling (e.g. what if the string contains symbols not in the "alphabet", at the moment it raises an IndexError and that seems sufficient). 


Here are some properties:

1. Any single letter should be represented by its index in the alphabet.
2. If we generate all strings of length `N` or lower in order: "A", "B", .., "Z", "AA", "AB", .... The value of the function should be the index in this list.
 
The first test is trivial from the implementation. Let's write the second test!

In [10]:
import itertools


def test_all_sequential(max_length: int, unordered_symbols: str):
    symbols=sorted(set(unordered_symbols))
    assert len(symbols) == len(unordered_symbols), 'Please make all symbols unique'

    up_to_n = itertools.chain(*[itertools.product(symbols, repeat=n) for n in range(1, max_length+1)])
    for expected, tup in enumerate(up_to_n):
        computed = col_to_index(''.join(tup), symbols=symbols)
        assert expected == computed, f"{''.join(tup)} input, expected {expected}, got {computed}"

In [11]:
test_all_sequential(max_length=3, unordered_symbols="AB")
test_all_sequential(max_length=3, unordered_symbols=string.ascii_uppercase)