<p>For a number written in Roman numerals to be considered valid there are basic rules which must be followed. Even though the rules allow some numbers to be expressed in more than one way there is always a "best" way of writing a particular number.</p>
<p>For example, it would appear that there are at least six ways of writing the number sixteen:</p>
<p class="margin_left monospace">IIIIIIIIIIIIIIII<br />
VIIIIIIIIIII<br />
VVIIIIII<br />
XIIIIII<br />
VVVI<br />
XVI</p>
<p>However, according to the rules only <span class="monospace">XIIIIII</span> and <span class="monospace">XVI</span> are valid, and the last example is considered to be the most efficient, as it uses the least number of numerals.</p>
<p>The 11K text file, <a href="https://projecteuler.net/project/resources/p089_roman.txt">roman.txt</a> (right click and 'Save Link/Target As...'), contains one thousand numbers written in valid, but not necessarily minimal, Roman numerals; see <a href="https://projecteuler.net/about=roman_numerals">About... Roman Numerals</a> for the definitive rules for this problem.</p>
<p>Find the number of characters saved by writing each of these in their minimal form.</p>
<p class="smaller">Note: You can assume that all the Roman numerals in the file contain no more than four consecutive identical units.</p>


**Solution(s):**
We parse through the text document, converting each line into a string that contains the Roman numeral. We then use two functions to help us convert the numeral into a decimal, then back to an efficient numeral. These functions are defined using a hierarchy of if-else statements. Once we have the list of efficient Roman numerals, we compare the sizes of the strings in the two lists and sum of the differences of the sizes.

In [None]:
def convertToDecimal(s):
    """
    Takes in a string of a Roman numeral 
    and returns the integer represented by the numeral
    """
    tot = 0
    ind = 0
    l = len(s)
    while ind < l:
        let = s[ind]
        if let == "M":
            tot += 1000
        elif let == "D":
            tot += 500
        elif let == "C":
            if ind < l-1 and (s[ind+1] == "M" or s[ind+1] == "D"):
                tot -= 100
                #ind += 1
            else:
                tot += 100
        elif let == "L":
            tot += 50
        elif let == "X":
            if ind < l-1 and (s[ind+1] == "L" or s[ind+1] == "C"):
                tot -= 10
                #ind += 1
            else:
                tot += 10
        elif let == "V":
            tot += 5
        elif let == "I":
            if ind < l-1 and (s[ind+1] == "V" or s[ind+1] == "X"):
                tot -= 1
                #ind += 1
            else:
                tot += 1
        else:
            print("BAD LETTER!", let)
        ind += 1
    return tot

In [None]:
# Here we go through the list of numerals and save them as a list of strings.

romans = open("p089_roman.txt", "r")               # read the file
numbers = []                                     # initialize a list to store the strings


while True:
     # Get next line from file
    number = romans.readline()
  
    # if line is empty
    # end of file is reached
    if not number:
        break
    numbers.append(number[:-1])                 # We need to cut off the pagebreak characters
romans.close()


In [None]:
# The loop above cuts off the last character in the last line, so correct it
numbers[-1]="XXXXVIIII"

In [None]:
decimals = [convertToDecimal(number) for number in numbers]

In [None]:
def convertToRomans(tot):
    """
    Takes in a decimal and returns the most efficient way of writing
    that decimal as a Roman numeral.
    """
    rom = ""
    while tot > 0:
        if tot >= 1000:
            rom += "M"
            tot -= 1000
        elif tot >= 900:
            rom += "CM"
            tot -= 900
        elif tot >= 500:
            rom += "D"
            tot -= 500
        elif tot >= 400:
            rom += "CD"
            tot -= 400
        elif tot >= 100:
            rom += "C"
            tot -= 100
        elif tot >= 90:
            rom += "XC"
            tot -= 90
        elif tot >= 50:
            tot -= 50
            rom += "L"
        elif tot >= 40:
            tot -= 40
            rom += "XL"
        elif tot >= 10:
            tot -= 10
            rom += "X"
        elif tot >= 9:
            tot -= 9
            rom += "IX"
        elif tot >= 5:
            tot -= 5
            rom += "V"
        elif tot >= 4:
            tot -= 4
            rom += "IV"
        else:
            tot -= 1
            rom += "I"
    return rom

In [None]:
# Create a list of efficient Roman numerals
reconverts = [convertToRomans(n) for n in decimals]

# This loop checks that the conversion doesn't change the value of the decimal.
for i in range(len(numbers)):
    # print(reconverts[i], numbers[i])
    if convertToDecimal(reconverts[i]) != decimals[i]:
        print(i, numbers[i], decimals[i], reconverts[i])

In [None]:
diff = 0                                                           # A count of the difference in string lengths.
for i in range(len(numbers)):
    diff += len(numbers[i])-len(reconverts[i])

    # The next two lines show which numerals were converted.
    # if reconverts[i] != numbers[i]:
    #    print(i, reconverts[i], numbers[i], decimals[i], diff)
diff