# Introduction

At work, I often need to parse contents of files such as /etc/os-release. Here is a sample of what the file looks like:

```
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"
```

In this post, I will discuss my different attempts and their pros and cons.


## First Attempt: Split Line at the Equal Sign

This was the most obvious attempt, given a line such as:

In [1]:
line = 'VERSION="10 (buster)"'

We can split it by the equal sign:

In [2]:
line.split("=")

['VERSION', '"10 (buster)"']

The split results in 2 parts: the key (VERSION) and the value ("10 (buster)"). We are not done yet: we need to strip the quotes from the value. Putting it all together:

In [3]:
def parse_line(line):
    key, value = line.split("=")
    value = value.strip('"')
    return key, value

Test it out:

In [4]:
parse_line(line)

('VERSION', '10 (buster)')

Now that we successfully parse one line, we can parse multiple lines. Given that:

In [5]:
text = """VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"
"""

We can write the parser which split the text into lines and parse each line:

In [6]:
def parse_etc_os_release(text):
    dict_object = dict(
        parse_line(line)
        for line in text.splitlines()
    )
    return dict_object

Test it out:

In [7]:
parse_etc_os_release(text)

{'VERSION': '10 (buster)',
 'VERSION_CODENAME': 'buster',
 'ID': 'raspbian',
 'ID_LIKE': 'debian',
 'HOME_URL': 'http://www.raspbian.org/',
 'SUPPORT_URL': 'http://www.raspbian.org/RaspbianForums',
 'BUG_REPORT_URL': 'http://www.raspbian.org/RaspbianBugs'}

This approach was obvious, does not use any standard- or third-party library, so I really like it. However, I usually challenge myself to find alternative solutions, so I looked around and found another solution.

## Second Attempt: Use the `csv` Library

At closer look, the contents of /etc/os-release looks like a comma-separated-value (csv) file, only with the equal sign as separator. That means I could use the `csv` library to parse it:

In [8]:
import csv

def parse_etc_os_release2(text):
    # CSV can "sniff" or guess the separators
    dialect = csv.Sniffer().sniff(text)
    
    # Parse it and turn into a dictionary
    lines = text.splitlines()
    csv_reader = csv.reader(lines, dialect)
    dict_object = dict(csv_reader)
    
    return dict_object

Test it out:

In [9]:
parse_etc_os_release2(text)

{'VERSION': '10 (buster)',
 'VERSION_CODENAME': 'buster',
 'ID': 'raspbian',
 'ID_LIKE': 'debian',
 'HOME_URL': 'http://www.raspbian.org/',
 'SUPPORT_URL': 'http://www.raspbian.org/RaspbianForums',
 'BUG_REPORT_URL': 'http://www.raspbian.org/RaspbianBugs'}

In the code above, we first use the `csv` library to guess the dialect of the file, which includes separators, quote characters, and other characteristics. Next, we create a `csv.reader` object and use the `dict` construct to iterate over the lines and return a dictionary object.

The advantage of this method is the `csv` library can handle a wide variety of separators and quote characters. The disadvantage of this method, compare to the first one is the first does not need to use any library, standard or third party.

## Conclusion

For parsing /etc/os-release alone, I believe the two methods are a tie. However, the second can parse more formats: imagine file formats which uses colon instead of equal sign as a separator.