### Adventures in Software Engineering #1

This simple exercise shows how to handle a text file whose content is slightly off specification.

The `csv` module is intended to handle files containing comma-separated data. Such data is sourced
by many different utilities, not all of which correctly observe specifications. I found one such
file in my own data, a list of videos.

In [None]:
import csv

In [None]:
reader = csv.reader(open("data/videos.txt", "r"))
reader

In [None]:
next(reader)

As you might spot, there are some very funny things going on in this file.
Let's take a look at the first few lines of data.

In [None]:
open("data/videos.txt", "r").readlines()[:5]

The issue here is the spaces that follow the commas intended to separate the fields.
This is causing the following double-quote to _not_ be treated as a field delimiter.

The `csv` module allows you to define _dialects_, which affect how the file is interpreted.
Since this is a very simple issue, however, I chose instead to create a object that could
be used instead of the file, masking the defects of the original data.

Python's `io` module defines the `BytesIO` object, essentially an in-memory replacement for the binary file.
Below I create one by concatenating all the lines of the input file after removing the space in
any occurrence of the sequence `'", "'`, rendering it `'","'` instead.
Note, however, that this operation should really be a string operation.

In [None]:
from io import BytesIO

In [None]:
file = open("data/videos.txt", "r")
newfile = BytesIO("".join(line.replace('", "', '","') for line in file))

Rewinding the `StringIO` to the beginning of its content with the `.seek()` method allows
you to use it as the argument when creating a new `csv.reader ` object.
Iterating over that object shows the data being interpreted correctly

In [None]:
newfile.seek(0)
list(t for t in csv.reader(newfile))