# Containers

**Definition**: Containers are any object that holds an arbitrary number of other objects. Generally, containers provide a way to access the contained objects and to iterate over them. 

A common situation to arise is the need to iterate over a large file mutltiple times. The simplest approach would be to use a generator to iterate over the file like below:

In [1]:
def read_file(data_path):
    with open(data_path, 'r') as in_file:
        for line in in_file:
            yield line

A file can be iterated over like so:

In [2]:
iteration = read_file('example.txt')
print(list(iteration))

['Data file line 1\n', 'Data file line 2\n', 'Data file line 3\n', 'Data file line 4\n', 'Data file line 5\n', 'Data file line 6\n', 'Data file line 7\n', 'Data file line 8\n', 'Data file line 9\n', 'Data file line 10']


Now, if we need to iterate over the file again we can just do the same again:

In [3]:
print(list(iteration))

[]


hhhmmmm... Not what I expected. This is the result of the generator producing the results a single time. The above generator has raised a `StopIteration` by reaching the end of the file and you won't get any more results when attempting to iterate over it again. What's more confusing is that no error has been raised either. This is because iterators such as `for` loops or the `list` constructor used above expect a `StopIteration` to be raised during normal operation. These functions cannot tell the difference between an iterator with no data or an iterator that has exhausted its output.

So how can we get around this problem? We could just store the first result but that could take up large amount of memory and cause the program to crash, so we probably don't want to do that. We can create an iterable container by defining a `__iter__` special method. The special `__iter__` method is used to define a way to iterate over data, in this case we are using it to iterate through a file. Internally, the `__iter__` returns an iterator object, which itself calls the `__next__` method. The iterator object will continue to call the `__next__` method until a `StopIteration` is raised. Below is an example container to iterate over a file given its path. We can call the `__iter__` method by simply iterating over an instance of the class e.g `for i in ReadFile('example.txt'):`

In [4]:
class ReadFile:
    def __init__(self, data_path):
        self.data_path = data_path
    
    def __iter__(self):
        with open(self.data_path, 'r') as in_file:
            for line in in_file:
                yield line

Let's see if we can iterate over the date multiple times:

In [5]:
f = ReadFile('example.txt')
print(list(f))
print(list(f))

['Data file line 1\n', 'Data file line 2\n', 'Data file line 3\n', 'Data file line 4\n', 'Data file line 5\n', 'Data file line 6\n', 'Data file line 7\n', 'Data file line 8\n', 'Data file line 9\n', 'Data file line 10']
['Data file line 1\n', 'Data file line 2\n', 'Data file line 3\n', 'Data file line 4\n', 'Data file line 5\n', 'Data file line 6\n', 'Data file line 7\n', 'Data file line 8\n', 'Data file line 9\n', 'Data file line 10']


In [6]:
for i in f:
    print(i)

Data file line 1

Data file line 2

Data file line 3

Data file line 4

Data file line 5

Data file line 6

Data file line 7

Data file line 8

Data file line 9

Data file line 10


This now works because a new iterator object is allocated when `ReadFile.__iter__` is called