# Intro to File Management

Working with files is crucial to write any meaningful software. The good news is that it's super simple to do file management in Python. For example, this is how you can read a file:

```python
f = open('data_file.csv')
f.read()
```

Incredibly simple, right? Python's simple [`open`](https://docs.python.org/3/library/functions.html#open) function is the key mechanism that will rule all file management. Throughout this lesson, we'll keep seeing more details about it. From the code above, what you can see is that it takes the file name (`'data_file.csv'` in that case) and returns that `f` thing. Which is actually a "file object" that will be the interface used with the files. Again, we'll keep exploring details about it.

Now, there's bad news: when working with Files, it's not only Python the one we have to deal with

### The Good, the Bad and the Ugly

As you just saw in our previous example, it's a bliss to work with files in Python (reading, writing, etc). It's sincerely a pleasure to do file management, compared to other languages (Java, I'm looking at you). The problem is that Python isn't the only actor to consider when dealing with files. Until today, our programs depended only on us: if we wrote our code correctly, it'd work correctly. If we had bugs, it'd fail. But it was entirely our responsability.

Starting with files, we'll need to start dealing also with:
* The Operating System (the "watchman" of files), and
* The user (ugh! 😖) and the environment.

### The Bad: Operating System

The Operating System is very protective of its files (and other I/O). After all, compromising files can cause severe damage and security threats to the overall system. The OS is in charge of controlling who (what program) can read which files, and at what extent. How much you can write, what permissions you can change, etc.

The OS forms a barrier between our Python code and the actual files. That means that all the code that you write in Python, is actually being watched and supervised by the Operating System. For example, when you want to read from a file, Python is not directly accessing the files, but it's asking the OS to do it for us. The operating system is doing the file management under the hood, and presenting the results to Python.

![Python and OS Interaction](https://docs.google.com/drawings/d/e/2PACX-1vS8_ENvTn7GpurxVKLSFy0kShHudStcz61nniB3FTEzeOOkHEdZ8dzAC86fimnjI9Ep49LRKEOY2gk5/pub?w=960&h=720)

On top of that, there are **different operating systems** to deal with. As you know, Python is a very versatile programming language, and you can write software for Linux, Windows, OSX, Raspberry Pis, etc. So things tend to work differently for each Operating System. For example, permissions and owners in Windows and Linux work really differently.

**Conclusion:** You can have a perfectly written Python program, but when you're trying to access a file, it has the wrong permissions and the OS doesn't grant you access. The result: your code will fail. Not because your code didn't work, but because of an external factor: The Bad, the OS.

### The Ugly: User and environment

When working with files, there's usually a "user" involved. The one deciding what files to read, or typing the name of the new file to save the results of that report that you've just perfectly written. There's also the OS user, the one with certain privileges, hard drive space assigned, etc. The user will screw things up most of the time. So you must be really careful and perform every check possible before interacting with your user.

Even if you can deal with wild users, you still have to deal with the environment: [Cosmic rays](https://spectrum.ieee.org/computing/hardware/how-to-kill-a-supercomputer-dirty-power-cosmic-rays-and-bad-solder), [Bit rot (old drives)](https://superuser.com/questions/1075831/can-data-on-a-hard-disk-degrade-without-windows-warning-me-that-this-has-happene) or even [loud noises](https://www.youtube.com/watch?v=tDacjrSCeq4) (please don't yell at your hard drives 🙏, they get scared).

**Conclusion:** You can have another perfectly working program, this time with the OS at your side, but the user might just choose the wrong directory to save their progress (one where they don't have write privileges for example). Or even if you have a smart user, and a friendly OS, a cosmic ray or a yelling engineer can screw over your entire program.

### Final conclusion

File management in Python is simple. But we need to be **extremely cautious**, and sadly, **brace for impact** before we even start writing code: we need to be ready to handle every possible error, because there will be errors, I guarantee you...

### Important Concepts

There are two important concepts to understand when dealing with files:
* Open mode
* File cursor (or pointer)

##### Open Mode
The "Open Mode" is something you decide when you're opening a file, and basically states if you want to read or to write to a file (or both). For example, you can open a file "only for reading". If you try to perform an operation not contemplated by the open mode, you'll get an error. For example, I'll open a file for reading, and try to write to it:

In [12]:
f = open('alice.txt', 'r')

In [13]:
f.write('Hello World')

UnsupportedOperation: not writable

As you can see, an `UnsupportedOperation` exception was raised. We tried to write in a file opened for reading. It was an invalid mode.

You can already guess that the `open` function we introduced before takes a second parameter, which is actually the "open mode".

```python
open(file_name, [open_mode], ...)
```

The most common options are:

* `'r'`: Open the file for reading
* `'w'`: Open the file for writing (**WARNING,** erases the contents of the file).
* `'a'`: Open for writing, but places the _cursor_ at the end ("appending").
* `'r+'`: Read and write, but places the _cursor_ at the beginning.

By default, `open_mode` will be `'r'`, which is the safest choice. Both the `'r+'` and `'a'` modes mention a "_cursor_". That's our second concept:

###### Cursors (or pointers)

We can think a file as a big string of characters (also bytes), suppose we're working with a CSV file with the following content:

```
InvoiceNo,StockCode,Description,Quantity
536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6
536365,71053,WHITE METAL LANTERN,6
536365,84406B,CREAM CUPID HEARTS COAT HANGER,8
536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6
```

We always need to have a notion of "position". That's the idea of the cursor or pointer. You'll open a file and Python will place automatically the cursor at some given position. For example, we open it with mode `'r'`, the pointer is placed at the beginning:

![pointers](https://user-images.githubusercontent.com/872296/37557772-73301a5a-29e8-11e8-91ba-29836fcd20db.png)

Now we can perform different operations (reading, writing, etc) which will be performed based on the pointer. For example, we can decide to read 10 characters: Python will take the current position of the pointer, read the 10 characters, and place the pointer after the last one read.

![pointers](https://user-images.githubusercontent.com/872296/37557778-858f333e-29e8-11e8-8a4b-a4ddcf466a2d.png)

In [17]:
f = open('products.csv', 'r')
f.read(10)

'InvoiceNo,'

If we then decide to read 5 more characters, Python will start from that previous position of the pointer and after reading, it'll keep the pointer at the last character again:

![pointers](https://user-images.githubusercontent.com/872296/37557785-91549010-29e8-11e8-98a9-8a1b831a5849.png)

In [18]:
f.read(5)

'Stock'

There are two methods to manage the pointer: `tell` will tell you the position of the pointer, and `seek` will let you move that pointer to whatever position you want. For example, after reading 10 characters first, and 5 characters in our second operation, we should expect this file's pointer position to be `15`:

In [19]:
f.tell()

15

I could now "reset" the pointer and move it back at the beginning (similarly as how it started when we just opened the file)

![pointers](https://user-images.githubusercontent.com/872296/37557772-73301a5a-29e8-11e8-91ba-29836fcd20db.png)

In [20]:
f.seek(0)

0

And then try to read 8 characters:

![pointers](https://user-images.githubusercontent.com/872296/37557794-a15fab2a-29e8-11e8-9130-a21fdb95d9b0.png)

In [23]:
f.read(8)

'InvoiceN'

### More reading

You've already seen the `read()` method a couple of times already. But let's dig a little bit deeper. `read()` doesn't require any parameters; by default, it'll read the entire content of the file:

In [26]:
f = open('alice.txt')

In [27]:
content = f.read()
print(content)

ALICE’S ADVENTURES IN WONDERLAND

CHAPTER I. Down the Rabbit-Hole

Alice was beginning to get very tired of sitting by her sister on the
bank, and of having nothing to do: once or twice she had peeped into the
book her sister was reading, but it had no pictures or conversations in
it, ‘and what is the use of a book,’ thought Alice ‘without pictures or
conversations?’

So she was considering in her own mind (as well as she could, for the
hot day made her feel very sleepy and stupid), whether the pleasure
of making a daisy-chain would be worth the trouble of getting up and
picking the daisies, when suddenly a White Rabbit with pink eyes ran
close by her.



This is of course **VERY DANGEROUS** if you have large files.

`read()` takes an optional parameter that's the number of "characters" (or bytes) to read. I've used "characters" and "bytes" interchangeably but they're not the same thing. And actually it depends on the type of file we're reading, and the Python version. It's a big mess. So for now, we can just assume it's "the number of characters":

In [30]:
f = open('alice.txt')

In [31]:
f.read(20)

'ALICE’S ADVENTURES I'

There's also a `readline()` method, that reads an entire line **of text** until it finds a new line (`'\n'`). Let's see an example with our previous `'alice.txt'` file, just as a reference, these are the first lines of the file:

```
ALICE’S ADVENTURES IN WONDERLAND

CHAPTER I. Down the Rabbit-Hole

Alice was beginning to get very tired of sitting by her sister on the
bank, and of having nothing to do: once or twice she had peeped into the
book her sister was reading, but it had no pictures or conversations in
it, ‘and what is the use of a book,’ thought Alice ‘without pictures or
conversations?’
```

In [32]:
f = open('alice.txt')

In [33]:
f.readline()

'ALICE’S ADVENTURES IN WONDERLAND\n'

In [34]:
f.readline()

'\n'

In [35]:
f.readline()

'CHAPTER I. Down the Rabbit-Hole\n'

In [36]:
f.readline()

'\n'

As you can see, the `readline` finds all the lines divided by the newline char, and returns them (including `'\n'`).

Reading one line at a time is a pretty common pattern, and there's a better way...

### File objects and the iterator pattern

When you open a file with the function `open`, the result is a "file object". It's a full featured Python object that has a few methods and attributes (`read`, `readline`, etc). But as many other Python objects, it also holds the concept of "iteration" (as for example, lists or dicts). That means that you can just "iterate" over that object with a regular for loop. Each "pass" of the iterator will give you a new line (similar to the `readline()` method).

In [37]:
f = open('alice.txt')

In [38]:
print(f)

<_io.TextIOWrapper name='alice.txt' mode='r' encoding='UTF-8'>


In [39]:
for line in f:
    print(line)

ALICE’S ADVENTURES IN WONDERLAND



CHAPTER I. Down the Rabbit-Hole



Alice was beginning to get very tired of sitting by her sister on the

bank, and of having nothing to do: once or twice she had peeped into the

book her sister was reading, but it had no pictures or conversations in

it, ‘and what is the use of a book,’ thought Alice ‘without pictures or

conversations?’



So she was considering in her own mind (as well as she could, for the

hot day made her feel very sleepy and stupid), whether the pleasure

of making a daisy-chain would be worth the trouble of getting up and

picking the daisies, when suddenly a White Rabbit with pink eyes ran

close by her.



### Closing the files

The operating system needs to allocate resources to keep up with the files you open. Remember that the OS needs to "monitor" all our files-related activities; so every time you open a file, the OS needs to allocate resources for that "monitoring" task.

If you open too many files, and don't close them, your system might crash, or your data might be lost (the OS also flushes data from memory to the files when you close them).

![Closing files](https://docs.google.com/drawings/d/e/2PACX-1vRR4bnLVVzZkFhyaRG3I5BO402tJ__yZGVP64TP-n3jV9svUeVjSzZuVPu-25sLS45pSLD5IIb22pu-/pub?w=960&h=720)

Closing a file is simple, just use the `close()` method of the file object returned by `open()`:

In [40]:
f.close()

In [41]:
f

<_io.TextIOWrapper name='alice.txt' mode='r' encoding='UTF-8'>

Any successive operation that we try to perform with a closed file will raise an exception:

In [42]:
f.read(5)

ValueError: I/O operation on closed file.

In [43]:
f.closed

True

### Full File Object API

Finally, you can check the docs to learn more about the file object: https://docs.python.org/3/library/io.html#io.IOBase

You can see interesting methods like `flush()`, `readable()`, etc.