# Files

Python is very populate when it comes to working on text files. Many developers prefer to use Python if it comes to working on text files. 

Python offers severl file handling libraries, and in many scenarious you will use a the `pandas` package to work with files. However it is important to understand the core fundamentals of file handling using Python. This module does just that. 

## Reading a File

Let's get started with reading an already existent file. We have a file located at `data/lorem.txt`. We will read this file, and simply print the contents of the file. 

In [3]:
file = open('test.txt') 

for line in file: 
    print(line) 

this is a sample text file

line2


We use a built-in function called `open` to open a file. The file to open is specified by the parameter passed. The passed parameter must map to a file that is actually present on the filesystem. 

The above output looks like one big block of text, but we read this output line by line. If we wanted to find out where the line breaks were, then let's add something extra to the print statement. This way we will know how Python read the file. 

In [4]:
file = open('test.txt') 

for line in file: 
    print('>>>',line)

>>> this is a sample text file

>>> line2


Now it is clear. Everywhere we see a `>>>` that is where a new line starts. Python is interating through the for loop once per occurance of `>>>`

Interesting, let's see if we can read the same file twice? 

In [10]:
file = open('test.txt') 

for line in file: 
    print('>>>',line)

    
for line in file: 
    print('---',line)

>>> this is a sample text file

>>> line2


Uh! So we wrote the for loop twice the read the file twice, but it does not seem like the second for loop executed at all. 

This is true, as when the for loop is interating the file pointer is moving forward; one line at a time. When the first for loop completes, the file pointer has reached the end of the file. This is also the time when the for loop breaks as there is nothing more to read.

When the second for loop is run on the same file object; where the file point is already at the end of the file; there is nothing more left to read. The second for loop does not get any additional data, so it does not iterate at all. The for loop simply quits as the file has alredy been read. 

It is important to keep this in mind where performing file operations. We cannot re-read something that is already read; so we must appropriate manage and handle what is read, and may be assign and keep what is read in a variable for until the time we need it. 

## Closing a File

In the above example, even though we finished reading the file and we have read it all the way to the end; the file is still open. We must explicitely close a file that we open. It is impotant to close a file as keeping a file open consumes siginicant amount of system resources. Also, operating systems have a limit on the number of files you can keep open at any point in time. This limit is realatively high, but if you touch this limit, you will no longer be able to any any more files until you have closed some of your already open files. 

In [11]:
file.close()

The `close()` function can be used on the file object to close an open file. You can invoke `close()` on an already closed file without any problems. However, once you close a file, you will not be able to read it; even if you hav not read the first time around.

In [12]:
file = open('test.txt') 

file.close()

for line in file: 
    print(line)

ValueError: I/O operation on closed file.

In the above example, we opened a file and then immediately closed it. After closing the file we attempted to read all lines in the file. What we land up with is an `I/O operation` error. This is so because we cannot read a file that is already closed.

## File Reading Modes

Python supports specifying the mode in which you wish to open the file. The following are the supported file modes:
- "r", for read only access.
- "w", write only access, but creates a new file by overwriting any existing file
- "x", write only access, but fails if a file with the same name is already present
- "a", write access to append to a file. Also creates a new file if the same does not exist
- "r+", Open for both read and write access

In addition to the above parameters, we can also in addition specify `b` or `t` to instruct whether the file should be opened in binary more or text mode. 

If we specify mode as `rb` then it means the file should be opened for read only access, but in binary mode read. And if we specify `rt` it means we open it for read only access but text based reading. If we specify just `r` then it essentially defaults to `rt`. All files are opened for text mode operation if no mode is specifically specified. 

Let's look at an example.

In [13]:
file = open('test.txt', 'r') 

for line in file: 
    print(line)

this is a sample text file

line2


As we can see, we can specify the mode as a second parameter to the `open` function. If we don't specify a mode, the file opens in the mode `r` by default; which is for purpose of reading only. We won't be able to write to such a file or modify the contents of such a file. 

If you want to open the file for writing, then do not accidentially use `w` model. You must at all points in time use the `a` or `r+` mode; unless you definitely want to use the `w` mode. 

## With Statement

In most cases, you will use the with-as statement to work on files. If you want to open a file, do some work on it and then forget about it; then with-as is your best choice. The with-as will automatically close the file once you are done working on it.

In [14]:
with open('test.txt') as file:
    for line in file: 
        print(line)

this is a sample text file

line2


In [19]:
file = open('test.csv')
for line in file:
    print(line)

1,2,3

4,5,6

7,8,9



The above block of code does exactly the same thing as before, but is a safer implementation as we do not have to explicitely close the file. 

## Creating a File / Writing to a File

We can create a new file by using the `w` mode. This mode will also create a new file for writing. If a file with the same name is already present, the same will be overwritten. 

In [20]:
with open('test.txt', 'w') as file:
    file.write('Hello World')

So that's it. You can see no output, but I can assure you that have now created a new file and written a single line `Hello World` in the file. Go back to the file list on this notebook and see if you can view the newly created file. The file should be in the same location as this program file and should be named `hello.txt`. You can open the file to indeed confirm that `Hello World` is written within it. 

Now, let's see what happens if we perform the write operation in a loop. 

In [21]:
for x in list(range(10)):
    with open('test.txt', 'w') as file:
        file.write('Hello ' + str(x))

While again there is no output, but you can be rest assured that your file got written 10 times. You used the `w` mode, so each iteration of the for loop should completely overwrite the file. 

If you open the file `hello.txt`, you will see that the contents of the file read

``` text
Hello 9
```

Instead of containing 10 lines of `Hello x`, you actually have just one line with the last value of `x`, which is when `x = 9`.

This is expected behaviour as the write mode was chosen to be `w`

Let's try with another file mode. What would happen if we used the write mode as `a`; which means we are opening the file for append mode. 

In [22]:
for x in list(range(10)):
    with open('test.txt', 'a') as file:
        file.write('Hello ' + str(x))

This time we created a file with name `tmp.txt`. If we open this file, we can see that the output is

``` text
Hello 0Hello 1Hello 2Hello 3Hello 4Hello 5Hello 6Hello 7Hello 8Hello 9
```

So what happened here? The only thing we changed is the file mode; wherein we used `a` instead of `w`. So each time we opened the file, the file got opened for appending instead of overwriting. 

We can see that Hello from each iteration of the for loop has got appended onto the file. 

Now what if we wanted the hello's to come on different lines? 

In [11]:
for x in list(range(10)):
    with open('tmp2.txt', 'a') as file:
        file.write('Hello ' + str(x) + '\n')

We have now created a new file called `tmp2.txt`. If we open the file, the contents are as follows:

``` text
Hello 0
Hello 1
Hello 2
Hello 3
Hello 4
Hello 5
Hello 6
Hello 7
Hello 8
Hello 9

```

Each iteration of the for loop now got written to a new line. This is so as we put a `\n` at the end of each line, and a `\n` is an escape sequence character that represents in a new line. 

We are not covering the escape sequence characters here, as the escape sequence characters in Python remain to be the same as other languages. 

## Writing Multiple Lines

In a single go we can write multiple lines to a file by using the `writelines` function. Let's take a look at this. 

In [12]:
lines = []
for x in list(range(10)):
    lines.append('Hello ' + str(x) + '\n')
    
with open('hello.txt', 'w') as file:
    file.writelines(lines)

In the above example, we created a list called `lines`. We filled in values into th list, and then passed the list of `lines` into the `writelines` function. In a single function call, all the strings present within the list got written to the file. 

If we now open the contents of `hello.txt`, we should find this inside

```text
Hello 0
Hello 1
Hello 2
Hello 3
Hello 4
Hello 5
Hello 6
Hello 7
Hello 8
Hello 9

```

## Deleting a File

We can delete a file in Python by using the `os` module. Let's delete some of the files we created so far. It will also help us clean up the directory that we messed up. 

In [13]:
import os
os.remove('hello.txt')
os.remove('tmp.txt')
os.remove('tmp2.txt')

Removing a file is a very simple operation provided by the os package. After executing this, you will notice that the previous created files should now have been deleted. 

Also, it is important to handle errors here. If we call the `remove` function on an inexistent or already deleted file, it will throw an error. We can see this in the below example. 

In [14]:
import os
os.remove('xyz.txt')

FileNotFoundError: [Errno 2] No such file or directory: 'xyz.txt'