# Input and output (IO)
Often, you want to read data in your program. When you start a Python session, your data will not be present yet in collections such as strings, lists, tuples amd dictionaries. These collections only live in memory. Not in a persistent state. To create such collections, you often start with reading data from a persistent state (files). This lesson, we will be dealing with reading from files and writing to files. Python can open many different file types. This course will concentrate on ASCII files (also called plain text files). A text file, is a file in which each byte represents one character according to the ASCII code. There is no layout such as bold, superscript etc. Remember that we already opened text files in lesson 2. Let's start with a short summary from that lesson.

The most basic file type is the text file or ASCII file. This is a file that you can open with a text-editor and yields readable text:

In [None]:
# you do not need to understand the code below yet.
import platform
os_type = platform.system()
if os_type == "Windows":
    !more file1.txt
else: # must be Unix-like, thus cat is probably installed.
    !cat file1.txt

In the code above we used the command `more` (installed on Windows) or the command `cat` (installed on Unix-like systems) to read the content of the file. We can do that with Python too:

In [1]:
filename = "../Opdrachten/Data/file1.txt"
file_object = open(filename)
print(file_object)
file_content = file_object.read()
print(file_content)

<_io.TextIOWrapper name='../Opdrachten/Data/file1.txt' mode='r' encoding='UTF-8'>
This is a text file.
If you open it with a text editor, you will be able to read the text.
End of message...


- The first line specifies the path to the filename. Because the file is in the same directory as this notebook file, we only need to specify the name of the file.
- The `open` function will return a file object. The file content is not read yet.
- The file object is printed to show that the content of the file is not read yet in order to save memory.
- The `read` method of the file object is called to read the content of the file and the content is returned in a multi-line string. The variable `file-content` is assigned to this string.
- The string is printed



## Reading files in streaming mode
While our previous method works, it is often not adviced to work with large files this way. The file-object.read() method allocates a lot of memory. If you use large files, it is better to work in streaming mode. Let's repeat the previous example in streaming mode:

In [None]:
filename = "../Data/file1.txt"
file_object = open(filename)
for line in file_object:
    print(line, end="")

Note that we use a for loop to read through the content of the file. You can only do this once. If you try to do it again you will observe that the file object is exhausted:

In [None]:
for line in file_object:
    print(line, end="")
print("This is used to show that this cell is executed")

So if you want to print the content of the file again, you need to create a new object using the open function:

In [None]:
filename = "../Data/file1.txt"
file_object = open(filename)
for line in file_object:
    print(line, end="")

To show you the difference in memory usage, I will use some code that you do not need to understand yet, but it does show the amount of memory used. First, reading the content of the file using file-object.read() method:

In [None]:
import sys
my_file = open('file1.txt')
content = my_file.read()
print(sys.getsizeof(content), 'bytes')

Now reading the same file in streaming mode:

In [None]:
my_file = open('file1.txt')
for line in my_file:
    print(sys.getsizeof(line), 'bytes')

As you can see, processing a file line-by-line allocates less memory. Because Bio-informaticians often work with very large files, processing of files line-by-line is often preferred.

## File modi
There are different modi a file object can be in:
- read or 'r'
- write or 'w'
- append or 'a'

The are more modi but we will concentrate on these three.

Read is default so you do not have to explicitly define it, although it does not hurt if you do:

In [None]:
my_file = open('file1.txt', 'r') # explicit in read mode
for line in my_file:
    print(line, end='')

If you only read a file, closing it is not very important. Python will close it for you when the script stops. It is, however good practice to close your file after use and it is **very important** when you write stuff to files.

In [None]:
my_file = open('file1.txt', 'r') # explicit in read mode
for line in my_file:
    print(line, end='')
my_file.close() # explicitly close your files.

## Write data using the print function
You can write data to a file using the `print()` function

In [None]:
my_file = open("hello.txt", "w") # write mode
print("hello", file=my_file)
print("This is used to show that this cell is executed")
my_file.close()

As you can see, the string `hello` is not printed to screen but it is written to the file hello.txt:

In [None]:
for i in open('hello.txt'):
    print(i, end='')

Note that the file modus used is write mode. That means that the content of the file will be overwritten each time the code is executed:

In [None]:
my_file = open("hello.txt", "w") # write mode
print("bla bla", file=my_file) # different string is written to the file
my_file.close()

for i in open('hello.txt'):
    print(i, end='') # hello is replaced by bla bla

The `file=` parameter in the print function is often used to write messages to a log file. It is usefull to write to a log file in append mode:

In [None]:
seq = "gatc"
log = open("log.txt", "a")
print("sequence converted to:", seq, file=log)
log.close()

In [None]:
# Read the file:
for line in open("log.txt"):
    print(line, end="")

In [None]:
# Add a new log entry:
seq = "cccc"
log = open("log.txt", "a")
print("sequence converted to:", seq, file=log)
log.close()

In [None]:
# Read the file again:
for line in open("log.txt"):
    print(line, end="")

## Write data using the file_object.write() method
Alternatively to the print function, you can also write to a file using the `file_object.write()` method. Here is an example:

In [None]:
sequences = ["GAATC", "CAACC", "GAGGG", "TTTTT", "AAAA"]
seq_file_obj = open("seq.txt", 'w')
for seq in sequences:
    seq_file_obj.write(seq)
seq_file_obj.close()
print("done")

In [None]:
# And read the content of the file:
for line in open("seq.txt"):
    print(line, end="")

Oops, thats not what we wanted. Of course we want a newline "\n" after each sequence. The print function adds a newline by default. We can add one here as well:

In [None]:
sequences = ["GAATC", "CAACC", "GAGGG", "TTTTT", "AAAA"]
seq_file_obj = open("seq.txt", 'w')
for seq in sequences:
    seq_file_obj.write(seq + "\n")
seq_file_obj.close()
print("done")

In [None]:
# Read the file:
for line in open("seq.txt"):
    print(line, end="")

## Using with() for reading and writing files
When we open a file for writing it is very important to explicitly close the file, forgetting to do so might cose you to lose data. Luckily, Python has a builtin mechanisme that will close the file automatically as soon as you are done.

This is done with the `with()` statement. It is a content management function that changes its behavior based on where you invoke it. For now, we will just focus on the use case of reading and writing files.

To read or write a file using the `with()` statement we place the `with()` in front of the `open()` command, and we give this statement a name using the `as` keyword. From then on, we use the name we just provided to do the reading or writing. As soon as we are done with the file. `with()` will close the file for us.

In [None]:
# file reading without with()
file_reader = open('a_file.txt', r)
for line in file_reader:
    print(line)
file_reader.close()

# file reading using with()
with open('a_file.txt', r) as file_reader:
    for line in file_reader:
        print(line)
# no need to close this last open() file, with() will deal with the closing for us
