# Seminar 02: Reading files like CAT




## Introduction

Author: Jurre Hageman

For this lesson, we will write a small program that implements the command-line program "cat".
cat is a frequently used command on Unix-like operating systems. It can display text files, combine copies of different text files and create new text files.

We will write a small Python program that mimics the text display feature of the cat command.
This excersise will give you a general overview of the informatics 1 course.
Note that we do not expect that at this point you will be able to write all the code yourself.
It will show you the end-level of what you should be able to code at the end of the course.

So no worries (yet). We will guide you through this lesson!

Let's first think of what the program should do:
- it should open a file
- it should loop through every line
- it should print the content of the line to the screen
- it should close the file

Before starting to write the cat program, let's first write a message to the screen.
In python you can write a message to the screen using the print function.
The print function needs something to print. It needs an argument.
We can "feed" the print function with a message like "Hello".
Note that we use quotation marks surrounding Hello so that Python knows that this is a text string. 
We call this a string literal.

Open IDLE3 and repeat the code below:


In [1]:
print("Hello")

Hello


We can also store content as a variable:

In [2]:
mssg = "hello"
print(mssg)

hello


The variable mssg is now assigned to the string literal "hello".
We can also create another variable name that stores my name.

In [3]:
name = "Jurre"
print(name)

Jurre


We can combine them to print the message and the name:

In [4]:
print(mssg, name)

hello Jurre


## Excercise I. Print message to the screen

Create two variables that store your first name and last name. Print them using the print function after the message "Hello". The output should be like: Hello Jan Janssen.
Use IDLE3 to write this script.

In [5]:
# generate two variables below: firstname and lastname. Print your name.
# firstname =
# lastname
# print

Nice work! Next we will write some code to print multiple items using a loop.
There are different types of loops. We will explore a for loop in the next section.
The for loop in Python is used to avoid repetition of code.
Suppose we have a collection of different items stored. We can store them in a list:

In [6]:
items = ["This", "is", "a", "list", "with", "each", "item", "being", "a", "string"]
print(items)

['This', 'is', 'a', 'list', 'with', 'each', 'item', 'being', 'a', 'string']


A text file is also a collection of items. It is a collection of lines containing characters.
To loop through the collection we can use a for loop. Each item will be printed:

In [7]:
items = ["This", "is", "a", "list", "with", "each", "item", "being", "a", "string"]
for item in items:
    print(item)

This
is
a
list
with
each
item
being
a
string


Item is now a placeholder that is overwritten for each consecutive loop. The first loop item refers to "This", the second loop to "is" etc.
Note that item (singular) differs from items (plural). Items refers to the complete list while item is the placeholder in the for loop refering to each item of the list. In the next lessons, you will learn more about lists and for loops. For now this will be enough.

The next thing we would like to do is to open a file.
Files can be opened in Python using the open command.
We have a text file in the directory that is named "input.txt". Make sure that you download this file in your working directory (the same directory in which you store your python file). 
Using the following code we can generate a file object with the open file:

In [1]:
file_name = "input.txt"
file_object = open(file_name)
print(file_object)

<_io.TextIOWrapper name='input.txt' mode='r' encoding='UTF-8'>


Printing the file object gives a bit of intimidating result. Do not let it intimidate you. It's just a file object in read mode with a certain encoding that can display the content.
To display the content we can loop through the file object in the following manner:

In [13]:
file_name = "input.txt"
my_file = open(file_name)
for line in my_file:
    print(line)
my_file.close()

This is a text files.

It has multiple lines...

but not a very interesting content.

End of message.



The keyword "line" is now a placeholder for every line of the file. The first loop it refers to the first line, the second loop to the second.
Note that a line break is introduced between each line. This is caused by the print function which normally introduces a line break after each printing event. This can be avoided by using the end='' statement:

In [10]:
mssg1 = "Hello"
mssg2 = "World"
print(mssg1)
print(mssg2)
print(mssg1, end='')
print(mssg2, end='')

Hello
World
HelloWorld

## Excercise II. Loop through a file

Now open the file <a href="L2_sources/excercise2.txt">exercise2.txt</a> in assign it to a variable.
Use a for loop to loop throug each line. Avoid linebreaks between the lines using the end='' argument in the print function. Use IDLE3 to write the script.

In [11]:
#open a file object of the excersise2.txt file
#loop through each line here
#print the content of each line and avoid line breaks
#close the file

Well done!
So far we have opened a file and displayed its content.
You may also want to write something to a file. 
To do so, first we have to open a new output file:

In [27]:
file_name = "output.txt"
output_file_obj = open(file_name, "w")
print(output_file_obj)
output_file_obj.close()

<_io.TextIOWrapper name='output.txt' mode='w' encoding='UTF-8'>


We have generated a file object and assigned it to the variable output_file.
The "w" argument informs the Python interpreter that the file object should be opened in write mode.
The default is read mode ("r") which does not need to be specified.
To write something to the file we can use the print function with the file = argument:

In [28]:
file_name = "output.txt"
output_file_obj = open(file_name, "w")
print("This is written to the file", file=output_file_obj)
print("End of message", file=output_file_obj)
output_file_obj.close()

Now we have a file in the same directory as the Jupyter file that contains the text file.
We can open it to check if the contant has been written to the file:

In [29]:
file_name = "output.txt"
my_file = open(file_name)
for line in my_file:
    print(line, end='')
my_file.close()

This is written to the file
End of message


Now we can read and write text files we can move one step further. 
We will open a source file, open an output file in write modus and for each line add a line number.
If we take the previous example were we used a list of strings, we can add line numbers in the following way:

In [30]:
items = ["This", "is", "a", "list", "with", "each", "item", "being", "a", "string"]
line_number = 1
for item in items:
    print(line_number, item)
    line_number = line_number + 1

1 This
2 is
3 a
4 list
5 with
6 each
7 item
8 being
9 a
10 string


The line_number variable is set before the loop.
Note that this variable is overwritten at each consecutive loop and incremented by 1.
This is why it increments each loop.

## Excercise III. Read a fasta file and write content to a new file

Now we come to the final excersise:
The file: <a href="L2_sources/dna_sequence.txt">dna_sequence.txt</a> is a FASTA file. The FASTA format is a text-based format for representing nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes. 
This is an example: <br>
/>My_dna_sequence <br>
atcaggatggggatggagagaggaccaaccac <br>
acagagtagagagagaggagagagacaagata <br>
tatatttttatacccaggagagacagatagag <br>

To open the file one could use the following code:

In [31]:
file_name = "dna_sequence.txt"
open_file = open(file_name)

We can check if the file object is created:

In [32]:
print(open_file)

<_io.TextIOWrapper name='dna_sequence.txt' mode='r' encoding='UTF-8'>


Use IDLE3 to write a program that:
- reads the content of the file
- writes the content of the file with line numbers before each line.

Well done!
You did a great job. Remember that this introduction lesson will give you a global overview of informatics 1.
At this point, we do not expect you to be able to write this code yet. It shows you the end-level of informatics 1!

## Solutions

Solutions for the excercises are given  below. Programming is like playing the piano: excercize, excercize, excercize. You learn most from typing each single word yourself. If you have no clue what to do you can have a look, but only after your first and second try!

<p><a href="L2_solutions/excercise01.py">excercise01.py</a></p>
<p><a href="L2_solutions/excercise02.py">excercise02.py</a></p>
<p><a href="L2_solutions/excercise03.py">excercise03.py</a></p>
