# Lesson2

## Simulating Cat

This lesson, we will simulate the `cat` command on Linux (also installed by default on Mac). Cat is used to concatenate text files but also often used to display the content of a text file to the screen. A similar command is the `less` command on Linux or the `more` command on Windows. Whether `less` is `more` is up to you to decide...
Anyway, my `cat` could not care `less`:
![my cat](figs/fig1.png)


As mentioned before. Cat can display the content of a text file:
![text file](figs/fig2.png)

Cat displaying the content of a text file on Ubuntu Linux running as [Windows subsystem for Linux (WSL)](https://docs.microsoft.com/en-us/windows/wsl/install-win10). 

The aim of this lesson is to simulate this behaviour with Python.
The required steps are:
- Use the `open`function to generate a file object.
- Use a `for` loop to iterate through the individual lines.
- Use the `print` function to display the lines (strings) to screen.
- Close the file object.

## Strings

During this  lesson, we will make use of another data type: `strings`. Some information about `strings`:
- Strings are ordered sequences of Unicode characters.
- Strings are a collection of characters.
- Strings can be created using single quotes or double quotes.
- Strings are iterables which means that it can take sequential indexes starting from zero.
- Strings have a zero based index in Python  
Some examples:

In [2]:
name1 = "Jan" # double quotes
name2 = 'Pien' # single quotes
message1 = "Let's code some Python code" # note the use of a single quote in the string
message2 = 'Let\'s code some Python code' # the \ character serves as an escape character

### String methods

Since Python is a real object oriented programming language, all data in Python are represented as objects.
In terms of programming, an object is a structure with data (properties or attributes) and methods (functionality). 
We will explore these concepts a little with string objects.

What is a string method?
A function is a piece of code that is accessible through its name.
You can call a function and cause its code to be executed.
Have a look at the print function for example:

In [4]:
print("Hello")

Hello


Here the `print` function accepted the string `Hello` as an `argument`. The execution of the `print` function resulted in the display of the string `Hello` on screen.

Methods are functions too but they are attached to an object. They can be called using the object.method( arguments ) syntax.
We can explore the methods attached to a string object using the `dir` function:

In [8]:
# dir(str) # not executed in order to save space.

The string methods that we will cover here are:
- strip
- rstrip
- lstrip
- split
- upper
- lower
- find
- index
- isnumeric
- startswith
- endswith

Some examples:

In [18]:
my_string = "Hello World"

print(my_string)
print(my_string.strip("Hd")) # strips the argument from left AND right side. Default is whitespace.
print(my_string.lstrip("H")) # strips the argument from left. Default is whitespace.
print(my_string.rstrip("H")) # strips the argument from right. Default is whitespace.
print(my_string.split()) # splits on argument and returns a list (next lesson more on lists). Default is whitespace.
print(my_string.upper()) # returns string in upper case
print(my_string.lower()) #  returns string in lower case
print(my_string.find("l")) #  returns first occurance position. More on index positions next lesson. returns -1 if substring not found.
print(my_string.index("l")) #  returns first occurance position. More on index positions next lesson.
print(my_string.isnumeric()) #  returns True if string can be converted to integer. Else False is returned.
print(my_string.startswith("H")) #  returns True if string starts with substring. Else False is returned.
print(my_string.endswith("ld")) #  returns True if string ends with substring. Else False is returned.

Hello World
ello Worl
ello World
Hello World
['Hello', 'World']
HELLO WORLD
hello world
2
2
False
True
True


## For loop

Strings are collections. It is a collection of Unicode characters. Any collection in Python is also an iterable.
That means that you can iterate through them using a for loop. A for loop is used when you have a block of code which you want to repeat a fixed number of times. Let's have a look at an example:

In [19]:
my_string = "Hello World"

for letter in my_string:
    print(letter)

H
e
l
l
o
 
W
o
r
l
d


As you can see. The for loop is a loop that runs a finite number of times.
When the end of the sequence is reached, the for loop stops.
Let's digest this a bit:
- the `for` keyword is used to clarify the type of loop (later we will cover a while loop).
- The `letter` is a placeholder. A temperary variable that is overwritten each loop. You can name it whatever you like but please use a logic name. 
- The `in` keyword in Python has two different purposes:
    - The `in` keyword is used to check if a value is present in a sequence.
    - The `in` keyword is also used in a for loop for readability purposes.
- my_string points to the string `"Hello World"` which is an iterable because it is a collection.
- the `:` specifies that a code block is beginning (remember that you want to repeat a block of code a number of times). As you can see, the code after a `:` is indented (4 spaces is default). 

## The `range` function

The `range` function returns a sequence of numbers. The default starting point is 0 by default. It increments by 1 by default. It stops before a specified number. The `range` function returns a range object:

In [46]:
print(range(5))

range(0, 5)


For demonstration purposes, we will create a `list` from the returned range object. Do not do this to iterate over a range object with a for loop because it will allocate a lot of memory. 

In [4]:
print(list(range(5)))

[0, 1, 2, 3, 4]


As you can see in the example above, only one argument was used (5). The default starting point is 0. The default step size is 1. Now with two arguments:

In [13]:
print(list(range(2, 5)))
print(list(range(2, 4)))

[2, 3, 4]
[2, 3]


Thus, the first argument is the starting point.  
The second argument is the stop point (not included). 
The default step size is 1.  
Now with 3 arguments:

In [16]:
print(list(range(0, 10, 2)))
print(list(range(10, -1, -2))) # now -1 is the stop point.

[0, 2, 4, 6, 8]
[10, 8, 6, 4, 2, 0]


In the example above, the first argument is the starting point.  
The second argument is the stop point.  
The third argument is the step size.

> The Python `range` function works only with integers. It does not support floats. If you need a range of floats, you will need a library (numpy, beyond the scope of this course). 

## For loop and range combined.

The for loop is often combined with the range function in order to iterate over a sequence of numbers:

In [17]:
for num in range(5):
    print(num) 

0
1
2
3
4


> Note that you can iterate directly on a range object. Do not make a list from it as it will allocate a lot of memory!

## File IO

In [54]:
import platform
os_type = platform.system()
if os_type == "Windows":
    !more test.txt
else: # must be Unix-like, thus cat is probably installed. 
    !cat test.txt

This is a text file.
This is the second line of the file.
The third line.
End of message...


You can create a file object using the `open` function.

In [56]:
my_file_handle = open("test.txt")
print(my_file_handle)

<_io.TextIOWrapper name='test.txt' mode='r' encoding='cp1252'>


This shows a bit of information about the file handler. The name of the file is `test.txt`. The mode is `r` which stands for read mode. The file encoding is `cp1252` which is one of the many different types of file encodings. Now that we have a file handler, we can use a for loop to iterate through the lines just as we looped through each character of a text string.

In [57]:
for line in my_file_handle:
    print(line)

This is a text file.

This is the second line of the file.

The third line.

End of message...


As you can see, each line is printed. However, we also see a newline between the printed text. This is because the `print` function also prints a newline by default: 

In [58]:
print("line 1")
print("line 2")
print("line 3")

line 1
line 2
line 3


We can change that behaviour by altering the `end` argument of the `print` function.

In [62]:
help(print)

Help on built-in function print in module builtins:

print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
    
    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file:  a file-like object (stream); defaults to the current sys.stdout.
    sep:   string inserted between values, default a space.
    end:   string appended after the last value, default a newline.
    flush: whether to forcibly flush the stream.



In [59]:
print("line 1", end="")
print("line 2", end="")
print("line 3", end="")

line 1line 2line 3

Thus in practice:

In [65]:
my_file_handle = open("test.txt") # We have to create a new file handle object because the old one is exchausted (a property of generator objects as you will learn later)
for line in my_file_handle:
    print(line, end="")

This is a text file.
This is the second line of the file.
The third line.
End of message...

As you can see, this is the same output as the `cat` command. 

The end...