# Programming with Python

# 5 Data input and Processing
There are several ways to present the output of a program; data can be printed in a human-readable form, or written to a file for future use. This chapter will discuss some of the possibilities.


## 5.1 Keyboard input


In order to read from the keyboard, use the function **input()**

In [1]:
input()

12


'12'

In [2]:
# assign keyboard input to a variable
a = input()
print('You entered',a)
print(a,'is of type',type(a))

56
You entered 56
56 is of type <class 'str'>


All keyboard entries are of type **string**! Typecast if necessary!

## 5.2 Reading a file from the computer

In [3]:
file = open('./img/lecture05/measurement_data.txt')
print(file)

<_io.TextIOWrapper name='./img/lecture05/measurement_data.txt' mode='r' encoding='UTF-8'>


In [4]:
data = file.read()
print(data)

0.1	1	11
0.2	2	19
0.3	3	41
0.4	4	76
0.5	5	55
0.6	6	43
0.7	7	70
0.8	8	81
0.9	9	65


In [5]:
data = file.read()
data

''

This phenomenon is still a relict of tape technologies. If a tape was read in, the tape was not rewinded to the beginning after each read-in. Until today, all programming languages are still tape technology compliant. Therefore, in each programming language exist some kind of a rewind functionality.

In [6]:
file.seek(0) # rewind file pointer to beginning
data = file.read()
print(data)

0.1	1	11
0.2	2	19
0.3	3	41
0.4	4	76
0.5	5	55
0.6	6	43
0.7	7	70
0.8	8	81
0.9	9	65


Today, you will just **reopen** the file:

In [7]:
file = open('./img/lecture05/measurement_data.txt')
data = file.read()
file.close()
print(data)

0.1	1	11
0.2	2	19
0.3	3	41
0.4	4	76
0.5	5	55
0.6	6	43
0.7	7	70
0.8	8	81
0.9	9	65


All data files read by the computer is handled as **single** string

In [8]:
print('data is of type', type(data))
print('the length of data is', len(data))
data

data is of type <class 'str'>
the length of data is 80


'0.1\t1\t11\n0.2\t2\t19\n0.3\t3\t41\n0.4\t4\t76\n0.5\t5\t55\n0.6\t6\t43\n0.7\t7\t70\n0.8\t8\t81\n0.9\t9\t65'

**print()** is a function that displays the data in a more human readable way. The computer reads everything in as a single line and add control characters to seperate values, lines, etc.  
**\t** is tab.  
**\n** is a new line.  
**\r** stands for carriage return. This is a relict of typewriters. To start a new line, you also needed to return the carriage to the beginning.  
Today you will still find files with **\r\n** from Windows operating system. Most other operating system omitted the return control charactar.   
Python understands and works with **both**!

**Split the data**  
In order to process the data, it is necessary to make all values accessible. A single line is hard to handle. Therefore, the data needs to be splitted.

In [9]:
splitted_data = data.split()
print(splitted_data)

['0.1', '1', '11', '0.2', '2', '19', '0.3', '3', '41', '0.4', '4', '76', '0.5', '5', '55', '0.6', '6', '43', '0.7', '7', '70', '0.8', '8', '81', '0.9', '9', '65']


The split method is able to split strings into a list of values. By **default**, it splits at so-called **white spaces**. White space are spaces and control charactors such as tab, newline, return, etc.   
The problem with the default argument is that all information about line breaks are lost.  
Therefore, we will pass the control character for a new line \n as splitting character:

In [10]:
data2 = data.split("\n")
print(data2)

['0.1\t1\t11', '0.2\t2\t19', '0.3\t3\t41', '0.4\t4\t76', '0.5\t5\t55', '0.6\t6\t43', '0.7\t7\t70', '0.8\t8\t81', '0.9\t9\t65']


In [11]:
# There is also a abbrevated method into python
file = open('./img/lecture05/measurement_data.txt')
data = file.readlines()
file.close()
data

['0.1\t1\t11\n',
 '0.2\t2\t19\n',
 '0.3\t3\t41\n',
 '0.4\t4\t76\n',
 '0.5\t5\t55\n',
 '0.6\t6\t43\n',
 '0.7\t7\t70\n',
 '0.8\t8\t81\n',
 '0.9\t9\t65']

Now, all lines are splitted as into list elements.  
Next we need to split the white spaces in all list elements and create a two dimensional array.

In [12]:
# split white spaces in string elements
data3 = []
for string in data: # s stands for string. It helps the programmer to see with what kind of variable type she or he is working with.
    dummy = string.split()
    data3.append(dummy)
data3

[['0.1', '1', '11'],
 ['0.2', '2', '19'],
 ['0.3', '3', '41'],
 ['0.4', '4', '76'],
 ['0.5', '5', '55'],
 ['0.6', '6', '43'],
 ['0.7', '7', '70'],
 ['0.8', '8', '81'],
 ['0.9', '9', '65']]

Now all values need to be converted to numbers in order to process them.

In [13]:
# typecast all strings to float
data4 = []
for line in data3:
    dummy = []
    for string in line:
        number = float(string)
        dummy.append(number)
    data4.append(dummy)
data4

[[0.1, 1.0, 11.0],
 [0.2, 2.0, 19.0],
 [0.3, 3.0, 41.0],
 [0.4, 4.0, 76.0],
 [0.5, 5.0, 55.0],
 [0.6, 6.0, 43.0],
 [0.7, 7.0, 70.0],
 [0.8, 8.0, 81.0],
 [0.9, 9.0, 65.0]]

In [14]:
# combining split and typecasting
data5 = []
for line in data:
    dummy = line.split("\t")
    temp_list = []
    for s in dummy:
        temp_list.append(float(s))
    data5.append(temp_list)

In [15]:
data5

[[0.1, 1.0, 11.0],
 [0.2, 2.0, 19.0],
 [0.3, 3.0, 41.0],
 [0.4, 4.0, 76.0],
 [0.5, 5.0, 55.0],
 [0.6, 6.0, 43.0],
 [0.7, 7.0, 70.0],
 [0.8, 8.0, 81.0],
 [0.9, 9.0, 65.0]]

In [16]:
# access some value
data5[4][1]

5.0

In [17]:
# using list comprehension to reduce the read-in and convert to a two-dimensional table in a single line.
file = open('./img/lecture05/measurement_data.txt')
data6 = [ [ float(s) for s in line.split() ] for line in file.readlines() ]
data6

[[0.1, 1.0, 11.0],
 [0.2, 2.0, 19.0],
 [0.3, 3.0, 41.0],
 [0.4, 4.0, 76.0],
 [0.5, 5.0, 55.0],
 [0.6, 6.0, 43.0],
 [0.7, 7.0, 70.0],
 [0.8, 8.0, 81.0],
 [0.9, 9.0, 65.0]]

It is recommend to **close the file** after the read-in, because your operating system locks the file during the reading process. By closing the file, operating system releases the file to the system again.

In [18]:
file.close()

In [19]:
# obviously after closing the file, it is not possible anymore to read the file
file = open('./img/lecture05/measurement_data.txt')
file.close()
data7 = [ [ float(s) for s in line.split() ] for line in file.readlines() ]


ValueError: I/O operation on closed file.

In [20]:
# write a file reader as a function
def file_open(filepath):
    file = open(filepath)
    datafile = [ [ float(s) for s in line.split() ] for line in file.readlines() ]
    file.close()
    return datafile

df = file_open('./img/lecture05/measurement_data.txt')
df

[[0.1, 1.0, 11.0],
 [0.2, 2.0, 19.0],
 [0.3, 3.0, 41.0],
 [0.4, 4.0, 76.0],
 [0.5, 5.0, 55.0],
 [0.6, 6.0, 43.0],
 [0.7, 7.0, 70.0],
 [0.8, 8.0, 81.0],
 [0.9, 9.0, 65.0]]

## 5.3 Basics on processing data

Let's assume we've read the following data from disc:

In [21]:
data = [ [0.1, 1, 11],
         [0.2, 2, 19],
         [0.3, 3, 41],
         [0.4, 4, 76],
         [0.5, 5, 55],
         [0.6, 6, 43],
         [0.7, 7, 70],
         [0.8, 8, 81],
         [0.9, 9, 65], ]

Let's assume that we want to calculate the average of the third column.
We want to use the following function:

In [22]:
def mean_of_list(list_of_numbers):
    m = sum(list_of_numbers)/len(list_of_numbers)
    return m

As you can see, mean() expects its argument to be a list. From the internal name of the argument, we can see that its creator assumes us to only provide lists of numbers.

The task at hand therefore is to extract the third column of the data object. Remember that list indices in python are 0-based, hence we're looking at something with an index of 2. Our first approach might be to extract the item on index 2 of data:

In [23]:
data[2]

[0.3, 3, 41]

You can see that data[2] yielded the third *line* when we wanted the third *column* of data...

### The 'column getter' pattern
In software engineering, a (software design) pattern is a general, reusable solution to a commonly occurring problem. While getting (and setting) lines of a list is a simple operation in python:

In [24]:
data[0]

[0.1, 1, 11]

In [25]:
data[0] = [0.1, 0, 0]
data

[[0.1, 0, 0],
 [0.2, 2, 19],
 [0.3, 3, 41],
 [0.4, 4, 76],
 [0.5, 5, 55],
 [0.6, 6, 43],
 [0.7, 7, 70],
 [0.8, 8, 81],
 [0.9, 9, 65]]

getting a column of a matrix (= a list of a list of numbers) is not a native operation. Therefore, we need to create this funtionality on our own. We'll use an old friend - a custom built function which we'll call by what it does and which accepts two parameters. The first one, let's call it *matrix*, is the list of list of numbers from which we want to extract the *i*-th column. We'll pass in the column index *i* as second parameter:

In [26]:
def get_col(matrix, j):
    col = []
    for line in matrix:
        col.append(line[j])
    return col

Please note, that from now on, we'll stick to the convention that matrix lines are denoted by index i and that matrix columns are denoted by index j. But now, let's test our new function:

In [27]:
col_three = get_col(data, 2)
print(col_three)

[0, 19, 41, 76, 55, 43, 70, 81, 65]


### Basics of column modification
Now we can compute the arithmetic mean of column three by simply calling *mean()* and passing in *col_three* as its argument:

In [28]:
mean_of_list(col_three)

50.0

Also, we can easily multiply all elements by 2:

In [29]:
new_col_three = []
for n in col_three:
    new_col_three.append(2*n)
col_three = new_col_three
print(col_three)

[0, 38, 82, 152, 110, 86, 140, 162, 130]


Just as easily as this, we can divide it by 10:

In [30]:
new_col_three = []
for n in col_three:
    new_col_three.append(n/10)
col_three = new_col_three
print(col_three)

[0.0, 3.8, 8.2, 15.2, 11.0, 8.6, 14.0, 16.2, 13.0]


## The 'column setter' pattern
Another import pattern in software design is setting a column of a data matrix. Here is a most simple implementation:

In [31]:
def set_col(matrix, col, j):
    i = 0
    for line in matrix:
        line[j] = col[i]
        i = i + 1
    return matrix    

Let's see how it works, and remember that we need to reassing the modified matrix that the function returns. Also remember that column three has j = 2:

In [32]:
print('data, before:')
data

data, before:


[[0.1, 0, 0],
 [0.2, 2, 19],
 [0.3, 3, 41],
 [0.4, 4, 76],
 [0.5, 5, 55],
 [0.6, 6, 43],
 [0.7, 7, 70],
 [0.8, 8, 81],
 [0.9, 9, 65]]

In [33]:
print('col_three =', col_three)

col_three = [0.0, 3.8, 8.2, 15.2, 11.0, 8.6, 14.0, 16.2, 13.0]


In [34]:
data = set_col(data, col_three, 2)
data

[[0.1, 0, 0.0],
 [0.2, 2, 3.8],
 [0.3, 3, 8.2],
 [0.4, 4, 15.2],
 [0.5, 5, 11.0],
 [0.6, 6, 8.6],
 [0.7, 7, 14.0],
 [0.8, 8, 16.2],
 [0.9, 9, 13.0]]

## 5.4 Output using print functions

So far we’ve encountered two ways of writing values: expression statements and the print() function.

Often you’ll want more control over the formatting of your output than simply printing space-separated values. There are several ways to format output.

To use **formatted string literals**, begin a string with **f** or **F** before the opening quotation mark or triple quotation mark. Inside this string, you can write a Python expression between <b>{</b> and <b>}</b> characters that can refer to variables or literal values.

In [35]:
year = 2019
event = 'referendum'
print(f'I hope to do my {event} in {year}')

I hope to do my referendum in 2019


## 5.5 Writing a file

Writing a file is very similar to display information on the screen. The difference is mostly it <i>displays</i> your information in a file using the **write() method instead of a print() function**.

In order to write into a file, a file needs to be open. It is identical to opening a file for readout, but also needs an additional **option for writing access** (here: w+).  

Follwing options are also available:  
 <b>r</b>&nbsp;&nbsp;&nbsp;&nbsp;Open text file for reading.  The stream is positioned at the
         beginning of the file.

 <b>r+</b>&nbsp;&nbsp;Open for reading and writing.  The stream is positioned at the
         beginning of the file.

 <b>w</b>&nbsp;&nbsp;&nbsp;Truncate file to zero length or create text file for writing.
         The stream is positioned at the beginning of the file.

 <b>w+</b>&nbsp;Open for reading and writing.  The file is created if it does not
         exist, otherwise it is truncated.  The stream is positioned at
         the beginning of the file.

 <b>a</b> &nbsp;&nbsp;  Open for writing.  The file is created if it does not exist.  The
         stream is positioned at the end of the file.  Subsequent writes
         to the file will always end up at the then current end of file,
         irrespective of any intervening fseek(3) or similar.

 <b>a+</b>&nbsp;&nbsp;Open for reading and writing.  The file is created if it does not
         exist.  The stream is positioned at the end of the file.  Subse
         quent writes to the file will always end up at the then current
         end of file, irrespective of any intervening fseek(3) or similar.

In [36]:
# open file, remember to explicitly state a mode string, use 'w+' in doubt
f = open('modified_data.txt', 'w+')

In [37]:
# iterate over the lines of the data matrix
# remember that each line is a list of numbers, which we'll call numbers in short hand
def write_to_file(data, filename):
    f = open(filename, 'w+')
    for numbers in data:
        # use a list comprehension to transform the line of numbers to a line of strings
        strings = [ str(n) for n in numbers ]
        # now, concatenate to a sting that contains the whole line, with cols separated by tabs
        line = '\t'.join(strings)
        # add carriage return (\r) and new line (\n)
        line = line + '\r\n'
        # write string to file
        f.write(line)
    f.close()