# Reading data from files
How to read from a file now? Files are organized sequentially as mentioned before, i.e. they consist of consecutive
lines. For processing sequences the `for` loop is suitable. Specifically, one can iterate over the lines of a file like
follows:

In [1]:
# open file
with open("lorem_ipsum.txt", "r") as file:
    # read file line by line and output the lines
    for line in file:
        print(line)

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris

nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in

reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla

pariatur. Excepteur sint occaecat cupidatat non proident, sunt in

culpa qui officia deserunt mollit anim id est laborum.


If you compare the output of the program with the content of the file (e.g. in a text editor), you notice that blank
lines have been added to the output. What is the reason for this?  
At the end of each line there is a line break `\n` in the text file. This is only visible indirectly, because the text
continues on the next line. On output, the function `print()` adds another line break, hence the blank line. 

You can correct this behaviour in several ways. One way is to set the `end` parameter in the `print()` function to an
empty character `end = ""`.  
Another way is to *strip* the line first. For strings there is a method `.strip()`. This removes spaces, tabs and line
breaks at the beginning and at the end of a string. `.strip()` is often used when reading forms to prevent a leading
space from changing the input. With one optional argument, you could also specify which characters should be removed.  
Alternatively, `.lstrip()` or `.rstrip()` can be used. In this case something is deleted only left or right of the
string.

In [2]:
# Open file
with open("lorem_ipsum.txt", "r") as file:
    # read file line by line, strip from  and output the lines
    for line in file:
        line = line.strip()
        print(line)

Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in
reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.


## Output the contents of a file twice
In the following program, the `for` loop is run twice. What does the output look like? Why?

In [3]:
# open file
with open("lorem_ipsum.txt", "r") as file:
    # read file line by line and print the lines
    print("First round")
    for line in file:
        line = line.strip()
        print(line)

    # read file line by line and print the lines
    print("Second round")
    for line in file:
        line = line.strip()
        print(line)

First round
Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in
reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.
Second round


When reading a file, the "read cursor" or "read pointer" is moved character by character over the file. If the *read
pointer* arrives at the end of the file and is **not** reset or set to another position, it can not continue reading as
the file ends there. To place the *read cursor*, the method `.seek()` can be used. However, this is beyond the scope of the course. 

## Read a file into a list in one go
It is possible that the line breaks are superfluous and only exist because a paper page has a limited width for example.
In this case, it may make sense to read the entire text "in one go" without iterating over the lines using a loop. The
method `.readlines()` is useful for this. The result is a list with **one** entry.

In [4]:
# Open file
with open("lorem_ipsum.txt", "r") as file:
    # read file in one go
    line = file.readlines()
    print(line)

['Lorem ipsum dolor sit amet, consectetur adipiscing elit,\n', 'sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.\n', 'Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris\n', 'nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in\n', 'reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla\n', 'pariatur. Excepteur sint occaecat cupidatat non proident, sunt in\n', 'culpa qui officia deserunt mollit anim id est laborum.']


# Exercise 1:
In the file `numbers2.txt` there is one number per line. Read the file and sum up the numbers. Output your result.

In [5]:
sum_lines = 0

with open("numbers2.txt", "r") as file:
    for line in file:
        print(line)
        sum_lines += int(line)

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100



In [4]:
print(sum_lines)

5050


# Self Test

### Question 1
`2.0 Pts`

Which of the following statements about line breaks are correct?

*Note: There are 3 correct answers to this question.*

The line break `\n` cannot be used in strings to format the output. 

From the computer's point of view, the line break is nothing but the special character `\n`. This character forces the editor to go to the next line. `correct` 

If there are two line breaks `\n\n`, the program goes into the next line twice and an empty line is created. To have two empty lines, four line breaks `\n\n\n\n` are required. 

As there is a line break `\n` in standard text files at the end of each line and another line break `\n` is added by the `print()` statement, this can lead to additional empty lines when outputting the file. `correct` 

The line break \n is put at the end of each `print()` statement if the `end` parameter is not explicitly set to another value. `correct`

# Unit 3: Exercise

### Instructions:

The file `numbers.txt` contains random integer numbers. There is exactly one number per line. Read the file and output the three biggest numbers in the following form:

2345 

223 

89

**Hint** 

Read the file line by line, delete the line break. As files can only contain strings, the number must now be converted into an integer. Add the number into a list. When all numbers are in the list, sort the list. Then print out the biggest numbers.

In [6]:
numbers_list = []

# open file and read data line by line
with open("numbers3.txt", "r") as file:
    for line in file:
        line = line.strip()
        line = int(line)
        numbers_list.append(line)
        
# sort the created list
sorted_list = sorted(numbers_list)

print(sorted_list[-1])
print(sorted_list[-2])
print(sorted_list[-3])

9853
9760
9745


In [7]:
print(sorted(numbers_list, reverse=True))

[9853, 9760, 9745, 9613, 9539, 9529, 9479, 9475, 9472, 9433, 9424, 9313, 9312, 9307, 9276, 9214, 9190, 9122, 9115, 9097, 9075, 9043, 9006, 8996, 8993, 8960, 8922, 8910, 8843, 8749, 8749, 8740, 8717, 8693, 8689, 8670, 8658, 8655, 8527, 8436, 8424, 8401, 8389, 8369, 8322, 8063, 8061, 8061, 8035, 7951, 7946, 7722, 7706, 7691, 7676, 7649, 7612, 7599, 7564, 7484, 7463, 7461, 7415, 7334, 7292, 7183, 7176, 7092, 7005, 6938, 6913, 6906, 6812, 6704, 6652, 6571, 6506, 6504, 6435, 6419, 6405, 6384, 6337, 6311, 6279, 6223, 6018, 5985, 5895, 5885, 5868, 5785, 5637, 5618, 5579, 5565, 5564, 5523, 5496, 5491, 5416, 5335, 5295, 5222, 5205, 5154, 5136, 5077, 5058, 4939, 4916, 4857, 4854, 4845, 4830, 4780, 4683, 4675, 4619, 4601, 4598, 4560, 4557, 4437, 4313, 4265, 4157, 4148, 4130, 3846, 3791, 3773, 3632, 3530, 3523, 3422, 3390, 3374, 3374, 3219, 3202, 3180, 3061, 3020, 2984, 2956, 2951, 2807, 2787, 2765, 2703, 2691, 2629, 2623, 2603, 2571, 2499, 2469, 2443, 2433, 2404, 2380, 2349, 2335, 2277, 2275, 219

In [8]:
# Another way to sort

numbers_list.sort(reverse=True)

print(numbers_list)

[9853, 9760, 9745, 9613, 9539, 9529, 9479, 9475, 9472, 9433, 9424, 9313, 9312, 9307, 9276, 9214, 9190, 9122, 9115, 9097, 9075, 9043, 9006, 8996, 8993, 8960, 8922, 8910, 8843, 8749, 8749, 8740, 8717, 8693, 8689, 8670, 8658, 8655, 8527, 8436, 8424, 8401, 8389, 8369, 8322, 8063, 8061, 8061, 8035, 7951, 7946, 7722, 7706, 7691, 7676, 7649, 7612, 7599, 7564, 7484, 7463, 7461, 7415, 7334, 7292, 7183, 7176, 7092, 7005, 6938, 6913, 6906, 6812, 6704, 6652, 6571, 6506, 6504, 6435, 6419, 6405, 6384, 6337, 6311, 6279, 6223, 6018, 5985, 5895, 5885, 5868, 5785, 5637, 5618, 5579, 5565, 5564, 5523, 5496, 5491, 5416, 5335, 5295, 5222, 5205, 5154, 5136, 5077, 5058, 4939, 4916, 4857, 4854, 4845, 4830, 4780, 4683, 4675, 4619, 4601, 4598, 4560, 4557, 4437, 4313, 4265, 4157, 4148, 4130, 3846, 3791, 3773, 3632, 3530, 3523, 3422, 3390, 3374, 3374, 3219, 3202, 3180, 3061, 3020, 2984, 2956, 2951, 2807, 2787, 2765, 2703, 2691, 2629, 2623, 2603, 2571, 2499, 2469, 2443, 2433, 2404, 2380, 2349, 2335, 2277, 2275, 219

In [9]:
# Another way to sort 

print(sorted_list)

[1022, 1040, 1060, 1136, 1141, 1163, 1166, 1223, 1269, 1290, 1323, 1369, 1387, 1441, 1448, 1452, 1456, 1482, 1521, 1561, 1625, 1642, 1650, 1705, 1716, 1760, 1840, 1875, 1895, 1925, 1948, 1974, 2084, 2191, 2275, 2277, 2335, 2349, 2380, 2404, 2433, 2443, 2469, 2499, 2571, 2603, 2623, 2629, 2691, 2703, 2765, 2787, 2807, 2951, 2956, 2984, 3020, 3061, 3180, 3202, 3219, 3374, 3374, 3390, 3422, 3523, 3530, 3632, 3773, 3791, 3846, 4130, 4148, 4157, 4265, 4313, 4437, 4557, 4560, 4598, 4601, 4619, 4675, 4683, 4780, 4830, 4845, 4854, 4857, 4916, 4939, 5058, 5077, 5136, 5154, 5205, 5222, 5295, 5335, 5416, 5491, 5496, 5523, 5564, 5565, 5579, 5618, 5637, 5785, 5868, 5885, 5895, 5985, 6018, 6223, 6279, 6311, 6337, 6384, 6405, 6419, 6435, 6504, 6506, 6571, 6652, 6704, 6812, 6906, 6913, 6938, 7005, 7092, 7176, 7183, 7292, 7334, 7415, 7461, 7463, 7484, 7564, 7599, 7612, 7649, 7676, 7691, 7706, 7722, 7946, 7951, 8035, 8061, 8061, 8063, 8322, 8369, 8389, 8401, 8424, 8436, 8527, 8655, 8658, 8670, 8689, 869

In [10]:
# Review

num_sort = []

with open("numbers3.txt", "r") as numbers:
    for line in numbers:
        line = int(line.strip())
        num_sort.append(line)

num_sort.sort(reverse=True)
print(*num_sort[:3], sep="\n")

9853
9760
9745


Even though the above produces the right output, it fails to pass the functional test. The issue was that the student did not separate their `.strip()` method from the `int` type cast.