# Files input/output

## Python Programming for Engineers
### Tel-Aviv University / 0509-1820 / Fall 2025-2026

## Agenda

<table style="display:block" align=center>
    <tbody>
        <tr>
            <td>
                <h3>
                    Text parsing: handling spaces flanking the text (self-learning)
                </h3>
                <h3>
                    File input-output 
                </h3>
                <ul>
                    <li>
                        <code>open()</code> and <code>close()</code>
                    </li>
                    <li>
                        <code>read()</code> and <code>write()</code>
                    </li>
                </ul>      
                <ul>
                    <li>
                        Questions from past exams 
                    </li>
                </ul>
                <h3>
                    List comprehension
                </h3>
            </td>
            <td>    
    </td>
        </tr></tbody>
</table>


## Text parsing: handling spaces flanking the text

|Function|description|
|:-|:-|
|`.split()`|splits a string into tokens|
|`.lstrip()`|removes leading spaces|
|`.rstrip()`|removes trailing spaces|
|`.strip()`|removes both leading and trailing spaces|


### Split:

In [None]:
s = "Boom!    Big reveal!\n I turned myself into a pickle!"
l1 = s.split()
print(l1)
l2 = s.split(" ")
print(l2)

In [None]:
s = "Boom! Big reveal! I turned myself into a pickle!"
l2 = s.split('!')
print(l2)

### [*]Strip:

In [None]:
# Has two leading spaces and a trailing one.
value = "  a line "

# Remove left spaces.
value1 = value.lstrip()
print("|" + value1 + "|")

# Remove right spaces.
value2 = value.rstrip()
print("|" + value2 + "|")

# Remove left and right spaces.
value3 = value.strip()
print("|" + value3 + "|")

## File input-output 

### What is a file?

- A block of arbitrary information
- A “digital” document
- Has a path (=address) in the computer
- Example (Windows):
    - C:/Users/User/test_file.txt


#### Why do we need files?
- We shall next show how to deal with textual files
- Files can also contain arbitrary, “binary” information

#### <span style="color:red"> DO NOT USE NON ENGLISH PATHS! </span>

### Opening a file

```python 
f=open(filename, mode) 
```
- `filename`: an address of a file
- `mode`:     

|||
|:-|:-|
|"w"|**overwrites** (“deletes”) prior data (BEWARE!)|
|"r"|read|
|"a"|append - adds at end of prior data|

#### <span style='color:red'>Remember to close a file after you done with it using </span> `f.close()`

### Using context manager (`with`)

Instead of closing a file manually, we can use the `with` context manager

In [None]:
with open('rick_and_morty_file.txt', "r") as f:
    lines = f.readlines()
# after this block ends, f.close() is called behind the scenes
print(lines)

is equivalent to:

In [None]:
f = open('rick_and_morty_file.txt', "r")
lines = f.readlines()
f.close()
print(lines)

#### What happens if the file does not exist?

In [None]:
print(open("non_existed_file.txt",'r').read())

In [None]:
open("test3.txt",'w').write("rick")
print(open("test3.txt",'r').read())

- `r` expects an existing file while `w` does not
- If the file exists `w` will overwrite it!

### Two common ways for reading from a file

In [1]:
## Here we only WRITE to a file named rick_and_morty_file.txt so we can work with it in the examples below
f=open('rick_and_morty_file.txt', 'w')
f.write('Accepting\nRerountiong\nOperation Phoenix\nInitiated')
f.close() # Releases the file lock, frees resources. More details about this operation soon...

In [2]:
## (1) Here we simply read the entire text contained in the file to the "lines" variable
f = open('rick_and_morty_file.txt', 'r')
text =  f.read()
print(text)
f.close() # releases the file lock, frees resources

Accepting
Rerountiong
Operation Phoenix
Initiated


In [3]:
# (2) We can also read a list of strings, each represent a single line in the file
# Note that at the of each line there is a new line char ("\n")
f = open('rick_and_morty_file.txt', 'r')
lines =  f.readlines()
print(lines)
f.close() # releases the file lock, frees resources

['Accepting\n', 'Rerountiong\n', 'Operation Phoenix\n', 'Initiated']


#### Newline ('\n') char (cond.) 

- `\n` appears between lines in the text file (at the end of each line)
- To get rid of trailing newline characters, the string method **rstrip()** can be used:

In [4]:
txt = 'Accepting\nRerountiong\nOperation Phoenix\nInitiated'
with open('rick_and_morty_file.txt', 'w') as f:
    f.write(txt)

with open('rick_and_morty_file.txt', 'r') as f:
    lines = f.readlines()

print(lines)
for i in range(len(lines)):
    lines[i] = lines[i].rstrip()
print(lines)

['Accepting\n', 'Rerountiong\n', 'Operation Phoenix\n', 'Initiated']
['Accepting', 'Rerountiong', 'Operation Phoenix', 'Initiated']


### How would you remove `\n` using list comprehension?

In [5]:
txt = 'Accepting\nRerountiong\nOperation Phoenix\nInitiated'
with open('rick_and_morty_file.txt', 'w') as f:
    f.write(txt)

with open('rick_and_morty_file.txt', 'r') as f:
    lines = f.readlines()

print(lines)
print([l.rstrip() for l in lines])


['Accepting\n', 'Rerountiong\n', 'Operation Phoenix\n', 'Initiated']
['Accepting', 'Rerountiong', 'Operation Phoenix', 'Initiated']


- Another way to get rid of trailing newline characters is splitting by '\n'

In [6]:
txt = 'Accepting\nRerountiong\nOperation Phoenix\nInitiated'
with open('rick_and_morty_file.txt', 'w') as f:
    f.write(txt)

f = open('rick_and_morty_file.txt', 'r')
lines = f.read()
print(lines.split("\n"))
f.close()

['Accepting', 'Rerountiong', 'Operation Phoenix', 'Initiated']


#### Two ways to iterate over lines of file

In [7]:
f = open('rick_and_morty_file.txt', 'r')
lines = []
for line in f:
    lines.append(line) 
print(lines)
f.close()

['Accepting\n', 'Rerountiong\n', 'Operation Phoenix\n', 'Initiated']


In [8]:
f = open('rick_and_morty_file.txt', 'r')
lines = []
line = f.readline() # This is not the same as ".readlines()"!
while line:
    lines.append(line)
    line = f.readline() 
print(lines)
f.close()

['Accepting\n', 'Rerountiong\n', 'Operation Phoenix\n', 'Initiated']


- Can you think about other ways to iterate over a file's lines? (hint: we just discussed "`.readlines()`")

### Exercise 1: Copy a text file while omitting commented lines

#### Write a function that copies every line from a source file to a target file, excluding lines that start with a ‘#’

In [None]:
txt='Accepting\nRerountiong\n# Operation Phoenix\nInitiated'
f=open('src.txt', 'w')
f.write(txt)
f.close()

In [9]:
def copy_file_excluding_comments(source, target):
    infile  = open(source, 'r')
    outfile = open(target, 'w')
    for line in infile:
        if line[0] == '#':
            continue
        outfile.write(line)
    infile.close()
    outfile.close()
    
copy_file_excluding_comments("src.txt", "trgt.txt")

#### What is the result in the txt files?

In [None]:
f = open('src.txt', 'r')
print(f.read())
f.close()

In [None]:
f = open('trgt.txt', 'r')
print(f.read())
f.close()

### Reading a CSV file

- A CSV file contains data in a tabular format using comma to separate between values
- Each row holds the same number of columns
    - e.g.,   
        1,5,8,3    
        6,4,2,1    
        99,98,97,0 
- <span style='color:red'>Read more in Google about what a CSV is.</span>


### Exercise 2: Sum row of numbers read from a CSV file

#### Write a function that sums the numbers in each row for a given CSV file.
- Input: CSV file name. 
- Output:  A list containing the line sums


In [10]:
def sum_lines_in_csv_file(filename):
    f = open(filename,'r')
    sums = []
    for line in f:
        tokens = line.rstrip().split(',')
        line_sum = 0
        for token in tokens:
            line_sum+=int(token)
        sums.append(line_sum)

    f.close()
    return sums 

In [11]:
txt='0,1,3,7\n100,30,7,0\n50,-100,-88,1'
open('numbers.csv', 'w').write(txt)

sum_lines_in_csv_file('numbers.csv')

[11, 137, -137]

## Self Learning

### Exercise 1: file to data structures.

Implement the function `csv_to_lists` that gets a path to an existing csv file and returns a list of lists containing the matrix.

In [None]:
# Solution
def csv_to_lists(filename):
    f = open(filename,'r')
    matrix = []
    for line in f:
        tokens = line.rstrip().split(',')
        matrix_line = []
        for token in tokens:
            matrix_line.append(token)
        matrix.append(matrix_line)

    f.close()
    return matrix 

## Questions from previous exams

Open [Exam 2022-2023 semester A Moed A](https://courses.cs.tau.ac.il/pyProg/2526a/exams/exam2223a_moedA.pdf) and answer questions 1.A (a+b), 3.A.

#### Solutions

1.A.1

In [None]:
def build_suffix_dict(lst, k):
    d = {}
    for i in range(len(lst)):
        sfx = lst[i][-k:]
        if sfx in d:
            d[sfx].append(i)
        else:
            d[sfx] = [i]
    return d

In [None]:
lst = ["good luck!", "Hello", "cartago", "duck duck go", "go girl", "lololo"]
build_suffix_dict(lst, 2)

1.A.2

In [None]:
def prefix_suffix_overlap(lst, k):
    d = build_suffix_dict(lst, k)
    for i in range(len(lst)):
        if lst[i][:k] in d:
            for j in d[lst[i][:k]]:
                if i != j:
                    return True
    return False


In [None]:
print(prefix_suffix_overlap(["good luck!", "Hello", "cartago"], 2))
print(prefix_suffix_overlap(["good luck!", "Hello", "cartago"], 3))
print(prefix_suffix_overlap(["aaaabbbaaaa"], 4))


3.A

In [None]:
def fix_data(corrupted_file_name, out_file_name):
    f = open(corrupted_file_name)
    fw = open(out_file_name, "w")
    for l in f:
        l = l.replace("#", "")
        lst = l.split()
        if len(lst) > 0:
            fw.write(",".join(lst) + "\n")
    f.close()
    fw.close()

In [None]:
open('files/corrupted.txt', 'w').write("#2 45 44# 66\n\n### ### ## #\n###############\n\n\n9 10# 34# 22")

fix_data('files/corrupted.txt', 'files/clean.txt')
print("=== corrupted ===")
print(open('files/corrupted.txt','r').read())
print("\n=== clean ===")
print(open('files/clean.txt','r').read())

Open [Exam 2022-2023 semester B Moed B](https://courses.cs.tau.ac.il/pyProg/2526a/exams/exam2223a_moedB.pdf) and answer question 3.A.

#### Solution

3.A

In [None]:
def merge_files(infile1, infile2, lst1, lst2, out_file):
    f1 = open(infile1, 'r')
    f2 = open(infile2, 'r')
    c_file1 = f1.read().split(' ')
    c_file2 = f2.read().split(' ')
    c_out = ''
    c_file=c_file1+c_file2
    lst=lst1+lst2
    for a in range(len(lst)):
        c_out+=c_file[lst.index(a)]
       
    f_out = open(out_file, 'w')
    f_out.write(c_out[:-1]) # remove last redundant ' '
    
    f1.close()
    f2.close()
    f_out.close()
    


In [None]:
open('files/infile1.txt', 'w').write('python is best the world')
open('files/infile2.txt', 'w').write('the course in whole')
lst1 = [0, 1, 3, 6, 8]
lst2 = [2, 4, 5, 7]

merge_files('files/infile1.txt', 'files/infile2.txt', lst1, lst2, 'files/out.txt')
print(open('files/out.txt', 'r').read())