# Quiz 04: Topic Working with Files
This quiz is designed to test your understanding of reading and writing from files.  It consists of X multiple choice, Y short answer, Z fill in the blank, and XX programming problems.

## Multiple Choice

### Absolute Paths
What is an absolute path?

A reference to a location ...
1. that includes `/`
2. starting from the root
3. starting from the current directory
4. that includes the word absolute

<details>
<summary>Answer</summary>
2 - An absolute path starts from the root.
</details>

### Valid Paths
Which of these are valid paths in Linux?

1. `/mnt/media/`
2. `../help me/obi-wan`
3. `./../../use_the_force.txt`
4. `/usr/../usr/../etc`

<details>
<summary>Answer</summary>
1, 2, 3, 4 - All are valid paths.
</details>

### Special Characters
Match the following characters with their definitions

1. `/`
2. `~`
3. `..`
4. `.`

A. The root directory or path separator<br>
B. The current directory<br>
C. The user's home directory<br>
D. The parent directory<br>

<details>
<summary>Answer</summary>
1 - A <br>
2 - C <br>
3 - D <br>
4 - B <br>
</details>

### File Modes
Which mode should you use to open a file for writing, erasing its contents if it already exists or creating the file if it doesn't exist?

1. `r`
2. `w`
3. `a`
4. `x`

<details>
<summary>Answer</summary>
2 - The "write" mode will truncate the existing contents of a file if it already exists.
</details>

### File Existence
What happens if you try to open a nonexistent file in `r` mode?

1. The file is deleted
2. The file is created
3. The file is renamed
4. An error occurs

<details>
<summary>Answer</summary>
4 - Python will raise a `FileNotFoundError` if the file does not exist.
</details>

### Binary vs Text
Which mode should you use to read a binary file in Python?

1. `r`
2. `w`
3. `rb`
4. `wb`

<details>
<summary>Answer</summary>
3 - The `r` is for read and the `b` means in binary.
</details>

## Fill In The Blank

### File Writing
The function used to write a string to a file object is _________.

<details>
<summary>Answer</summary>
`write`
</details>

### File Reading
Use the _________ method to read all lines of a file into a list.

<details>
<summary>Answer</summary>
`readlines`
</details>

### Context Manager
Using the _________ statement helps manage file opening and closing, even if an error occurs.

<details>
<summary>Answer</summary>
`with`
</details>

### And You Shall Find
To move the file pointer to a specific location in a file, use the _________ method.

<details>
<summary>Answer</summary>
`seek`
</details>

## Reading Problems

### Basic Reading
```python
with open("data.txt", "r") as f:
    line = f.readline()
    line = f.readline()
print(line[0])
```
If `data.txt` contains:
```text
hello
world
```
What does this code print?

<details>
<summary>Answer</summary>
w
</details>

### Reading Multiple
```python
with open("numbers.txt", "r") as f:
    for line in f:
        print(int(line.strip()) * 2)
```
If `numbers.txt` contains:
```text
1
2
3
```
What is the output of this code?

<details>
<summary>Answer</summary>
2 <br>
4 <br>
6 <br>
</details>

### Seek
```python
with open("data.txt", "r") as f:
    print(f.read())
    f.seek(0)
    print(f.readline())
```
If `data.txt` contains:
```text
first
second
```
What is the output of this code?

<details>
<summary>Answer</summary>
first <br>
second <br>
first <br>
</details>


### Interleaved Files
```python
with open("a.txt", "r") as fa, open("b.txt", "r") as fb:
    for line_a in fa:
        print(line_a.strip() + "-" + line_b.readline().strip())
```
If `a.txt` contains:
```text
apple
banana
cherry
```
and `b.txt` contains:
```text
red
yellow
dark red
purple
```
What is the output of this code?

<details>
<summary>Answer</summary>
apple-red <br>
banana-yellow <br>
cherry-dark red <br>
</details>

## Software Problems

### Problem 0 — Count lines
Write a small helper `count_nonempty_lines(path)` that returns the number of non-empty lines in a file. Lines that are empty or contain only whitespace should not be counted. Implement the function so it reads the file line-by-line (so it scales to large files).

For example, a file containing 'hello\nworld\n' should return 2. An empty file returns 0, and a file with only whitespace lines returns 0.

pytest snippet:
```python
def test_count_lines(tmp_path):
    p = tmp_path / '1.txt'
    p.write_text('hello\nworld\n')
    assert count_nonempty_lines(str(p)) == 2

def test_count_lines_empty_lines(tmp_path):
    p = tmp_path / '2.txt'
    p.write_text('\n\nhello\n\n')
    assert count_nonempty_lines(str(p)) == 1

def test_count_lines_empty(tmp_path):
    p = tmp_path / '3.txt'
    p.write_text('')
    assert count_nonempty_lines(str(p)) == 0
```
Tip: see [here](https://docs.pytest.org/en/6.2.x/tmpdir.html) for how the `tmp_path` fixture works in pytest.


### Problem 1 — Count word frequencies (easy)
Implement `count_word_freq(path)` which reads a text file and returns a dictionary mapping normalized words to their counts. Normalize tokens by lowercasing and stripping surrounding punctuation; treat any whitespace as a separator. The function should stream the file (process line-by-line) so it handles large inputs efficiently. Punctuation-only tokens should be ignored and an empty file should return an empty dictionary.

Example: the content 'Hello hello world.' should produce {'hello': 2, 'world': 1}. The implementation must also handle newlines and repeated punctuation such as 'Hi!\nHi?'.

pytest snippet:
```python
def test_count_word_freq_simple(tmp_path):
    p = tmp_path / 'words1.txt'
    p.write_text('Hello hello world.')
    assert count_word_freq(str(p)) == {'hello': 2, 'world': 1}

def test_count_word_freq_empty(tmp_path):
    p = tmp_path / 'words2.txt'
    p.write_text('')
    assert count_word_freq(str(p)) == {}

def test_count_word_freq_punctuation(tmp_path):
    p = tmp_path / 'words3.txt'
    p.write_text('Hi!\nHi?\nHi...')
    assert count_word_freq(str(p)) == {'hi': 3}

def test_count_word_freq_mixed(tmp_path):
    p = tmp_path / 'words4.txt'
    p.write_text('A, B; C: a b')
    assert count_word_freq(str(p)) == {'a': 2, 'b': 2, 'c': 1}
```

### Problem 2 — Top-N longest lines
Write `top_n_longest_lines(path, n)` which returns the n longest lines from a text file in descending order by length. Each returned line should be trimmed of its trailing newline. If multiple lines have identical length, preserve their original file order. The function should handle edge cases such as n being zero or larger than the file length and should stream the file line-by-line.

Example: for a file containing 'apple\nbanana\ncherry pie\n' and n=2, the function should return ['cherry pie', 'banana']. If the file is empty, return an empty list.

pytest snippet:
```python
def test_top_n_longest_lines_basic(tmp_path):
    p = tmp_path / 'lines1.txt'
    p.write_text('apple\nbanana\ncherry pie\n')
    assert top_n_longest_lines(str(p), 2) == ['cherry pie', 'banana']

def test_top_n_longer_than_file(tmp_path):
    p = tmp_path / 'lines2.txt'
    p.write_text('a\nbb\nccc\n')
    assert top_n_longest_lines(str(p), 5) == ['ccc', 'bb', 'a']

def test_top_n_zero(tmp_path):
    p = tmp_path / 'lines2.txt'
    assert top_n_longest_lines(str(p), 0) == []

def test_top_n_ties_preserve_order(tmp_path):
    p = tmp_path / 'lines3.txt'
    p.write_text('aa\nbb\ncc\n')
    assert top_n_longest_lines(str(p), 2) == ['aa', 'bb']
```


### Problem 3 — Merge two sorted integer files
Implement `merge_sorted_files(path_a, path_b)` to merge two files containing one integer per line (each file is already sorted ascending) and return a single sorted list containing all integers from both files. The function should stream both files simultaneously and perform an efficient merge (do not load both files entirely into memory). If any line cannot be parsed as an integer, raise ValueError.

Example: merging files with contents '1\n3\n5\n' and '2\n4\n6\n' should return [1,2,3,4,5,6]. If one file is empty, return the contents of the other file.

pytest snippet:
```python
def test_merge_sorted_files_normal(tmp_path):
    a = tmp_path / 'a.txt'
    b = tmp_path / 'b.txt'
    a.write_text('1\n3\n5\n')
    b.write_text('2\n4\n6\n')
    assert merge_sorted_files(str(a), str(b)) == [1,2,3,4,5,6]

def test_merge_sorted_files_one_empty(tmp_path):
    b = tmp_path / 'b.txt'
    b.write_text('2\n4\n6\n')
    c = tmp_path / 'c.txt'
    c.write_text('')
    assert merge_sorted_files(str(c), str(b)) == [2,4,6]

def test_merge_sorted_files_duplicates(tmp_path):
    d = tmp_path / 'd.txt'
    e = tmp_path / 'e.txt'
    d.write_text('1\n2\n2\n')
    e.write_text('2\n3\n')
    assert merge_sorted_files(str(d), str(e)) == [1,2,2,2,3]

def test_merge_sorted_files_nonint_raises(tmp_path):
    b = tmp_path / 'b.txt'
    b.write_text('2\n4\n6\n')
    f = tmp_path / 'f.txt'
    f.write_text('1\na\n')
    with pytest.raises(ValueError):
        merge_sorted_files(str(f), str(b))
```


### Problem 4 — Lines containing all query words
Write `lines_with_all_words(path, queries)` to return the lines from a file that contain every query word provided. Matching should be case-insensitive and word-based; to avoid false negatives strip surrounding punctuation from tokens before matching. For this exercise, an empty `queries` list should return an empty list (explicit choice).

Example: given lines 'I like apples and bananas' and 'Bananas are tasty', searching for ['apples','bananas'] should return only the first line; searching for ['bananas'] should return both lines (case-insensitive).

pytest snippet:
```python
def test_lines_with_all_words_both(tmp_path):
    p = tmp_path / 't.txt'
    p.write_text('I like apples and bananas\nBananas are tasty\n')
    assert lines_with_all_words(str(p), ['apples','bananas']) == ['I like apples and bananas']

def test_lines_with_all_words_single(tmp_path):
    p = tmp_path / 't.txt'
    p.write_text('I like apples and bananas\nBananas are tasty\n')
    assert lines_with_all_words(str(p), ['bananas']) == ['I like apples and bananas', 'Bananas are tasty']

def test_lines_with_all_words_empty_queries(tmp_path):
    p = tmp_path / 't.txt'
    p.write_text('I like apples and bananas\nBananas are tasty\n')
    assert lines_with_all_words(str(p), []) == []
```


### Problem 5 — Replace lines in a file safely (hard)
Create `replace_lines_in_file(path, predicate, replacer)` which scans a file line-by-line. When `predicate(line)` is True, replace that line with the string returned by `replacer(line)`. To avoid data loss, perform updates by writing to a temporary file and then atomically replacing the original (use `os.replace` for cross-platform behavior). The function should return the number of replacements made; if none were made it should leave the file unchanged and return 0.

Example: replacing 'blue' with 'azure' in a file containing 'red', 'blue', 'yellow' should update the file to ['red','azure','yellow'] and return 1. Make sure your replacer returns proper newline characters if needed.

pytest snippet:
```python
def test_replace_lines_single(tmp_path):
    p = tmp_path / 'colors1.txt'
    p.write_text('red\nblue\nyellow\n')
    n = replace_lines_in_file(str(p), lambda l: l.strip()=='blue', lambda l: 'azure\n')
    assert n == 1

def test_replace_lines_file_content_after_single(tmp_path):
    p = tmp_path / 'colors1.txt'
    p.write_text('red\nblue\nyellow\n')
    replace_lines_in_file(str(p), lambda l: l.strip()=='blue', lambda l: 'azure\n')
    assert p.read_text().splitlines() == ['red','azure','yellow']

def test_replace_lines_no_matches(tmp_path):
    p2 = tmp_path / 'colors2.txt'
    p2.write_text('red\ngreen\n')
    n2 = replace_lines_in_file(str(p2), lambda l: 'blue' in l, lambda l: 'azure\n')
    assert n2 == 0

def test_replace_lines_file_content_unchanged(tmp_path):
    p2 = tmp_path / 'colors2.txt'
    p2.write_text('red\ngreen\n')
    replace_lines_in_file(str(p2), lambda l: 'blue' in l, lambda l: 'azure\n')
    assert p2.read_text().splitlines() == ['red','green']

def test_replace_lines_multiple(tmp_path):
    p3 = tmp_path / 'colors3.txt'
    p3.write_text('blue\nblue\n')
    n3 = replace_lines_in_file(str(p3), lambda l: l.strip()=='blue', lambda l: 'azure\n')
    assert n3 == 2

def test_replace_lines_file_content_after_multiple(tmp_path):
    p3 = tmp_path / 'colors3.txt'
    p3.write_text('blue\nblue\n')
    replace_lines_in_file(str(p3), lambda l: l.strip()=='blue', lambda l: 'azure\n')
    assert p3.read_text().splitlines() == ['azure','azure']
```

---
Notes:
For safe in-place editing use `tempfile` and `os.replace`. For word-based matching elsewhere in the quiz consider regex word boundaries (e.g. `\bword\b`) or splitting tokens on non-word characters depending on course level.