Original text is unavailable. Homework was provided in a web-based PDF viewer without a save button. PDF file format doesn't necessarily render spaces as spaces, sometimes coordinates provide positioning instead of X number of spaces:
```
<div style="left: 118.04px; top: 317.233px; font-size: 18.4px; font-family: sans-serif; transform: scaleX(0.893967);">Python is a high</div>
<div style="left: 235.6333333333333px; top: 317.2333333333333px; font-size: 18.4px; font-family: sans-serif;">-</div>
<div style="left: 241.233px; top: 317.233px; font-size: 18.4px; font-family: sans-serif; transform: scaleX(0.905979);">level, general</div>
...
<div style="left: 118.04px; top: 429.283px; font-size: 18.4px; font-family: sans-serif; transform: scaleX(0.907185);">one of the       most popular programming languages.</div>
<div style="left: 516.717px; top: 429.283px; font-size: 18.4px; font-family: sans-serif; transform: scaleX(0.900499);">The developers of </div>
...
```
When copy-pasting this text from the browser, several things can happen. Firefox inserts line brakes at the end of DIVs - DIVs are by default block HTML elements - so this is probably the correct behaviour.
Brave on the other hand ignores these line breaks, and copies the text without any to the clipboard.
Firefox variants use the Gecko engine, while Brave is a Chromium/Webkit/KHTML variant, so probably they behave similarly within an engine family, no need to check Chromium, Chrome, etc. as well. All currently up-to-date browsers use either of these engines.
Let's try to sanitize this:

In [1]:
# 1. Assign this text to a variable
firefox_text = """Python is a high
-
level, general
-
purpose     programming language. Its design philosophy emphasizes 
code readability. Guido van Rossum, a Dutch       computer 
programmer, began working on    Python     
in the late 1980s as a successor to the ABC programming language, and first released      it as Python 
0.9.0 in 1991. Python 2.0 was released in 2000, and Python 3.0, released in 2008, was a major 
revision not com
pletely    backward
-
compatible     with earlier versions. Python consistently ranks as 
one of the       most popular programming languages.
The developers of 
Python     aim for it to be fun 
to use. This is reflected in its name, a tribute to the      Briti
sh comedy group Monty Python, and in 
occasionally playful approaches to tutorials       and reference materials.
"""
# this time triple quotes/multiline strings are the most convenient

In [2]:
brave_text = "Python is a high-level, general-purpose     programming language. Its design philosophy emphasizes code readability. Guido van Rossum, a Dutch       computer programmer, began working on    Python     in the late 1980s as a successor to the ABC programming language, and first released      it as Python 0.9.0 in 1991. Python 2.0 was released in 2000, and Python 3.0, released in 2008, was a major revision not completely    backward-compatible     with earlier versions. Python consistently ranks as one of the       most popular programming languages.The developers of Python     aim for it to be fun to use. This is reflected in its name, a tribute to the      British comedy group Monty Python, and in occasionally playful approaches to tutorials       and reference materials."

In [3]:
# 2. Break the text into multiple program lines in the cell, so that we don’t need to scroll the cell.
# brave_text above is indeed ugly, let's correct that here:
brave_text = "Python is a high-level, general-purpose     programming language. \
Its design philosophy emphasizes code readability. Guido van Rossum, a Dutch       computer programmer, began \
working on    Python     in the late 1980s as a successor to the ABC programming language, and first \
released      it as Python 0.9.0 in 1991. Python 2.0 was released in 2000, and Python 3.0, released \
in 2008, was a major revision not completely    backward-compatible     with earlier versions. Python \
consistently ranks as one of the       most popular programming languages.The developers of Python     \
aim for it to be fun to use. This is reflected in its name, a tribute to the      British comedy group \
Monty Python, and in occasionally playful approaches to tutorials       and reference materials."

In [4]:
# Before continuing the assignment, let's clean up the Firefox variant as well (remove white space), and then check if they match:
firefox_text = firefox_text.replace('\n', '')
assert firefox_text == brave_text

In [5]:
# Nice, they seem to match.
# 3. Remove all the unnecessary spaces from the text (full trim).

# Let's write a simple function for removing spaces:
def true_trim(s: str) -> str:
    return ' '.join(s.split())


# We can use it to clean up both strings (they should lead to the same result, they have the same content)
firefox_text = true_trim(firefox_text)
brave_text = true_trim(brave_text)
assert firefox_text == brave_text

In [6]:
# Nice! Now choose one, and call this text. From now on, one of these is enough. (deleting the variables would not be important, it's a small text)
text = firefox_text
# 4. Replace all “Python” words to upper case.
word_to_replace = "Python"
text = text.replace(word_to_replace, word_to_replace.upper())
# 5. Within version numbers, change points (.) to underscores (_).
# This is a bit trickier, because '.' can be a full stop at the end of a sentence, or part of a version number. A simple replace won't be enough.
# The correct solution would probably be re.sub, regular expressions could handle this easily. If there is a decimal number after the '.', it's part of a
# version number. So \.(\d) could be replaced by _\1. Now do this without regexp, loops could handle this situation.
text_backup = text
idx = 0
while (idx := text.find('.', idx)) != -1:
    if idx + 1 < len(text) and text[idx + 1].isdigit():
        text = text[0: idx] + '_' + text[idx + 1:]  # strings are immutable, so either do it this way, or convert the str to a list, then convert it back
    idx += 2  # skip both the . and the digit

# just for the record:
import re
text_regexp = re.sub(r'\.(\d)', r'_\1', text_backup)  # IMHO a lot nicer
assert text_regexp == text
# 6. Insert line breaks after the end of each sentence.
# Let's use re this time. The end of a sentence can be '.' or '?' or '!' with an unknown (after cleanup 0 or 1) number of spaces following it
# e.g. "languages.The developers" has no space, the PDF "inserted" it by positioning the sentence slightly to the right.
text = re.sub(r'([.?!])[ ]*', r'\1\n' , text)
#7. Write out the cleaned text by print().
print(text)

PYTHON is a high-level, general-purpose programming language.
Its design philosophy emphasizes code readability.
Guido van Rossum, a Dutch computer programmer, began working on PYTHON in the late 1980s as a successor to the ABC programming language, and first released it as PYTHON 0_9_0 in 1991.
PYTHON 2_0 was released in 2000, and PYTHON 3_0, released in 2008, was a major revision not completely backward-compatible with earlier versions.
PYTHON consistently ranks as one of the most popular programming languages.
The developers of PYTHON aim for it to be fun to use.
This is reflected in its name, a tribute to the British comedy group Monty PYTHON, and in occasionally playful approaches to tutorials and reference materials.



In [11]:
# 8. Find the position of the very first release of Python in the text, then use the slicing 
# to write out only that part of the text which contains the name of the first release version 
# and the year.

# Finding the position could be done in several ways.
# * By hand. This is bad, especially if we need to clean up similar texts by program.
# * By using loops and the find method like above. Unfortunately we don't know what version to look for directly with str.find, so that's not enough.
# * By using regular expressions. Versions tend to have 2 or 3 numbers separated by a '.' (now replaced by '_').
# The release year is 4 digits. Python didn't run on the Antikythera mechanism.

if m := re.search(r'\d+_\d+.*\d{4}', text):
    start_index = m.start()
    end_index = m.end()
    print(text[start_index:end_index])  # same as m.group(), but the assignment asked for the slicing operator
else:
    print("Not found")

0_9_0 in 1991


In [8]:
# 9. Print a final message to the user. 

print("KTHXBYE")

KTHXBYE
