## Reading and writing files

First we'll download a text file from the web with a library/ module called `requests`. [Documentation](https://docs.python-requests.org/en/latest/).

In [1]:
import requests

If it's not installed, install it in your activated conda environment with:
```shell
conda install requests
```
<br>

In [2]:
# Request web source.
url = 'https://loremipsum.de/downloads/original.txt'
r = requests.get(url)

# Store text content in a variable called txt.
txt = r.text

In [3]:
# Inspect txt.
print(type(txt))
print(len(txt), 'characters')

<class 'str'>
3971 characters


In [4]:
# Print first 100 characters.
print(txt[:100])

Lorem ipsum dolor sit amet, consetetur sadipscing elitr,  sed diam nonumy eirmod tempor invidunt ut 


### Writing a file to disk

Writing and reading files is done with a **context manager**, called via the keyword `with`.<br>
Syntax:<br>
```python
with open('path_to_file', 'mode') as file_object:
    file_object.write(content)
```
<br>
Modes (copied from <code>help(open)</code>):

Probably you will use `'r'` (read) and `'w'` (write) most of the time.

In [5]:
# Open text file.
with open('lorem.txt', 'w') as f:
    # Write content of txt to file.
    f.write(txt)

Sometimes when writing text files it is necessary to specify the `encoding` parameter of `open()`. (For example if the `write()` function returns an `UnicodeEncodeError`.)

In [6]:
with open('lorem.txt', 'w', encoding='utf-8') as f:
    f.write(txt)

### Reading a file from disk

In [7]:
# Open text file in reading mode.
with open('lorem.txt', 'r') as f:
    # Read content into variable.
    lorem = f.read()

In [8]:
print(lorem[:100])

Lorem ipsum dolor sit amet, consetetur sadipscing elitr,  sed diam nonumy eirmod tempor invidunt ut 


In [9]:
print(type(lorem))

<class 'str'>


Sometimes it is better to store a text line by line in a list. This is possible with `.readlines()`.

In [10]:
with open('lorem.txt', 'r') as f:
    lorem = f.readlines()

In [11]:
# Inspect variable:
print(type(lorem))
print(len(lorem))

<class 'list'>
7


In [12]:
print(lorem[0])

Lorem ipsum dolor sit amet, consetetur sadipscing elitr,  sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr,  sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr,  sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.



### Intermezzo: Removing multiple whitespaces

The text contains multiple whitespaces (between "elitr," and "sed" for example). We can remove multiple whitespaces with the following code:

In [13]:
# Iterate over lines of lorem:
for index, line in enumerate(lorem):
    # Remove multiple whitespaces.
    cleaned_line = ' '.join(line.split())
    # Override the current element in the list lorem:
    lorem[index] = cleaned_line

In [14]:
print(lorem[0])

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.


<br>
How does it work?<br>
<code>.split()</code> without any argument splits at every whitespace (including tabs and newline characters) and removes them from the returned list.<br>
After that, all elements are joined together with a normal whitespace (<code>' '</code>) in between them.

In [15]:
s = 'A string with some regular and  some    irregular  whitespaces.'

In [16]:
s_list = s.split()
print(s_list)

['A', 'string', 'with', 'some', 'regular', 'and', 'some', 'irregular', 'whitespaces.']


In [17]:
s = ' '.join(s_list)
print(s)

A string with some regular and some irregular whitespaces.


Another method is using **regex expressions**. (This is slightly complicated/ cryptic, so we won't deal with that here. Helpful for figuring out expressions: [pythex](https://pythex.org/) or [regex101](https://regex101.com/).)

In [18]:
import re # regex module

s = 'A string with some regular and  some    irregular  whitespaces.'
print(s)

# Substitute whitespaces of any length with single whitespaces.
s = re.sub(' +', ' ', s)

print(s)

A string with some regular and  some    irregular  whitespaces.
A string with some regular and some irregular whitespaces.


#### Inserting whitespaces

In [19]:
sequence = 'Rose is a rose is a rose'
l = len(sequence)

print(sequence.ljust(l+10))
print(sequence.center(l+10))
print(sequence.rjust(l+10))

Rose is a rose is a rose          
     Rose is a rose is a rose     
          Rose is a rose is a rose


### Overriding a file on disk

In [28]:
with open('lorem.txt', 'w') as f:
    f.write(lorem)

<br>
The variable <code>lorem</code> is still of type <code>list</code>, which we can't write to a file. We have to join it to a <code>str</code>, before we can write it to disk.

In [22]:
print(type(lorem))

<class 'list'>


In [23]:
# Join list to one string.
lorem = '\n'.join(lorem) # Insert newline characters between paragraphs.

# If we use 'w' as argument with open on an existing file,
# we will override it.
with open('lorem.txt', 'w') as f:
    f.write(lorem)

### Append to a file

With the argument <code>'a'</code> we can append content to an existing file. If it does not exist yet, it will be created.

In [24]:
# Create a new file and write one line into it.
with open('looped_text.txt', 'w') as f:
    f.write('First line of a new text.\n') # See the \n at the end to move to the next line.

In [25]:
with open('looped_text.txt', 'a') as f:
    for e in s.split():
        f.write(e+'\n')

### Remove a file

In [26]:
# To remove a file we can use the module os:
import os
os.remove('looped_text.txt')

If we execute the code again, it will raise an error, because the file does not exist:

In [27]:
# To remove a file we can use the module os:
import os
os.remove('looped_text.txt')

FileNotFoundError: [Errno 2] No such file or directory: 'looped_text.txt'