Lab 3 - Part 5: Reading from and writing to files
========================================

In this lab you will practice reading and writing to/from files.

You will be given two functions that you will run to complete two different tasks. You are not supposed to change the functions, but you need to import one module from the Python standard library that is used in one of the functions. 

The two functions might use Python functionality that you are not yet familiar with, so it is also an excercise in understanding Python code.

The three tasks are I: Read from files, and II: Write to files, and III Read from file and filter content
There are three code blocks at the end of this notebook, where you can write the code.

Task I: Read from files
------------------------
1) Write code that reads the two text files located in `/kaggle/input/io-lab-text-files`. In the code block below, there is example code which prints out the names of the files in `/kaggle/input/io-lab-text-files`.  Instead of just printing the file names,  you should open and read the files. Use the syntax `with open("file_name.txt", encoding="utf-8") as my_file:` . 

**What Python keywords are you using in your code for opening a file?**

Append the texts that you have read from the files to the list `text_list`. I.e., the list should contain two strings, one with the text from the file `intro.txt` and one with the text in the file `history.txt`.

To check that it works, you can, e.g. print the content of `text_list`.

2) Try to understand as much as possible from the `get_nonsense_text_modifications` function. To be able to run it, you must import a module. Add this import, after `import os` below. 

**What module do you import?**

**What is the expected type for the parameter to the `get_nonsense_text_modifications` function? 

**What does the function return?**

**What type does`" ".join(modified_text_list)` create?**

3) Run the code for task I.

4) The `get_nonsense_text_modifications` function has two return values.  

**How can you see that it returns two values?** 
**What is the syntax used for storing the results from a function returning multiple values?** (Note that you can also see it as a tuple, and treat it as a tuple.)

5) **The second return value is the total number of upper-case letters in all 
        modified strings. How many are there?**
        
6) The first return value is a list containg the modified strings. You need this list for task II.


Task II: Write to files
-------------------

7) Write code that creates a directory where you will store text files. Use the content in the variable `output_dir` for naming the directory. Before creating it, check that a directory with that name  does not already exist. (If it already exists, then don't create it.) Use functionality in [https://docs.python.org/3/library/os.path.html](https://docs.python.org/3/library/os.path.html) and [https://docs.python.org/3/library/os.html](https://docs.python.org/3/library/os.html). **What functionality in `os` and `os.path` are you using?**

8) Manually check that the code works, i.e. check that the directory that you aimed to create exists. (If you have not specified any other location, the default is that the directory will be created under `/kaggle/working`, which you find under`Output` to the right.)

9) Take the list of strings that was returned from the first task, i.e. the list stored in `modified_text_list`. For each string in the list, write it to a separate txt-file in the `output_dir` directory. 
Note: The files must have the suffix ".txt".

As filenames, you can, for instance, use numbers, so that the first filename is written to "my_output_dir/0.txt" and the second to "my_output_dir/1.txt". 

Again, use the `with open...` syntax. **`open` is a built-in function. It has one mandatory argument, and many optional arguments. What is the mandatory arugment? To write to a file, you need to use one of the optional arguments. Which? What are the other possible values you can use for this optional argument?**

The filename that you use as argument to `open` needs to include the directory name as well, i.e. it needs to be "my_output_dir/0.txt" and not only "0.txt". Create the filename-string which includes the directory name by using `os.path.join()`
**What does `os.path.join` do?**

10) Manually check that the two files have been created in the directory. ()

11) With the `get_size_of_text_files_in_folders` function you will find out the total size of the text files in `/kaggle/input/io-lab-text-files` and in your newly created directory. 

Try to understand what happens in `get_size_of_text_files_in_folders`. **What is the function of glob?**

12) **Record the file size for the modified folders**. Which one is largest?

Task III: Read from file and filter content
=======================================
13) Read the content from the file "sustainability_termlist_2022.txt", line-by-line, and append all rows that start with *enTE* to the list`list_with_enTE`. Before adding the content of the row, remove the staring "enTE " and the newline at the end. You will then have a list of english terms.

14) There are at least two ways for reading from a file object, line by line. You could either use a for loop or a method. 

**What is the name of the method?**

In [27]:
import os
import glob
import string

#####
# Two functions to use in the lab
##### 

# NOTE: To be able to run this function, you need to import one module from the
# Python standard library (it's part of the exercise to see which module)
def get_nonsense_text_modifications(text_list):
    
    # Example of a doc-string
    """Makes a nonsense-modification of a list of strings, 
    and counts the total number of upper-case letters in all 
    modified strings in the list

    Parameters
    ----------
    text_list : list
        A list of strings

    Returns
    -------
    list
        a list of modified strings 
    int
        the number of upper-case letters in all 
        modified strings
    """
    
    # Example of 'assert'
    assert isinstance(text_list, list), "The argument to the function must be a list"
    
    # Example of list comprehension
    modified_text_list = ["nonsense-modified-text: \n" + string.capwords(txt) for txt in text_list]
    
    # Example of how to use 'join'
    nr_of_upper = 0
    for ch in " ".join(modified_text_list):
        if ch.isupper():
            nr_of_upper += 1 # Example of augmented assignment
    
    # Example of how to return multiple values
    return modified_text_list, nr_of_upper
    
# NOTE: To be able to run this function, you need to import the 'os' module, 
def get_size_of_text_files_in_folders(folders):
    
    # Example of a doc-string
    """Takes a list of folder paths. For each folder: take all the '.txt'-files 
    in the folder and compute the total size of these in bytes. 
    Return a list of sizes corresponding to the list of folders given as a parameter

    Parameters
    ----------
    text_list : list
        A list strings, each string being a path to a folder

    Returns
    -------
    list
        A list of ints, containing the total size of the .txt files in the folder  
    """
    
    # Example of 'assert'
    assert isinstance(folders, list), "The argument to the function must be a list"
    
    sizes = []
    for folder in folders:
        text_files = glob.glob(os.path.join(folder, "*.txt"))
        
        # Example of using sum of list and list comprehension
        sizes.append(sum([os.stat(f).st_size for f in text_files]))
    return sizes

#########
#########

In [28]:
# Task I: Read from files

text_list = []
        
# YOUR CODE FOR READING FROM FILE
# - read the text content of each file in '/kaggle/input/io-lab-text-files' 
# and add it as an element to text_list.

for filename in glob.glob("/kaggle/input/io-lab-text-files/*.txt"):
    print(filename)
    with open(filename) as f:
        text_list.append(''.join(f.readlines()))

# Do nonsense modifications
modified_text_list, nr_of_upper = get_nonsense_text_modifications(text_list) 
print("Nr of upper case letters after nonsense modification:", nr_of_upper)



/kaggle/input/io-lab-text-files/intro.txt
/kaggle/input/io-lab-text-files/history.txt
Nr of upper case letters after nonsense modification: 467


In [29]:
# Task II: Write to files in output_dir

output_dir = 'my_output_dir'
# YOUR CODE FOR CREATING A DIRECTORY AND WRITING TO FILES
# - create a directory with the name stored in the variable output_dir
# - for each text-string stored in the variable modified_text_list: 
#     write it to a ".txt"-file in output_dir
if not os.path.exists(output_dir):
    os.makedirs(output_dir)
for i,text in enumerate(modified_text_list):
    with open(os.path.join(output_dir, str(i)+'.txt'), mode='w') as of:
        of.write(text)

folders_to_check = ['/kaggle/input/io-lab-text-files', output_dir]
byte_size_list = get_size_of_text_files_in_folders(folders_to_check)
for folder_name, byte_size in zip(folders_to_check, byte_size_list):
    print("Total size stored in ", folder_name, ":", byte_size)

Total size stored in  /kaggle/input/io-lab-text-files : 3549
Total size stored in  my_output_dir : 3582


In [30]:
# Task III: Read from file and filter content
file_name = '/kaggle/input/sustainability-term-list-2022/sustainability_termlist_2022.txt'

list_with_enTE = []

# Open file for reading, and read it line by line.
# If a line starts with 'enTE', add to the list 'list_with_enTE'
# i.e. create a list containg all the English terms. 
# (Remove enTE and newline before adding the term to the list)
with open(file_name) as f:
    for line in f:
        if line.startswith('enTE'):
            list_with_enTE.append(line[4:].strip())

list_with_enTE            

['biological diversity',
 'ecosystem service',
 'resilience',
 'tipping point',
 'weather attribution',
 'tipping element',
 'feedback mechanism',
 'carbon neutrality',
 'decarbonization',
 'sustainable development']

Extra
------
Instead of using functionality in `os.path` to check that the directory you plan to create doesn't exist, solve it with the help of exceptions.