# Working with external files
So far, all the data and code that we have been using were contained in the same script we were writing. But that will be almost never the case. We will be needing at least to access external data (musical scores, audio recordings, annotations, etc.), stored in different files. And also, we will probably want to use someone else's code in our own code. So let's see how this can be done.

## Accessing external text files
By default, Python comes with functions for accessing text files. To work of other types of files, we will need to use extra code that it is alreay made and available online, but not "pre-installed" in Python. In this notebook we will just look at these default functions in Python.

We are going to be working with the file `WhatAWonderfulWorld-lyrics.txt`, contained in the `files` folder that you downloaed together with this notebook. Before start working with it, open it and give it a look. You can open it directly in Jupyter, just by double-clicking on it in the Jupyter Dashboard (the Dashboard is the page that is opened in your browser when you run the `jupyter-notebook` command in Conda, with the URL `http://localhost:8888/`). There is a common saying in data driven research: Before start working with them, "give a look to your data."

⇒ **Note**: `WhatAWonderfulWorld-lyrics.txt` is a **plain text** file. If you open it with an advanced text editor, it might force some formatting to it. So pay attention when closing the file not to save it as an enriched text file. Python cannot deal with those. Make sure that you don't modify the file format. If you want to be extra sure, open a copy of the file. In any case, the safest way is to open the file directly in Jupyter.

As you see, this file contains the lyrics of the classic tune "[What a wonderful world](https://en.wikipedia.org/wiki/What_a_Wonderful_World)." Let's work with it!

The first requirement for accessing a external file is "telling" Python where it is located and how it is called. That is, telling Python the ***path*** to the file and the ***file name***.

The file name is just the name of the file including the extension, in this case: `WhatAWonderfulWorld-lyrics.txt`

The path to the file can be given in two ways:
- **absolute path**: that is the full path from the root drive.
    - In Windows, they start with the letter of the drive unit, followed by colon ( `:` ) and then the rest of the path separated by **forward slashes** `/` (not backslashes). So, they should look something like: `C:/Users/MyUser/Documents/MyFolder/`.
    - In MacOS, they start simply with a **forward slash** generally followed by `Users` (sometimes it can be `Home`). They should look something like: `/Users/MyUser/Documents/My Folder/`

    ⇒ **Note**: Even thought in your computer you might seem the name of system folders in your language (`Benutzer`, `Dokumente`), both Windows and MacOS, internally, name them in English.
    

- **relative path**: is a path relative to the location of the Python script from which the target file will be accessed. For example, let's assume that this Notebook is saved in (using a Windows path)

            C:/Users/MyUser/Documents/CompMethEthno/notebooks/

    If I want to access a file saved in
            
            C:/Users/MyUser/Documents/CompMethEthno/notebooks/files/

    that is, a folder that is saved in the same folder as the notebook, I don't need to give the full path of the file starting from `C:/...`, just the relative position with respect to the script. Therefore:    

    - If the relative path starts in in the same folder where the script (in our case, our notebook) is saved, we start with `./`. This single period means "the folder where I am now". Or looked in a different way, you can understand that `./` replaces the part of the path that is common for the current location of the script and the file we want to access. So, for the previous example, `./` is the same as `C:/Users/MyUser/Documents/CompMethEthno/notebooks/`. Therefore, the relative path we need to give python would be
    
            ./files
            
        And if it were the case, we could give consecutive subfolders, like `./files/examples/lyrics/`.
        
    - If it is required to go through upper folders to get to our target, the double period `../` can be used for very step up. For example, imagine that our script is saved in
    
            C:/Users/MyUser/Documents/CompMethEthno/notebooks/
           
     The file we want to acces is saved in
     
            C:/Users/MyUser/Documents/files/
     
     That means that we need to go two levels up until reaching the folder that is shared by the two paths (`C:/Users/MyUser/Documents/`, and then, from there, enter `files/`. So, the relative path would be
     
             ../../files

Absolute paths are less preferable, since, when you share code, you wouldn't know which OS the new user will be using. It is a good practice you share your data together with the code, so that you can controll the paths to all the required data. If that is not possible, you should make the user know that s/he has to worry about giving correct paths.

So now, let's work with our `WhatAWonderfulWorld-lyrics.txt` file.

In [1]:
# If you downloaded the files folder together with the notebook, the relative path to WhatAWonderfulWorld-lyrics.txt is:
path = "./files/"  # We save the path in a variable, in case we want to retrieve different files from the same folder

# The name of the file
fileName = "WhatAWonderfulWorld-lyrics.txt"

# Now we join the path and the file name in the variable fn
fn = path + fileName

# Just to check that is correct, print it out
print(fn)

./files/WhatAWonderfulWorld-lyrics.txt


Now that we have the path and filename joined in the varible `fn` ready to use, let's open the file. The most common way of doing that is as follows:

In [2]:
with open(fn, 'r') as f:
    lyrics = f.readlines()

The `open()` function takes two parameters: first, the file name together with the path, and then a letter indicating if we want to open the file just to read it (`'r'`), or for writing it from scratch (`'w'`), or to append new lines (`'a'`). In this case, we just want to read it, therefore, `'r'`.

What `open()` does is creating a handle to the file. Using the `with` statement together with `as`, this handle is saved in the variable `f`. In the next line, we use the `.readlines()` method to read all the lines from the file and save them in the variable `lyrics`.

You don't really need to understand this process. If you ever want to open a file to save its content in a variable, just use the code in the cell above, give a correct `path` + `file name` as `fn`, and save the data in the variable you want (in this case `lyrics`, but you can use the name you want).

Now, let's see exactly what we have in the variable `lyrics`, just by calling it

In [3]:
lyrics

['I see trees of green,\n',
 'red roses too,\n',
 'I see them bloom,\n',
 'for me and you.\n',
 'And I think to myself:\n',
 'what a wonderful world!\n',
 '\n',
 'I see skies of blue\n',
 'and clouds of white,\n',
 'the bright blessed day,\n',
 'the dark sacred night.\n',
 'And I think to myself:\n',
 'what a wonderful world!\n',
 '\n',
 'The colors of the rainbow,\n',
 'so pretty in the sky,\n',
 'are also on the faces\n',
 'of people going by.\n',
 'I see friends shaking hands,\n',
 'saying: "How do you do?"\n',
 "They're really saying:\n",
 '"I love you!"\n',
 '\n',
 'I hear babies crying,\n',
 'I watch them grow,\n',
 "they'll learn much more\n",
 "than I'll never know.\n",
 'And I think to myself:\n',
 'what a wonderful world!\n',
 'Yes, I think to myself:\n',
 'what a wonderful world!']

As you see, the content of the file has been saved as a `list`. This can be known from the square brackets (`[  ]`) enclosing all the lines. Then, each line in the original file is retrieved as an independent `string` in this list.

Notice that each string in the list, which, remember, correspond to a line in the original file, ends with a `\n`. This is the escape sequence for new lines (escape sequences were introduced in Notebook 02). When we write a text file, every time we press `Enter` to finish a paragraph, a `\n` is automatically created (and hidden to us). You can see that there are strings that consist only of `'\n'`. These are the empty lines that we add for better readability in our text files.

So, as with any list, we can access its content with indexing and slicing:

In [4]:
# First occurrence of "What a wonderful world!"
print(lyrics[5])

# First verse
print(lyrics[:6]) # Remember that the slice of a list is another list
print()

# Last verse
print(lyrics[-8:])

what a wonderful world!

['I see trees of green,\n', 'red roses too,\n', 'I see them bloom,\n', 'for me and you.\n', 'And I think to myself:\n', 'what a wonderful world!\n']

['I hear babies crying,\n', 'I watch them grow,\n', "they'll learn much more\n", "than I'll never know.\n", 'And I think to myself:\n', 'what a wonderful world!\n', 'Yes, I think to myself:\n', 'what a wonderful world!']


Great! So now we are ready to start working with these data.

## Importing modules
I wrote several functions for analysing lyrics. Since I want to use these functions in several projects, I saved them all in a single Python file (that is, a file with the extension `.py`). It is called `lyricsAnalysis.py`, and you downloaded it together with the notebook.

As always, the first thing to do is open it and "give a look to your data." (Well, in this case it's not data, but code, but stil...) You can open it directly in Jupyter by doble clicking on it in the Jupyter Dashboard. Or you can also open it with a text editor, but remember, save it in the same `.py` format.

You will see that the `lyricsAnalysis.py` file contains, first, three variables, that will be used later in the functions, but most importantly, four functions for performing different analysis. All of them are well documented, both with docstrings and comments. So give them a look and see if you understand what they do and how they can be used. (If you see that any of those functions use a method that is new for you, in the last notebook you learnt how to get information about it.)

This file is different from the previous text file. The `WhatAWonderfulWorld-lyrics.txt` file contains data that we want to work on. However, the `lyricsAnalysis.py` file contains code that we want to use in our script to work on the data retrieved from `WhatAWonderfulWorld-lyrics.txt`. If we use the `open()` function, Python will consider the `lyricsAnalysis.py` file as a standard text file, and we will get a list of strings, one string per line. That is not what we want. We want to have the functions ready to use. Therefore, we need to use a different way.

These files containing code that we can import to our own code are called **modules**. And the way to import them is by using an `import` statement.

⇒ **Note**: the great majority of, let's say, "professional" modules have to be installed. In fact, music21, the software that we will be using for analysing music scores, is such a module (or better, group of modules). However, for single files like our `lyricsAnalysis.py` file, the only requirement for them to be imported is that **they have to be in the same folder as our code**. That is, `import` does not allow paths. So, make sure that `lyricsAnalysis.py` is saved in the same folder as this notebook.

In [5]:
# The extension .py has to be omitted

import lyricsAnalysis

So now, all the functions contained in `lyricsAnalysis.py` are ready to be used in our code!

To call functions from an imported module, we need to give the name of the module first, followed by period ( `.` ) and then the name of the function.

Let's start by using the function `countWords()` contained in the `lyricsAnalysis.py` file. And let's start by reading its documentation:

In [None]:
lyricsAnalysis.countWords?

So, now that we know how to use it, let's count how many words are contained in the lyrics of "What a wonderful world:"

In [None]:
# Since the countWords() function counts the words of single lines, we need to keep adding the number of words per line.
# So, first, create a word counter
word_counter = 0

# Iterate over the lines in the lyrics
for line in lyrics:
    # Call the countWords function for each line
    line_words = lyricsAnalysis.countWords(line)
    # Update the word counter the number of words per line
    word_counter += line_words

# Print the results
print("These lyrics have {} words.".format(word_counter))

Excellent!

So, everytime we want to use a function from `lyricsAnalysis.py` we need to write the name of the module. And this can be a little bit tiring. Luckily, `import` allows us to give a "nickname" to our imported modules, in the following way:

In [None]:
import lyricsAnalysis as lA

In the previous cell we re-imported `lyricsAnalysis.py` and assigned to it the nickname `lA`. Now, when we want to use its functions, we just need to call `lA.` and the name of the function.

So, let's do that. Our file `lyricsAnalysis.py` contains functions for counting vowels and consonants. Let's read their documentation first:

In [None]:
lA.countVowels?

In [None]:
lA.countConsonants?

Now we are ready to use them:

In [None]:
# These functions count vowels and consonants per line. So, create first counters for vowels and consonants
vowel_counter = 0
consonant_counter = 0

# Iterate over the lines in lyrics
for line in lyrics:
    # Count vowels
    line_vowels = lA.countVowels(line)
    # Count consonants
    line_consonants = lA.countConsonants(line)
    # Update the vowels counter
    vowel_counter += line_vowels
    # Update the consonants counter
    consonant_counter += line_consonants

# Print the results
print("These lyrics have {} vowels and {} consonants.".format(vowel_counter, consonant_counter))

Let's compute some statistics, why not?

In [None]:
# You might already know what to do...
total = vowel_counter + consonant_counter

vowels_percentage = vowel_counter / total * 100
consonant_percentage = consonant_counter / total * 100

print("These lyrics have a {}% of vowels and a {}% of consonants.".format(round(vowels_percentage, 1), round(consonant_percentage, 1)))

Finally, `import` allows importing only pieces of big modules:

```Python
from module import thisPart
```

Our `lyricsAnalysis.py` file is way too tiny to be used with this functionality. However, we can use a version of this functionality. If we substitute `thisPart` for an asterisk `*`, it means that we are importing everything from that module. So, what is the difference between `import module` and `form module import *`? With this second option, we won't need to call the module's name everytime we call a function in that module. So, in our case, if we run

```Python
from lyricsAnalysis import *
```

when we want to use the, for example, the `countWords()` function, we don't need to call `lyricsAnalysis.countWords()`, but just directly `countWords()`

⇒ **Note**: I explain this because this is the way how we are going to be importing music21, according to its developer's recommendation. However, this way of importing is not advisable, since it can interfere with Python built-in functions, or functions from other modules or our own code.

To try this, let's use the fourth function in `lyricsAnalysis.py`, the one called `revowelizer()`. Let's import `lyricsAnalysis.py` using this last method, and then read the documentation of this function.

In [None]:
from lyricsAnalysis import *

revowelizer?

Now that we know what this function does and how to use it, let's put it at work!

In [None]:
# Let's revowelize the first verse (lyrics[:6]), using the default target vowel, a

for line in lyrics[:6]:
    print(lA.revowelizer(line))

In [None]:
# Let's try the vowel o now, with the second verse
for line in lyrics[7:13]:
    print(lA.revowelizer(line, vowel="oO"))

print()

# And, why not ü? Let's use the third verse for that
for line in lyrics[14:22]:
    print(lA.revowelizer(line, vowel="üÜ"))

## Saving output in text files
I liked the output of the last cell, and I'd like to save it in a text file, to share it with someone else, or just to reuse it somewhere else. Python allows creating text files, also using the `open()` function. If when we call it we give as second parameter either `'w'`, for writing, or `'a'` for appending new content, if the file given as first parameter doesn't exist, it will be created. The output of the `open()` function is a handle to the newly created file. To add content to it, the method `.write()` can be used. Finally, we have to close the handle to the new file, by running the method `.close()`.

Maybe, the better way to understand all this is just seeing it in action. We are going to run the `revowelizer()` function on all the lines of the last verse of the lyrics, to change all their vowels to "i" (or "I"), and then save the output in a text file:

In [None]:
# First, create a new file to save the output
# This is the name of the new file
newFile = 'revowelizedArmstrong.txt'

# Use the same path defined at the beginning, ./files, so the new file will be created there
new_fn = path + newFile

# Second, open the new file.
# Since the function revowelizer() will be called line by line, the returned output will be appended every time to the file.
# Therefore, we open the file for appending ('a') new content.
f = open(new_fn, 'a')

# Third, run the code
# Iterate over the lines of the last verse (lyrics[23:])
for line in lyrics[23:]:
    # Run the revowelizer() function in the current line and save it in the variable new_line
    new_line = lA.revowelizer(line, vowel="iI")
    # Print the new line, just for us to know what the code is doing
    print(new_line)
    # The revowelizer() function removes the escape sequence for new lines ("\n")
    # If we append the output without "\n", everything will be appended to a single line
    # Therefore, we add "\n" to each line
    new_line += '\n'
    # Now we append the new line to the new file, using the .write() method
    f.write(new_line)

# Once we are finished, we need to close the handle to the new file.
f.close()

If you go to the `files` folder, a new `revowelizedArmstrong.txt` should have been created there with the output of our code.

⇒ **Note**: When we use the function `open()` at the beginning of this notebook, we did it within a `with` statement. The advantage of this is that, once the content of the file is retrieved, the handle is closed automatically without needing to run the `.close()` method. The disadvantage is that we cannot keep adding content, as we just did.

You can use the code of the previous cell as many times you want to try out different vowels or verses and save the output in text files. However, if you don't change the name of the new file, since you are openning it for appending lines (`open(new_fn, 'a')`) you will add the new output to the content alreay saved in that file. If you want to avoid that, you can either change the name of the new file, or delete the file previously created.

We will not be using this functionality of writing new files much. I just introduced it here for you to know that it exists. So if you didn't fully get it, don't worry! It is not important for our course

## Working with tables (`csv` files)
Text files generally contain what is known as *unstructured* (or also *raw*) data. However, for our computational work, most of the time we will need *structured* data. One of the simplest, and most common ways of structuring data are **tables**. A table consists in a collection of data items, arranged in rows, for which we have different types of information, arranged in columns. But the best way of understanding this is just having a look to an example.

In the `files` folder that you downloaded with this notebook, you will also find a `top20movies-RottenTomatoes.csv` file. This file contains information about the first 20 of the ["Top 100 Movies of All Time"](https://www.rottentomatoes.com/top/bestofrt/) according to the website Rotten Tomatoes.

Let's first consider the extension, `.csv`. `csv` is the acronym of "comma separated values," and it is the simplest type of tables. As always, before start working with them, "give a look to your data." First, open this file directly in Jupyter by double-clicking on it in the Jupyter Dashboard (or in a text editor, but taking the aforementioned precautions). You'll probably find all the information confusing. But just read the first line. That is the *heading*, and it describes what type of information is stored in each column: `Title`, `Rating`, `Director`, `Year`, etc. These information types are separated by commas, meaning that each of them is in a different column. Then, if you look at the following lines, you will see that each line is a movie, that is, a data item, and for each of them we have the types of information described in the heading, that is, the title of the movie, the rating (according to the users of Rotten Tomatoes), the director of the movie, etc. And in each line, these pieces of information are also separated by commas.

There is a more understandable way of looking at this data, which is opening the `top20movies-RottenTomatoes.csv` file with your spreadsheet software, such as Microsoft Excel, LibreOffice Calc, o Google Sheets. (I uploaded it to my Google Drive, so hopefully you can access it through Google Sheets with [this link](https://docs.google.com/spreadsheets/d/1zYA0puYwwMEPIK__lkQAPUaJpHWwdWfKgM82FO38U5c/edit?usp=sharing)). It might happen that you get a pop-up window asking for some details about how to handle the file. If so, make sure that you select commas as separators. So, if you are looking to this file in a spreadsheet software, hopefully the information is clear! When you close the file, make sure you maintain the `.csv` format and didn't do any format changes.

Now that we understand our data, let's start working with them! There are quite sophisticated ways of loading `.csv` files to Python, with different modules. However, in this first approach, we will use the same `open()` function.

In [None]:
# The name of the file
fileName = 'top20movies-RottenTomatoes.csv'

# This file is in the folder .files/, so we use the same path variable
fn = path + fileName

# Open the file and retrieve all the lines
with open(fn, 'r') as f:
    data = f.readlines()

Using the `open()` function, the `.csv` file is considered as a standard text file. So let's look what was retrived:

In [None]:
data

As you can see, as with the case of `WhatAWonderfulWorld-lyrics.txt`, the data are retrieved as a list of strings (notice that all of them also end with `"\n"`). However, now each string correspond to a row of the table. And each of them contains all the information separated by commas. The data are structured.

So, let's have a closer look. We will skip the first row with the heading (`data[0]`), and we'll have a look to the first data item, in the second row (`data[1]`).

In [None]:
movie_01 = data[1]

print(movie_01)

Luckily, we now know a string method that segments a given string according to the separator we decide: `.split()`. So, let's separate this first row using comma `,` as separator:

In [None]:
# First, use the .rstrip() method to remove the '\n' at the end
movie_01_clean = movie_01.rstrip()

# Use .split(',') to segment the string according to commas
movie_01_info = movie_01.rstrip().split(',')

print(movie_01_info)

Now, we have a list with all the information for that movie, and we can access then using indexing

In [None]:
for i in range(len(movie_01_info)):
    print("Index {}: {}".format(i, movie_01_info[i]))

Great! So now we know how to access the information we need for each movie using the `.split()` method and indexing. We just need to remember what information type can be retrieved for each index. Let's make a note by looping over heading (first row, `data[0]`):

In [None]:
# Same logic as the previous cells

heading = data[0]

heading_clean = heading.rstrip()
heading_info = heading_clean.split(',')

for i in range(len(heading_info)):
    print("{} : {}".format(i, heading_info[i]))

We are ready for our data analysis. The rest of this notebook is just some questions about our data that we can answer with the amount of programming you already now.

**Q1**: Most of these top 20 movies are from the USA. Are there movies from other countries? From which ones?

In [None]:
# Create an open string to store countries different from USA
not_USA = []

# Iterate over all movies, ignoring the first row (data[0]), which contains the heading. So: data[1:]
for movie in data[1:]:
    # Remove '\n' at the end
    movie_clean = movie.rstrip()
    # Segment the string according to commas
    movie_info = movie_clean.split(',')
    # Retrieve the country: index 4
    movie_country = movie_info[4]
    # Check if the country is different to "USA", and also if it is not already in the not_USA list, to avoid repetitions
    if movie_country != "USA" and movie_country not in not_USA:
        # Append to the not_USA list
        not_USA.append(movie_country)

# Print results
print("Besides from USA, the top 20 movies by Rotten Tomatoes include movies from:")
for country in not_USA:
    print("\t" + country)

**Q2**: Most of the films are from recent years. Which movies reached the top 20 from before the 2000s?

In [None]:
# Empty list to store years
pre2000 = []

# Iterate over all movies, excluding heading
for movie in data[1:]:
    # Concatenate the methods .rstrip().split(',') to remove '\n' and segment the string at once
    movie_info = movie.rstrip().split(',')
    # Retrieve the year: index 3. Convert the year into an integer (originally is a string)
    movie_year = int(movie_info[3])
    # Check if the year is less than 2000
    if movie_year < 2000:
        # Append to the pre2000 list
        pre2000.append(movie_year)

print("The top 20 movies by Rotten Tomatoes include {} filmed before 2000:".format(len(pre2000)))
# I use the function sorted() to order the list of integers from the lower to the higher
for year in sorted(pre2000):
    print("\t" + str(year))

**Q3**: To which genres belong these top 20 movies?

⇒ **Note**: I hope that by now you are getting used to some of the commong procedures. So I will start reducing the comments.

In [None]:
genres = []

for movie in data[1:]:
    # Concatenate .rstrip().split(',') and indexing
    # Genre: index 6
    movie_genre = movie.rstrip().split(',')[6]
    if movie_genre not in genres:
        genres.append(movie_genre)
        
print("The top 20 movies by Rotten Tomatoes belong to {} genres:".format(len(genres)))
# When used with a list of strings, the sorted() function orders them alphabetically
for genre in sorted(genres):
    print("\t" + genre)

**Q4**: Which percentage of movies correspond to each genre?

In [None]:
# Counters
action_movies = 0
animation_movies = 0
comedy_movies = 0
drama_movies = 0
horror_movies = 0
musical_movies = 0

for movie in data[1:]:
    # Concatenate .rstrip().split(',') and indexing
    movie_genre = movie.rstrip().split(',')[6]
    if movie_genre == "Action":
        action_movies += 1
    elif movie_genre == "Animation":
        animation_movies += 1
    elif movie_genre == "Comedy":
        comedy_movies += 1
    elif movie_genre == "Drama":
        drama_movies += 1
    elif movie_genre == "Horror":
        horror_movies += 1
    elif movie_genre == "Musical":
        musical_movies += 1

print("The distribution of the top 20 movies by Rotten Tomatoes by genre is as follows:")
print("- {}% of action movies".format(action_movies / 20 * 100))
print("- {}% of animation movies".format(animation_movies / 20 * 100))
print("- {}% of comedy movies".format(comedy_movies / 20 * 100))
print("- {}% of drama movies".format(drama_movies / 20 * 100))
print("- {}% of horror movies".format(horror_movies / 20 * 100))
print("- {}% of musical movies".format(musical_movies / 20 * 100))

**Q5**: What is the mean duration of these movies? Which movie is the longest one? And the shortest?

In [None]:
longest_movie_duration = 0
# To look for the longest movie, I will compare the duration of each movie to this variable
# If it is longer than this variable, I will update the variable value to the duration of the movie
# Therefore, I need to start with a very low value

# I want to have information about the longest movie, so I create variables to store them
# Since when a variable is defined a value has to be assigned to it, I create them as empty strings ("")
longest_movie_title = ""
longest_movie_director = ""
longest_movie_year = ""

# The same for the shortest movie
shortest_movie_duration = 1000000
# Now I will look for movies with shorter duration that this variable, so I start with a very high value

# Information about the shortest movie
shortest_movie_title = ""
shortest_movie_director = ""
shortest_movie_year = ""

# To compute the mean duration, I need to sum all the durations, and divide it later by the number of movies
total_duration = 0

for movie in data[1:]:
    movie_info = movie.rstrip().split(',')
    # Retrieve information about the movie
    movie_duration = int(movie_info[5])   # Convert the duration to integer (originally a string)
    movie_title = movie_info[0]
    movie_director = movie_info[2]
    movie_year = movie_info[3]
    # Check if the duration of this movie is longer than the duration stored in longest_movie_duration
    if movie_duration > longest_movie_duration:
        # If so, this duration is the new value for the longest_movie_duration
        longest_movie_duration = movie_duration
        # And therefore, save the related information of this movie
        longest_movie_title = movie_title
        longest_movie_director = movie_director
        longest_movie_year = movie_year
    # Check if the duration of this movie is shorter than the duration stored in shortest_movie_duration
    if movie_duration < shortest_movie_duration:
        # If so, this duration is the new value for the shortest_movie_duration
        shortest_movie_duration = movie_duration
        # And therfore, save the related information of this movie
        shortest_movie_title = movie_title
        shortest_movie_director = movie_director
        shortest_movie_year = movie_year
    
    # In all cases, sum the duration of the current movie to the total duration
    total_duration += movie_duration
    
# Calculate the mean duration
mean_duration = total_duration / 20

print("The mean duration of the top 20 movies by Rotten Tomatoes is {} minutes.".format(mean_duration))
print('The longest movie is "{}" by {} ({}), with {} minutes.'.format(longest_movie_title, longest_movie_director, longest_movie_year, longest_movie_duration))
print('The shortest movie is "{}" by {} ({}), with {} minutes.'.format(shortest_movie_title, shortest_movie_director, shortest_movie_year, shortest_movie_duration))

So, do you have any other questions about these movies?