# Session 3: Functions, Files, and Built-in Modules
Now that you're all familiar with Python's syntax, data types, and control flow (loops and conditional statements), you have the basic skills to actually start writing little programs.  
## 3.0 Functions
To write a usable program, it's good practice to structure your code into _functions_ that execute certain tasks. Creating separate functions for each well-defined task makes it easy to change the order in which you are doing things, or remove one step without accidentally breaking the program.  
### 3.0.0 Defining a function
To define a function, simply write `def function_name(parameter_1, paremeter_2, parameter_n):` and start the next line with a tab (similar to a loop or conditional statement from last week). The concept of _parameters_ is new, but it's really quite simple: these are variables that you put into the function when you _call_ it so you can do something with them.  
Let's start with a function that adds two numbers together:

In [None]:
def add_nums(num_1, num_2):
    return num_1 + num_2

### 3.0.1 Returning a value
The `return` statement tells the function what value to return when you _call_ it. You don't strictly need to return a value from a function, but it's often useful to at least `return True` to signify a function ran successfully if you don't need any other return value from it.  

We only _defined_ our function, but didn't actually _call_ it. Let's do that now:

In [None]:
a = 3
b = 6
c = add_nums(a, b)

print(c)

We just _passed_ the _arguments_ a and b into our function and stored the return value in c.  
You don't need to do this all separately though, functions can be nested:

In [None]:
print(add_nums(12.01, 4.3))

Notice how we nested our function inside the print function?
We also used _floats_ this time instead of _integers_. This still works because our function secretly uses the `+` operator to perform the addition. This means the function can also add other things, like strings for instance.  
Try adding two strings together using our function and printing the result:

### 3.0.2 Scope
One of the most useful features of Python, and functions in particular, is something called _scope_, this is the principle that variables defined inside a function stay inside the function. An example:

In [None]:
def example_function():
    x = 5
    
print(x)

As you can see, `x` was only defined inside the function. Using it outside of the function's _scope_ doesn't work.  
This also applies to variable names that already exist outside the function:

In [None]:
y = 3

def another_example():
    y = 6
    print(f'the value of y inside the function is {y}')
    
print(f'the value of y outside the function is {y}')
another_example()
print(f'the value of y outside the function is {y}')
another_example()

You can repeat this as many times as you like, but `y` inside the function scope never affects `y` outside the function scope. This is great, because it makes accidental reuse of variable names inside and outside functions unlikely to cause problems. (You should still aim to not reuse variable names too much though, just because it makes the code harder to read.)

### 3.0.3 Arbitrary numbers of parameters
Sometimes, we might want to add more than just two things together. One option would be to write a function that accepts a _list_ as an argument, and then loop over that list, adding all the items together.  
Python has another way of accommodating arbitrary numbers of arguments though: The `*` operator, which means zip these items into a tuple.  
This sounds a little complicated, but is easier to understand when demonstrated in code:

In [None]:
# define a new function that zips all arguments into a tuple named "nums"
def add_multiple(*nums):
    # start our total at 0
    total = 0
    
    # loop over the items in the nums tuple and add them to our total
    for num in nums:
        total += num
        
    # return the total
    return total

# test the function with four arguments
print(add_multiple(3, 4, 1, 40))

You can use the `*` outside of function definitions as well, when you just want to zip things into a tuple for other reasons.  
Within a function definition, you can have other parameters __before__ the `*`, but not __after__ (because everything after that point gets zipped into the tuple).  
### 3.0.4 Default parameter values
Sometimes, you'll want to give a parameter a default value, just in case no argument is passed in when the function is called. This looks just like a normal variable assignment:

In [None]:
def greet(name='stranger'):
    print(f'Hello, {name}!')

greet('Jimmy')
greet()

Notice how, when we pass 'Jimmy' as an argument the function uses that value, but if we pass nothing, the default is used.  
### 3.0.5 Side effects
Another interesting thing about this function is that we didn't get a _return_ value from it that we then use for other things, but instead the function directly prints it's output. This is called a _side effect_.  Printing as a side effect is okay, because it usually just gives us information about what's happening in the function.  
Other side effects, such as directly reading from or changing variables outside the function scope, for instance, can be dangerous because it's much harder to keep track of function side effects than it is to keep track of return values.  
Always consider using arguments and return values where possible.

## 3.1 Files
If we're going to write useful programs, we're going to need to read and write information to and from other files. Python has many convenient ways to read and write data built in, and lots more can be installed easily with additional packages.

### 3.1.0 Opening a file for reading
In the same directory as this notebook is a text file named `sherlock_holmes.txt`. This file contains Arthur Conan Doyle's "The Adventures of Sherlock Holmes".
Let's open the file and print the first line:

In [None]:
with open('sherlock_holmes.txt', 'r') as sherlock_file:
    print(sherlock_file.readline())

Excellent!
There's a lot going on in this first line, so let's take it bit by bit. The `with` type statement tells the Python interpreter that everything indented below this statement is to be executed with this particular file opened. After the indented block is done, the file will be closed again.  
The `open('sherlock_holmes.txt', 'r')` bit tells Python to open this filename, in `r` or _read_ mode. There is also `w` for _write_ mode and a couple of other, less important, modes.  
The `as sherlock_file` bit just tells Python the file object should be temporarily stored in a variable named `sherlock_file`.

On the next line you can see that the file object has a `readline()` method that just reads a single line from it. Let's try using it to read another line.

In [None]:
print(sherlock_file.readline())

That doesn't work. When the indented block was completed, the Python interpreter closed the `with` statement, and therefore closed the file.  
If we want to keep access to the whole file, we might want to read all the lines into a variable:

In [None]:
with open('sherlock_holmes.txt', 'r') as sherlock_file:
    sherlock_text = sherlock_file.read()

# the sherlock_text variable now contains an entire book, so let's not print it here
# we can check the length of the string instead
print(len(sherlock_text))
# or use sys.getsizeof() to check the size in bytes
import sys
print(sys.getsizeof(sherlock_text))

That sounds like a lot, but it's more useful to know the number of lines, rather than the number of characters. One easy way to count the number of lines is to _split_ this huge string on the line endings. The character used to denote a line ending is `\n`. Python strings have a `str.split()` method that lets you split on any substring you want. After splitting, the string is turned into a `list` of strings.

In [None]:
# split the text on line endings
sherlock_lines = sherlock_text.split('\n')

# print the length of the list (the number of separate lines)
print(f'the sherlock holmes text is {len(sherlock_lines)} lines long')

Do you think splitting on line endings is the best way of getting an indication of the text length?  
Can you think of other characters or strings to split on that might be more useful?  

Either way, it's too much to print here, but using slicing we can print the first hundred lines and see what we have here.

In [None]:
# print the first 100 lines using list slicing
print('\n'.join(sherlock_lines[0:100]))

It looks like there is an index, and each chapter starts with a line that starts with the word "ADVENTURE". In the next section we'll take advantage of that to write each chapter to a separate file. First we'll put the chapters in a dictionary though, so we can access each chapter by the chapter name.  

We'll use iteration to go through all the lines one by one and check if they're chapter headings. If a line is the start of a chapter, we'll make a new `dict` _key_. If it's a normal line, we'll add it to the current chapter entry in the `dict`.

In [None]:
sherlock_chapters = {}  # initiate a chapters dict
current_chapter ='PREAMBLE'  # before the first chapter starts there is some preamble
sherlock_chapters[current_chapter] = []  # start a list we can append lines to

# iterate over our list of strings
for line in sherlock_lines:
    
    # strings have a .startswith() method which is very useful here
    if line.startswith('ADVENTURE'):
        current_chapter = line  # use the chapter heading as a new chapter name
        sherlock_chapters[line] = []  # make a new dict key from the chapter name and start a list
    
    # if the line is a normal line instead of a chapter name, we append it to the list of lines in that chapter
    else:
        sherlock_chapters[current_chapter].append(line)
        
# let's print the dict keys to check if we indeed have chapter names here
print(sherlock_chapters.keys())        

That looks good, let's write that to separate files.
### 3.1.1 Opening a file for writing
Opening a file for writing is just the same as opening it for reading, except we specify `'w'` for write mode instead of `'r'` for reading. Just like the `file.readline()`  and `file.read()`  methods, there are `file.writeline()` and `file.write()` methods.

In [None]:
# iterate over the chapters in the dict
for chapter, lines in sherlock_chapters.items():
    
    # if we want to use the chapter names as filenames, we need to replace the periods first
    filename = chapter.replace('.', '-')
    
    # open a new chapter text file
    with open(filename + '.txt', 'w') as chapter_file:
        
        # use str.join() to turn the list of lines into a single string again before we write to file
        chapter_text = '\n'.join(lines)
        
        # write to file
        chapter_file.write(chapter_text)
        
        # print a little message to indicate our progress
        print(f'wrote chapter {chapter} to file')

That looks good, but comparing the index and the chapter files we just made reveals a mismatch. Can you figure out why the last few chapters were not written to a separate file?

__NOTE:__ When using write mode (`'w'`) Python always __overwrites the entire file__. Keep this in mind, since you can delete important data that way. There is an append mode (`'a'`) for appending to an existing file, should you need it.  
__NOTE ABOUT THE NOTE:__ If you're using git and Github to keep track of the version history of your important files, the risk of overwriting a file is a bit less serious, since you can revert to a prior state of the file. (I.e., before you accidentally ruined it.)

## 3.2 Importing modules

As you learn more about coding and use python more often, you will probably get to a point where you'd like to reuse some of the code that you've previously written. As you might have noticed, whenever you quit your python interpreter, you "loose" all of the variables and functions that you have defined in that session - so if you wanted to reuse a function that you wrote in an earlier programming session in a new python program, you'd have to redefine that function all over again. Retyping all your function definitions would get pretty redundant and frustrating... Fortunately, python accommodates for the need to reuse code.

__Modules__ are python files that contain function definitions and statements. There are a lot of very useful modules out there (and we'll talk about some of them in a lot more detail later), but it's actually really easy to write a module, yourself.

Imagine, for example, that you wanted to reuse the greet function from above. Rather than redefining it over and over again every time you use it, you can simply save it as a .py file and import it whenever you'd like to reuse it. 

Here's the greet function again:

In [None]:
def greet(name='stranger'):
    print(f'Hello, {name}!')

In order to turn this into a module, you can simply open a text editor, paste your function code in there, and save it as a file ending in .py (for example, helper_functions.py).

Now you can imort your module (in this case, the module corresponds to the file name, so it's called helper_functions) into your current python session like this:

In [None]:
import helper_functions

In order to use one of the functions from your module, you'll have to tell python that it should look within the module you imported:

In [None]:
helper_functions.greet('greta')

If you intend to use that function a lot, you might want to rename it within your current script:

In [None]:
g = helper_functions.greet
g('jeroen')

There are several ways of importing modules into a script or session. For example, you can choose to import only specific functions: In that case, you won't have to specify the module name each time.

In [None]:
from helper_functions import greet
greet('creepy vampire doggo')

You could also import __all__ the functions within a given module, like this:

In [None]:
from helper_functions import *

...or you could choose to assign a different name to the module as you import it. Some modules are weirdly named/tedious to type out every time, so it is pretty much convention to rename them as you import them (for example, numpy is generally imported as np).

In [None]:
import helper_functions as hf
hf.greet()

This also works together with "from" statements:

In [None]:
from helper_functions import greet as g
g()

## 3.3 The OS module - Manipulating files and directories

One of the most useful modules out there is os. It provides a whole lot of functions and methods to manipulate files and directories, and it allows you to interface with your underlying operating system. 

For example, you can use os to query python for your current working directory (the folder you're currently in):

In [None]:
import os
os.getcwd()

You can also use it to change your working directory:

In [None]:
os.chdir(r"../..") # .. means 'move up' - ../.. means 'move up two levels in the hierarchy'
os.getcwd()

In [None]:
os.chdir(r"mpi-python-intro/Session3/")
os.getcwd()

You can use os to create new directories for you:

In [None]:
os.mkdir('test') # this will make a new directory called "test" in your current working directory

If you want to delete a folder, you can do so like this:

In [None]:
os.rmdir('test') # this will give you a warning in case the folder isn't empty 

The os.path submodule also has a lot of useful functions. For example, the path.join function will forever help to solve my confusion about forward and backslashes in Windows and Linux:

In [None]:
new_path = os.path.join(os.getcwd(), 'my_filename.py')
new_path

The os.listdir function returns a list of all the files in a current directory:

In [None]:
os.listdir(os.getcwd())

This can be incredibly useful, for example when you want to loop over all the files in a directory and perform a certain action:

In [None]:
for filename in os.listdir(os.getcwd()):
    absolute_path = os.path.join(os.getcwd(), filename)
    print(absolute_path) # usually you'll do something more useful than this :)

...and there's a lot more to discover, so feel free to read in the documentation if you like. :)

## 4.0 DIY TIME! :)

Remeber your fibonacci loop from last session? Turn that into a function that takes as its argument the highest number in the sequence you'd like to go to. (So, for example, fibonacci(15) would return [0,1,1,2,3,5,8,13]). 

Save your function as a module called "session3".

Write another function that takes a list as its input and returns as its output another list with all the items from the original list, minus any duplicates. (E.g., input_list = [1,2,3,4,5,5,6,7,8,8,9], output_list = [1,2,3,4,5,6,7,8,9].)

As always, there are several ways of going about this - if you're up for it, try thinking of two different ways that you could write this function (perhaps one using a list comprehension and one using a loop, or something different all together). 

Save this function into the same "session3" module, together with your fibonacci function.

Now try different ways of importing and using your two new functions from your session3 module.

Open the sherlock text again. Make a new directory within your Session3 folder (using os), and in that directory, make ten new files. To each of them, write one line from the original sherlock text, and name them "line1.txt", "line2.txt", and so on.

Loop through your new folder and check whether any of the lineX.txt files are empty. If yes, then delete the empty file from the directory.