# Functions

What are functions?
- Commands not bound to a specific variable
	
Functions help us reuse code
- Saves time
- Learn from & share with others

Modularity is good, because…
- It makes making changes easier (change one place instead of e.g. three)
- It helps reduce the likelihood of errors
- It helps make your code more tidy and readable
    
Use `def` to make a function (see cell below)

In [1]:
#function average: calculates the average of a list of numbers
#Expected input: a list that consists only of numbers (int or float)
#Output: a float that is the average of all the numbers in the list
def average (a_list):
    avg = 0
    
    #loop through elements and add them
    for element in a_list: 
        avg += element
    
    #return the sum divided by the length of the list, e.g. number of elements
    return (avg / len(a_list))
     

In [2]:
#Data from https://www.statbank.dk/10021
DK_population =[2447,2477,2506,2532,2560,2589,2621,2652,2687,2722,
                2757,2788,2820,2851,2886,2921,2958,2991,3027,3061,
                3265,3306,3340,3372,3406,3439,3467,3487,3510,3531,
                3557,3590,3620,3651,3683,3711,3738,3765,3794,3826,
                3849,3882,3926,3973,4023,4075,4123,4168,4211,4252,
                4285,4315,4349,4389,4424,4454,4479,4501,4532,4566,
                4601,4630,4666,4703,4741,4779,4820,4855,4879,4907,
                4951,4976,5008,5036,5054,5065,5080,5097,5112,5122,
                5124,5119,5116,5112,5111,5116,5125,5129,5130,5135,
                5146,5162,5181,5197,5216,5251,5275,5295,5314,5330,
                5349,5368,5384,5398,5411,5427,5447,5476,5511,5535,
                5561,5581,5603,5627,5660,5707,5749,5781,5806,5823]

print(average(DK_population))

4362.975


In [3]:
#list of integers
int_list = [5,3,8,11]
float_list = [3.6, 8.8, 1.3, 8.2]

#calculate average
my_average1 = average(int_list)
my_average2 = average(float_list)

#print average
print(my_average1)
print(my_average2)


6.75
5.475


In [6]:
#Testing average function with a different list (also strings)
#This will return an error, because in our function we have told it to add the elements of the list.

mixed_list = ['1.3','a',3,9]
#my_average3 = average(mixed_list)
#print(my_average3)

# Testing

Important for eliminating errors from a program and your analysis
- The above cell is an example of testing our average function


Test in a separate cell from the original code
- In a separate notebook, when we get to that


What we test:
- Typical inputs
- Where we expect errors
- Boundary conditions -- comething that might be overlooked or unusual input


# Error handling

Use `if` statements to check conditions


Use `try` statements if you know something might cause an error
- Usually for when others might use your code
- (See example of the try command below). Here we try to floatify a variable, which we know is wrong (hence the except error line). Therefore, we provide our own error statement.
- `None` is a way to return nothing (not 0, but nothing). So if error --> give error msg --> return None

In [4]:
#all the above are tests of our function 'average'. 
#we learned that there are problems if 'average' gets a list that includes 
#elements that are not strings
#making a new function better_average that includes error handling

#function better_average: calculates the average of a list of numbers
#Expected input: a list that consists only of numbers (int, float, or string representations of a number)
#Expected output: a float that is the average of all the numbers in the list
#Returns None if a list element is invalid
def better_average(a_list): 
    avg = 0
    
    for element in a_list: #remember elements is not a keyword, but a word i choose. could also have been i for example
        #Try to do this
        try:
            value = float(element)
        #Do this if there was a ValueError
        except ValueError: 
            print("This list includes elements that are not numbers. An average cannot be calculated.")
            return None 
        #Otherwise add the value to the average sum
        else:
            avg += value
        
    return (avg / len(a_list))       

In [7]:
#Testing average function with our previously declared list
b_avg_var1 = better_average(int_list)
print(b_avg_var1)
b_avg_var2 = better_average(float_list)
print(b_avg_var2)
b_avg_var3 = better_average(mixed_list)
print(b_avg_var3)

6.75
5.475
This list includes elements that are not numbers. An average cannot be calculated.
None


# Functions continued

Variables declared within a function only exist within that function
- In other words, it only has a local scope
- This is why you return a value (thus giving it a global scope outside the function)


As before, variables declared outside a function persist throughout the script/notebook
- They have a global scope


Creating a function is like creating a contract (with the rest of your code) about how that function will operate


Good documentation/commenting on what the function does leads to good readability and expected behavior
- Above each function, iinclude a description of what the function does, the expected input, and the expected output (see example above in cell 20)


# Text files

Also known as IO (input-output) operations

Four things you can do with text files
- Opening
- Reading
- Writing
- Closing

To read or write a text file you have to open it

For now, we are "hardcoding" the text file name into our code
- Other ways of asking for input

To open call the function open, which returns a file object:
- `open(filename, 'r')` - to read a textfile
- `open(filename, 'w')` - to write to the text file
- `open(filename, 'a')` - to append to the text file (adds to the end, where write instead overwrites stuff at the beginning)

-

### You can read the whole file into a string
- `file.read()` - creates a string thats the content of the entire file (not recommended for large files, only works if the file exists)
- `file.readline()` - a line ends with a "\n" or newline character (use loop to repeatedly read lines)
- `file.read(x)` - x is the number of characters (moves cursor x characters in the text)

In [9]:
#Opening a file
f = open('course_description.txt', 'r')

In [10]:
#reading the first line of the file
first_string = f.readline()

In [11]:
#printing to see what the first line is
print(first_string)

#Returns a long line, because it is literally on a single line in the txt-file. (no newline characters added)

The rise of new types of digital data and the varieties of social life taking place on social media platforms enable new relations between quantitative and qualitative methods of inquiry and analysis. How such new complementarities are best exploited for social-scientific and practical purposes will be the focus of this course.



### You can write strings into files
- `file.write("This is my new string.")`
- Command also writes amount of characters written (therefore below prints 29)

Strings will be concanated (added together) with multiple writes

A file needs to be opened in write or append mode to be able to write to it

#### !! Write mode will overwrite what's already in the file

In [12]:
#Opening a file to write
#Note: If this file already exists, it will be overwritten
f2 = open('new_file.txt', 'w')

#Writing to the file
f2.write("I’m writing to my first file!")

29

### Always remember to close textfiles! Else you risk losing data.
- `file.close()`
- See cell below

In [13]:
#Closing all our files
f.close()
f2.close()

# Navigating files

Our first library: `os`
- `import os`

`os.listdir(x)` gives a list of files in x (x is a directory)

Lots of functions
- `os.rename(original filename, new filename)` - Both filenames should be strings. Can be used with loops to rename several files in a directory (just remember they need different names, e.g. adding +1 to each name)
- `os.getcsw()` - the directory the script is saved within
- `os.mkdir("New")` makes a new directory, called New in the directory the script is saved within
- Use caution with `os.delete` - it _will_ delete the file from the computer
- Etc. (see slides for more)

In [18]:
#Importing our first library
import os

#looping through all of the files in the current directory
for filename in os.listdir('.'): # the dot, . , means 'current directory'. Left out, this achieves the same thing. Could be replaces with another link.
    #printing the files
    print(filename)
    
#Making a new directory (prints error, because New directory already exists)
os.mkdir("New")
file = open("./New/new_file.txt", 'w')

.git
.gitattributes
.ipynb_checkpoints
3 - Core concepts of Python notebook.ipynb
4 - Containers.ipynb
course_description.txt
New
new_file.txt
octocat.png
README.md
sodas_people.txt
Speech_2019.txt
stop_words.txt
W1D2-3_2_commenting.ipynb
W1D2-3_2_commenting.py
W2D1-Exercise1.ipynb
W2D1-Exercise1_sol.ipynb
W2D2 Demo.ipynb
W2D2-Exercise2.ipynb
W2D2-Exercise_sol.ipynb
W3D1 Demo.ipynb
W3D1-Exercise3.ipynb


# Regular expressions

Our second library `re`!
- Note the minimalism in library names

#### Regular expressions help you find patterns in text
Useful to find things that follow specific formats or have minor variations
- Addresses
- E-mail addresses
- Currency amount
- E-mail
- Phone numbers
- Etc.

Three key function for finding a pattern:
- `re.match(pattern, string)` - tries to find the pattern at the start of the string
- `re.search(pattern, string)` - tries to find the pattern anywhere in the string (prob. what we want most of the time)
- `re.findall(pattern, string)` - finds all matches of the pattern
- Look at documentation for this, prob. not in textbook

Regular expressions use special characters to mark pieces of patterns
- See slides for list, but e.g.
- Refer to documentation for special characters lists and how they work!
- The `\` says the symbol/letter, not the character. E.g. `\s` looks for whitespace.
- Making the character capitalized means opposite, e.g.: `\S` mean not whitespace. (For example see e-mail cell)


`re.search` and `re.match` return a `match` object if a match was found
- A match object includes the subgroup matches of your pattern
- A succesful match also has a Boolean value of `True`
- No match returns a null value (`None`)

In [39]:
import re
#our regular expression pattern for an email
email_pattern = '(\S+@\S+\.(\w){2,3})'
#\S+ - finding multiple characters (+) that are NOT whitespace (\S))
#the (\w){2,3} means that it wants a . and then 2 or 3 characters (for the domain name)
#If we wanted to find all mail adresses that started with "s", we would just add s (without backslash) at the beginning: (s\S+@\S+\.(\w){2,3})

#Our test cases
print(re.search(email_pattern, 'samantha.breslin@anthro.ku.dk'))
print(re.search(email_pattern, 'I\'ll be @ work'))

<re.Match object; span=(0, 29), match='samantha.breslin@anthro.ku.dk'>
None


In [37]:
#Finding a phone number (doesnt work for me)
import re
phonenumber = '((\d){8}(/s?))'
print(re.search(phonenumber, '22556699'))

None


In [29]:
#A function that finds the emails in a file
#Expected input: a filename for an existing file (will cause an error if the file does not exist)
#Expected output: Writes to a text file called filename_emails.txt that lists all the emails found 
#in the original file separated by newlines
def read_emails(filename):
    #import regular expressions
    import re

    #our regular expression pattern for an email
    email_pattern = '(\S+@\S+\.(\w){2,3})'

    #Opening file
    file = open(filename,'r')
    #Create empty list to store emails we collect through the search
    emails = []

    #loop through each line in the file
    for line in file:
        #try to search for an email
        temp_email = re.search(email_pattern, line)
        
        #If an email was found, add it to the list
        if temp_email: 
            emails.append(temp_email.group()) #adding the group tells it to append the whole thing

    #Close the file, since we're done with it
    file.close()
    
    #removing the file extension (so our new file is not XXX.txt.txt but XXX.txt)
    short_name = filename.split('.')[0]
    
    #Create a new file for the email names
    email_file = open(short_name +'_emails.txt', 'w')

    #go through each email in the list and write the email to the new file
    for element in emails:
        email_file.write(element + '\n')

    #Close the new file with all the emails
    email_file.close()


In [30]:
#Testing our function
read_emails('sodas_people.txt')