## Deep and shallow copies

If you have experience with other programming languages you might know about this. Most programming courses/websites etc. mention this fairly late in the course but it is one of the things that are **really, really, really** important to understand. And it is actually pretty simple so let's talk about it and look at a few examples.

In [None]:
string_1 = "I am a string"
string_2 = "I am a string"
print(string_1 == string_2)
print(string_1 is string_2)
string_2 = string_1
print(string_1 is string_2)
string_2 = "2"
print(string_1)
print(string_2)
print(string_1 == string_2)
print(string_1 is string_2)

**And:**

In [None]:
list_1 = ['list', 1]
list_2 = ['list', 1]
print(list_1 == list_2)
print(list_1 is list_2)
list_2 = list_1
print(list_1 is list_2)
list_2[1] = 2
print(list_1)
print(list_2)
print(list_1 == list_2)
print(list_1 is list_2)
#spot the difference between the two examples

Python has **immutable** and **mutable** types. 
A number (int, float), a bool, string or tuple are all immutable, meaning we can't modify them.

In [None]:
some_letters = "I am immutable"
some_letters = "Completely new string" #we can assign the variable some_letters a completely NEW string
print(some_letters[4])
some_letters[4] = 'L' #but we can't change anything in the string assigned to 'some_letters'
#try the same with a tuple

List, dictionaries and sets are mutable, meaning we can modify the object that is stored in the variable.

In [None]:
list_nr_1 = [1,2,4]
print(list_nr_1)
list_nr_1[2] = 3
print(list_nr_1)

What is dangerous about this is that when we make a copy of a mutable object, it isn't really a copy but just another way to reach the same object. So any modifications made through either way of reaching the object will modify the same object.

In [None]:
#remember the list example above?
list_1 = ['list', 1]  #at the moment, the lists look the same
list_2 = ['list', 1]  #but they are two different objects
#we can modify them independently
print(list_1, list_2)
list_1[0] = 'the list'
list_2[1] = 2
print(list_1, list_2)
#now we point them at the same object (or rather we point the name list_1 to the object that list_2 refers to)
list_2 = list_1
print(list_1, list_2)
#now any modification using either name for the object (list_1 or list_2) will modify the object since they refer to the same thing
list_1[0] = 'aaaaah'
print(list_1, list_2)
list_2[1] = 14
print(list_1, list_2)
print(list_1 is list_2) #checks whether both names refer to the same object

This can lead to cases where you think you have made a copy but you haven't. You have just created two ways of accessing the same object and modification through either way will change it.

In [None]:
#try to create a similar situation to the above using dictionaries (i.e. where two variables point to the same dictionary)
#then modify the dictionary through both and show the result

Therefore, in cases where we want to have a copy that is a real copy (and not just another way to access the old object) we need to use deep copy.

In [None]:
from copy import deepcopy
list_1 = ['list', 1]
list_2 = ['list', 1]
#we can modify them independently
print(list_1, list_2)
list_1[0] = 'the list'
list_2[1] = 2
print(list_1, list_2)
list_2 = deepcopy(list_1)
print(list_1, list_2)
list_1[0] = 'aaaaah'
print(list_1, list_2)
list_2[1] = 14
print(list_1, list_2)
print(list_1 is list_2)

## Working with files

#### Reading

In [None]:
with open("testfile.csv", 'r') as test_file: #'r' tells it that you want to read the file
    for line in test_file:                   #goes through the file line by line
        number, number_word = line.strip().split(",")
        print(number, number_word)
#using this construct, the file is closed automatically when the whole content has been read

If the file has a header, then we need to deal with that.

In [None]:
#ignore header row
with open("testfile_with_header.csv", 'r') as test_file:
    next(test_file) #ignore the header line
    for line in test_file:
        number, number_word = line.strip().split(",")
        print(number, number_word)

In [None]:
#save header row for later
with open("testfile_with_header.csv", 'r') as test_file:
    header = test_file.readline()
    for line in test_file:
        number, number_word = line.strip().split(",")
        print(number, number_word)
print("Header was:", header)

#### Writing

In [None]:
out_file = open("outfile.txt", 'w')
out_file.write("First line.\n")  #unlike print, write does not do newlines for you. You have to do them by hand using \n
out_file.write("Second line.\n")
out_file.write("Third line.")
out_file.close() #don't forget to close the file

Open the created file in explorer. Then change the 'w' in the code to an 'a'. Run the code again and check the file again. What is the difference between w and a?

In [None]:
commonly_seen_in_uk = {'unicorn': False, 'badger': True, 'lion': False, 'narwhale': False, 'sheep': True, 
                       'cow': True, 'pony': True, 'pygmy hippo': False, 'cat': True, 'python': False}
#write the animals which are not commonly seen in the UK into one file and the other animals into another

#### csv read

In [None]:
import csv

sum = 0
with open("testfile2.csv", 'r') as test_file:
    reader = csv.reader(test_file)
        
    next(reader) #ignore header row
        
    for row in reader:
        number = int(row[0])
        sum += number
        
print(sum)

R is a bit more elegant with things like this as it can load the whole table as a data frame in one go. In python you can do that with **Pandas**. Or you can just stick to R and use python for the cases where you want scripting/want to mine through a file line by line.

### os package

The os package in python allows you to a lot of filesystem things easily.

In [None]:
import os

In [None]:
operating_system = os.name
print("You are running {}.".format(operating_system))
if operating_system not in ['posix', 'mac']:
    print("Some things might work differently for you.")
print("The path separator for your operating system is {}.".format(os.sep))

In [None]:
os.uname() #gives you additional information (not available under windows)
#os.uname().sysname #accessing individual fields

In [None]:
os.getcwd() #returns the current working directory

In [None]:
os.chdir('folders') #change the working directory using a relative path
os.getcwd()
#try running this command again

In [None]:
#using an absolute path
os.chdir('HERE/subfolder1') #copy and paste absolute path of folder 'folders' here
os.getcwd()

In [None]:
#get the home directory
home_dir = os.path.expanduser('~')
print(home_dir)

#### os.path.join()

In [None]:
python_dir = "/home/me/python_course/introduction"
#replace with the path to your python, remember to use \\ or raw strings under windows
folder_folder = 'folders'
print(python_dir+folder_folder) #dangerous!

In [None]:
#instead use os.path.join()
folder_of_folders = os.path.join(python_dir, folder_folder) #what we actually wanted
print(folder_of_folders)

In [None]:
#works with more levels as well
one_more = 'filesfolder'
a_file = 'a_file.txt'
print(os.path.join(python_dir, folder_folder, one_more, a_file))
#os.path.join automatically uses the right separator for your operating system
#it puts on separator between each part you tell it to join
print(os.path.join(python_dir, folder_folder, one_more))
print(os.path.join(python_dir, folder_folder, one_more, ""))

In [None]:
#if one of the components is an absolute path then everything before gets ignored
#os.path.join("/home/myhome/MyDocs", "Docs", "a_folder", "/usr", "more") #linux example
os.path.join("C:\\MyDocs", "Docs", "a_folder", "C:\\Programs", "more") #windows example

In [None]:
#os.path.join together with os.path.expanduser is very useful if you have similar folder structure 
#on home and work machine but different usernames!

#write a bit of code that prints a full path to a file named 'wolpertinger.txt' in the users home directory

In [None]:
#collapses redundant separators and, under windows, it turns / into \
print(os.path.normpath("C:/MyDocs//MyName\MyFiles"))
#also collapses up-level references
print(os.path.normpath("/home/myname/something/../file_in_myname"))

In [None]:
#get the full path for a file
folder_of_folders = os.path.join(python_dir, "folders")
os.chdir(folder_of_folders)
os.path.realpath("subfolder1")

In [None]:
#get relative paths
os.path.relpath('home/ezes1m13/python_course/folders/subfolder1', start='home/ezes1m13/')

#### Split a path into folder and filename

In [None]:
#windows: split into drive name and path
os.path.splitdrive("C:\User\something")

In [None]:
my_file_path = "/home/user1/files/stuff/thing2.txt"
print(os.path.dirname(my_file_path))
print(os.path.basename(my_file_path))

In [None]:
#we can also use split for a one-liner
my_path, my_file = os.path.split(my_file_path)
#what is the type of the thing that os.path.split returns?

In [None]:
#try these
my_file_path1 = "/home/user1/files/stuff/thing2.txt"
my_file_path2 = "/home/user1/files/stuff/"
my_file_path3 = "/home/user1/files/stuff"
my_file_path4 = "a_file"
print(os.path.split(my_file_path1))
print(os.path.split(my_file_path2))
print(os.path.split(my_file_path3))
print(os.path.split(my_file_path4))

#### Split filename and extension

In [None]:
myfilename = "hello.txt"
filename, extension = os.path.splitext(myfilename)
print("Filename: {}\nExtension: {}".format(filename, extension))
print("Extension without a dot: {}".format(extension[1:]))

In [None]:
another_file_path = "/home/user/work/data/datas.csv"
#split this path into:
#path
#filename
#extension without dot
#and print all of these

#### Check whether a file or path exists

In [None]:
#finding out whether a file exists can be useful
subfol1 = os.path.join(testfolder, "subfolder1")
files_to_check = ["a_file.txt", "anotherfile.txt", "some_file.txt"]
for filename in files_to_check:
    print(os.path.exists(os.path.join(subfol1, filename)))

#### EXTRA: Iterate over all files in a folder

In [None]:
#iterate over everything
folder_with_files = "/home/me/python_teaching/folders/filesfolder/" #set to the right path
list_dir = os.listdir(folder_with_files)
for files_and_folders in list_dir:
    print(files_and_folders)
#change this so that it prints the full path for each file/folder

In [None]:
#iterate over files
folder_with_files = "/home/me/python_teaching/folders/filesfolder/" #set to the right path
list_dir = os.listdir(folder_with_files)
#filter to get only files
just_files = [x for x in list_dir if os.path.isfile(os.path.join(folder_with_files, x))] #equivalent for dirs: os.path.isdir()
for a_file in just_files:
    print(a_file)

In [None]:
#iterate over all files with a certain extension
folder_with_files = "/home/me/python_teaching/folders/filesfolder/" #set to the right path
list_dir = os.listdir(folder_with_files)
#filter to get only files
just_files = [x for x in list_dir if os.path.isfile(os.path.join(folder_with_files, x))] #equivalent for dirs: os.path.isdir()
just_csvs = [x for x in just_files if x.endswith(".csv")] #string.startwith() is also a thing
for a_file in just_csvs:
    print(a_file)

Note: you could also do the filtering in the loop if you prefer!

In [None]:
folder_with_files = "/home/codenotebooks/python_teaching/folders/filesfolder/" #set to the right path
list_dir = os.listdir(folder_with_files)
for a_thing in list_dir:
    if os.path.isfile(os.path.join(folder_with_files, a_thing)) and a_thing.endswith(".csv"):
        print(a_thing)

In [None]:
#write some code that displays both files and directories but tells us which is which like so:
#name: Directory!
#name: File
#name: File
#.....

## Functions

When our code gets longer and longer it often becomes unreadable. We can pack code into smaller bits called **functions**. Functions are also a great way to reuse code.

In [None]:
#let's define a function called print_hello
def print_hello(): #the name() tells python that name is a function (and not a variable)
    print("Hello")

print_hello()
print_hello()
print_hello()

In [None]:
#write a small function that prints a triangle made out of **** s
#then call that function 5 times
#how many lines of code have you saved in comparison to copying and pasting the same code 5 times?

Functions don't have to do exactly the same thing every time. We can pass **arguments** to functions. They can be used to switch between different behaviours of the function or the can specify information that the function works on.

In [None]:
def greeting(name): #this function has one argument, named 'name'
    print("Hello {}!".format(name))
greeting('Elisabeth')
greeting('Your name')
greeting('someone')
greeting('mysterious stranger')
greeting('Doctor')
#change the statement so that it uses a different greeting such as Howdy
#now imagine you had to change that for all five people! Using a function means we only have to do one change no matter
#how often we use the function! Yay!

In [None]:
#write a variant of the greetings function that prints something that depends on the name passed
#such as a correct regional greeting depending on country of origin or something else

In [None]:
#functions generally return something (but don't have to)
def give_it_back(something):
    return something #the return keyword specifies what to return
print(give_it_back("ping")) #this does nothing
#try passing the function an int instead of a string
#or a list
#python doesn't mind (unlike java et al)

In [None]:
#a function can have more than one argument
def powers(base, exponent):
    return base**exponent
print("{} is not the same as {}.".format(powers(2,3), powers(3,2)))

In [None]:
#and we can (and ideally should) refer to the arguments using their names
#this means that even if get are confused about the the order, we still get the right result
print("{} IS the same as {}.".format(powers(base=2,exponent=3), powers(exponent=3,base=2)))

In [None]:
#functions can return more than one value in python (the type of the returned thing is a tuple by default)
def i_return_two_things(filename):
    name, extension = os.path.splitext(filename)
    return name, extension[1:]
nam, ext = i_return_two_things("made_up_filename.txt")
print(type(things))
print("Name:", nam)
print("Extension:", ext)

In [None]:
#write a function that splits a filename with path into path, filename without extension and extension without . 
#and returns these 3 parts
a_filename_with_path = "/home/wally/files/mr_file.csv"

In [None]:
#we can specify default values for arguments, which makes them optional
def say_hi(name, greeting="Hello"):
    print("{} {}!!!".format(greeting, name))
say_hi(name="Elisabeth")
say_hi(name="Elisabeth", greeting="Hallo")
say_hi(greeting='Yo') #what is wrong with this one?

In [None]:
#optional, undefined arguments: sometimes we want to leave an argument undefined
#we can use the keyword None for that
def say_hi2(name=None, greeting="Hello"):
    if name is None: #now we can check whether the user has set name and act accordingly
        if greeting == "Hallo":
            name = "Elisabeth"
        elif greeting == "Buon giorno":
            name = "Carla or Alessandro"
        else:
            name = "Stranger"
    print("{} {}!!!".format(greeting, name))
say_hi2()
#test a few more combinations

#### EXTRA: Why using None can be really important

In [None]:
def spam(eggs=[]): #using empty list
    eggs.append("spam")
    return eggs
spam()
spam()
spam()
spam()

Huh, what happened there? The issue is that we have assigned a *mutable* object as a default. So python thinks it is clever: at the time of the second call it thinks "hey, I've done eggs=[] already, so eggs now exists, so let's use the thing that I already have." Which is not what we want most of the time!

In [None]:
#instead use
def spam(eggs=None): #using empty list
    if eggs is None:
        eggs = []
    eggs.append("spam")
    return eggs
spam()
spam()
spam()
spam()

#### EXTRA: Side effects (not the medical kind but potentially fatal)

In [None]:
def i_have_side_effects(unsorted_list):
    unsorted_list.sort()
    for item in unsorted_list:
        print(item)
my_list = [4, 17, 1, 32, 9, 12]
print(my_list)
i_have_side_effects(my_list)
print(my_list)

What happens in the function does not stay in the function if we work on mutable objects!

In [None]:
def plus_one(number):
    number += 1
    print(number, "!")
my_number = 7
print(my_number)
plus_one(my_number)
print(my_number)

A number is immutable, so what happens in the function stays in the function!

Right, so this is all a bit upsetting but there is more that you need to know...

#### EXTRA: Scope

In [None]:
def mathsy_stuff(nr):
    nr2 = 10
    return nr+nr2
nr = 1
nr2 = 2
print(mathsy_stuff(nr))

What happens here is that the nr2 within the function is a different nr2 to the nr2 on the outside. nr2 within the function only exists within the function and the space where it exists is called its scope.

In [None]:
def mathsy_stuff(nr):
    return nr+nr2
nr = 1
nr2 = 2
print(mathsy_stuff(nr))

So now that the inner nr2 doesn't exist, python looks further away and finds the outer nr2.

In [None]:
def mathsy_stuff(nr):
    nr2 += 1
    return nr+nr2
nr = 1
nr2 = 2
print(mathsy_stuff(nr))

Ahm yes, I did say this might be a bit uncomfortable/confusing? What happens here is that we are trying to modify nr2 from within the function. But nr2 doesn't belong to the function mathsy_stuff because it normally lives outside. So mathsy_stuff is allowed to look at it, but not change it!

In [None]:
def mathsy_stuff(nr):
    nr5 = 5
    return nr+nr2
nr = 1
print(mathsy_stuff(nr))
print(nr5)

This one fails because nr5 only exists within the function, so we can't access it from outside of the function, because it doesn't exist on the outside.

#### Time to unconfuse!

1) Variables declared outside defs are visible everywhere (outside and inside the def). But within the def can't assign a new value because it doesn't "own" the variable.

In [None]:
i_am_global = "GLOBAL"
def outer_def():
    print(i_am_global)  #i am global is not defined within outher_def yet, so it goes to the global one
    #i_am_global = "NOT #not allowed
outer_def()
print(i_am_global)      #but only within the function. On the outside we can see the global one again.

2) Variables inside a def that have the same name as a variable on the outside hide the outside variable temporarily.

In [None]:
i_am_global = "GLOBAL"
def outer_def():
    i_am_global = "NOT" #with this we create a new variable with the same name. It hides the other variable
    print(i_am_global) 
outer_def()
print(i_am_global)      #but only within the function. On the outside we can see the global one again.

3) Variables defined within a def are local and can not be accessed from outside the def.

In [None]:
def outer_def():
    non_global = "LOCAL"
print(non_global)

4) The whole thing is hierarchical: Variables inside a def that has another def nested inside are global from the view of the inner def and local when seen from the outer def.

In [None]:
def outer_def():
    i_am_local_global = "HM?"
    print(i_am_fully_local)
    def inner_def():
        i_am_fully_local = "LOCAL"
        print(i_am_local_global)
        print(i_am_fully_local)
    inner_def()
outer_def()
print(i_am_local_global)
print(i_am_fully_local)
#comment out all statements that lead to errors

### Running from the command line

In the python_course folder is a file called *my_library.py*. Open that file in an editor and see what it contains.

Now open the command window, it should be under: Start, All Programs, Accessories, Command Prompt.

Type: python3 c:\pathtofile\my_library.py and press enter

This should run the script and you should be able to see the output in the command prompt.

#### Import your own code

We have imported existing modules before, but we can do the same with our own code.

In [None]:
from my_library import useful_function

What happens here is that when we load the file, the code that is not in a function gets executed, printing a lot of things.

If we don't want this to happen we have to define a main method. A main method is run only when the script is run as a stand-alone script, but **not** when it is imported as a library (which is what we want).

In [None]:
#!/usr/bin/python3

#defs go here

def main():
    #stuff that only gets executed in stand-alone mode goes here
if __name__ == "__main__":
    main()

Adjust my_library.py in that way.

In [None]:
#after import we can use the functions we imported
useful_function(lolspeak=True)

In [None]:
#not imported, so can't use it
last_element([1,2,3])

In [None]:
import my_library #imports the whole library
my_library.last_element([1,2,3]) #but we have to put the filename/module name in front of the function
#last_element([1,2,3]) #this still doesn't work

In [None]:
#or we can import everything from the whole library
from my_library import *
useful_function(lolspeak=True)
last_element([1,2,3])

## Useful links

a nice interactive beginners course (if you want to practice some of the things from here some more):
    
http://www.codecademy.com/

a blogpost on python books and courses:

http://simulatingcomplexity.wordpress.com/2014/11/03/python/

and a (challenging but not because of the programming) online game:

http://www.pythonchallenge.com/

an online course:

http://learnpythonthehardway.org/book/

The official python documentation:

https://docs.python.org/3.4/