# Introduction to Python
## Programming tools, 2016 Winter semester, CEU
_Jeno Pal_, Jan-Feb 2016

## Outline

* [Functions](#functions)
* [File Input-Output](#fileio)
* [Debugging](#debugging)
* [Coding style: PEP 8](#pep8)

## Functions

Functions are objects that take inputs, do some calculations and possibly return an output.
__They are essential and should be used all the time.__ Why?

* they let you be organized and logical: ideally you should break whatever you do into small, easily testable pieces
* abstraction is key: you should not repeat yourself, ever
    * easier to modify later, easier to fix bugs

### Built-in functions

You have already seen some examples!

In [1]:
# any
print(any([1, 1, 0]))

# len
print(len(range(10)))

# str
print(str(42))

True
10
42


You can get information in IPython by typing the function's name and a "?"

In [2]:
len?

These are functions that are loaded whenever you start Python - you don't have to import anyting so that you can use them. The list of built-in functions are [here](https://docs.python.org/2/library/functions.html).

### User-defined functions

Beyond these built-in functions, you can create your own functions, too. This is a power that sets you free. :)
The syntax is

```
def name_of_function(arguments):
    """
    docstring
    """
    code block
    return something
```
Arguments are optional, as well as return statements.

Let's see some examples!

In [3]:
# greet

def greeting(name):
    """
    Greet someone with the name "name"
    """
    print("Hello {}, how are you today?".format(name))
    
# this function has one argument, name
# it does not return anything

greeting("Mary")
greeting("John")

Hello Mary, how are you today?
Hello John, how are you today?


In [4]:
# functions can be used inside other functions

def greet_people(list_people):
    """
    Greet a list of people
    """
    for name in list_people:
        greeting(name)
        
greet_people(["Adele", "Meghan", "Katy", "Taylor"])

Hello Adele, how are you today?
Hello Meghan, how are you today?
Hello Katy, how are you today?
Hello Taylor, how are you today?


In [5]:
# inspect for a moment these functions
print(type(greeting))

greeting?

<class 'function'>


In [6]:
greeting??

Docstrings are available when someone asks for help.

* apart from very small, self-explanatory functions you should always include a docstring, describing in one sentence
what the function does
* it is best to include a description of the arguments and their types

Python functions are very flexible:
    
* functions may have no, several or even a variable number of arguments
* functions may have arbitrary number of return statements: even zero
    * execution is stopped at the first return statement
* functions can be defined within functions
* functions can be arguments or return values of other functions

In [7]:
# a function without arguments

def print_smileys():
    """ This function prints smileys"""
    print(":-) :-D")
    
print_smileys()

:-) :-D


In [8]:
# a function with multiple return statements
# --- only the first one hit is executed
def is_even(num):
    """
    Check if a number is even or odd.
    """
    if num % 2 == 0:
        return True
    else:
        return False
    
print(is_even(2))
print(is_even(3))

True
False


In [9]:
def print_foo():
    print("foo")
    
# a function that takes a function as argument

def do_twice(f):
    f()
    f()
    

do_twice(print_foo)

foo
foo


### Functions have their own namespaces

You can pass arguments to a function and define variables within a function. Important thing is that they will only exist _within_ the function, after execution you can't access their values any more.

In [10]:
def foo(a, y):
    # a and y exist here, are passed as arguments
    print(a)
    print(y)
    b = a + 1
    print(b)

foo(2, 3)

# after the execution is done, you can't access b any more - you get an error
print(b)

2
3
3


NameError: name 'b' is not defined

### Keyword arguments

A function may have special types of arguments called _keyword arguments_. The arguments you've seen so far are called _positional arguments_.

Positional arguments are determined by their positions: first, second, third... 

In [11]:
# the order defines the role of the inputs

# range(a, b): an iterator over integer numbers in the interval [a, b)
# turn to list with list()
print(list(range(3, 6)))
# if a > b, it is an empty list
print(list(range(6, 3)))

[3, 4, 5]
[]


Keyword arguments are named, you can pass values by the `f(arg_name=value)` syntax.

In [12]:
# write a version of range with named arguments!

def kw_range(from_num=0, to_num=0):
    return list(range(from_num, to_num))

print(kw_range(from_num=3, to_num=6))

# order does not matter
print(kw_range(to_num=6, from_num=3))

[3, 4, 5]
[3, 4, 5]


In [13]:
# the function definition involves default values for the named parameters
print(kw_range(to_num=5))
print(kw_range(from_num=-2))
print(kw_range())

[0, 1, 2, 3, 4]
[-2, -1]
[]


Keyword arguments are useful if there are many parameters whose order is hard to remember. The default values are also useful: many times you use function with inputs that are largely unchanged. Then using the usual value as default saves typing.

<a id='fileio'></a>

## File Input-Output

Reading and writing files is essential since data is stored in files.

### Basic reading and writing

Let's create a new file, write text into it and save it.

In [15]:
# this creates a new file object: open's first argument is the file's name, "w: is for writing
f = open("newfile.txt", "w")                

# the file object's write method takes a string and writes it. "\n" is a newline character.
f.write("Hello World!\n")
f.write("Python is a lot of fun!")

# if we open a file object, we have to close it, too. Upon closing, it gets saved.
f.close()

Let's read the file we've just created

In [16]:
# f is again a file object, "r" stands for reading
f = open("newfile.txt", "r")

# most of the times we want to read a file line-by-line.
# you can iterate on f to get the lines!
for line in f:
    print(line)
    
# we have to close the file
f.close()

Hello World!

Python is a lot of fun!


You can actually spare on always having to remember closing the file with the `with -- as` statement. I use this all the time.

In [17]:
with open("newfile.txt", "r") as f_in:
    for line in f_in:
        print(line)

Hello World!

Python is a lot of fun!


In [18]:
with open("newfile_2.txt", "w") as f_out:
    for text in ["Santa", "Claus", "is", "coming", "to", "town!"]:
        f_out.write(text + "\n")
        
with open("newfile_2.txt", "r") as f_in:
    for line in f_in:
        print(line)

Santa

Claus

is

coming

to

town!



### Paths

OK - where is the "newfile.txt" file stored? In the current working directory (cwd). What is that?

* in IPython, you can get that by typing `pwd`
* alternatively:

In [19]:
import os
os.getcwd()

'/home/paljenczy/Dropbox/Programming_Tools/Winter2016/progtools2-2016-winter/code'

You can specify where you want to store a new file by specifying a path to it.

* absolute path: the full path to the file
* relative path: the path relative to the current working directory

In [20]:
pwd

'/home/paljenczy/Dropbox/Programming_Tools/Winter2016/progtools2-2016-winter/code'

You can change the current working directory

* in IPython, use `cd`

You can list the content of the current working directory by `ls` in Ipython.

### The csv module

#### Simple writing and reading

Data is best stored as text files as opposed to application-specific file formats (e.g., `.dta` in Stata). It's better because it can be read by many applications (including Stata) - and even humans, too.

`.csv` stands for [comma (or character) separated values](#http://en.wikipedia.org/wiki/Comma-separated_values) and is a plain text file that contains rows with fields separated by a fixed character (comma or tab most of the times).

* example: "../Data/UNRATE.csv": go and open it with a text editor

Python's `csv` module allows us to work with .csv files. Remember to import it!

In [21]:
# read a .csv file with the previous method and print its first 10 lines
# reads each line as one string
with open("../data/UNRATE.csv", "r") as f:
    count = 0
    for line in f:
        print(line)
        count += 1
        if count > 10:
            break

DATE,VALUE

1948-01-01,3.4

1948-02-01,3.8

1948-03-01,4.0

1948-04-01,3.9

1948-05-01,3.5

1948-06-01,3.6

1948-07-01,3.6

1948-08-01,3.9

1948-09-01,3.8

1948-10-01,3.7



In [22]:
import csv 

# read a .csv file with the csv module's reader
with open("../data/UNRATE.csv", "r") as f:
    reader = csv.reader(f, delimiter=",")
    count = 0
    for line in reader:
        print(line)
        count += 1
        if count > 10:
            break

['DATE', 'VALUE']
['1948-01-01', '3.4']
['1948-02-01', '3.8']
['1948-03-01', '4.0']
['1948-04-01', '3.9']
['1948-05-01', '3.5']
['1948-06-01', '3.6']
['1948-07-01', '3.6']
['1948-08-01', '3.9']
['1948-09-01', '3.8']
['1948-10-01', '3.7']


So we can iterate on the reader. Why is it better than iterating on the open file like before?

* `csv` recognizes that it is a .csv file and separates lines on the delimiting character
* reads lines as a list

In [23]:
import csv

# writing: collect data to be written to list of lists
# one list: one row
countries_capitals = [["Hungary", "Budapest"], ["Poland", "Warsaw"], ["Croatia", "Zagreb"]]

with open("../data/countries_capitals.csv", "w") as f_out:
    # this time we use a tab for delimiting
    writer = csv.writer(f_out, delimiter="\t")
    # write a header row
    writer.writerow(["country", "capital"])
    for c_c in countries_capitals:
        writer.writerow(c_c)

#### DictReader, DictWriter

If you have a .csv file, it may be convenient to read each row as a dictionary: field names as keys and the respective field values as values. You can also use a list of dicts to be written.

In [24]:
# Let's read in the country file we've just written

with open("../data/countries_capitals.csv", "r") as f_in:
    d_reader = csv.DictReader(f_in, delimiter="\t")
    for xx in d_reader:
        print(type(xx))
        print(xx)

<class 'dict'>
{'country': 'Hungary', 'capital': 'Budapest'}
<class 'dict'>
{'country': 'Poland', 'capital': 'Warsaw'}
<class 'dict'>
{'country': 'Croatia', 'capital': 'Zagreb'}


In [25]:
# writing is similar: you have the data in a list of dicts 
countries_data = [{"country":"Hungary", "capital": "Budapest", "currency": "Forint"}, 
                  {"country":"Poland",  "capital": "Warsaw", "currency": "Zloty"}, 
                  {"country": "Croatia", "capital": "Zagreb", "currency": "Kuna"}]
                       
# an important thing: you have to specify a fieldnames argument, a list
# which fields will be written (order is also set)
with open("../data/coutries_capitals_currencies.csv", "w") as f_out:
    d_writer = csv.DictWriter(f_out, fieldnames=["country", "currency", "capital"])
    # header is written: fieldnames
    d_writer.writeheader()
    for dd in countries_data:
        d_writer.writerow(dd)

## Debugging

A large chunk of time when writing code is spent on fixing things that don't work - debugging.

### Easy but useful: printing

A not-so-sophisticated but sometimes useful method is to print out variables to see what's going on inside your code (e.g., what values does a variable have in a loop?). It's ok to do that but it's not very effective.

### IPython's `debug` magic and pdb

Let's try to write a function that takes a dict of data and prints a message based on it.

In [49]:
def print_country_info(d_info):
    """
    Based on a dictionary containing infos, this function prints
    a sentence about a country.
    """
    print("The capital of {} is {}".format(d_info["country"], d_info["capital"]))

In [54]:
# an example dictionary
d_info = {"country": "Switzerland", "capitol": "Bern"}

In [55]:
# try to use our function
print_country_info(d_info)

KeyError: 'capital'

Whoops, we got an error. Why? Where did things go wrong? Now see the magic: type debug, then `Shift + Enter`

In [57]:
debug

> [1;32m<ipython-input-49-385d06509084>[0m(6)[0;36mprint_country_info[1;34m()[0m
[1;32m      4 [1;33m    [0ma[0m [0msentence[0m [0mabout[0m [0ma[0m [0mcountry[0m[1;33m.[0m[1;33m[0m[0m
[0m[1;32m      5 [1;33m    """
[0m[1;32m----> 6 [1;33m    [0mprint[0m[1;33m([0m[1;34m"The capital of {} is {}"[0m[1;33m.[0m[0mformat[0m[1;33m([0m[0md_info[0m[1;33m[[0m[1;34m"country"[0m[1;33m][0m[1;33m,[0m [0md_info[0m[1;33m[[0m[1;34m"capital"[0m[1;33m][0m[1;33m)[0m[1;33m)[0m[1;33m[0m[0m
[0m
ipdb> d_info
{'country': 'Switzerland', 'capitol': 'Bern'}
ipdb> q


You are put back to the code just before the error happens. This enables you to inspect variables, objects, play with them, to see what could go wrong.

* this works when you execute code with IPython, and even here, in IPython Notebook

## Coding style and PEP 8

Coding style is a set of recommendations for how to write code.

* you can give (almost) any name to a variable
* but typically you don't want to
* you want to help youself, your future self and others who read your code by writing according to some set of consistent rules

In [58]:
first_name = "Jeno"       # this uses the lowercase letter, underscore naming
lastName = "Pal"          # this is called mixed case

# both are similar variables - their names refer to that they might be different, can be misleading

For Python, you want to adhere to a set of rules called [PEP8](https://www.python.org/dev/peps/pep-0008/). You should skim through it and try to gradually use these in your codes.

Type `import this` to the prompt to read more on Python philosophy!

In [59]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
