# Python Basics, Part 2

Now we know some of the basic "things" in `Python`---numbers, strings, `for` loops, `if` statements---it's important to learn how to, well, do things with things. Two big pieces we're currently missing are:
1. How do we manipulate our data---and, in particular, how do we do the _same_ thing to different things?
2. How do we store our work for later use, and how do we load up work that other people have done for us ahead of time?
The first problem is solved with _functions_, which allow you to "bottle" code you've already written and use it when you need it. (You're already a pro at this point---we've been using functions informally the entire day.) The second problem is solved with `Python`'s I/O (or "input / output") interface.

## 0. Advanced basic `Python` data types

Before we get started with more advanced topics, there are a few more data types that we need to be able to work with: dictionaries and tuples. Given how much you already know about numbers, strings, and lists, dictionaries and tuples will be a breeze.

### 0.0. Dictionaries

Dictionaries (or, more properly, `dicts`) work a lot like lists, except instead of looking up things in a dictionary by their _index_, we look them up by their _key_. It's easy to think of a `Python` dictionary as almost the same thing as a Webster's dictionary:
* __The Key:__ The "key" in a `Python` dictionary corresponds to the word you want to look up.
* __The Value:__ The "value" in a `Python` dictionary is whatever is stored at a key, which corresponds to a word's definition in a real dictionary.

(Dictionaries are known as *associative arrays* or *hash tables* in other languages.)

Much like a list, dictionaries are constructed using curly braces, with key value pairs separted by commas, like so: `{key1: value1, key2: value2, key3: value3}`. Let's try it:

In [1]:
gre_study_guide = {
    "bucolic": "adj. pastoral, rustic, countryfied",
    "tendentious": "adj. controversial, one-sided",
    "skulk": "v. to move in a stealthy or furtive manner"
}

Now, suppose the big test is tomorrow, but we can't remember the defintion of "skulk." No problem---we'll just look it up in much the same way that we'd look something up in a list:

In [2]:
gre_study_guide["skulk"]

'v. to move in a stealthy or furtive manner'

#### 0.0.0. Exercise

Try it yourself! Make a dictionary with a few of your favorite words. Look up their definitions.

In [3]:
### START
my_words = {
    "flambeau": "n. a lit torch or decorative candlestick",
    "excursus": "n. a long intellectual digression in a speech or piece of writing",
    "facultative": "adj. occurring optionally in response to circumstances rather than by nature"
}
### END

Of course, like lists, dictionaries can store more than strings. For instance, you might use a dictionary to store information for something like an address book:

In [4]:
address_book = {
    "simon": {
        "first_name": "Simon",
        "last_name": "Cowell",
        "phone": "447911123456"
    },
    "paula": {
        "first_name": "Paula",
        "last_name": "Abdul",
        "phone": "5303228051"
    },
    "randy": {
        "first_name": "Randy",
        "last_name": "Jackson",
        "phone": "2122002099"
    }
}

Now we can find someone's contact information just by looking up their name.

In [5]:
address_book["randy"]

{'first_name': 'Randy', 'last_name': 'Jackson', 'phone': '2122002099'}

What's more, we can even go directly to their phone number by accessing the `"phone"` key in the dictionary returned after looking up `"randy"`!

In [6]:
address_book["randy"]["phone"]

'2122002099'

#### 0.0.1. Exercise

Use the `address_book` dictionary to find Paula's area code.

In [7]:
#### START
address_book["paula"]["phone"][:3]
#### END

'530'

#### 0.0.2. Methods and utility functions for dictionaries

Let's get familiar with how we can work with dictionaries.

In [8]:
me = {'name':'Hans', 'email':'jgaeb@stanford.edu'}
print(me)

{'name': 'Hans', 'email': 'jgaeb@stanford.edu'}


In [9]:
me['cell'] = '414-123-4567'
print(me)

{'name': 'Hans', 'email': 'jgaeb@stanford.edu', 'cell': '414-123-4567'}


Or delete existing `key:value` pairs with the `del` statement.

In [10]:
del(me['email'])
print(me)

{'name': 'Hans', 'cell': '414-123-4567'}


The `key` of a dictionary can't be a list (because lists are mutable), but the `value` sure can!

In [11]:
me['siblings'] = ['Carrie', 'Karl']
print(me)

{'name': 'Hans', 'cell': '414-123-4567', 'siblings': ['Carrie', 'Karl']}


Use the `keys()` method of dictionary objects to get a list of the keys used in the dictionary.

In [12]:
me.keys()

dict_keys(['name', 'cell', 'siblings'])

And use the `in` keyword (compatible with all lists) to see if the a certain key exists in the dictionary.

In [13]:
'name' in me.keys()

True

In [14]:
'email' in me.keys()

False

When the keys are simple strings, it is sometimes easier to specify pairs using the `dict` constructor.

In [15]:
me = dict(name='Hans', email='jgaeb@stanford.edu', siblings=['Carrie', 'Karl'])
print(me)

{'name': 'Hans', 'email': 'jgaeb@stanford.edu', 'siblings': ['Carrie', 'Karl']}


Of course, it's worth keeping in mind is that the `key` in a dictionary can be _anything_, as long as it's immutable. So `key`s can be strings, numbers, or... tuples!

### 0.1. Tuples

The (very last!) data type we'll need to talk about in `Python` is tuples. Fortunately, tuples are much easier than lists or dicts. Think of tuples as lists that are _immutable_: once you've put some stuff in a tuple, you can't change it.

Tuples consist of a number of values separted by commas (not necessarily, but often, enclosed in parentheses).

In [16]:
description = 'male', 'dark hair'
print(description)

('male', 'dark hair')


In [17]:
description[0]  # tuples are also sequences, and can be indexed

'male'

In [18]:
description[1:]  # or sliced

('dark hair',)

In [19]:
description[0] = 'female'  # but NOT changed, because they are immutable

TypeError: 'tuple' object does not support item assignment

While being immutable may seem like a minor difference from lists, the implications are quite big, and tuples are generally used for very different purposes compared to lists. For example, tuples can be used as the `key` for dictionaries (think sparse matrices). 

In [20]:
super_sparse_matrix = {(0, 0):1, (1000, 1000):1}  # a 1000*1000 matrix with only two non-zero elements?
print(super_sparse_matrix)

{(0, 0): 1, (1000, 1000): 1}


In [21]:
word_matrix = {('apples', 'bananas'):1, ('apples', 'pears'):1}  # a matrix indexed by words
print(word_matrix)

{('apples', 'bananas'): 1, ('apples', 'pears'): 1}


There are many more data structures commonly used in `python`, but lists, dictionaries, and tuples pretty much cover the basics (not to mention that these three constitute enough to fully represent the [JSON](http://json.org/) format in `python`, something you might see some of this afternoon when you're working with APIs and scrapers.)

#### 0.1.0. Exercise

Create a tuple that contains all the people you live with. (__HINT:__ If you live by yourself, you'll need to try something like `("me",)`.) What happens if you try to add a roomate?

In [22]:
### START
apartment = ("Chandler", "Joey")
apartment[2] = "Eddie"
### END

TypeError: 'tuple' object does not support item assignment

### 0.2. List and dictionary comprehensions

List comprehension is `python`'s way of creating lists (and also other data structures) in a concise manner. In particular, a lot of `for` loops can be rewritten with list comprehensions.

#### 0.2.0. List comprehensions

One way to create a list of squares would be:

In [23]:
squares = []  # make an empty list
for x in range(10):
    squares.append(x**2)
    
print(squares)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


However, the more 'pythonic' way to do this, is to use list comprehension:

In [24]:
[x**2 for x in range(10)]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

The command reads: 
> build a list out of the square of x (x\*\*2), for the values of x in `range(10)`

List comprehension can be used to build a list of tuples too.

In [25]:
[(x, y) for x in range(10) for y in range(10) if x*y == 21]

[(3, 7), (7, 3)]

Note the use of `if` to filter out pairs that we _don't_ want in our list.

This is equivalent to the nested `for` loop:

In [26]:
twenty_one = []
for x in range(10):
    for y in range(10):
        if x*y == 21:
            twenty_one.append((x, y))
            

print(twenty_one)

[(3, 7), (7, 3)]


#### 0.2.1. Exercise

Look back at Exercise 1.5. from earlier this morning. Using list comprehensions, you can solve the whole problem in just one line. We'll give you a few of the functions you need, including `uniform()`, `mean()`, and `on_board()`.

In [27]:
from random import uniform
from statistics import mean
from math import sqrt

def on_board(x, y):
    if (abs(x) > 1 or abs(y) > 1):
        raise Exception(
            'Your throw should lie in the square from (-1, -1) to (1, 1). The'
            ' throw you gave me was: ' + str((x, y))
        )
    if (sqrt(x ** 2 + y ** 2) < 1):
        return True
    else:
        return False

### START
4 * mean([on_board(uniform(-1,1), uniform(-1,1)) for i in range(0,10000)])
### END

3.136

__PROTIP:__ We needed to reload those functions because each notebook is a separate execution environment. What that means is that all of the cells know about all the variables and functions you've defined in the _other_ cells in the _same_ notebook, but nothing else, unless you explicitly `import` it. We'll be learning more about that in a bit.

Be aware that if the item of the list is a tuple, it must be parenthesized.

In [28]:
[x, y for x in range(10) for y in range(10) if x*y == 21]  # this won't work

SyntaxError: invalid syntax (<ipython-input-28-9c5a6788255d>, line 1)

List comprehension can also be used to build dictionaries!

For example, we can also build what's called a _sparse matrix_ using a dictionary. The position is indexed by words $x$ and $y$, and
    \begin{equation}
        (x, y) = \begin{cases}
            1 & \text{if} \quad y \quad \text{is longer than} \quad x \\
            0 & \text{otherwise}
        \end{cases}
    \end{equation}

(But note that the example below is not always the best way to do this! Implementation should depend on your context - what do you want to do with the data/matrix?)

In [29]:
words = ['anti', 'happy', 'evening', 'eagles', 'interior', 'zebra']
{(x,y):1 for x in words for y in words if len(y) > len(x)}

{('anti', 'happy'): 1,
 ('anti', 'evening'): 1,
 ('anti', 'eagles'): 1,
 ('anti', 'interior'): 1,
 ('anti', 'zebra'): 1,
 ('happy', 'evening'): 1,
 ('happy', 'eagles'): 1,
 ('happy', 'interior'): 1,
 ('evening', 'interior'): 1,
 ('eagles', 'evening'): 1,
 ('eagles', 'interior'): 1,
 ('zebra', 'evening'): 1,
 ('zebra', 'eagles'): 1,
 ('zebra', 'interior'): 1}

## 1. Read/Write Files
Often, you will need to read some data into your `python` workspace, do something to/with said data, and then write the results to another file. We'll take a look at the most basic file read/write methods, which will get you started with your work.

### 1.0. File objects
Think of a `python` file object as a portal connecting your `python` workspace to a file on your hard drive. You can open a file object with the built-in `open(filename, mode)` function. The `filename` argument is a string specifying the file name, and the `mode` argument can be one of the following values, specifying whether you want to read from or write to the file:
- `'r'`: read
- `'w'`: write (overwrites any existing files with same filename)
- `'a'`: append (write additional to any existing data)

(you can also open files for both read/write with mode `'r+'`, but this best avoided if possible)

By default, files are opened in "text" mode (think: "Files that can be opened and read by a human in a text-editor.) Alternatively, you can open files in "binary mode" by appending a `b` to the `mode` argument (e.g., `wb`, `rb`, `ab`). 

Remember that `open()` simply creates the 'portal', and you have to call additional methods on that file object to either read or write. Since reading can be a little more complicated, let's start with a simple write:

In [30]:
f = open('data/example.txt', 'w')
print(f)

<_io.TextIOWrapper name='data/example.txt' mode='w' encoding='UTF-8'>


Note that after creating the file object, the empty `filename` file (in the above example, `example.txt`) is created in your working directory. Now, let's actually write something to it:

In [31]:
f.write('Something')

9

You can only write strings to a file object:

In [32]:
some_list = [1, 2, 3]
f.write(some_list)

TypeError: write() argument must be str, not list

To write anything other than a string, use the `str()` built-in function to convert it to a string first:

In [33]:
f.write(str(some_list))

9

You might notice that even though you've called `write()` a couple times, the actual file on your hard drive doesn't necessarily get updated. That's because a file object's `write`s are kept in buffer. To complete all the `write`s and close the file object, call the `close()` method:

In [34]:
f.close()

Note that using a closed file object will result in an error:

In [35]:
f.write('...')

ValueError: I/O operation on closed file.

#### 1.0.0. Exercise

Write a short poem ([here](https://en.wikipedia.org/wiki/Category:Poetic_form) are some poetic forms if you're having trouble nailing down your rhyme scheme), and then save each line to a file called `poem.txt` in the `data` directory. Open up the poem in `JupyterLab` to check that it saved correctly.

In [36]:
### START
poem = [
    "岱宗夫如何？齐鲁青未了。",
    "造化钟神秀，阴阳割昏晓。",
    "荡胸生曾云，决眦入归鸟。",
    "会当凌绝顶，一览众山小。"
]
f = open("data/poem.txt", 'w')
for line in poem:
    f.write(line + '\n')
f.close()
###

### 1.1. Reading from a URL
Reading data from a URL in `python` is pretty simple, using the `urllib.request` module. The `urllib.request` module let's you open URLs in `read` mode, as if they were file objects.

Let's use `python`'s `urllib.request` to read Charles Dickens' "A Tale of Two Cities" from https://goo.gl/fHIeOi

(This is just for illustration. Note, there are other libraries that are usually more appropriate for reading/scraping web pages. We'll be learning about some of these this afternoon.)

In [37]:
from urllib.request import urlopen  # the import statement is used in python to import modules/libraries

link = urlopen('https://goo.gl/fHIeOi')  # open the url
print(link)
text = link.read()
link.close()  # just like file objects, url connections should be closed after you're done

<http.client.HTTPResponse object at 0x10c56cb90>


The `text` variable now contains the entire text of "A Tale of Two Cities". 

In [38]:
print(text[0:20])

b'A Tale of Two Cities'


Notice the `b` in front of the quotes. This indicates that the data in our `text` object is saved as `bytes` not strings.

Now let's try writing `text` to a file.

In [39]:
f = open('data/two_cities.txt', 'wb')  # open file object in write (bytes) mode
f.write(text)
f.close()

### 1.2. Reading from file objects
And now, we have a file to practice reading from! We can create a file object just like we did for writing, but with the `'r'` mode specified:

In [40]:
f = open('data/two_cities.txt', 'r')  # open file object in read mode

A file object will iterate over the contents of the file it is connected to. For example, the `readline()` method will read the file, one line at a time. And consecutive calls to `readline()` will keep giving you the next line:

In [41]:
print('first line:', f.readline())  # read the first line
print('second line:', f.readline())  # read the second line

first line: A Tale of Two Cities, by Charles Dickens

second line: 



Since the file object essentially provides an iterator over each line of the file, you can loop over the file object line-by-line. This is memory efficient, fast, and leads to simple code:

In [42]:
n = 1  # a simple counter to control the number of lines printed
for line in f:
    print(line)
    if n > 10: 
        break
    n += 1

[A story of the French Revolution]







CONTENTS











Book the First--Recalled to Life



Just like when writing, don't forget to close files after you're done!

In [43]:
f.close()

As your file I/O gets complex, opening and closing can become quite painful (e.g., what if an error occurs before you close the file object? what happens to the memory it's using?), and forgeting to close file objects is potentially dangerous. So, it's good practice to use the `with` and `as` keywords, which makes sure that the file is properly closed after operations are finished, even if an error occurs during operations:

In [44]:
with open('data/two_cities.txt', 'r') as f:
    n = 1
    for line in f:
        print(line)
        if n > 10: break
        n += 1

A Tale of Two Cities, by Charles Dickens



[A story of the French Revolution]







CONTENTS











### 1.2.0. Exercise

1. Create a `python` dictionary that counts the number of occurences of words, delimitted by white spaces, in "A Tale of Two Cities". Do this using list comprehension.
2. Find words that occur between 500 and 700 times.

__HINT:__ If you do this naively using list comprehensions, you might find that it takes a pretty long time! That's because if you do something like `{... for word in words}`, and the word `the` shows up `words` 10,000 times, you're going to do the _same thing 10,000 times,_ once each time the word "the" comes up. To avoid wasting your (and your computer's!) time, you can use `Python`'s set datatype. A set is like a list, except each element only occurs once. To create a set from a list, simply input `set(list)`. So, for example
```python
    >>> set([1,2,3])
    {1, 2, 3}
    >>> set([1,2,3,1,2,3])
    {1, 2, 3}
```
Sets can be used in more or less the same ways as lists and dicts. In particular, `for element in set` will work the way you expect!

If you're still having trouble, try just using the first 10,000 words of _A Tale of Two Cities_ to construct your dictionary.

In [45]:
### START
# Part 1
with open('data/two_cities.txt') as f:
    words = f.read().split()
    
tts_dict = {word:words.count(word) for word in set(words)}
    
# Part 2
common_tts_dict = {
    word:num
    for (word,num) in tts_dict.items()
    if 500 <= tts_dict[word] and 700 >= tts_dict[word]
}

print(common_tts_dict)
### END

{'The': 574, 'have': 699, 'is': 688, 'said': 570, 'by': 527, 'be': 692, 'Mr.': 603, 'my': 568, 'him': 525, 'were': 630}


## 2. Functions: The powerhouse of the cell

One of the best ways to unlock the full potential of `Python` for data analysis is with functions. Functions allow you to automate repetitive tasks in a safer and easier to understand way than copying and pasting code.

### 2.0. Writing functions

Let's create a function to count the number of vowels in a given string:

In [46]:
def count_vowels(s):
    """Count the number of vowels in a string."""
    vowels = 'aeiouAEIOU'
    nvowels = [s.count(v) for v in vowels]  # count the number of each vowel in s
    return sum(nvowels)  # return the sum of elements in nvowel

# use the new function
count_vowels('Eels are delicious animals')

12

A few quick remarks about this syntax:
- The `def` keyword declares a function **def**inition, followed by a function name and the parenthesized list of formal parameters.
- The statements that form the body of the function start at the next line, and must be indented.
- The first statement of the function body can optionally be a string, also known as the [docstring](https://docs.python.org/3/tutorial/controlflow.html#tut-docstrings).
- Many tools use the docstring to give users meaningful information - so help yourself, make a habit of writing meaningful docstrings.
- Functions that don't finish with a `return` statement return `None` (a special `python` object for "Nothing").

Functions can also return a tuple of values. For example, let's modify our `count_vowels` function to return the number of vowel along with a `list` specifying the number of each vowel.

In [47]:
def count_vowels(s):
    """
    Count the number of vowels in a string.
    
    returns: number of vowels, list containing number of appearance for each vowel 
    """
    vowels = 'aeiouAEIOU'
    nvowels = [s.count(v) for v in vowels]  # count the number of each vowel in s
    return sum(nvowels), list(zip(vowels, nvowels))  # return the sum and a zipped list
                              
count_vowels('Eels are delicious animals')

(12,
 [('a', 3),
  ('e', 3),
  ('i', 3),
  ('o', 1),
  ('u', 1),
  ('A', 0),
  ('E', 1),
  ('I', 0),
  ('O', 0),
  ('U', 0)])

A returned tuple can also be 'unpacked' into multiple variables.

In [48]:
total_count, individual_count = count_vowels('Eels are delicious animals')
print('Found total', total_count, 'vowels, each vowel as follows:')
print(individual_count)

Found total 12 vowels, each vowel as follows:
[('a', 3), ('e', 3), ('i', 3), ('o', 1), ('u', 1), ('A', 0), ('E', 1), ('I', 0), ('O', 0), ('U', 0)]


#### 2.0.0. Exercise

Create a function that takes the following two arguments:
* `x_0`, which is a duple of numbers
* `x_1`, which is also a duple of numbers

and then returns the distance between the two points. Be sure to add a doc string and a `return` statement!

__HINT:__ Remember the formula
    \begin{equation}
        d(x_0, x_1) = \sqrt{(x_0[0] - x_1[0])^2 + (x_0[1] - x_1[1])^2}
    \end{equation}
and the `sqrt()` function.

In [49]:
### START
def distance(x_0, x_1):
    """Finds the distance between two points in __R__^2.
    Args:
        x_0: The first point, as a duple of floats.
        x_1: The second point, as a duple of floats.
    
    Returns:
        Distance between the points."""
    
    return sqrt((x_0[0] - x_1[0]) ** 2 + (x_0[1] - x_1[1]) ** 2)
### END

### 2.1. Functions with optional arguments
Let's further enhance the `count_vowels` function by letting the user specify
- which vowels to count ('aeiouAEIOU' by default)
- whether to return a single sum or a tuple of the sum and list (single sum by default)

This can be achieved by specifying default values in the function declaration.

In [50]:
def count_vowels(s, vowels = 'aeiouAEIOU', returnAll = False):
    """
    Count the number of vowels in a string.
    
    Args:  
        s: the string to count vowels from
        vowels: string of characters that should be considered 
            a vowel (default: aeiouAEIOU)
        returnAll: boolean indicating whether to return just the sum of 
            vowels (default: False) or a tuple of the sum of vowels and a 
            list of occerence for each character
                 
    Returns: 
        number of vowels[ , list of vowel count] 
    """
    # count the number of each vowel in s
    nvowels = [s.count(v) for v in vowels]  
    if returnAll:
        # return the sum and a zipped list
        return sum(nvowels), list(zip(vowels, nvowels))  
    else:
        # return just the sum
        return sum(nvowels)  
                              
count_vowels('Eels are delicious animals')

12

In [51]:
count_vowels('Eels are delicious animals', vowels = 'aeiou')  # no caps

11

In [52]:
count_vowels('Eels are delicious animals', returnAll = True)  # give me EVERYTHING

(12,
 [('a', 3),
  ('e', 3),
  ('i', 3),
  ('o', 1),
  ('u', 1),
  ('A', 0),
  ('E', 1),
  ('I', 0),
  ('O', 0),
  ('U', 0)])

Be careful with having mutable defaults, though. Default values of a function's argument are shared between subsequent calls, and this might cause problems if you're manipulating the argument's value within the function. For example,

In [53]:
def fun(n, stuff=[]):
    """Illustrating issues with mutable defaults."""
    stuff.append(n)
    return stuff

print(fun(1))  # stuff is empty by default
print(fun(2))  # stuff was manipulated, and is now [1] from the previous call!
print(fun(3))  # even worse, stuff is now [1, 2] !!!

[1]
[1, 2]
[1, 2, 3]


This behavior isn't necessarily a problem, and it might even make sense in some contexts. However, it's definitely worth keeping in mind to avoid being surprised. If you want to prevent such behavior, one simple work-around is to set the default to `None`, and check if it is indeed `None`, before assigning the 'true' default, such as:

In [54]:
def fun(n, stuff=None):
    """Fix for mutable defaults."""
    if stuff is None:
        stuff = []
    stuff.append(n)
    return stuff

print(fun(1))  # unspecified argument stuff is None, then set to []
print(fun(2))  # unspecified argument stuff is None, then set to []
print(fun(3))  # unspecified argument stuff is None, then set to []
print(fun(3, [1,2]))  # and we can always specify stuff if we need to!

[1]
[2]
[3]
[1, 2, 3]


#### 2.1.0. Exercise

Modify your `distance()` function from Exercise 2.0.0. so that `x_1` is, by default, the point `(0,0)`.

Finally, to capture an arbitrary number of arguments in a function, you can use the `*name` and `**name` parameters. Note that, if both are present, `*name` **must** occure before `**name`, and both must occur after all the formal parameters. When present, the `*name` parameter receives a tuple containing the positional arguments beyond the formal parameter list, and `**name` receives a dictionary containing the key-value pair of the named arguments, except for those corresponding to a formal parameter. For example:

In [55]:
def fun(n, name='Jongbin', *arguments, **keywords):
    """Demo of *name and **name parameters."""
    print('\n' + '=' * 79)
    print('Function called with n=', n, end = ', ')  # values of the end argument will replace new lines (\n)
    print('Name=', name)
    print('Arguments received:')
    print(end = '\t')  # a tab character to print appropriate indents
    for arg in arguments:
        print(arg, '|', end = ' ')
    print('\nNamed arguments received:')
    print(end = '\t')  # a tab character to print appropriate indents
    for key, value in keywords.items():
        print(key, '=', value, '|', end = '')
        
fun(1)  # supply minimal arguments
fun(2, 'Padme', 'Amidala', 'Princess', 'testing additional arguments')  # some additional arguments
fun(2, 'Luke', gender='male', affiliation='Rebel Alliance', text='testing named arguments')  # named arguments
fun(3, 'Anakin', 'Skywalker', 'Jedi', 2015, weapon='Lightsaber', skill='force')  # both


Function called with n= 1, Name= Jongbin
Arguments received:
	
Named arguments received:
	
Function called with n= 2, Name= Padme
Arguments received:
	Amidala | Princess | testing additional arguments | 
Named arguments received:
	
Function called with n= 2, Name= Luke
Arguments received:
	
Named arguments received:
	gender = male |affiliation = Rebel Alliance |text = testing named arguments |
Function called with n= 3, Name= Anakin
Arguments received:
	Skywalker | Jedi | 2015 | 
Named arguments received:
	weapon = Lightsaber |skill = force |

Sometimes, an opposite situation may occur, where the required arguments are in a list/tuple or keyword arguments are in a dictionary, and you would like to unpack them programatically in the function call. In such cases, you can use the `*name` and `**name` conventions introduced above in the function call. For example:

In [56]:
print('regular call:', range(1, 10, 2))  # the range function takes arguments (start, stop[, step])
args = [1, 10, 2]  # pack the arguments (equivalent to above) into a list
print('unpack from list:', range(*args))  # all the function by unpacking the list

regular call: range(1, 10, 2)
unpack from list: range(1, 10, 2)


In [57]:
def print_info(name, email, phone):
    """Quick demo of keyword argument unpacking."""
    print('Name:', name)
    print('email:', email)
    print('phone:', phone)
    
kwargs = {'name':'Jongbin Jung', 'email':'jongbin at stanford.edu', 'phone':'650-123-4567'}
print_info(**kwargs)

Name: Jongbin Jung
email: jongbin at stanford.edu
phone: 650-123-4567


## Exercise 2.1.0.

Write a function `top_n(d, n=5)`, which takes a dictionary of word counts (such as that created in Exercise 1.2.0.) and an optional argument `n`, and prints words that have the top `n` count, along with the actual count. Don't forget to include a docstring!

__HINT:__ To sort a dictionary by its values, use the built-in function `sorted(iterable, cmp=None, key=None, reverse=False)`; you can set the sorting `key` to the dictionary's value by setting `key=d.get`, and sort in descending order by setting `reverse=True`.

__HINT:__ You might want to create a simple word count dictionary to test your function.

__BONUS CHALLENGE:__ Can you modify `top_n` so that instead of a dictionary `d`, it takes a series of named arguments, e.g., `a = 100, the = 99, for = 98`?

In [58]:
### START
def top_n(d, n = 5):
    """Prints the `n` words occurring most frequently.
    
    Args:
        d: A dictionary which contains `key:value` pairs of the following sort:
            `word:count`, where `word` is a word, and `count` is the number of
            occurrences.
        n: The number of words to print.
    Returns:
        None. (Prints the top `n` words as a side-effect.)
    """
    d = sorted(d, key = d.get, reverse = True)
    print(d[:n])

def top_n_bonus(n = 5, **kwargs):
    """Prints the `n` words occurring most frequently in the keyword arguments.
    
    Args:
        n: The number of words to print.
        kwargs: keyword arguments of the form `word = n`, where `n` is the word count.
    Returns:
        None. (Prints the top `n` words as a side-effect.)
    """
    d = sorted(kwargs, key = kwargs.get, reverse = True)
    print(d[:n])
### END

### 2.2. Modules
Once you start building functions, you might want to collect certain functions as a general 'toolbox' to be used across multiple projects. In `python`, you can put definitions in a file with a `.py` extension. Such a file is called a `module`. Once you save your functions into a `module`, you can `import` them. Let's practice with some examples.

For illustration purposes, create let's create two modules that contain one function of the same name each:

In [59]:
# save this function to a file named module1.py
def speak():
    """Make module 1 say something"""
    print('Module 1 speaking ...')

In [60]:
# save this function to a file named module2.py
def speak():
    """Make module 2 say something"""
    print('Hi, this is module 2 speaking!')

You can import each module (and the functions in them) using the `import` statement as follows:

In [61]:
import module1
import module2

Note that the name you use in the `import` statement is just the file name of the module, without the `.py` extension. 

When you `import` a module, `python` creates an isolated 'space' for each module. This allows different modules to have functions of the same name, without causing confusion. But because of this, whenever you want to use a function from a certain module, you have to specify the module name before calling the function. Compare:

In [62]:
module1.speak()
module2.speak()

Module 1 speaking ...
Hi, this is module 2 speaking!


This can be a bit painful (and messy) if your module names get longer. There are typically two ways to work around this:
1. `import` with the `as` keyword to assign your own name to a model
1. assign your own function name to a module's function

Each approach is illustrated below, which to use should depend on the context and personal style:

In [63]:
import module1 as m1
import module2 as m2
m1.speak()
m2.speak()

Module 1 speaking ...
Hi, this is module 2 speaking!


In [64]:
import module1
import module2

speak1 = module1.speak  # note the lack of parentheses
speak2 = module2.speak  # when assigning functions to a new name

speak1()
speak2()

Module 1 speaking ...
Hi, this is module 2 speaking!


Finally, modules can also be executed as standalone scripts. However, to do this, the module must know when it's been `import`ed or executed. This is done in `python` by specifying a `__name__` variable within each module's 'space'. When a module is `import`ed, it's `__name__` variable is set to the filename it was imported from:

In [65]:
module1.__name__

'module1'

However, if a module is executed, for example from the terminal with the command,
```bash
python module_name.py
```
then the `__name__` variable is set to `__main__`.

To illustrate this, let's create a new module, `module3.py`:

In [66]:
# save this code to a file named module3.py
def speak():
    """Make module 3 say something"""
    print('My __name__ is', __name__)
    

if __name__ == '__main__':
    speak()
    print('You\'ve executed me!')

My __name__ is __main__
You've executed me!


In [67]:
import module3

module3.speak()

My __name__ is module3


Now, instead of `import`ing module3, execute it from a command prompt with the command:
```bash
python module3.py
```
(you can open a command prompt within `spyder`)

The output should look like:
> `My __name__ is __main__` <br />
> `You've executed me!`

### 2.3. Using modules from the command line

When executing a module from the command prompt, you can also pass arguments to the module in the form of 
> `python filename.py arguments`

The arguments are passed to the module via a list in the `sys` standard module, and can by accessed by calling `sys.argv`. (*Standard* modules are modules that are built-in to `python`, like `statistics` or `math`---we've been using these since the warmup!) The first (position 0) element of `sys.argv` contains the execution call of the module, so arguments that are passed through the command prompt start from position 1. 

For example, we can write a module that takes a single argument from the command prompt as follows:

In [68]:
# save this code to a file named module4.py
import sys  # import the standard module sys

if __name__ == '__main__':
    print('The first element of sys.argv is', sys.argv[0])
    print('The argument passed was:', sys.argv[1])

The first element of sys.argv is /usr/local/lib/python3.7/site-packages/ipykernel_launcher.py
The argument passed was: -f


Then, execute from the command prompt with an argument, for example:
> `python module4.py hello`

This should print to the screen:
> `The first element of sys.argv is module4.py` <br />
> `The argument passed was: hello`

Note that all arguments are passed as a string by default. If you want to use a different type, you will have to convert it within `python`, e.g., `int(sys.argv[1])` to convert the first argument into an integer).

(if you want to do some serious argument parsing, you should take a look at the [`argparse` module](https://docs.python.org/2/howto/argparse.html))

#### 2.2.0. Exercise

Expand your function from Exercise 2.1.0. into a module called `dictionary_maker.py` that can be executed with a target filename and integer `n` as an argument, i.e.,
```bash
    > python dictionary_maker.py target_file.txt 5
```
which:
 1. Reads the contents of the target file,
 2. Generates a word occurence count dictionary from the text, and
 3. Prints word/count of the words with top `n` occurences.
 
__HINT:__ You shouldn't need to write any new functions---see if you can do this exercise entirely with function's you've already written.

In [72]:
# Put this code into `dictionary_maker.py`!

### START
# from sys import argv
# 
# def top_n(d, n = 5):
#     """Prints the `n` words occurring most frequently.
#     
#     Args:
#         d: A dictionary which contains `key:value` pairs of the following sort:
#             `word:count`, where `word` is a word, and `count` is the number of
#             occurrences.
#         n: The number of words to print.
#     Returns:
#         None. (Prints the top `n` words as a side-effect.)
#     """
# d = sorted(d, key = d.get, reverse = True)
#     print(d[:n])
# 
# def make_dict(f):
#     """Turns a file into a dictionary by tokenizing text using whitespace.
#     
#     Args:
#         f: A file object.
#     Returns:
#         A dictionary of `key:value` pairs where `key` is a token (i.e., word)
#         and `value` is the number of occurences in the file.
#     """
#     words = f.read().split(" \n\t")
#     {word:words.count(word) for word in set(words)}
# 
# if __name__ == "__main__":
#     with open(argv[1], "r") as f:
#         d = make_dict(f)
#         top_n(d, n = argv[2])
### END