# Python Basics, Part 2

Now we know some of the basic "things" in `Python`---numbers, strings, dicts, lists---it's important to learn how to, well, do things with things. Some of the basics are doing different things depending on what our data are, transforming and manipulating data, saving and retrieving data, and structuring data. Generally speaking, in `Python`, changing how your code handles different inputs is accomplished using _control flow_, data manipulation is handled with _functions_, saving and retrieving data is handled using Python's `io` module, and data structuring is done with _objects_.

## Control flow

There are three basic ways of changing your codes behavior depending on the input: `if` statements, `for` statements, and `while` statements. At a high level, `if` statements only do something (surprise, surprise) if something is true: you only take the square root of a number if it's positive or open a file if it actually exists. On the other hand, `for` statements do the same thing for everything in a group: if you want to double every number in a list or extract the phone number of everyone in your address book, use a `for` statement. While they're very important in other languages, `while` statements are very rarely necessary in `Python`. If you're using a `while` statement, chances are you could do the same thing more safely and more cleanly with a `for` statement.

### `if` statements
An example should suffice:

In [1]:
x = int(input('Give me a BIG number: '))
if x < 0:
    print('You\'re joking, right?')
elif x < 1e3:
    print('Try harder ... ')
else:
    print('Nice.')

Give me a BIG number:  10


Try harder ... 


Some notes on the above code:
- the `input()` function (as you've now seen), promts the user for an input
- the `int()` (tries to) convert string values to integers (`raw_input()` will always return the user's input as a string)
- `elif` is short for `else, if`, and there can be none or more than one `elif` sequences
- the `else` clause is optional

One more thing that's implicit but *__extremely__* important: **indents.**

- `Python`, unlike many other languages out there, doesn't use curley brackets {}
- instead, blocks of grouped code are identified by the level of indents (this is something to get used to, if you've never seen it before)
- word of caution: NEVER USE <kbd>Tab</kbd> (although, don't worry too much: `JupyterLab` changes all your <kbd>Tab</kbd>s to four spaces by default, which is the [PEP 8 spec for indentation][pep8] in `python`)

[pep8]: https://www.python.org/dev/peps/pep-0008/#indentation

### `for` statements
The `for` statement in `python` iterates over the items of any sequence (e.g., lists and even strings!), in the order that they appear in the sequence.

In [2]:
names = ['Jamie', 'Cersei', 'Jon', 'Sansa']

for name in names:
    print(name, 'has', len(name), 'characters and starts with a', name[0])

Jamie has 5 characters and starts with a J
Cersei has 6 characters and starts with a C
Jon has 3 characters and starts with a J
Sansa has 5 characters and starts with a S


The example above introduces a few new concepts:
- the variable `name` is defined along with the declaration of the `for` statement. It doesn't need to exist beforehand
- it's good practice to use plurals for collections (`names` for the list) and singulars for individual items (`name` for each name)

Since strings are like lists, you can also loop over a string, one character at a time.

In [3]:
vowels = ['a', 'e', 'i', 'o', 'u']  # make a list of vowels
for name in names:
    vowel_count = 0  # initialize the vowel count
    for char in name:
        if char in vowels:
            vowel_count += 1
    print(name, 'has', vowel_count, 'vowel(s)')

Jamie has 3 vowel(s)
Cersei has 3 vowel(s)
Jon has 1 vowel(s)
Sansa has 2 vowel(s)


You can use the built-in `range()` function to do a more 'classic' `for` loop over a sequence of numbers.

In [4]:
for i in range(10): print(i)

0
1
2
3
4
5
6
7
8
9


`range(len)` generates the legal indices (starting from 0) for a sequence of length `len`. You can also use `range(start, stop[, step])` to specify the start, end, and (optionally) step to take.

(The `[, step]` notation in the fuction signiture shows that the `step` argument is optional. It's useful to know such conventions when refering to the docs.)

In [5]:
for i in range(4,8): print(i)

4
5
6
7


In [6]:
for i in range(4,8,2): print(i)

4
6


In [7]:
for i in range(20,4,-3): print(i)

20
17
14
11
8
5


You can combine `range()` with `len()` to iterate over the indices of a sequence.

In [8]:
for i in range(len(names)):
    print('Name', i, 'is', names[i])

Name 0 is Jamie
Name 1 is Cersei
Name 2 is Jon
Name 3 is Sansa


But in such cases, the `enumerate()` function is usually more convenient.

In [9]:
for i, name in enumerate(names):
    print('Name', i, 'is', name)

Name 0 is Jamie
Name 1 is Cersei
Name 2 is Jon
Name 3 is Sansa


As you might have guessed, the `enumerate()` function takes a sequence, and returns the (index, value) pairs for each item (the 'pairs' are actually `tuple`s!), and you can assign items from a `tuple` to its own variable in the `for` statement.

In [10]:
print(list(enumerate(names)))

[(0, 'Jamie'), (1, 'Cersei'), (2, 'Jon'), (3, 'Sansa')]


Occasionally, you might want to loop over two or more sequences at a time. You can pair the entries with the `zip()` function.

In [11]:
title = 'Game of Thrones'
houses = ['Lannister', 'Lannister', 'Snow', 'Stark']
for char, house, name in zip(title, houses, names):
    print(char, '-', name, house)

G - Jamie Lannister
a - Cersei Lannister
m - Jon Snow
e - Sansa Stark


Note how `zip()` gracefully fits the iterator to the length of the shortest sequence, i.e., only the first four characters of the string 'Game of Thrones' were iterated.

### `break` and `continue` statements
You can manage your loops in more detail using `break` and `continue` statements. 

A `break` statement, as the name implies, will break you out of the smallest enclosing loop.

In [12]:
for name, house in zip(names, houses):
    if house == 'Snow':
        break
    else:
        print(name, house)

Jamie Lannister
Cersei Lannister


A `continue` statement will simply skip over to the next item in the iterator, instead of breaking out of the loop.

In [13]:
for name, house in zip(names, houses):
    if house == 'Snow':
        continue  # compare to the previous example where we stopped the loop at Snow, now we simply skip it
    else:
        print(name, house)

Jamie Lannister
Cersei Lannister
Sansa Stark


## Read/Write Files
Often, you will need to read some data into your `python` workspace, do something to/with said data, and then write the results to another file. We'll take a look at the most basic file read/write methods, which will get you started with your work.

### File objects
Think of a `python` file object as a portal connecting your `python` workspace to a file on your hard drive. You can open a file object with the built-in `open(filename, mode)` function. The `filename` argument is a string specifying the file name, and the `mode` argument can be one of the following values, specifying whether you want to read from or write to the file:
- `'r'`: read
- `'w'`: write (overwrites any existing files with same filename)
- `'a'`: append (write additional to any existing data)

(you can also open files for both read/write with mode `'r+'`, but this best avoided if possible)

By default, files are opened in "text" mode (think: "Files that can be opened and read by a human in a text-editor.) Alternatively, you can open files in "binary mode" by appending a `b` to the `mode` argument (e.g., `wb`, `rb`, `ab`). 

Remember that `open()` simply creates the 'portal', and you have to call additional methods on that file object to either read or write. Since reading can be a little more complicated, let's start with a simple write:

In [14]:
f = open('data/example.txt', 'w')
print(f)

<_io.TextIOWrapper name='data/example.txt' mode='w' encoding='UTF-8'>


Note that after creating the file object, the empty `filename` file (in the above example, `example.txt`) is created in your working directory. Now, let's actually write something to it:

In [15]:
f.write('Something')

9

You can only write strings to a file object:

In [16]:
some_list = [1, 2, 3]
f.write(some_list)

TypeError: write() argument must be str, not list

To write anything other than a string, use the `str()` built-in function to convert it to a string first:

In [17]:
f.write(str(some_list))

9

You might notice that even though you've called `write()` a couple times, the actual file on your hard drive doesn't necessarily get updated. That's because a file object's `write`s are kept in buffer. To complete all the `write`s and close the file object, call the `close()` method:

In [18]:
f.close()

Note that using a closed file object will result in an error:

In [19]:
f.write('...')

ValueError: I/O operation on closed file.

### Reading from a URL
Reading data from a URL in `python` is pretty simple, using the `urllib.request` module. The `urllib.request` module let's you open URLs in `read` mode, as if they were file objects.

Let's use `python`'s `urllib.request` to read Charles Dickens' "A Tale of Two Cities" from https://goo.gl/fHIeOi

(This is just for illustration. Note, there are other libraries that are usually more appropriate for reading/scraping web pages.)

In [20]:
from urllib.request import urlopen  # the import statement is used in python to import modules/libraries

link = urlopen('https://goo.gl/fHIeOi')  # open the url
print(link)
text = link.read()
link.close()  # just like file objects, url connections should be closed after you're done

<http.client.HTTPResponse object at 0x10c4c3d50>


The `text` variable now contains the entire text of "A Tale of Two Cities". 

In [21]:
print(text[0:20])

b'A Tale of Two Cities'


Notice the `b` in front of the quotes. This indicates that the data in our `text` object is saved as `bytes` not strings.

Now let's try writing `text` to a file.

In [22]:
f = open('data/two_cities.txt', 'wb')  # open file object in write (bytes) mode
f.write(text)
f.close()

### Reading from file objects
And now, we have a file to practice reading from! We can create a file object just like we did for writing, but with the `'r'` mode specified:

In [24]:
f = open('data/two_cities.txt', 'r')  # open file object in read mode

A file object will iterate over the contents of the file it is connected to. For example, the `readline()` method will read the file, one line at a time. And consecutive calls to `readline()` will keep giving you the next line:

In [25]:
print('first line:', f.readline())  # read the first line
print('second line:', f.readline())  # read the second line

first line: A Tale of Two Cities, by Charles Dickens

second line: 



Since the file object essentially provides an iterator over each line of the file, you can loop over the file object line-by-line. This is memory efficient, fast, and leads to simple code:

In [26]:
n = 1  # a simple counter to control the number of lines printed
for line in f:
    print(line)
    if n > 10: 
        break
    n += 1

[A story of the French Revolution]







CONTENTS











Book the First--Recalled to Life



Just like when writing, don't forget to close files after you're done!

In [27]:
f.close()

As your file I/O gets complex, opening and closing can become quite painful (e.g., what if an error occurs before you close the file object? what happens to the memory it's using?), and forgeting to close file objects is potentially dangerous. So, it's good practice to use the `with` and `as` keywords, which makes sure that the file is properly closed after operations are finished, even if an error occurs during operations:

In [29]:
with open('data/two_cities.txt', 'r') as f:
    n = 1
    for line in f:
        print(line)
        if n > 10: break
        n += 1

A Tale of Two Cities, by Charles Dickens



[A story of the French Revolution]







CONTENTS











## Functions: The powerhouse of the cell

One of the best ways to unlock the full potential of `Python` for data analysis is with functions. Functions allow you to automate repetitive tasks in a safer and easier to understand way than copying and pasting code.

### Writing functions
Let's create a function to count the number of vowels in a given string


In [30]:
def count_vowels(s):
    """Count the number of vowels in a string."""
    vowels = 'aeiouAEIOU'
    nvowels = [s.count(v) for v in vowels]  # count the number of each vowel in s
    return sum(nvowels)  # return the sum of elements in nvowel

# use the new function
count_vowels('Eels are delicious animals')

12

- the `def` keyword declares a function **def**inition, followed by a function name and the parenthesized list of formal parameters
- the statements that form the body of the function start at the next line, and must be indented
- the first statement of the function body can optionally be a string, also known as the [docstring](https://docs.python.org/3/tutorial/controlflow.html#tut-docstrings)
- many tools (such as `spyder`) use the docstring to give users meaningful information - so help yourself, make a habit of writing meaningful docstrings
- functions that don't finish with a `return` statement return `None` (a special `python` object for "Nothing")

Functions can also return a tuple of values. For example, let's modify our `count_vowels` function to return the number of vowel along with a `list` specifying the number of each vowel.

In [31]:
def count_vowels(s):
    """
    Count the number of vowels in a string.
    
    returns: number of vowels, list containing number of appearance for each vowel 
    """
    vowels = 'aeiouAEIOU'
    nvowels = [s.count(v) for v in vowels]  # count the number of each vowel in s
    return sum(nvowels), list(zip(vowels, nvowels))  # return the sum and a zipped list
                              
count_vowels('Eels are delicious animals')

(12,
 [('a', 3),
  ('e', 3),
  ('i', 3),
  ('o', 1),
  ('u', 1),
  ('A', 0),
  ('E', 1),
  ('I', 0),
  ('O', 0),
  ('U', 0)])

A returned tuple can also be 'unpacked' into multiple variables.

In [32]:
total_count, individual_count = count_vowels('Eels are delicious animals')
print('Found total', total_count, 'vowels, each vowel as follows:')
print(individual_count)

Found total 12 vowels, each vowel as follows:
[('a', 3), ('e', 3), ('i', 3), ('o', 1), ('u', 1), ('A', 0), ('E', 1), ('I', 0), ('O', 0), ('U', 0)]


### Modules
Once you start building functions, you might want to collect certain functions as a general 'toolbox' to be used across multiple projects. In `python`, you can put definitions in a file with a `.py` extension. Such a file is called a `module`. Once you save your functions into a `module`, you can `import` them. Let's practice with some examples.

For illustration purposes, create let's create two modules that contain one function of the same name each:

In [33]:
# save this function to a file named module1.py
def speak():
    """Make module 1 say something"""
    print('Module 1 speaking ...')

In [34]:
# save this function to a file named module2.py
def speak():
    """Make module 2 say something"""
    print('Hi, this is module 2 speaking!')

You can import each module (and the functions in them) using the `import` statement as follows:

In [35]:
import module1
import module2

Note that the name you use in the `import` statement is just the file name of the module, without the `.py` extension. 

When you `import` a module, `python` creates an isolated 'space' for each module. This allows different modules to have functions of the same name, without causing confusion. But because of this, whenever you want to use a function from a certain module, you have to specify the module name before calling the function. Compare:

In [36]:
module1.speak()
module2.speak()

Module 1 speaking ...
Hi, this is module 2 speaking!


This can be a bit painful (and messy) if your module names get longer. There are typically two ways to work around this:
1. `import` with the `as` keyword to assign your own name to a model
1. assign your own function name to a module's function

Each approach is illustrated below, which to use should depend on the context and personal style:

In [37]:
import module1 as m1
import module2 as m2
m1.speak()
m2.speak()

Module 1 speaking ...
Hi, this is module 2 speaking!


In [38]:
import module1
import module2

speak1 = module1.speak  # note the lack of parentheses
speak2 = module2.speak  # when assigning functions to a new name

speak1()
speak2()

Module 1 speaking ...
Hi, this is module 2 speaking!


### Classes
If you've worked with/written classes before in another language, then dealing with `class`es in `python` should be a breeze, once you get some syntax cleared. (But on the other hand, if you are experienced in programming to that degree, this introductory workshop was probably pretty boring for you...)

If you've never heard of object-oriented programming or `class`es in a programming context, keep in mind that while we'll try to cover some of the core concepts, this is by no means an exhaustive treatment of the topic. If you want to learn more about the fundamentals of object-oriented programming, there are tons of books and courses focusing on exactly that. 

Here, lets try to focus on what's useful in getting work done with `python`, even if you don't quite understand precisely what's going on under-the-hood.

In `python`, you can define a `class` using the `class` keyword, followed by a class name and indented statements that belong to the class (one `python` naming convention is to use CamelCases for class names):

In [39]:
class MyFirstClass:
    """A simple example class"""
    def __init__(self, name='Jongbin'):
        self.name = name

Let's pause there for a second.

So, what *is* a `class`? I find it useful (not always correct, but useful) to think of a `class` as a blueprint for some kind of entity. For example, a Human would be a `class`. A `class` itself doesn't necessarily do anything, just like the concept of a Human doesn't do much. However, there can be *instances* (copies, actual realizations) of a `class`, just like we are all instances of the Human `class`.

The `class` definition, therefore, defines what that `class` looks like. 

- What **attribute** (data, values) does it have? For example, the Human `class` typically has a head, two arms, and two legs. 
- What **methods** does the `class` have, i.e., what can the `class` do? For example, the Human `class` can think(), walk(), run(), type(), etc., each involving some kind of maneuvering of its attributes (head, arms, legs, ...)

While we're at it, let's try and define a simple Human `class`:

In [40]:
class Human:
    """Basic Human class"""
    def __init__(self):
        """Initialize the Human"""
        self.head = True
    
    def think(self):
        """Makes the Human think"""
        if self.head:
            print("I'm thinking...")
        else:
            print("Pretty hard to think without a head...")

Remember, this definition is just a blueprint. If we actually want to use a human of the `class` Human, we need to create an *instance* of `Human` (note another convention: instances will usually be lower case). Creating an instance in `python` is as simple as pretending the `class` is a function and *calling* it:

In [41]:
jongbin = Human()

Once we've created an instance, that instance has all the attributes and methods described in the blueprint:

In [42]:
print(jongbin.head)  # print the value of jongbin's head
jongbin.think()  # make the Human jongbin think
jongbin.head = False  # set the value of jongbin's head to False
jongbin.think()  # make the Human jongbin think, again

True
I'm thinking...
Pretty hard to think without a head...


Some notes on syntax:
- `self`: The `self` keyword in `class` definitions refer to the instance of the `class` that is making the calls. All `class` methods (functions defined inside a `class` definition) are required to take `self` as the first argument. For now, let's think of this as a syntactic requirement. More on this later ...
- `__init__`: The `__init__` method is called by default when a `class` instance is created. If the `class` definition doesn't explicitly define an `__init__` method, an empty `class` object is created. In other words, you don't *__have to__* define an `__init__` method, but it's usually useful to have one so that you can declare some initial values for instances of your `class`. 

The `__init__` method can also take arguments (other than `self`). So, for example, if we wanted our humans to have a name, and let users define the human's name when the human is created, we can modify the `__init__` method to:

In [43]:
class Human:
    """Basic Human class"""
    def __init__(self, name):
        """Initialize the Human"""
        self.head = True
        self.name = name
    
    def think(self):
        """Makes the Human think"""
        if self.head:
            print('My name is', self.name, "and I'm thinking...")
        else:
            print("Pretty hard to think without a head...")

In [44]:
jongbin = Human('Jongbin')
jongbin.think()

My name is Jongbin and I'm thinking...


#### The `self`
Now let's talk a bit more about that `self`. It is important to have a clear distinction between the `class` as a class definition (or `class` object), and an actual *__instance__* of said `class`. Attributes in a `class` definition that are not prepended with `self` will belong to the `class` object, and anything prepended with a `self` will belong to the instance of that `class` that is currently making the call. 

This can be mildly confusing, but becomes quite important if you have mutable attributes. Let's add a list of `tools` for our Human `class`:

In [45]:
class Human:
    """Basic Human class"""
    tools = []  # all Humans have a list of tools
    
    def __init__(self, name):
        """Initialize the Human"""
        self.head = True
        self.name = name
    
    def think(self):
        """Makes the Human think"""
        if self.head:
            print('My name is', self.name, "and I'm thinking...")
        else:
            print("Pretty hard to think without a head...")
            
    def add_tool(self, tool):
        """Add a tool to the list of tools"""
        self.tools.append(tool)
        
    def get_tools(self):
        """Print the tools that the human has"""
        print(self.name, 'has:', end = ' ')
        for tool in self.tools:
            print(tool, end = ' | ')
        print()  # create newline after last tool

Now, lets create some humans, and see how the list of tools work:

In [46]:
luke = Human('Luke')
anakin = Human('Anakin')
luke.add_tool('light saber')  # add light saber to Luke's tools
anakin.add_tool('death star')  # add death star to Anakin's tools
luke.get_tools()  # print Luke's tools
anakin.get_tools()  # print Anakin's tools

Luke has: light saber | death star | 
Anakin has: light saber | death star | 


See that even though two Human instances added their own tool, both ended up having all the tools, since the `tools` list was declared without the `self` prepended, making it an attribute of the `class` object, instead of an attribute of each instance.

If we wanted each Human to have ther own list of tools, a better way to define the `class` would have been:

In [47]:
class Human:
    """Basic Human class"""
    def __init__(self, name):
        """Initialize the Human"""
        self.head = True
        self.name = name
    
        self.tools = []  # initiate a specific human's list of tools
        
    def think(self):
        """Makes the Human think"""
        if self.head:
            print('My name is', self.name, "and I'm thinking...")
        else:
            print("Pretty hard to think without a head...")
            
    def add_tool(self, tool):
        """Add a tool to the list of tools"""
        self.tools.append(tool)
        
    def get_tools(self):
        """Print the tools that the human has"""
        print(self.name, 'has:', end = " ")
        for tool in self.tools:
            print(tool, end = ' | ')
        print()  # create newline after last tool

In [48]:
luke = Human('Luke')
anakin = Human('Anakin')
luke.add_tool('light saber')  # add light saber to Luke's tools
anakin.add_tool('death star')  # add death star to Anakin's tools
luke.get_tools()  # print Luke's tools
anakin.get_tools()  # print Anakin's tools

Luke has: light saber | 
Anakin has: death star | 


Note that there's no right or wrong way define attributes. There actually might be cases where you *__want__* different instances to share a single attribute. 

For example, think of a Bar `class` that has a bathroom attribute. All bars share a single bathroom which can only hold a single customer. We want to print a message if any bar tries to use *the* bathroom while it's occupied. We can define such a Bar `class` as:

In [49]:
class Bar:
    """A fictional Bar"""
    bathroom_occupancy = []
    def use_bathroom(self):
        if len(self.bathroom_occupancy) < 1:
            print('use bathroom')
            self.bathroom_occupancy.append('in use')
        else:
            print('bathroom in use!')
            
            
# instantiate two bars
coolBar = Bar()
hotBar = Bar()

print('coolBar:', end = ' ') 
coolBar.use_bathroom()
print('hotBar:', end = ' ') 
hotBar.use_bathroom()

coolBar: use bathroom
hotBar: bathroom in use!
