# BDSI Python 101

Date: June 12, 2017

Instructor: Jonathan Stroud (stroud@umich.edu)

*******************************************************************
*******************************************************************

## Introduction
*Python* is a modern, robust, high level programming language. It is very easy to pick up even if you are completely new to programming.

*Jupyter notebooks* are a convenient way to write and execute Python code. They are widely used by data scientists and researchers. You're using one right now!

## Installation

In order to load these notebooks on your machine, you need to install Jupyter. You can do this with Anaconda, which includes many other python packages as well. If you prefer something lightweight, you can install Jupyter directly. 

### Option 1: With Anaconda (preferred)

Follow the instructions on Anaconda's website, here: https://www.continuum.io/downloads. Anaconda includes everything you need for these tutorials. **\*\*These tutorials assume you are using Python 2.7.**

Once you're done with that, open a terminal and enter

```
jupyter notebook
```

Congrats! You're running Jupyter. The interface should appear in your browser.

### Option 2: With plain Python

#### 1. Install Python

Mac OS X and Linux comes pre installed with python. Windows users can download python from https://www.python.org/downloads/ .

**\*\*These tutorials assume you are using Python 2.7.**

#### 2. Install Jupyter

To install Jupyter, open a terminal and run

```
pip install jupyter
```

Once you're done with that, open a terminal and enter

```
jupyter notebook
```

Congrats! You're running Jupyter. The interface should appear in your browser.


## Launching IPython Notebook

From the terminal, run:

```
ipython notebook
```

in the directory that contains the notebooks.

## Using this resource

(from the command line)

1. Download the notebooks and data: 
```
git clone bdsipython
```
2. Launch Jupyter from the folder which contains the notebooks.
```
cd bdsipython
jupyter notebook
```
3. Open each one of them and enter
```
Cell > All Output > Clear
```
This will clear all the outputs and now you can understand each statement and learn interactively.

## Acknowledgments

These notes are adopted & condensed from: https://github.com/rajathkumarmp/Python-Lectures

****************************************************************
****************************************************************

# Variables

A name that is used to denote something or a value is called a variable. In python, variables can be declared and values can be assigned to it as follows,

In [1]:
x = 2
y = 5
xy = '"Hey"'
z = '2'
b = True

In [2]:
print x+y, xy
print x + 1
print z + xy
print z, xy
print b

7 "Hey"
3
2"Hey"
2 "Hey"
True


In [3]:
z

'2'

# Operators


## Arithmetic Operators

| Symbol | Task Performed |
|----|---|
| +  | Addition |
| -  | Subtraction |
| /  | division |
| *  | multiplication |
| **  | to the power of |

In [4]:
1+2

3

In [5]:
2-1

1

In [6]:
1*2

2

In [7]:
2/3

0

Why 0? This is because both the numerator and denominator are integers but the result is a float value hence an integer value is returned. By changing either the numerator or the denominator to float, correct answer can be obtained.

In [67]:
print 2/3         # integer division
print 2/3.0       # float division
print 2.0/3.0     # float division
print 2/float(3)  # float division

0
0.666666666667
0.666666666667
0.666666666667


In [9]:
2**3

8

## Relational Operators

| Symbol | Task Performed |
|----|---|
| == | True, if it is equal |
| !=  | True, if not equal to |
| < | less than |
| > | greater than |
| <=  | less than or equal to |
| >=  | greater than or equal to |
| and | both are true |
| or | either or both are true |

In [10]:
z = 1  # sets z to 1

In [11]:
z == 1 # checks if z is 1

True

In [12]:
z > 1

False

In [13]:
(z > 1) and (z < 2)

False

In [14]:
(z > 1) or (z < 2)

True

# Strings

_Strings_ are a data type in python that consist of a sequence of characters. We commonly use strings to store words or sentences. Python has many ways to define strings, including both single and double quotes.

In [52]:
word1 = 'Hello'
word2 = "world"

In [53]:
print word1, word2

Hello world


Strings often behave differently than numerical values when we apply typical operators.

In [58]:
print word1 + word2
print word1 + ' ' + word2 + '!'
print word1 * 2
#print word1 - word2 # This causes an error
#print word1 / word2 # Error

Helloworld
Hello world!
HelloHello


In [61]:
print len(word1)
print word1.lower()
print word2.upper()

5
hello
WORLD


# Lists

*Lists* are a data structure that store many values in sequence. They behave a lot like strings, but can store any values, not just characters.

To create a list, use square brackets to enclose the list, and commas to separate values. Like this:

In [15]:
x = ['apple', 'orange']
print x

['apple', 'orange']


You can refer to specific elements in a list by indexing. Indices always start at 0.

In [16]:
x[0]

'apple'

In [17]:
x[1]

'orange'

In [18]:
#x[2] # This will return an error!

Lists can contain any kind of data, even other lists.

In [19]:
y = ['carrot', 'potato']

In [20]:
z = [x, y, 1.5, False, 'Hello!']
print z

[['apple', 'orange'], ['carrot', 'potato'], 1.5, False, 'Hello!']


In [21]:
print z[0]
print z[0][0]

['apple', 'orange']
apple


### Slicing

*Slicing* allows you to retrieve many elements of a list at the same time.

Slicing is done by defining the index values of the first element and the last element from the parent list that is required in the sliced list. It is written as `parentlist[a:b]` where `a`,`b` are the index values from the parent list. If a or b is not defined then the index value is considered to be the first value for a if a is not defined and the last value for b when b is not defined.

In [22]:
num = [0,1,2,3,4,5,6,7,8,9]

In [23]:
print num[1:4] # elements 1, 2, 3
print num[0:4] # elements 0, 1, 2, 3
print num[:4]  # first 4 elements
print num[4:]  # elements 4 through end

[1, 2, 3]
[0, 1, 2, 3]
[0, 1, 2, 3]
[4, 5, 6, 7, 8, 9]


### Operations on lists

You can do lots of cool things with lists.

| Function | Task Performed |
|----|---|
| len(list) | length of the list |
| min(list)  | minimum value in the list |
| max(list) | maximum value in the list |
| sum(list) | add all values in the list |
| sorted(list) | sort list elements in ascending order |
| list1 + list2 | concatenate two lists |
| v in list  | check if value v is in list |

In [24]:
len(num) # length of the list

10

In [25]:
min(num) # minimum value

0

In [26]:
max(num) # maximum value

9

In [69]:
sum(num) # sum of all values

45

In [88]:
sorted([3, 4, 2, 7, 1, 9, 5])

[1, 2, 3, 4, 5, 7, 9]

Lists can be concatenated by adding, (`+`) them. The resultant list will contain all the elements of the lists that were added.

In [27]:
[1,2,3] + [5,4,7]

[1, 2, 3, 5, 4, 7]

To check if a list contains an element, use `in`:

In [28]:
names = ['Earth','Air','Fire','Water']

In [29]:
'Fire' in names

True

In [30]:
'BDSI' in names

False

### More List functions

Lists are a type of python *object*, which means they have their own set of functions that can be accessed by `list.function_name`. These operations sometimes change the values within a list.

| Function | Task performed |
|----|---|
| list.append(v)  | add value v to end of list |
| list.index(v) | return the first index of value v in list |
| list.insert(i, v) | insert value v in list at index i |
| list.remove(v) | remove value v in list |
| list.count(v) | count how many times v appears in list |

`append` is used to add a single element to the end of a list.

In [31]:
l = [1, 1, 4, 8, 7]
print l
l.append(1)
print l

[1, 1, 4, 8, 7]
[1, 1, 4, 8, 7, 1]


`index` returns the location of the specified value. If it appears multiple times, it only returns the first index. If it doesnt exist, python returns an error.

In [32]:
print l.index(1)
print l.index(4)
#print l.index(100) # this will return an error

0
2


`insert` and `remove` are pretty straightforward.

In [33]:
print l

l.remove(4) # removes value 4
print l

l.insert(3, 1000) # adds value 1000 at index 5
print l

[1, 1, 4, 8, 7, 1]
[1, 1, 8, 7, 1]
[1, 1, 8, 1000, 7, 1]


In [34]:
print l.count(1)

3


# Sets

_Sets_ are another type of collection available in Python. Like mathematical sets, they are unordered and can only contain an element at most once.

We initialize sets using lists.

In [41]:
s = set([1,2,2,3,3,4])
print s

set([1, 2, 3, 4])


In [45]:
len(s)

4

Unlike with lists, we can't index sets.

In [46]:
#s[0]  # This causes an error

# List Comprehensions

_List comprehensions_ are a way of constructing lists. They're a pretty unique feature of Python.

In [71]:
origlist = range(10) # enumerates numbers up to 10
print origlist

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


List comprehensions are created using brackets and the 'for' keyword, like this.

In [80]:
print [number/2.0 for number in origlist]
print [number**2 for number in origlist]

[0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


List comprehensions can also contain an 'if' keyword to filter out some values.

In [81]:
print [number for number in origlist if number < 4]

[0, 1, 2, 3]


***********************************************************************************
***********************************************************************************

# Exercise: Hamlet Frequency

In this exercise, we'll use lists to analyze _Hamlet_.

### Before you start

Make sure you have `hamlet.txt` in the same directory where you're running this notebook. You can download this file from https://raw.githubusercontent.com/thejakeyboy/Python-Lectures/master/data/hamlet.txt


### Here's some help getting started

This cell loads the words in _Hamlet_ into a big list. You don't need to know how it works for now.

In [62]:
import os, re
hamlet = 'hamlet.txt'
words = re.findall(r'\w+', open(hamlet).read().lower())

In [63]:
print words[0:10]
print len(words)

['hamlet', 'dramatis', 'personae', 'claudius', 'king', 'of', 'denmark', 'king', 'claudius', 'hamlet']
27577


## 1. How many unique words did Shakespeare use?

In [64]:
print len(set(words))

4086


## 2. What is the average word length?

In [79]:
print sum([len(word) for word in words])/float(len(words))

4.13090618994


## 3. How many times did Shakespeare write the word 'hamlet'?

In [102]:
print words.count('hamlet')

368


## 4. What are the 10 most common words used in Hamlet?

Hint: You can sort a list of strings with a condition like this: `sorted(list, key=len)`. The `key=len` part specifies that the strings should be sorted by their length.

In [114]:
unique = list(set(words))
unique_sorted = sorted(unique, key=words.count, reverse=True)
print(unique_sorted[:10])

['the', 'and', 'to', 'of', 'i', 'you', 'a', 'my', 'in', 'hamlet']


In [115]:
counts = [words.count(word) for word in unique]
counts_sorted = sorted(counts, reverse=True)
print(counts_sorted[:10])

[930, 843, 652, 562, 517, 496, 450, 439, 378, 368]
