# Crash Course on Python for Data Analysis

## Where We're Headed
**Our End Goal:**

* Input data
* Manipulate data until it's in the form we want
* Produce a compelling graphic

**Jupyter Notebook Basics**

* Entering and executing Python
* Documenting what you've done

**Python Basics Pt 1**

* Numbers and Calculations
* Comments
* Assignments
* Strings
* Lists
* Dictionaries, Tuples and Sets

**Python Basics Pt 2**

* Booleans
* Flow of Control
* Extending Python by Importing Packages
* Objects and Methods


**Python Basics Pt 3**

* Input and Output

===============================

## Numbers

Python has various "types" of numbers (numeric literals). We'll mainly focus on integers and floating point numbers.

* Integers are just whole numbers, positive or negative. For example: 2 and -2 are examples of integers.

* Floating point numbers in Python are notable because they have a decimal point in them, or use an exponential (e) to define the number. For example 2.0 and -2.1 are examples of floating point numbers. 4E2 (4 times 10 to the power of 2) is also an example of a floating point number in Python.

Numbers can represent the following kinds of data:

* integer values
* floating point values
* complex values

## Calculations

The most basic functionality that I can imagine for using Python is to treat it as a calculator.
As a starting point, let's take a few minutes to run through the most basic arithmetic operations.
In each of the cells below, enter an expression that will return the requested results.
Remember, after you type the expression, hit <CTL>-return to execute the cell.

To get you started, here are some of the most fundamental arithmetic operations:

| Operator | Function |
|:--------:|:--------:|
| + | Addition |
| - | Subtraction |
| * | Multiplication |
| / | Division |

<div class="alert alert-info" role="alert">
You may have noticed that each of the cells in this notebook starts with a line that begins with a hashtag or "#". This hashtag tells Python that the line should be considered as a comment and shouldn't be executed.
</div>

As a quick example, let's imagine that we want to simply add two numbers, say 5 and 10. After execution, that would look like the following in Python:

In [1]:
5 + 10

15

<div class="alert alert-success">
Your turn, please multiply 2 integers, say 2 and 3.
</div>

In [2]:
# create your answer to the exercise here

<div class="alert alert-success">
Now try dividing 2 integers, say 3 by 2 as 3/2. Pay attention to what happens.
</div>

In [3]:
# create your answer to the exercise here

In Python 2, when we divide two integers, we get what's called "classic" division within the Python community. We can avoid this "classic" division, by making sure that one of the numbers is a floating point number.

<div class="alert alert-success">
For example, retry our division example as 3.0 / 2. 
</div>

In [4]:
# create your answer to the exercise here

And, just to be clear, the order in which we use the floating point number doesn't matter. Feel free to go back and try that in one of the cells. As it turns out we can use a builtin function to force the issue by using the "float()" function to "cast" one of the integers to be a floating point number.

<div class="alert alert-success">
We'll talk more about functions in a little while, but for now convince yourself of what happens if you evaluate float(3)/2
</div>

In [5]:
# create your answer to the exercise here

Finally, this is probably a good time to mention that Python has rules on the order in which it will evaluate the results of operations.

In [6]:
2 * 3 + 4

10

Make sure that you're comfortable with what just happened there. Generally, Python will follow the same rules that we learned a long time ago, however, to be fair some of the rules are esoteric enough that we might not remember them all. Whenever there's a doubt keep in mind that you can control the order in which operations are executed by judiciously using parentheses, for instance, try

In [7]:
2 * ( 3 + 4 )

So what happened here? By default, Python has a specific order in which it will execute operations. As a rule, it will perform multiplication before addition.
In these final examples, the parentheses are introduced to force the expression "3 + 4" to be executed first. If you are completely new to programming, this can be a bit confusing and I recommend that you take a look at the Python documentation for more detail on the order in which operations are executed.

## Using Built-In Functions To Create An Expression
If we want to use other mathematical functions, such as trigonometric functions or
essentially anything more complex than arithmetic, then we'll need to import the functions from Python's **math** package.

By itself, the core Python language is pretty simple. Fortunately, however, there are an enormous number of packages which are available which incorporate additional functionality. We'll come back to this later in more detail, but for now let's just introduce this concept so that we can use


## Assigning Variables

If we could only use Python as a calculator to evaluate expressions, then we'd be pretty limited in terms of the types of problems that we could solve. As a first step in moving towards more interesting computations, We can store results by assigning a label, or a variable name, to our results as we move along. For example, let's say that we want to store the value "10" using a variable named "a".

Python does this as:

In [8]:
a = 10

Try it in the cell below:

In [9]:
# create your answer to the exercise here

If you typed that in exactly as shown, then you shouldn't have gotten any type of output. There are a couple of ways in which we can test whether or not, the variable was actually set though.

First, you can simply evaluate a cell with the name of the variable:

In [10]:
 a

Alternatively, you can use the "print" function in python which will evaluate an expression (in this case the variable) and print the results to the screen. In Python 2, the syntax for this is:

In [11]:
 print a

Try both techniques here so you can verify that the variable **a** was set to 10:

In [12]:
# create your answer to the exercise here

In terms of creating variable names, there are some rules on what's allowable:

    1. Names can not start with a number.
    2. There can be no spaces in the name
    3. Can't use any of these symbols :'",<>/?|\()!@#$%^&*~-+

Using variable names can be a very useful way to keep track of different variables in Python. For example:

    1. It's considered best practice that the names are lowercase.
    2. Also, if you need to create a long name, maybe comprised of several words strung together, Python best practice is to connect these words using underscores. e.g. `this_is_johns_variable`

<div class="alert alert-success">
Construct a series of expressions to compute the taxes you would need to pay on a typical data scientists salary. Assume the following:
<li>My income is $ 20,000,000</li>
<li>My tax rate is 30%</li>
As a simple exercise, please set up variables called my_income and my_tax_rate compute the taxes you owe using simple multiplication. Finally, print out the taxes you owe.
</div>

In [13]:
# create your answer to the exercise here

When you evaluate your results, you should get a result of 6000000.0.

## Strings

* Double quotes or single quotes
* Index from 0
* Slicing
* Does not include the right index "up to but not including"
* Include everything
* Negative indexing
* frequency / step size / reversing a sting using a -1 step size

### String Properties
Strings are immutable, i.e. once they are created, they can't be changed or replaced.
Concatenation using +
Duplication using *

### String methods
Actions on the objects themselves

In [14]:
len('Hello world!')

12

In [15]:
s = 'Hello world'

In [16]:
s[::-1]

'dlrow olleH'

In [17]:
s + ' there'

'Hello world there'

In [18]:
s*2

'Hello worldHello world'

In [19]:
s = 'hello'

In [20]:
s.upper()

'HELLO'

In [21]:
s.capitalize()

'Hello'

In [22]:
s.find('l')

2

In [23]:
s.split('e')

['h', 'llo']

## Lists

In [24]:
my_list = [1,2,3]

In [25]:
my_list

[1, 2, 3]

In [26]:
my_list = ['string', 23, 1.234, '0']

In [27]:
len(my_list)

4

### Index and Slicing

* Works just like strings
* can do list concatenation

* no fixed sizes in list
* no type constraints on contents

### Methods

In [28]:
l = [1,2,3]

In [29]:
l.append('append me')

In [30]:
l

[1, 2, 3, 'append me']

In [31]:
l.pop()

'append me'

In [32]:
l

[1, 2, 3]

In [33]:
l.pop(0)

1

In [34]:
l

[2, 3]

In [35]:
x = l.pop(0)

In [36]:
x

2

In [37]:
l

[3]

In [38]:
l[99]

In [39]:
new_list = ['a', 'e', 'x', 'b', 'c']

In [40]:
new_list

In [41]:
new_list.reverse()

In [42]:
new_list

In [43]:
new_list.sort()

In [44]:
new_list

In [45]:
l_1 = [1,2,3]
l_2 = [4,5,6]
l_3 = [7,8,9]

In [46]:
matrix = [l_1, l_2, l_3]

In [47]:
matrix # nested data structures

In [48]:
matrix[0]

In [49]:
matrix[1][2]

### List Comprehensions

In [50]:
first_column = [row[0] for row in matrix]

In [51]:
first_column

## Dictionaries

A dictionary is basically a mapping between objects. What we've seen so far are sequences, objects where we can index them based on their position. keys must be unique

In [52]:
my_dict = {'key1': 'value', 'key2': 'value2'}

In [53]:
my_dict

{'key1': 'value', 'key2': 'value2'}

In [54]:
my_dict['key1']

In [55]:
my_dict = {'key1': 'value', 'key2': 123, 'key3': [1,2,3]}

In [56]:
my_dict['key1']

In [57]:
my_dict['key3']

In [58]:
my_dict['key3'].pop()

In [59]:
my_dict

In [60]:
my_dict['key1'].upper()

In [61]:
my_dict

In [62]:
my_dict['key2'] *= 2

In [63]:
my_dict

In [64]:
d = {}
d

In [65]:
d['animal'] = 'Dog'
d['answer'] = 42

In [66]:
d

In [67]:
d = {'k1': {'nestKey': {'subnest': 'value'}}}

In [68]:
d

In [69]:
d['k1']

In [70]:
d['k1']['nestKey']['subnest'].upper()

In [71]:
d = {}

In [72]:
d['k1'] = 1
d['k2'] = 2
d['k3'] = 3

In [73]:
d

In [74]:
d.keys()       # note the order. Dictionaries do not guarantee any order

In [75]:
d.values()

In [76]:
d.items()       # this returns tuples of all of the dictionary key-value pairs

## Tuples

Similar to lists but immutable

In [77]:
t = (1,2,3)

In [78]:
t

(1, 2, 3)

In [79]:
len(t)

3

In [80]:
t[1]

2

In [81]:
t = ('one', 2)

In [82]:
t[0]

'one'

In [83]:
t[1]

2

In [84]:
t[-1]

2

In [85]:
t.index('one')

0

In [86]:
t.count('one')

1

In [87]:
t = (1,1,2,3)

In [88]:
t.count(1)

2

Tuples are immutable

In [89]:
l = [1,2,3]

In [90]:
l

[1, 2, 3]

In [91]:
t = (1,2,3)

In [92]:
l[0] = 'string'

In [93]:
l

['string', 2, 3]

In [94]:
t[0] = 'string'

## Sets

An unordered collection of *unique* objects

In [95]:
x = set()

In [96]:
x.add(1)
x

{1}

In [97]:
x.add(2)
x

{1, 2}

In [98]:
x.add(1)
x

{1, 2}

In [99]:
# can use the set function to cast a list into a subset of unique elements
l = [1,1,1,2,2,2,2,2,3,3,3,3]
print l

[1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3]

In [100]:
set(l)

{1, 2, 3}