# Introduction to Python

Learning Objectives:
* Explain the capabilities of Python3
* Implement basic Python3 functions specific to data science

# Python

Python is a programming language, i.e. we can use Python to tell a computer what to do. Python tends to be less complicated than some other popular text based programming languages like Java or C++. Let's run a quick program right now for fun.

In [1]:
# Import the turtle module
from turtle import *

# Install a pip package in the current Jupyter kernel
!{sys.executable} -m pip install tk

# Get a canvas/screen for a turtle to draw on
canvas = Screen()

# Set the canvas width and height
canvas.setup(400,200)

# Create a new turtle named tina
tina = Turtle()

# Set tina's shape
tina.shape("turtle")

# Set tina's color to blue
tina.color("blue")

# Ask tina to move forward by 100 pixels
tina.forward(100)

# Kill screen when clicked
canvas.exitonclick()

/bin/sh: 1: {sys.executable}: not found


Python has been on the programming stage for over two decades. Python is a powerful computational tool when we have to solve complicated tasks in the fields of finance, econmetrics, economics, data science and machine learning. A more technical description is that Python is an open-source, general-purpose high-level programming language.

### Why Jupyter Notebook?

Jupyter is a server-client application that allows you to edit your code through a web browser. Jupyter server provides the environment where a client is matched with a corresponding language kernel. We will focus on Python, and a web browser as a client, or asn an interactive shell. 

# Data Structures

## Lists

Lists are collections of heterogeneous objects.  They can be appended to, iterated over, etc, and we will use them for lots of fun things.  They're useful especially when you don't know in advance how big something is going to be or what types of objects will be in it.

We'll set a simple one up that includes the numbers 1 through 9.

In [2]:
a = [1, 2, 3, 4, 5, 6, 7, 8, 9]

Now let's call `dir` on it to see what things we can do to it.  Note that this will include lots of things starting with two underscores; for the most part these are "hidden" methods that we will use implicitly when we do things.  The main methods you'll use directly are the ones that don't start with underscores.

In [3]:
dir(a)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

Lists can be reversed *in-place*.  This means the return value is empty (`None`) but that the list has been changed.  An important thing that this means is that lists are *mutable* -- you can change them without copying them into a new thing.

In [4]:
a.reverse()

In [5]:
a

[9, 8, 7, 6, 5, 4, 3, 2, 1]

We can sort them, too.  Here the sorting is trivial -- it'll end up just reversing it back to what it was.  But, we can sort a more complex list as well.

In [6]:
a.sort()

In [7]:
a

[1, 2, 3, 4, 5, 6, 7, 8, 9]

Because lists are mutable, we can insert things into them.  Lists are zero-indexed, which means that the very first place is 0, not 1.  This makes insertion a lot easier if you think about the position you're inserting at -- 0 is the first (so it pre-empts the first item in the list) and so on.  Here, we'll insert at position 3, which is between the numbers 3 and 4 in this list.

In [8]:
a.insert(3, 3.9)

In [9]:
a

[1, 2, 3, 3.9, 4, 5, 6, 7, 8, 9]

We can also append values.

In [10]:
a.append(10)

In [11]:
a

[1, 2, 3, 3.9, 4, 5, 6, 7, 8, 9, 10]

We can also remove an item; note that using `pop` here will not only remove the item, but return it as a return value.  If we were to use `del` then it would not return it.

In [13]:
a.pop(3)

3.9

In [14]:
a

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

We can also use negative indices.  This means "the last" item.

In [15]:
a.pop(-1)

10

### Slices

We can also slice lists.  This uses the square brackets `[]` to supply a start index, a stop index, and a step.  This lets us choose subsets.  If you leave one of the items out, it defaults to the maximum selection -- i.e., first, last, and step of 1.

In [16]:
a[1:5:2]

[2, 4]

Here we just start at the beginning and take every other item.

In [17]:
a[::2]

[1, 3, 5, 7, 9]

Every other item, starting from the second:

In [18]:
a[1::2]

[2, 4, 6, 8]

We can also iterate in reverse:

In [19]:
a[::-1]

[9, 8, 7, 6, 5, 4, 3, 2, 1]

In reverse, but every second.

In [20]:
a[::-2]

[9, 7, 5, 3, 1]

Lists can include objects of different types.

In [21]:
a.append("blast off")

In [22]:
a

[1, 2, 3, 4, 5, 6, 7, 8, 9, 'blast off']

In [23]:
a.pop(-1)

'blast off'

A common problem you may run into is that sometimes, numbers look like strings.  This can cause problems, as we'll see:

In [24]:
a.append('10')

In [25]:
a

[1, 2, 3, 4, 5, 6, 7, 8, 9, '10']

If it were the number 10, this would work.  Unfortunately, strings and numbers can't be sorted together.

In [26]:
a.sort()

TypeError: '<' not supported between instances of 'str' and 'int'

## Dictionaries

Dictionaries (`dict` objects) are hashes, where a key is looked up to find a value.  Both keys and values can be of hetereogeneous types within a given dict; there are some restrictions on what can be used as a key.  (The type must be "hashable," which among other things means that it can't be a list.)

We can initialize an empty dict with the curly brackets, `{}`, and then we can assign things to this dict.

In [None]:
b = {}

Here, we can just use an integer key that gets us to a string.

In [None]:
b[0] = 'a'

If we look at the dict, we can see what it includes.

In [None]:
b

We can see a view on what all the keys are using `.keys()`:

In [None]:
b.keys()

If we just want to see what all the values are, we can use `.values()`:

In [None]:
b.values()

If we ask for a key that doesn't exist, we get a `KeyError`:

In [None]:
b[1]

Earlier, I noted that lists can't be used as *keys* in dicts, but they can be used as *values*.  For example:

In [None]:
b = {0: [1, 2, 3], 1: [4, 5, 6], 2: [7, 8, 9]}

In [None]:
b

We can also iterate over the keys in a dict, simply by iterating over the dict itself.  This statement will return each of the keys in turn, and we can see what value it is associated with.

In [None]:
for key in b:
    print(b[key])

## Sets

Sometimes, we need to keep track of all unique values in something.  This is where sets come in.  These are unsorted objects that only contain one of every item.  This means you can do neat set operations on them.  Let's initialize two sets that overlap:

In [None]:
c = set([1,2,3,4,5])
d = set([4,5,6,7,8])

We can now subtract one from the other, to see all objects in one but not the other.

In [None]:
c - d

We can also union them:

In [None]:
e = c.union(d)

In [None]:
e

An interesting component of sets is that they accept iterables.  This means that if you supply to them strings, they will look at each character of the string as an independent object.  So we can create two sets from two strings, and see what they contain -- all the unique values in each of the strings.

In [None]:
s1 = "Hello there, how are you?"
s2 = "I am fine, how are you doing today?"
v1 = set(s1)
v2 = set(s2)

In [None]:
v1

In [None]:
v2

Let's see how many there are in each:

In [None]:
len(s1), len(v1)

In [None]:
len(s2), len(v2)

If we combine, we can see how many unique characters in the two strings combined there are:

In [None]:
len(v1.union(v2))

## Iteration

We can use the `for` construct to iterate over objects.  Depending on the object type, this has different meaning.  If we iterate over a list, we get each item.

In [None]:
for value in a:
    print(value)

If we iterate over a dictionary, we get the keys.  We can also explicitly iterate over keys:

In [None]:
for name in b.keys():
    print(b[name])

If we iterate over a set, we get all the values in that set.  Note, however, that this iteration order is not guaranteed to be consistent, and should not be relied upon.

In [None]:
for value in v1:
    print(value)