# Intro to Python and Notebooks

This notebook provides a summary introduction of Python programming language data types, constructs, and operations, and Jupyter notebook structures.

This first cell is a *Markdown cell*: it contains text, using the [Markdown formatting language](https://daringfireball.net/projects/markdown/) for formatting.

Other cells are *code cells*, containing Python code that is executed.  The notebook displays the value of the last line of the cell.

> **Tip:** in the notebook's *command mode* (cells have blue borders, hit `Esc` to get there), `y` changes a cell to code and `m` changes it to Markdown.

For a more thorough introduction to the Python programming language, with way more than you will need for this class, see the [Python tutorial](https://docs.python.org/3/tutorial/index.html).

In [1]:
'heffalump'

'heffalump'

## Structuring Notebooks

If you look at the source of the Markdown cells in this notebook, you will see that I am using a hierarchical structure of headings (lines that begin with one or more `#` characters).

This is deliberate, and you should do this too. It makes the documents easier to read, and also supports accessibility tools such as screen readers.

You should treat your notebook as a document that walks readers through your project, with code used to compute data-driven answers.  A notebook is primarily a document with data; it is secondarily a means to run Python code.

Markdown cells should explain *why* we are doing things: what question are we trying to answer? What should we look for in the results? What is noteworthy in the results we just saw?  If there are significant, non-obvious aspects of why something is coded the way it is, the text can explain those as well.

## Python Data Types

Python has many of the same kinds of data types and objects you may be used to in other programming languages.  It has numbers, including integers:

In [2]:
42

42

and floating point numbers:

In [3]:
3.14157

3.14157

Python strings can be written using either single or double quotes - there is no difference:

In [4]:
'heffalump'

'heffalump'

In [5]:
"woozle"

'woozle'

Python has a few special values.  It has booleans `True` and `False`:

In [6]:
True

True

The value `None` means 'no value'; it is like `null` in Java.

In [7]:
None

That cell didn't display anything - that's curious!  If the value of the last line of the cell is `None`, Jupyter does not display anything.  Functions that just perform an action and do not have a meaningful result to return, such as `print`, return `None`; omitting the `None` keeps the output from being as cluttered.

Python also has direct support for writing out lists and *dictionaries*, which are like Java's `Map`s.  This is a list:

In [8]:
['apples', 'bananas', 'durians']

['apples', 'bananas', 'durians']

And this is a dictionary:

In [9]:
{'apple': 'fruit', 'spider': 'arachnid'}

{'apple': 'fruit', 'spider': 'arachnid'}

The values of a dictionary can be any objects, including lists or other dictionaries.

Another important Python object is the *tuple*.  It is what you use in Python to represent a collection of things that go together, like a mathematical ordered pair.  Tuples are written in parentheses:

In [18]:
('x', 'y')

('x', 'y')

An oddity is the *singleton tuple*, a tuple consisting of one item.  A singleton tuple really isn't a thing mathematically, but due to the way Python structures its data types, it is necessary.  It has parentheses and a comma with no second value:

In [19]:
('a',)

('a',)

Python tuples and lists have a lot of similarities. The primary technical difference is that you can add and remove items from a list; once you have created a tuple, you can change its elements, but you cannot change its length.

There is more of a semantic difference, though.  Use a list when you have many of what are effectively the same thing, such as a list of data points or a list of people.  Use a tuple when you have a (usually small) number of things that may be different, and together comprise a larger item, such as a row of a spreadsheet or a pair of coordinates.

## Operations on Basic Python Objects

Python supports the usual set of mathematical operators, such as `+`, `-`, `*`, and `/`.  So we can write:

In [10]:
2 + 2

4

In [11]:
7.5 * 3

22.5

In [12]:
42 - 8

34

The `/` operator defaults to always do floating-point division, even when we provide integers:

In [13]:
3 / 4

0.75

If you specifically need integer division, use `//`:

In [14]:
7 // 6

1

The order of operations, grouping (with `(..)`), etc. work like they do in most other programming languages:

In [17]:
5 * (3 + 2)

25

Python also supports *operator overloading*.  This means that many operations on other data types are performed with operators; most of them make sense.

We can use `+` to concatenate strings:

In [15]:
'beetle' + 'juice'

'beetlejuice'

It also concatenates lists:

In [16]:
['wumpus', 'axe-handle hound'] + ['chupacabra', 'splintercat']

['wumpus', 'axe-handle hound', 'chupacabra', 'splintercat']

Python is *dynamically typed*, in that data types are associated with objects and not variables, and there is no type checking before a line of code is run.  However, it will detect mismatches and refuse to allow you to perform operations with incompatible types.  For example, you cannot add a string and a number:

In [24]:
1 + 'bob'

TypeError: unsupported operand type(s) for +: 'int' and 'str'

You can't even do this if the string is a number:

In [25]:
'5' + 7

TypeError: can only concatenate str (not "int") to str

You must be explicit about what you want. We can convert the number to a string:

In [28]:
str(1) + 'bob'

'1bob'

Or the string to a number:

In [29]:
int('5') + 7

12

## Variables and Lookups

Now that we can write Python objects and basic expresssions, let's start to build them into larger things.  A good starting point for this is to store *variables*.  In Python, we do not need to declare variables; we can just assign things to names and it will work:

In [20]:
x = 5
y = 7
species = 'splintercat'

We then get the contents of a variable by writing its name:

In [21]:
x

5

We can use variables in expressions:

In [23]:
x + y

12

Assigning to a variable does not produce any result, so Jupyter doesn't show anything.  A pattern I commonly use is to assign a variable, and then conclude the cell with a line just dumping the variable's contents to Jupyter.  This allows me to see what is in the variable before moving on, to aid in understanding subsequent code:

In [30]:
x = 72 + 13 * 9
x

189

## Control Flow

Python has an `if` statement to make decisions.  Unlike other langauges you may have seen, the body of the `if` statement is delimited by `:` and indentation - whitespace is significant.  De-indenting closes the `if`.

In [53]:
x = 7
if x > 5:
    print('large')
else:
    print('small')
print('done')

large
done


In [54]:
x = 3
if x > 5:
    print('large')
else:
    print('small')
print('done')

small
done


## List Operations

Let's make a variable with a list:

In [31]:
cryptids = ['wumpus', 'axe-handle hound', 'chupacabra', 'splintercat']

The `len` function tells us the length of the list:

In [32]:
len(cryptids)

4

We can access individual elements of the list with integer indices that start from 0:

In [33]:
cryptids[0]

'wumpus'

In [34]:
cryptids[2]

'chupacabra'

We can use negative indices to count backwards from the end - `-1` is the last element:

In [35]:
cryptids[-1]

'splintercat'

Python will throw an error if we attempt to access an element that does not exist:

In [36]:
cryptids[4]

IndexError: list index out of range

We can get a *slice* of the list with `s:e`, which will return the elements in positions `[s,e)` - starting with `s`, and up to but **not** including `e`.  We call this a *half-open interval*, and it is very common in languages with 0-based indexing such as Python, C, and Java.

In [37]:
cryptids[1:3]

['axe-handle hound', 'chupacabra']

**Try it yourself:** you can also slice and index strings, just like lists - try that with the variable `species`.

**More:** One of the principles of Python is that objects that support similar operations should support them in the same way - things should be consistent.  Tuples have elements in positions like lists, so they support indexing just like lists:

In [38]:
w_tup = ('splintercat', 'North America', 'lumberjack legends')
w_tup[1]

'North America'

### Adding to Lists

We can append items to lists (but not tuples):

In [40]:
cryptids.append('left-handed sidewinder')
cryptids

['wumpus',
 'axe-handle hound',
 'chupacabra',
 'splintercat',
 'left-handed sidewinder']

It is now longer:

In [41]:
len(cryptids)

5

We can also append lists with `+=`:

In [42]:
cryptids += ['kraken', 'nessie']
cryptids

['wumpus',
 'axe-handle hound',
 'chupacabra',
 'splintercat',
 'left-handed sidewinder',
 'kraken',
 'nessie']

## Dictionaries

*Dictionaries* are useful for mapping arbitrary keys to values.  They are containers, like lists, but the keys can be other objects - including strings - and not just 0-based numbers.  Many other objects we will see later, such as data frames, act like dictionaries.

Let's create a dictionary mapping some of our cryptids to their habitats:

In [45]:
habitats = {
    'wumpus': 'cave',
    'splintercat': 'forest',
    'axe-handle hound': 'forest'
}

Dictionaries have a length:

In [46]:
len(habitats)

3

We can look up elements in the dictionary:

In [47]:
habitats['splintercat']

'forest'

We can also add or replace elements in the dictionary:

In [49]:
habitats['yeti'] = 'mountain'
habitats

{'wumpus': 'cave',
 'splintercat': 'forest',
 'axe-handle hound': 'forest',
 'yeti': 'mountain'}

## Unpacking

Another useful feature of Python is *unpacking*.  This is most commonly done with tuples, but it also works with lists.

To *unpack* we assign a tuple value to multiple variables:

In [50]:
x, y = (7, 20)

In [51]:
x

7

In [52]:
y

20

Tuples are a common way to return multiple values from a function, or otherwise represent values that somehow go together.  Unpacking the tuples into variables allows us to write out the order in one place, and thereafter access them by name, instead of writing `0` and `1` all over the place.