# Up and running from spreadsheets to Python

## Hello, Jupyter

This is the interface that we will use to execute `.ipynb`, or IPython notebook files. 

You may also often see Python files with a `.py` extension. These are *script* files versus *notebook* files.

Notebooks are divided into cells which can be either text or code, among other things.

Go ahead and click into this cell. What happens? 

You can add, reorder, cut and paste cells using the menu icons. 

(From a new cell)  you are seeing *raw Markdown* styling in the above cell. 

You can close out of it by **running the cell.** `Ctrl + Enter` is the keyboard shortcut.

Markdown allows us to style text using plain-text format. 

There's [a lot you can do with Markdown](https://www.markdownguide.org/cheat-sheet). Some basics:

# Big Header 1
## Smaller Header 2
### Even smaller headers
#### Still more

*Using one asterisk renders italics*

**Using two asterisks renders bold**

It's worth studying up on Markdown to write elegant text in your notebooks. 

But in this class we'll focus on the *code* block, because that's where executable code goes!


## Python as a fancy calculator

We can use Python as a highfalutin calculator, just as you might do with Excel.

Enter some basic arithmetic below, then **run the cell** (Do you remember how to do that?).

In [8]:
# This is a code block. 
# You can execute code here.

# Python can be used as a fancy calculator.


3

Some of these arithmetic operators are the same as in Excel, but others are different:


| Operator | Description    |
| -------- | -------------- |
| `+`      | Addition       |
| `-`      | Subtraction    |
| `/`      | Division       |
| `%`      | Modulus        |
| `**`     | Exponent       |
| `//`     | Floor division |

## Cell comments

What's the deal with the hashtags and text in the above cell?

Those are cell comments used to give us verbal instructions and reminders about our code. This helps other users -- and ourselves -- remember what we are doing with it.

![Gandalf coding meme](images/gandalf.jpg)

And yes, you can embed images into notebooks 😎.

Try writing comments in the cell below.


In [10]:
# Hello, world!

In [13]:
# Python follows the order of operations, just like spreadsheets. 

2+3/4*2**.5

3.0606601717798214

### Running functions

Python includes many functions for working with data.

Functions take arguments inside parentheses, just like in Excel:

In [14]:
# We can also call functions:
# Let's find the absolute value of -100
abs(-100)

100

There are some important differences:

In [15]:
# These aren't going to work to find them!
ABS(-100)
Abs(-100)

NameError: name 'ABS' is not defined

Moral of the story: **Python is case-sensitive** and all-around finicky. 

## Comparison operators

We can also test for whether one value is greater than another, much like you would do in Excel:

In [16]:
# Is 3 > 4?
3 > 4

False

Like in Excel, Python will return either a `True` or `False`. 

Much of these conditional operators will look familiar to you:

| Operator | Meaning                  |
| -------- | ------------------------ |
| `!=`     | Not equal to             |
| `>`      | Greater than             |
| `<`      | Less than                |
| `>=`     | Greater than or equal to |
| `<=`     | Less than or equal to    |
| **`==`**     | **Equal to**                 |


Did you catch that last one?

You do not check for whether two values are equal to each other using `=`, but instead using `==`. Why?

Because in Python, we assign data to *variables*. This is a game-changer!

## Assigning variables

Calling functions like `abs(100)` can be useful, but where things get *really* interesting in Python is by assigning results of operations to variables.

Let's go ahead and pass the absolute value of -100 to a variable, `my_first_variable`.

In [17]:
my_first_variable = abs(-100)

The result of `abs(-100)` has been stored in a *variable*, which will make it much easier for us to refer to and use it. 

### Printing variables

To see the result of that variable, we can *print* it using the `print()` function: 

In [18]:
print(my_first_variable)

100


What do you think the result of the below will be?

In [19]:
print(MY_FIRST_VARIABLE)

NameError: name 'MY_FIRST_VARIABLE' is not defined

## Python variable naming conventions

> There are only two hard things in Computer Science: cache invalidation and naming things. --Phil Karlton


There are some rules in naming Python variables:

- They must start with a letter or underscore.
- The rest of your variable can only contain letters, numbers or underscores.

In theory, you can name your variables almost anything so long as they fit these rules. But `sales` may be a better name for sales data than `scooby_doo`. 

### DRILL

Based on these rules, which of the following is an invalid variable name?

A. `My_string_`  
B. `string_1`  
C. `razzle.dazzle`  
D. `_`  

In [16]:
# Try assigning and printing these variables if you're not sure!

## Variable types

You can think about a variable as a box that we are putting a piece of information into. 

Variables can be of different types, like different categories and dimensions of boxes. 

![variables shoebox](images/variables-shoebox.png)



You can find a variable's type with the `type()` function.

In [20]:
# Assigning different variable types

# Integer
my_int = 2

# Float(ing point decimal)
my_float = 2.222

# String
my_string = 'Hello'

# Boolean
my_boolean = True

print(type(my_int))
print(type(my_float))
print(type(my_string))
print(type(my_boolean))

<class 'int'>
<class 'float'>
<class 'str'>
<class 'bool'>


We can call functions directly on these variables.

In fact, that's what we were doing with `print()` and `type()` all along!

In [25]:
# Absolute value of my_int
abs(my_int)

2

In [26]:
# Length of my_string
len(my_string)

5

In [27]:
# Assign the product to a variable
my_nonsense = abs(my_int) * len(my_string)
print(my_nonsense)

10


# DRILLS

1. Assign the sum of -10 and 2 to `a`.
2. Assign the absolute value of `a` to `b`.
3. Assign `b` minus 1 as `d`.
4. Print the result of `d`. What is the value? What type is this variable?

You can insert a code cell below to conduct your work.


# From spreadsheet ranges to Python lists

Generally in spreadsheets we want to operate on multiple cells at a time and the same is true in Python. 

In [21]:
# You know how to assign the number 1 to a variable... do it now!
my_variable = 1

1
<class 'int'>


What about the numbers 1, 2 and 3? Do we have to assign each to its own variable?

Thank heavens not! We can use a *collection* variable to to assign all of them at once. Let's look at a common collection data type, a list.

## Lists

Lists are denoted with brackets `[]`. 

Each *element* of the list is separated by commas `,`.

In [24]:
# Make a list
my_first_list = [1,2,3]
print(my_first_list)
print(type(my_first_list))

[1, 2, 3]
<class 'list'>


Notice that the type isn't `integer` but `list`. This is its own type of variable!

Lists can contain all sorts of individual data types inside of it.

![List shoebox](images/list-shoebox.png)

In [25]:
my_other_list = [1,2,3,"Boo!"]
print(my_other_list)
print(type(my_other_list))

[1, 2, 3, 'Boo!']
<class 'list'>


They can even contain *other lists*!

In [26]:
my_list_here = [1,2,3,[1,2,3,"Boo!"]]
print(my_list_here)
print(type(my_list_here))

[1, 2, 3, [1, 2, 3, 'Boo!']]
<class 'list'>


We can find the number of *elements* in a list using the `len()` function. Any list inside a list is considered one element.

In [27]:
len(my_list_here)

4

# DRILL

1. Create a list containing the values `North`, `East`, `South` and `West`.  
2. What is the result of the below?

```
len(['Monday','Tuesday','Wednesday','Thursday','Friday',['Saturday','Sunday']])
```

# Modifying lists

There are several ways you might want to manipulate a list. Let's look at a couple of common ones.

## Sorting lists 

You can do this using the `.sort()` method. A method is similar to a function, but we will suffix our variable with it. 

The method will operate directly on our variable.

In [31]:
my_list = [-1,4,3,2]

# This is a method
my_list.sort()
print(my_list)

[-1, 2, 3, 4]


Methods can contain arguments: for example, we can set `reverse` to `True` to sort the list in reverse:

In [33]:
my_list.sort(reverse=True)
print(my_list)

[4, 3, 2, -1]


## Appending lists

We can add elements to our list using the `.append()` method.

In [34]:
# Add number 0 to the list
my_list.append(0)
print(my_list)

[4, 3, 2, -1, 0]


In [35]:
# Let's re-sort our list!
my_list.sort()
print(my_list)

[-1, 0, 2, 3, 4]


For other list methods, [check out this article](https://www.w3schools.com/python/python_ref_list.asp).

# DRILL

1. What do you expect to be the result of the following? Run the code and see how you did.

```
my_week = (['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday'])
my_week.sort()
print(my_week)
```

2. Pass the `clear()` method to `my_week` from above. What happens?

# What questions do you have about variables?

 # Lists and Python indexing  

Have you ever accidentaly downloaded the same files multiple times and seen something like this?

![Computer downloads are an example of zero-based indexing](images/zero-based-index.png)

The first time you downloaded it, there was no number given. But after that, the file was suffixed with the numbers 1, 2, 3, and so on. 

This is an everyday example of *zero-based indexing*. 

We tend to count things from 1... **but Python counts from *zero*.** 

In [36]:
my_list = [7,12,5,10,9]

We would like to pull out the third element of this list.

We can do so using this notation:

```
list[position number]
```
So let's try it:

In [37]:
# Get the third element from my list... right?
my_list[3]

10

### Wrong!

This gets us the *fourth* element...

...so what gives?

This is zero-based indexing at work. What we see as the third element is to Python in the second *position*:


| `0` | `1` | `2` | `3` | `4` |
| --- | --- | --- | --- | --- |
| 7   | 12  | 5   | 10  | 9   |

Let's try again:


In [38]:
my_list[2]

5

Nice work!

![Kip meme](images/kip-yes.gif)

### Negative indexing

It's also worth noting that you can index starting at the *end* of the list, as well.

The first element will be in position `-1`.

| `0`<br>`-5` | `1`<br>`-4` | `2`<br>`-3` | `3`<br>`-2` | `4`<br>`-1` |
| ----------- | ----------- | ----------- | ----------- | ----------- |
| 7           | 12          | 5           | 10          | 9           |
  

Give it a try!

In [14]:
my_new_list = [6,10,3,9,1]

# Find the next-to-last element in the list 
# using a negative index 

my_new_list[-2]

9

## Slicing a list

What if we wanted to index multiple elements of a list at once?

This is called *slicing* and ... of course, it's got a loophole! 

The basic notation for slicing a list is

`list[starting_element:ending_element]`
 
However, the result is *exclusive* of the ending element. 🙈

Let's take an example.

In [39]:
my_list = [7,12,5,10,9]

# This gives me the 
# first through second elements... right?
my_list[0:1]

[7]

### Wrong!

The ending element is not included in the final results. You get everything *up until* that element.

Weird, right?

![Head scratch](images/confused.gif)

Because our result is *exclusive* of that final element, we'll need:

In [40]:
# This gives me the 
# first through second elements... right?
my_list[0:2]

[7, 12]

Let's see this in action a couple more times.

In [41]:
my_list = [7,12,5,10,9]

# First through second elements
print(my_list[0:2])

# Third through fifth elements
print(my_list[2:5])

# Fourth-last through second-last elements
print(my_list[-4:-1])

[7, 12]
[5, 10, 9]
[12, 5, 10]


## Drill

Practice some more slicing below:

In [None]:
my_list = [7,12,5,10,9]

# Get the first through third elements


# Get the third-last to second-last elements


# Get the second through last elements


## Slicing to/from first/last elements

Remember our notation for slicing a list:

`list[starting_element:ending_element]`

If we leave part of our slice blank, Python will index *all* the remaining elements in the list:

In [42]:
my_list = [7,12,5,10,9]
# Print the second through the end element
print(my_list[1:])

[12, 5, 10, 9]


In [43]:
my_big_list = [1,3,2,5,3,1,8,3,11,4]
# Works the same here
print(my_big_list[1:])

[3, 2, 5, 3, 1, 8, 3, 11, 4]


Likewise, we can get everything from the *beginning* of the list to a certain element by leaving the first part of our slice bank:

In [44]:
# Get everything but the last element
my_list = [7,12,5,10,9]
print(my_list[:-1])

[7, 12, 5, 10]


In [45]:
my_big_list = [1,3,2,5,3,1,8,3,11,4]
# Get everything up until the fourth element
print(my_big_list[:4])

[1, 3, 2, 5]


In [46]:
# Yes, this would print the whole list 😎
my_big_list = [1,3,2,5,3,1,8,3,11,4]
print(my_big_list[:])

[1, 3, 2, 5, 3, 1, 8, 3, 11, 4]


## DRILL

Practice slicing lists below.

In [28]:
this_list = ["Slicing","works","on","lists","of","strings","identically"]

# Get the third to final elements
print(this_list[2:])


# Get everything up to the fourth element
print(this_list[:4])


# Get everything starting with the second-last element
print(this_list[-2:])

['on', 'lists', 'of', 'strings', 'identically']
['Slicing', 'works', 'on', 'lists']
['strings', 'identically']


# What questions do you have about lists?

## Variable management

We've defined quite a few variables in this notebook. 

To see a list of them all, use the command

```
%who
```

In [47]:
%who

my_big_list	 my_boolean	 my_first_list	 my_first_variable	 my_float	 my_int	 my_list	 my_list_here	 my_other_list	 
my_string	 my_variable	 


This is an example of an [IPython magic command](https://ipython.readthedocs.io/en/stable/interactive/magics.html). These commands help with monitoring and managing your code environment.

We should be aware of the variables that we create as they take memory and can bloat our environment.

If we aren't using a variable anymore, it's not a bad idea to delete it. We can do so with `del`.

In [48]:
# Remove the my_big_list_ variable
del my_big_list

# my_big_list has left the building!
print(my_big_list)

NameError: name 'my_big_list' is not defined

## Restarting the kernel


We can remove *all* assigned variables by restarting the kernel. This is how our notebook communicates with the Python programming language.

It's not a bad idea when you're having coding difficulties to start with restarting the kernel. 

![Restarting the kernel](images/restart-kernel.gif)

Go ahead and restart the kernel in your notebook. But remember, *this will wipe any variables you created in your environment!*

## Lists and data analysis

Lists are a foundational variable type in Python. It's worth getting comfortable with them as a foundational variable type in Python.

All that said, lists are not easily capable of handing many common data analysis tasks. Let's take doubling what we would call a "range" of cells, like we do in spreadsheets all the time:

In [52]:
from IPython.core.display import display, HTML
display(HTML('<center><iframe width="600" height="400" frameborder="0" scrolling="no" src="https://onedrive.live.com/embed?resid=57D2AB2A84D54C81%21997&authkey=%21AGdlGfKL9x3bed4&em=2&wdAllowInteractivity=True&AllowTyping=True&wdDownloadButton=True&wdInConfigurator=True"></iframe></center>'))

This is not easily done with a list, even to the first range:

In [53]:
my_list = [1,9,5,3,8]
my_list * 2

[1, 9, 5, 3, 8, 1, 9, 5, 3, 8]

For easier data analysis, we will make use of some external packages and modules. 

But before we do that, let's take some time to learn about ... packages and modules.

# Python modules

Python does not come as the analytics powerhouse it is out of the box. We need to load and install a few *modules*.

### The [Python standard library](https://docs.python.org/3/library/index.html)

Python includes some functions and methods upon start. Others need to be called in specially. 

We can call in *modules*, which are bundles of code allowing us to do different things, like run different functions or methods.

Some modules come included with Python in the Python standard library. 

For example, the `math` module is part of the standard library, so we don't need to install anything more, but we do need to call it into our session. 

We can do this with the `import` statement.

In [55]:
# Import the math module from the Python standard library
import math

We now have access to the `sqrt()` function, but when we use it, we need to tell Python *where* we got it from. We will do that by prefixing `sqrt()` with `math`:

In [56]:
# Take the square root of 100 
# by using the math.sqrt() function:
math.sqrt(100)

10.0

## Drill

The `factorial()` function from `math` will take the factorial of a number `X`.

Find the factorial of 10 using this function.

# Installing modules

Python comes with an [impressive number of modules in the standard library](https://docs.python.org/3/library/index.html), but the real power comes from installing "aftermarket" modules developed by the community.

These modules can be submitted to and curated by the [Python Package Index](https://pypi.org).  A package is a way of bundling modules.

Anyone is free to install and use these packages as they please. It's easy to install them using the `pip` package installer.

From a notebook, we can install a package with the command `!pip install [package name]`.

In [57]:
# Install a package called "pandas"
!pip install pandas



You will use packages all the time, and if you ever have an issue with one, a good place to start (after restarting the kernel!) is checking whether you have it installed, and what version.

You can see all packages you've installed with `pip`, along with their versions, using `pip freeze`.

In [58]:
pip freeze

alabaster==0.7.12
anaconda-client==1.7.2Note: you may need to restart the kernel to use updated packages.
anaconda-navigator==1.9.6

anaconda-project==0.8.3
appier==1.18.24
archspec @ file:///home/conda/feedstock_root/build_artifacts/archspec_1596649123309/work
argh==0.26.2
asn1crypto @ file:///C:/ci/asn1crypto_1594339244757/work
astroid @ file:///C:/ci/astroid_1592481955828/work
astropy==4.0.1.post1
atomicwrites==1.4.0
attrs==19.3.0
autopep8 @ file:///tmp/build/80754af9/autopep8_1592412889138/work
Babel==2.8.0
backcall==0.2.0
backports.shutil-get-terminal-size==1.0.0
bcrypt==3.1.7
beautifulsoup4==4.9.1
bitarray @ file:///C:/ci/bitarray_1594753961793/work
bkcharts==0.2
bleach==3.1.5
bokeh @ file:///C:/ci/bokeh_1593179283802/work
boto==2.49.0
Bottleneck==1.3.2
brotlipy==0.7.0
cachetools==4.0.0
certifi==2020.6.20
cffi==1.14.0
chardet==3.0.4
click==7.1.2
cloudpickle @ file:///tmp/build/80754af9/cloudpickle_1594141588948/work
clyent==1.2.2
colorama==0.4.3
comtypes==1.1.7
conda==4.8.4
conda

# Drill 

One interesting thing about packages is that they can be based on top of and borrow code from other packages!

Install the `seaborn` package. This is a visualization package that we'll use later in the class, that is indeed built on top of another visualization package.

In [None]:
# Install seaborn package

# Questions about modules and packages?