In [6]:
#Run the following code to print multiple outputs from a cell
get_ipython().ast_node_interactivity = 'all'
#(don't worry about what this code means...it just helps for display purposes)

# Getting to know Jupyter notebooks

Jupyter Notebooks help you do data analysis that can be read and replicated by others.

There are different types of cells:

In [2]:
# This is a code cell. You would type your code here.
# Btw, any text in a code cell preceded by a hashtag will be ignored by Python.

# this is a markdown cell.  This line is a header.
## This is a sub header

This is just regular text in a markdown cell.

Most of what you'll be doing is in the code cells.

To add a cell, you can click the + symbol above (in this pane, not the files pane...that adds a new file) -- this inserts a new cell below the current cell. 

You can also use the keyboard shortcut "a" to insert a cell above the current cell and "b" to insert below. To do this, click to the left of the cell where you want to insert and a blue bar will appear. Don't click the blue bar itself unless you want to collapse the cell. Instead, click the white space directly to the left of the cell and type "a" or "b" and a new cell will appear. Try inserting a code cell above this block of text.

## Storing data with variables ##

Whenever you want to store a value for later use in your program, you need to assign it to a variable.

In [2]:
pigs = 36
cows = 5

animals = pigs + cows

To view the current value of an existing variable, you can simply type the name of the variable

In [3]:
animals

41

## Assignment statements
This is the first bit of coding you'll learn in this class -- **the "assignment statement."** It takes the following form:

`variable_name = [value of variable]`

When Python sees an assignment statement, it first evaluates what's on the right side of the "=" sign and then saves it to the variable name listed on the left side. Before running the next code block, think about what the final value of x is.

In [4]:
x = 5
x = x + 1
x

6

Variable names can be any sequence of letters, numbers, and some special characters such as underscore ( _ ) but must start with a letter or underscore ( _ ).

Case matters -- x isn't the same as X. Hence if you try to retrieve the value of X instead, you will get an error:

In [5]:
X

NameError: name 'X' is not defined

Python is case-sensitive for everything (variable names, functions, etc.), so the lowercase x is different from the uppercase X.

## Functions
Obviously, we can (and will) do a lot more than simple math. For example, we will often use functions. 

To illustrate, let's use rounding. Remember that π is 3 when rounded to 0 decimal places and 3.142 when rounded to 3 decimal places..

In python, the function to round is, cleverly, `round(x)`, where x is the value that you want to round. You try it...in the next code cell, round π to 0 decimal places and assign the result to a variable named z1...use 3.14159 as your value for π:

In [8]:
z1 = round(3.14159)
z1

3

Now, use the `round()` function to round 7.1, 23.156, and 192.9834 to whole numbers:

In [9]:
round(7.1)
round(23.156)
round(192.9834)

7

23

193

How do you figure out what a function like `round()` does?

In [10]:
help(round)

Help on built-in function round in module builtins:

round(number, ndigits=None)
    Round a number to a given precision in decimal digits.
    
    The return value is an integer if ndigits is omitted or None.  Otherwise
    the return value has the same type as the number.  ndigits may be negative.



You can also use the Jupyter Help menu up above.

## Function Parameters
In this example, `round()` is the function and 3.14159 is the value of the parameter, `number`, to the function. You can specify parameters by position or by name:

In [7]:
round(3.14159)
round(number = 3.14159)

3

3

The first example specified the parameter by position -- it was the first parameter.

Specifying parameters becomes more complicated when there are more of them. Expanding this example, `round()` actually has more than one parameter. One parameter is for he number that you want to round; the other is for the number of decimal places.

In [12]:
round(3.14159, 0)
round(3.14159, 1)
round(3.14159, 2)
round(number = 3.14159, ndigits = 2)
round(ndigits = 3, number = 3.14159) 
# notice that position doesn't matter when you specify the parameter names

3.0

3.1

3.14

3.14

3.142

## Extending Python
Let's use Python to find the square root of a number. Remember that 9 x 9 = 81 so the square root of 81 is 9. 

The function for square root in Python is `sqrt()`. Try finding the square root of 81 in the next code cell:

In [1]:
sqrt(81)

NameError: name 'sqrt' is not defined

You should have gotten an error. Why? The `sqrt()` function is not automatically loaded; it is in the `math` module. Instead, you'll need to `import` that module first (similar to loading package libraries in R):

In [8]:
import math
math.sqrt(81)

9.0

Let's try another function, this time from the `scipy` module. 

From a normal distribution with a mean of 0 and a standard deviation of 1, what is the probability that a random value will be less than -1 (using the `cdf()` function from the `scipy` module). The function is located in a `stats` submodule, so the import statement looks a bit different:

In [9]:
from scipy.stats import norm
norm.cdf(-1)

0.15865525393145707

Great, but what if the mean wasn't 0 or the standard deviation wasn't 1? Use `help` to see what the function parameters for `cdf()` are and figure out the probability that a random value will be less than -1, given a normal distribution where the mean is 0 and the standard deviation is 2:

In [16]:
norm.cdf(x = -1, loc = 0, scale = 2)
norm.cdf(-1, 0, 2)

0.3085375387259869

0.3085375387259869

*By the way, once you've imported a module once, you don't need to import it again.*

Most of the analysis we do will require modules that are not automatically loaded. Therefore, you should get in the practice of importing any modules first before creating your analysis script.

## Why use notebooks?
Now that you've been introduced to variables and functions, it's just a matter of running all the right functions in the right order. Right?

Well...
* We don't want to have to type things over and over
* We want to be able to fix mistakes, even if we don't realize the mistake at the moment we make it
* We want to be able to get back to work after interruptions
* We want others to see what we've done

Using notebooks and creating scripts allow us to do this:
* Saves process, not data
   * Raw data + script = analysis
* Reproducible/transparent
   * Document what you've done and why
   * Others can repeat your analysis
* Correctable
   * Helps recover from inevitable mistakes
* Saves effort
   * Autocomplete, hotkeys
   
In this class, everything must be in a notebook file. You need to be able to send me everything we need to recreate your analysis.

## Practice
* What is 89.12345 rounded to 3 decimal places?
* What is the square root of 39.45 rounded to 2 decimal places?
* What is the probability that a random variable will be less than -2 if selected from a normal distribution with a mean of -1 and a standard deviation of 2?

In [17]:
round(89.12345, 3)

89.123

In [18]:
round(math.sqrt(39.45), 2)

6.28

In [19]:
norm.cdf(-2, -1, 2)

0.3085375387259869