# What are Jupyter Notebooks

Jupyter is a web-based interactive development environment that supports multiple programming languages. It is commonly used as an "IDE" (Integrated Development Environment) for the Python programming language, especially when doing data science. The interactive environment that Jupyter provides enables researchers to create reproducible analyses and formulate a story from data within a single document.

Here is an example of a completed Jupyter Notebook from a scientific investigation that includes extensive code, mathematics, graphics, and text.

Concretely speaking, a Jupyter notebook is a file with file extension ".ipynb". You can download and share these files to share your work.

# How does it work?

Jupyter is an interface that runs in a web browser. The code that you write in a Jupyter notebook does not execute in the browser, instead Jupyter connects to a Python session which may be running on your computer, or somewhere in the cloud. 
In this course, you will be using Jupyter notebooks that interface with Python sessions running on Coursera's cloud.

# The structure of a Jupyter notebook

A Jupyter notebook is a series of "cells". Each cell in a Jupyter notebook is either a "code cell" or a "Markdown cell". In the toolbar you will see a dropdown that allows you to change the type of a cell. For this series of courses, the code will always be Python code. The contents of "Markdown" cells is entered as plain text, optionally formatted using Markdown as discussed further below.Double-click a cell to edit its contents. Then press shift-enter to cause Markdown cells to be rendered, and code cells to be executed.

After executing a cell, anything that is explicitly printed by your code will be displayed below the cell. The result of the last line of code in a cell will always be printed, unless the line ends with a semicolon. If your code generates any graphics they will be displayed below the cell.

# Program flow

In a Jupyter notebook you will probably organize your code into multiple cells. These cells execute in the same underlying Python session and can access the results of each other's computations. Note that it is possible to run the cells in any order (by pressing shift-enter on the cells in some sequence). However doing this can be quite dangerous as the results may depend on the order in which the cells are run. It is strongly advised to always run the cells in order, starting from the first cell, unless you are sure that there are no dependencies between the cells. The "Cell" menu on the toolbar contains an option to "Run All" cells, which is the safest way to produce a reproducible result.

The number next to each code cell gives the order in which the cells were run. A cell that has not yet been run will not be assigned a number, and a cell that is currently running will have an asterisk (*) instead of a number.

# What is Markdown?

Markdown is a "markup language" that uses plain text formatting syntax. This means that we can modify the formatting of our text with the use of plain text format specifiers. Almost all plain text is also valid Markdown, and will appear exactly as you have typed it. You can learn Markdown gradually, beginning with plain text, and then learning some of the formatting syntax as it arises in your work.

Some examples of formatting that can be accomplished in Markdown are:

Headers
Text modifications such as italics and bold
Ordered and Unordered lists
Links
Tables
Images
Etc.
Next we will showcase some examples of how this formatting is done. For each of the cells below, double click on the cell to reveal the underlying Markdown, and then type ctrl+enter to return to the formatted view.

Headers:

H1
H2
H3
H4
H5
H6
Text modifications:

Emphasis, aka italics, with asterisks or underscores.

Strong emphasis, aka bold, with asterisks or underscores.

Combined emphasis with asterisks and underscores.

Strikethrough uses two tildes. Scratch this.

Lists:

First ordered list item
Another item
Unordered sub-list.
Actual numbers don't matter, just that it's a number
Ordered sub-list
And another item.
Unordered list can use asterisks
Or minuses
Or pluses
Hyperlinks:

http://www.umich.edu

http://www.umich.edu

The University of Michigan's Homepage

To see more examples of Markdown syntax and features such as tables, images, etc. head to the following link: Markdown Reference

# Simple examples of code cells

In [2]:
# This is python code
print("This is a python code cell")

This is a python code cell


In [3]:
# All lines of code are executed, but only the results of the final line are displayed
1+2
1+3

4

In [4]:
# To display the result of a line that is not the final line, use print.
print(1+2)
1+3

3


4

In [5]:
### Global variables initialized in one cell are visible in any cell executed subsequently

x = 1738

print("x has been set to " + str(x))

x has been set to 1738


In [7]:
### Print x

print(x)

1738


# Command vs. Edit Mode & Shortcuts

There is an edit and a command mode for jupyter notebooks.  The mode is easily identifiable by the color of the left border of the cell.

Blue = Command Mode.

Green = Edit Mode.

Command Mode can be toggled by pressing **esc** on your keyboard.

Commands can be used to execute notebook functions.  For example, changing the format of a markdown cell or adding line numbers.

Lets toggle line numbers while in command mode by pressing **L**.

#### Additional Shortcuts

There are a lot of shortcuts that can be used to improve productivity while using Jupyter Notebooks.

Here is a list:

# How do you install Jupyter Notebooks?

Note: Coursera provides embedded Jupyter notebooks within the course, thus the download is not a requirement unless you wish to explore Jupyter further on your own computer.

Official Installation Guide: https://jupyter.readthedocs.io/en/latest/install.html

Jupyter recommends utilizing Anaconda, which is a bundle of software compatible with Windows, macOS, and Linux systems.

Anaconda Download: https://www.anaconda.com/download/#macos

# Data Types in Python

In computing, a data type refers to the way in which a value is stored in the computer's memory, and to the types of calculations that can be performed on it. The following data types can be used in base Python:

boolean
,integer
,float
,string
,list
,None
,complex
,object
,set
,dictionary
Here we will only focus on the bolded data types. In this notebook, we will be using data types from base Python, as well as some data types from the numpy and pandas libraries. Therefore we import the numpy library next.

In [8]:
import numpy as np

Let's connect these base Python data types to the variable types that we learned about in the Variable Types video. Recall that a "variable type" in this context refers to the type of information encoded in the variable, and informs the statistical analyses that can be performed on it. While there is a relationship between data types in computing and variable types in statistics, they are distinct ideas, and there is no one-to-one mapping between them.

# The quantitative (numerical) variable type

Quantitative variables usually represent an amount that can be measured in the real world. It is possible to do arithmetic when working with quantitative variables. As a result, statistical summaries like the mean (average value) make sense. Sometimes, it is useful to distinguish two types of quantitative variables:

Discrete -- a variable that can only take on a limited range of values, e.g. only positive integers
Continuous -- a variable that can in theory represent any real number, or a quantitative value measured to arbitrarily high precision
It is often (but not always) the case that discrete data are reperented by the computer with integers, and continuous data are represented by the computer with float values.

In base Python a single "literal" number is stored as an integer or as a "float" value based on whether it is expressed with a decimal point. We can see this in the following examples:

In [9]:
type(4)

int

In [10]:
type(4.)

float

In [11]:
type(0)

int

In [12]:
type(-3)

int

### Floats

Floating point values are the main numeric data type used to represent quantitative data, or any numeric value that is not a whole number.

In [13]:
3/5

0.6

Recall that ** represents exponentiation in Python.

In [14]:
6*10**(-1)

0.6000000000000001

In [15]:
type(3/5)

float

The integer division operator // divides two integers and drops the remainder.

In [16]:
print(3//5)
type(3//5)

0


int

In [17]:
type(np.pi)

float

In [18]:
type(4.0)

float

We can do something similar using Numpy. Note that here we obtain a "numpy float" which is distinct from a base Python float. For most purposes we can treat these two varieties of floats as being equivalent.

In [19]:
numbers = np.r_[2, 3, 4, 5]
print(numbers.mean())
print(type(numbers.mean()))

3.5
<class 'numpy.float64'>


### Categorical (or qualitative) variable types

In statistics, there are two main variants of a categorical variable:

* *Nominal* variables have no ordering, e.g. what country was a person born in, or whether a person is of age 65 years or older.

* *Ordinal* variables have an ordering, e.g. how many times has a person been involved in a traffic accident, or how strongly does a person support a policy (e.g. strongly oppose, neutral, strongly support)

In base Python, Integer (int), Boolean (bool), and String (str) data types are often used to represent nominal values.

Ordinal values may be represented by numbers, but it is important to remember that these numbers are codes that do not contain any quantitative information.

### Boolean

A Boolean variable has two possible values: "True" and "False" (in Python the capitalization of these terms is important). A "Boolean expression" is an expression involving comparison operators (<, <=, >, >=, ==) that evaluates to a Boolean value.

In [20]:
# Boolean
type(True)

bool

In [21]:
# Print the result of two Boolean expressions
print(6 < 5)
print(5 < 6)

# Print the type of a Boolean expression's result
print(type(6 < 5))

False
True
<class 'bool'>


Boolean expressions are often used in "if blocks" to control program flow

In [22]:
if 6 < 5:
    print("Yes!")

Square brackets [...] create a literal list. The list here contains only values of Boolean type. See below for further discussion of "None" and "is"

In [23]:
myList = [True, 6<5, 1==3, None is None]
print(myList)
for element in myList:
    print(type(element))

[True, False, False, True]
<class 'bool'>
<class 'bool'>
<class 'bool'>
<class 'bool'>


Python converts Boolean values to integers when doing arithmetic: False is converted to 0 and True is converted to 1.

In [24]:
print(sum(myList)/len(myList))
type(sum(myList)/len(myList))

0.5


float

### String

A string is a single "text" value of arbitrary length. Technically, text in Python3 is encoded using a scheme called unicode. Characters from nearly every human language can be a part of a unicode string.

Single or double quotes are equivalent in Python and can be used to create string literals.

In [25]:
type("This sentence makes sense")

str

In [26]:
type('This sentence makes sense')

str

Note that the back-tick character cannot be used to create a string literal.

In [27]:
# This is not allowed
# x = `invalid`

Triple quotes can be used when you want a literal string to span multiple lines. Try this with single quotes and you will see that if fails.

In [28]:
print("""This sentence makes 
sense""")

This sentence makes 
sense


A Python expression in quotion marks is a string and is not evaluated as code by the Python interpreter.

In [29]:
type("np.pi")

str

It does not make sense to take the average of string values, so an error results.

In [30]:
x = np.asarray(['dog', 'koala', 'goose'])
# This is not allowed:
# x.mean()

### Nonetype

None is a special value that is a placeholder representing "no meaningful value". It is often returned by functions that never return a value, or that cannot return a value for certain inputs.

In [31]:
type(None)

NoneType

None can be compared using "is" or "==" but conventionaly "is" is preferred

In [32]:
None is None
None == None

True

None cannot be used in arithmetic:

In [33]:
noneList = [None]*5
# This is not allowed:
# sum(noneList)/len(noneList)

### Lists

A list can hold values (posssibly of different types) in sequence.

In [35]:
myList = [1, 1.1, "This is a string", None]
for element in myList:
    print(type(element))

<class 'int'>
<class 'float'>
<class 'str'>
<class 'NoneType'>


As we have seen, arithmetic operations can only be used with numeric values:

In [36]:
# This is not allowed:
# sum(myList)/len(myList)

In [37]:
myList = [1, 2, 3]
for element in myList:
    print(type(element))
sum(myList)/len(myList) # note that this outputs a float

<class 'int'>
<class 'int'>
<class 'int'>


2.0

Elements of lists and vectors can be accessed by position, noting that Python always counts from zero:

In [38]:
myList = ['third', 'first', 'medium', 'small', 'large']
myList[0]

'third'

You can invoke certain "methods" on a list, which may compute a result using the list, or change the contents of the list.

In [39]:
myList.count('medium')

1

In [40]:
myList.sort()
myList

['first', 'large', 'medium', 'small', 'third']

There are more datatypes available when using different libraries such as Pandas and Numpy, which we will introduce to you as we use them.