# A jupyter notebook is a browser-based environment that integrates:
- A Kernel (python)
- Text
- Executable code
- Plots and images
- Rendered mathematical equations

## Cell

The basic unit of a jupyter notebook is a `cell`. A `cell` can contain any of the above elements. 

In a notebook, to run a cell of code, hit `Shift-Enter`. This executes the cell and puts the cursor in the next cell below, or makes a new one if you are at the end.  Alternately, you can use:
    
- `Alt-Enter` to force the creation of a new cell unconditionally (useful when inserting new content in the middle of an existing notebook).
- `Control-Enter` executes the cell and keeps the cursor in the same cell, useful for quick experimentation of snippets that you don't need to keep permanently.

## Hello World

In [2]:
print("Hello World!")

Hello World!


In [3]:
# lines that begin with a # are treated as comment lines and not executed

# print("This line is not printed")

print("This line is printed")

This line is printed


## Create a variable

In [11]:
g = 3.0 * 2.0 


## Print out the value of the variable

In [9]:
print(g)

6.0


## or even easier:

In [10]:
g

6.0

# Datatypes

In computer programming, a data type is a classification identifying one of various types that data
can have. 

The most common data type we will see in this class are:

* **Integers** (`int`): Integers are the classic cardinal numbers: ... -3, -2, -1, 0, 1, 2, 3, 4, ...
    
* **Floating Point** (`float`): Floating Point are numbers with a decimal point: 1.2, 34.98, -67,23354435, ...

  - Floating point values can also be expressed in scientific notation: 1e3 = 1000
  
  
* **Booleans** (`bool`): Booleans types can only have one of two values: `True` or `False`. In many languages 0 is considered `False`, and any other value is considered `True`.

* **Strings** (`str`): Strings can be composed of one or more characters: ’a’, ’spam’, ’spam spam eggs and spam’. Usually quotes (’) are used to specify a string. For example ’12’ would refer to the string, not the integer.

## Collections of Data Types

* **Scalar**: A single value of any data type.

* **List**: A collection of values. May be mixed data types. (1, 2.34, ’Spam’, True) including lists of lists: (1, (1,2,3), (3,4))

* **Array**: A collection of values. Must be same data type. [1,2,3,4] or [1.2, 4.5, 2.6] or [True, False, False] or [’Spam’, ’Eggs’, ’Spam’]

* **Matrix**: A multi-dimensional array: [[1,2], [3,4]] (an array of arrays).

In [14]:
a = 1
b = 2.3
c = 2.3e4
d = True
e = "Spam"

In [15]:
type(a), type(b), type(c), type(d), type(e)

(int, float, float, bool, str)

In [16]:
a + b, type(a + b)

(3.3, float)

In [17]:
c + d, type(c + d)    # True = 1

(23001.0, float)

In [18]:
a + e

TypeError: unsupported operand type(s) for +: 'int' and 'str'

In [19]:
str(a) + e

'1Spam'

# NumPy (Numerical Python) is the fundamental package for scientific computing with Python.

### Load the numpy library:

In [3]:
import numpy as np

#### pi and e are  built-in constants:

In [21]:
np.pi, np.e

(3.141592653589793, 2.718281828459045)

## Here is a link to all [Numpy math functions](https://docs.scipy.org/doc/numpy/reference/routines.math.html).

# Arrays

* Each element of the array has a **Value**
* The *position* of each **Value** is called its **Index**

## Our basic unit will be the NumPy array

In [5]:
np.random.seed(42)                 # set the seed - everyone gets the same random numbers
x = np.random.randint(1,10,20)     # 20 random ints between 1 and 10
x

array([7, 4, 8, 5, 7, 3, 7, 8, 5, 4, 8, 8, 3, 6, 5, 2, 8, 6, 2, 5])

## Indexing

In [26]:
x[0]    # The Value at Index = 0

8

In [27]:
x[-1]    # The last Value in the array x

3

## Slices

`x[start:stop:step]`
 
- `start` is the first Index that you want [default = first element]
- `stop`  is the first Index that you **do not** want [default = last element]
- `step`  defines size of `step` and whether you are moving forwards (positive) or backwards (negative) [default = 1]

In [31]:
x

array([7, 4, 8, 5, 7, 3, 7, 8, 5, 4, 8, 8, 3, 6, 5, 2, 8, 6, 2, 5])

In [32]:
x[0:4]           # first 4 items

array([7, 4, 8, 5])

In [33]:
x[:4]            # same

array([7, 4, 8, 5])

In [34]:
x[0:4:2]         # first four item, step = 2

array([7, 8])

In [35]:
x[3::-1]         # first four items backwards, step = -1

array([5, 8, 4, 7])

In [36]:
x[::-1]          # Reverse the array x

array([5, 2, 6, 8, 2, 5, 6, 3, 8, 8, 4, 5, 8, 7, 3, 7, 5, 8, 4, 7])

In [37]:
print(x[-5:])    # last 5 elements of the array x

[2 8 6 2 5]


## There are lots of different `methods` that can be applied to a NumPy array

In [7]:
x.size                   # Number of elements in x

20

In [39]:
x.mean()                 # Average of the elements in x

5.5499999999999998

In [40]:
x.sum()                  # Total of the elements in x

111

In [41]:
x[-5:].sum()              # Total of last 5 elements in x

23

In [42]:
x.cumsum()                # Cumulative sum

array([  7,  11,  19,  24,  31,  34,  41,  49,  54,  58,  66,  74,  77,
        83,  88,  90,  98, 104, 106, 111])

In [43]:
x.cumsum()/x.sum()        # Cumulative percentage

array([ 0.06306306,  0.0990991 ,  0.17117117,  0.21621622,  0.27927928,
        0.30630631,  0.36936937,  0.44144144,  0.48648649,  0.52252252,
        0.59459459,  0.66666667,  0.69369369,  0.74774775,  0.79279279,
        0.81081081,  0.88288288,  0.93693694,  0.95495495,  1.        ])

In [44]:
x.flatten

<function ndarray.flatten>

## Help about a function:

In [45]:
?x.min

[0;31mDocstring:[0m
a.min(axis=None, out=None, keepdims=False)

Return the minimum along a given axis.

Refer to `numpy.amin` for full documentation.

See Also
--------
numpy.amin : equivalent function
[0;31mType:[0m      builtin_function_or_method


## NumPy math works over an entire array:

In [46]:
y = x * 2
y

array([14,  8, 16, 10, 14,  6, 14, 16, 10,  8, 16, 16,  6, 12, 10,  4, 16,
       12,  4, 10])

In [47]:
sin(x)     # need to Numpy's math functions

NameError: name 'sin' is not defined

In [48]:
np.sin(x)

array([ 0.6569866 , -0.7568025 ,  0.98935825, -0.95892427,  0.6569866 ,
        0.14112001,  0.6569866 ,  0.98935825, -0.95892427, -0.7568025 ,
        0.98935825,  0.98935825,  0.14112001, -0.2794155 , -0.95892427,
        0.90929743,  0.98935825, -0.2794155 ,  0.90929743, -0.95892427])

## Masking - The key to fast programs

In [6]:
mask1 = np.where(x>5)
x, mask1

(array([7, 4, 8, 5, 7, 3, 7, 8, 5, 4, 8, 8, 3, 6, 5, 2, 8, 6, 2, 5]),
 (array([ 0,  2,  4,  6,  7, 10, 11, 13, 16, 17]),))

In [50]:
x[mask1], y[mask1]

(array([7, 8, 7, 7, 8, 8, 8, 6, 8, 6]),
 array([14, 16, 14, 14, 16, 16, 16, 12, 16, 12]))

In [51]:
mask2 = np.where((x>3) & (x<7))
x[mask2]

array([4, 5, 5, 4, 6, 5, 6, 5])

## Fancy masking

In [52]:
mask3 = np.where(x >= 8)
x[mask3]

array([8, 8, 8, 8, 8])

In [53]:
# Set all values of x that match mask3 to 0

x[mask3] = 0
x

array([7, 4, 0, 5, 7, 3, 7, 0, 5, 4, 0, 0, 3, 6, 5, 2, 0, 6, 2, 5])

In [55]:
mask4 = np.where(x != 0)
mask4

(array([ 0,  1,  3,  4,  5,  6,  8,  9, 12, 13, 14, 15, 17, 18, 19]),)

In [56]:
#Add 10 to every value of x that matches mask4:

x[mask4] += 100
x

array([107, 104,   0, 105, 107, 103, 107,   0, 105, 104,   0,   0, 103,
       106, 105, 102,   0, 106, 102, 105])

## Sorting

In [57]:
np.random.seed(13)                 # set the seed - everyone gets the same random numbers
z = np.random.randint(1,10,20)     # 20 random ints between 1 and 10
z

array([3, 1, 1, 7, 3, 5, 4, 5, 3, 7, 6, 5, 3, 1, 4, 6, 4, 7, 6, 2])

In [58]:
np.sort(z)

array([1, 1, 1, 2, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7])

In [59]:
np.sort(z)[0:4]

array([1, 1, 1, 2])

In [60]:
# Returns the indices that would sort an array

np.argsort(z)

array([ 1,  2, 13, 19, 12,  8,  0,  4, 14, 16,  6,  7,  5, 11, 18, 10, 15,
        3, 17,  9])

In [61]:
z, z[np.argsort(z)]

(array([3, 1, 1, 7, 3, 5, 4, 5, 3, 7, 6, 5, 3, 1, 4, 6, 4, 7, 6, 2]),
 array([1, 1, 1, 2, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7]))

In [62]:
maskS = np.argsort(z)

z, z[maskS]

(array([3, 1, 1, 7, 3, 5, 4, 5, 3, 7, 6, 5, 3, 1, 4, 6, 4, 7, 6, 2]),
 array([1, 1, 1, 2, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7]))

# Control Flow

Like all computer languages, Python supports the standard types of control flows including:

* IF statements
* FOR loops

In [63]:
xx = -1

if xx > 0:
    print("This number is positive")
else:
    print("This number is NOT positive")

This number is NOT positive


In [64]:
xx = 0

if xx > 0:
    print("This number is positive")
elif xx == 0:
    print("This number is zero")
else:
    print("This number is negative")

This number is zero


## `For loops` are different in python.

You do not need to specify the beginning and end values of the loop

In [65]:
z

array([3, 1, 1, 7, 3, 5, 4, 5, 3, 7, 6, 5, 3, 1, 4, 6, 4, 7, 6, 2])

In [66]:
for value in z:
    print(value)

3
1
1
7
3
5
4
5
3
7
6
5
3
1
4
6
4
7
6
2


In [67]:
for idx,val in enumerate(z):
    print(idx,val)

0 3
1 1
2 1
3 7
4 3
5 5
6 4
7 5
8 3
9 7
10 6
11 5
12 3
13 1
14 4
15 6
16 4
17 7
18 6
19 2


In [71]:
for idx,val in enumerate(z):
    if (val > 5):
        z[idx] = 0

In [72]:
for idx,val in enumerate(z):
    print(idx,val)

0 3
1 1
2 1
3 0
4 3
5 5
6 4
7 5
8 3
9 0
10 0
11 5
12 3
13 1
14 4
15 0
16 4
17 0
18 0
19 2


## Loops are slow in Python. Do not use them if you do not have to!

In [73]:
np.random.seed(42)
BigZ = np.random.random(10000)    # 10,000 value array
BigZ[:10]

array([ 0.37454012,  0.95071431,  0.73199394,  0.59865848,  0.15601864,
        0.15599452,  0.05808361,  0.86617615,  0.60111501,  0.70807258])

In [74]:
# This is slow!

for Idx,Val in enumerate(BigZ):
    if (Val > 0.5):
        BigZ[Idx] = 0

BigZ[:10]

array([ 0.37454012,  0.        ,  0.        ,  0.        ,  0.15601864,
        0.15599452,  0.05808361,  0.        ,  0.        ,  0.        ])

In [75]:
%%timeit

for Idx,Val in enumerate(BigZ):
    if (Val > 0.5):
        BigZ[Idx] = 0

2.68 ms ± 478 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [76]:
# Masks are MUCH faster

mask = np.where(BigZ>0.5)
BigZ[mask] = 0

BigZ[:10]

array([ 0.37454012,  0.        ,  0.        ,  0.        ,  0.15601864,
        0.15599452,  0.05808361,  0.        ,  0.        ,  0.        ])

In [77]:
%%timeit -o

mask = np.where(BigZ>0.5)
BigZ[mask] = 0

13.1 µs ± 5.24 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


<TimeitResult : 13.1 µs ± 5.24 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)>

# Functions

In computer science, a `function` (also called a `procedure`, `method`, `subroutine`, or `routine`) is a portion
of code within a larger program that performs a specific task and is relatively independent of the
remaining code. The big advantage of a `function` is that it breaks a program into smaller, easier
to understand pieces. It also makes debugging easier. A `function` can also be reused in another
program.

The basic idea of a `function` is that it will take various values, do something with them, and `return` a result. The variables in a `function` are local. That means that they do not affect anything outside the `function`.

Below is a simple example of a `function` that solves the equation:

$ f(x,y) = x^2\ sin(y)$

In the example the name of the `function` is **find_f** (you can name `functions` what ever you want). The `function` **find_f** takes two arguments `x` and `y`, and returns the value of the equation to the main program. In the main program a variable named `value_f` is assigned the value returned by **find_f**. Notice that in the main program the `function` **find_f** is called using the arguments `array_x` and `array_y`. Since the variables in the `function` are local, you do not have name them `x` and `y` in the main program.

In [97]:
def find_f(x,y):
    
    result = (x ** 2) * np.sin(y)           # assign the variable result the value of the function
    return result                           # return the value of the function to the main program

In [81]:
np.random.seed(42)

array_x = np.random.rand(10) * 10
array_y = np.random.rand(10) * 2.0 * np.pi

In [82]:
array_x, array_y

(array([ 3.74540119,  9.50714306,  7.31993942,  5.98658484,  1.5601864 ,
         1.5599452 ,  0.58083612,  8.66176146,  6.01115012,  7.08072578]),
 array([ 0.12933619,  6.09412333,  5.23039137,  1.33416598,  1.14243996,
         1.15236452,  1.91161039,  3.2971419 ,  2.71399059,  1.82984665]))

In [83]:
value_f = find_f(array_x,array_y)

value_f

array([  1.80927791, -16.98689063, -46.55215549,  34.8404827 ,
         2.21425259,   2.22349046,   0.31796601, -11.62325069,
        14.98437621,  48.46380134])

### The results of one function can be used as the input to another function

In [84]:
def find_g(z):
    
    result = z / np.e
    return result

In [85]:
find_g(value_f)

array([  0.66559615,  -6.24912783, -17.12558095,  12.81709731,
         0.81457801,   0.81797643,   0.11697316,  -4.27595497,
         5.51244395,  17.82883616])

In [86]:
find_g(find_f(array_x,array_y))

array([  0.66559615,  -6.24912783, -17.12558095,  12.81709731,
         0.81457801,   0.81797643,   0.11697316,  -4.27595497,
         5.51244395,  17.82883616])

# Creating Arrays

## Numpy has a wide variety of ways of creating arrays: [Array creation routines](https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.array-creation.html)

In [87]:
# a new array filled with zeros

array_0 = np.zeros(10)

array_0

array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

In [91]:
# a new array filled with ones

array_1 = np.ones(10)

array_1

array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])

In [89]:
# a new array filled with evenly spaced values within a given interval

array_2 = np.arange(10,20)

array_2

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

In [94]:
# a new array filled with evenly spaced numbers over a specified interval (start, stop, num)

array_3 = np.linspace(10,20,5)

array_3

array([ 10. ,  12.5,  15. ,  17.5,  20. ])

In [96]:
# a new array filled with evenly spaced numbers over a log scale. (start, stop, num, base)

array_4 = np.logspace(1,2,5,10)

array_4

array([  10.        ,   17.7827941 ,   31.6227766 ,   56.23413252,  100.        ])