# Complex Data: Dictionaries and NumPy
### Workshop 3 of DASIL's Series on "Introduction to Python"
### Created by Linh Tang, Britney He, Christopher Moppel, Sergio Martelo, Martin Pollack, Nandika Jhunjhunwala, and Anil Alper

# Dictionaries

Like a list, a dictionary is a collection of things. But instead of individual values being listed with associated indices, we can give each value in our collection a name, or a "key".

Key-value groups are unordered, mutable (changeable), and written with curly brackets {} instead of square brackets []. If you look up a key in the dictionary, it returns its value similar to looking up a value using an index. But you cannot use a value to identify a key. Dictionaries also do not allow duplicate keys.

When designing a dictionary, you should think about what should be the key and what should be the value. It depends on the purpose of the dictionary.

In [None]:
d = {}
print(d, type(d))

{} <class 'dict'>


`buildings` is a dictionary for University of Iowa building name abbreviations. If you look up an abbreviation, it returns its full name.

In [None]:
buildings = {"UCC": "University Capitol Center", "PBB": "Pappajohn Business Building"}
print(buildings, type(buildings))

{'UCC': 'University Capitol Center', 'PBB': 'Pappajohn Business Building'} <class 'dict'>


If you look up a key in a dictionary, it returns, if any, its value. 

In [None]:
buildings["UCC"]

'University Capitol Center'

In [None]:
buildings["PBB"]

'Pappajohn Business Building'

If there is not the key in the dictionary, it returns KeyError.

In [None]:
buildings["IMU"]

KeyError: 'IMU'

In [None]:
buildings.keys()

dict_keys(['UCC', 'PBB'])

In [None]:
buildings.values()

dict_values(['University Capitol Center', 'Pappajohn Business Building'])

In [None]:
buildings["IMU"] = "Iowa Memorial Union"
buildings

{'UCC': 'University Capitol Center',
 'PBB': 'Pappajohn Business Building',
 'IMU': 'Iowa Memorial Union'}

You can check if a key is in a dictionary using the <b>in</b> operator.

In [None]:
"IMU" in buildings

True

In [None]:
"SH" in buildings

False

The length of a dictionary is the number of key-value mappings in the dictionary. 

In [None]:
len(buildings)

3

## Exercises - Dictionaries

Create a dictionary that represents your shopping list.

You want to buy 5 carrots, 2 onions, and 10 peas.

Think carefully about what your keys should be, and what your values should be. Which thing do you want to use to search?

In [None]:
# Your code here

You need to quickly know where to go in the grocery store. Print out the types of items you need.

In [None]:
# Your code here

What is the greatest quantity you are buying? You do not need to know what item it is that you are buying in this quantity.

# NumPy

To use Numpy, we first need to import the `numpy` package:

In [None]:
import numpy as np

## Arrays

A numpy array is a grid of values, all of the same type. This is different from a list, which can have elements of various types.

The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension.

We can initialize numpy arrays from nested Python lists, and access elements using square brackets:

In [None]:
a = np.array([1, 2, 3])  # Create a rank 1 array
print(a[0], a[1], a[2])

a[0] = 5                 # Change an element of the array
print(a[0])

1 2 3
5


The `shape()` array method gives the shape of the array (the width by the height), as shown in the following example.

In [None]:
a.shape

(3,)

We can also create a 2-dimensional NumPy array.

In [None]:
b = np.array([[1,2,3],[4,5,6]])   # Create a rank 2 array
print(b)

[[1 2 3]
 [4 5 6]]


In [None]:
print(b.shape)
print(b[0, 0], b[0, 1], b[1, 0])

(2, 3)
1 2 4


Numpy also provides many functions to create arrays:

In [None]:
a = np.zeros((2,2))  # Create an array of all zeros
print(a)

[[0. 0.]
 [0. 0.]]


In [None]:
b = np.ones((1,2))   # Create an array of all ones
print(b)

[[1. 1.]]


In [None]:
c = np.full((2,2), 7) # Create a constant array
print(c)

[[7 7]
 [7 7]]


In [None]:
d = np.eye(2)        # Create a 2x2 identity matrix
print(d)

[[1. 0.]
 [0. 1.]]


In [None]:
e = np.random.random((2,2)) # Create an array filled with random values
print(e)

[[0.8690054  0.57244319]
 [0.29647245 0.81464494]]


In [None]:
f = np.arange(0, 10, 3) # Create an array from 0 to 10 (exclusive) by increments of 3
print(f)

[0 3 6 9]


## Exercises - Arrays

Create an identity matrix of dimension 4-by-4

In [None]:
# Your answer here

Generate a sequence of numbers in the form of a numpy array from 0 to 100 with gaps of 2 numbers, for example: 0, 2, 4,...

In [None]:
# Your answer here

## Array indexing

Numpy offers several ways to index into arrays.

Slicing: Similar to Python lists, numpy arrays can be sliced. Since arrays may be multidimensional, you must specify a slice for each dimension of the array:

In [None]:
# Create the following rank 2 array with shape (3, 4)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(a)

# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2; b is the following array of shape (2, 2):
# [[2 3]
#  [6 7]]
b = a[:2, 1:3]
print(b)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
[[2 3]
 [6 7]]


A slice of an array is a view into the same data, or a "shallow copy", so modifying it will change the original array.

In [None]:
print(a[0, 1])
b[0, 0] = 77    # b[0, 0] is the same piece of data as a[0, 1]
print(a[0, 1])

2
77


To instead make a copy of an array that you can change independently, you have to use the `np.copy()` function.

In [None]:
c = np.copy(b)
print(c)

print(b[1, 1])
c[1, 1] = 99
print(b[1, 1])

[[2 3]
 [6 7]]
7
7


You can also mix integer indexing with slice indexing. However, doing so will yield an array of lower rank than the original array.

In [None]:
# Create the following rank 2 array with shape (3, 4)
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(a)

print(a[1, :])

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
[5 6 7 8]


If instead we use slices or lists for all dimensions, the NumPy array will maintain its rank.

In [None]:
row_r1 = a[1, :]    # Rank 1 view of the second row of a  
row_r2 = a[1:2, :]  # Rank 2 view of the second row of a
row_r3 = a[[1], :]  # Rank 2 view of the second row of a
print(row_r1, row_r1.shape)
print(row_r2, row_r2.shape)
print(row_r3, row_r3.shape)

[5 6 7 8] (4,)
[[5 6 7 8]] (1, 4)
[[5 6 7 8]] (1, 4)


We can make the same distinction when accessing columns of an array:

In [None]:
col_r1 = a[:, 1]
col_r2 = a[:, 1:2]
print(col_r1, col_r1.shape)
print(col_r2, col_r2.shape)

[ 2  6 10] (3,)

[[ 2]
 [ 6]
 [10]] (3, 1)


Boolean array indexing: Boolean array indexing lets you pick out arbitrary elements of an array. Frequently this type of indexing is used to select the elements of an array that satisfy some condition. Here is an example:

In [None]:
import numpy as np

a = np.array([[1,2], [3, 4], [5, 6]])

bool_idx = (a > 2)  # Find the elements of a that are bigger than 2;
                    # this returns a numpy array of Booleans of the same
                    # shape as a, where each slot of bool_idx tells
                    # whether that element of a is > 2.

print(bool_idx)

[[False False]
 [ True  True]
 [ True  True]]


In [None]:
# We use boolean array indexing to construct a rank 1 array
# consisting of the elements of a corresponding to the True values
# of bool_idx
print(a[bool_idx])

# We can do all of the above in a single concise statement:
print(a[a > 2])

[3 4 5 6]
[3 4 5 6]


## Exercises - Array Indexing

Given the following NumPy array 'a', extract the second column of 'a'. 

In [None]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

In [None]:
# Your answer here

Get the element at the intersection of the first row and third column of 'a' defined above

In [None]:
# Your answer here

Replace all elements of an array which have values greater than 50 with the value 15.50

In [None]:
n_arr = np.array([75.42436315, 42.48558583, 60.32924763])

In [None]:
# Your answer here

Replace all negative values of an array with 0

In [None]:
ini_array1 = np.array([1, 2, -3, 4, -5, -6])

In [None]:
# Your answer here

## Datatypes

Every numpy array is a grid of elements of the same type. Numpy provides a large set of numeric datatypes that you can use to construct arrays. Numpy tries to guess a datatype when you create an array, but functions that construct arrays usually also include an optional argument to explicitly specify the datatype. Here is an example:

In [None]:
x = np.array([1, 2])  # Let numpy choose the datatype
y = np.array([1.0, 2.0])  # Let numpy choose the datatype
z = np.array([1, 2], dtype=np.int64)  # Force a particular datatype

print(x.dtype, y.dtype, z.dtype)

int64 float64 int64


## Array math

Basic mathematical functions operate elementwise on arrays, and are available both as operator overloads and as functions in the numpy module:

In [None]:
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)

# Elementwise sum; both produce the array
print(x + y)
print(np.add(x, y))

[[ 6.  8.]
 [10. 12.]]
[[ 6.  8.]
 [10. 12.]]


In [None]:
# Elementwise difference; both produce the array
print(x - y)
print(np.subtract(x, y))

[[-4. -4.]
 [-4. -4.]]
[[-4. -4.]
 [-4. -4.]]


In [None]:
# Elementwise product; both produce the array
print(x * y)
print(np.multiply(x, y))

[[ 5. 12.]
 [21. 32.]]
[[ 5. 12.]
 [21. 32.]]


In [None]:
# Elementwise division; both produce the array
# [[ 0.2         0.33333333]
#  [ 0.42857143  0.5       ]]
print(x / y)
print(np.divide(x, y))

[[0.2        0.33333333]
 [0.42857143 0.5       ]]
[[0.2        0.33333333]
 [0.42857143 0.5       ]]


In [None]:
# Elementwise square root; produces the array
# [[ 1.          1.41421356]
#  [ 1.73205081  2.        ]]
print(np.sqrt(x))

[[1.         1.41421356]
 [1.73205081 2.        ]]


You can also easily change each element of an array in the same way. For example, to add 1 to each element of `x`, just type

In [None]:
print(x + 1)

[[2. 3.]
 [4. 5.]]


Numpy provides many useful functions for performing computations on single arrays; one of the most useful is `sum`:

In [None]:
x = np.array([[1,2],[3,4]])

print(np.sum(x))  # Compute sum of all elements; prints "10"
print(np.sum(x, axis=0))  # Compute sum of each column; prints "[4 6]"
print(np.sum(x, axis=1))  # Compute sum of each row; prints "[3 7]"

10
[4 6]
[3 7]


Similarly, we can find row and column means with the `np.mean()` function.

In [None]:
print(np.mean(x, axis=0)) # column means
print(np.mean(x, axis=1)) # row means

## Exercises - Array math

Given 2 similarly-sized NumPy arrays, 'a' and 'b', how do you find a NumPy array output in which every element is an element-wise sum of the 'a' and 'b'  NumPy arrays?

In [None]:
a = np.array([[1,2,3],
              [4,5,6]])

b = np.array([[10,11,12],
              [13,14,15]])

In [None]:
# Your answer here

Multiply each element of `a` by 3

In [None]:
# Your answer here

Find the two row means of `a+b`, or the array resulting from adding each element of `a` and `b` element-wise.

In [None]:
# Your answere here

## Concatenating

We can also build a NumPy array by putting together two arrays that we have already built. This is called "concatenation". This is especially helpful when we want to work with data from two or more sources at once.

Let's say we want to add some new observations to our data. This means we want to add more rows. In essence we want to vertically stack two arrays on top of one another. The `vstack()` function allows us to do this. All we have to do is pass it a collection of arrays to stack.

In [None]:
obs1 = np.array([[1, 1, 1]])
obs2 = np.array([[2, 2, 2]])

np.vstack([obs1, obs2])

array([[1, 1, 1],
       [2, 2, 2]])

If we want to add new columns to our data, we can use `hstack()`, or we can horizontally stack two arrays next to one another. Again, we just pass this function a collection of arrays we want to stack.

In [None]:
col1 = np.array([[1], [1], [1]])
col2 = np.array([[2], [2], [2]])

np.hstack([col1, col2])

array([[1, 2],
       [1, 2],
       [1, 2]])

## Exercises - Concatenating

Stack 2 numpy arrays horizontally i.e., 2 arrays having the same 1st dimension (number of rows in 2D arrays)

In [None]:
a1 = np.array([[1,2,3],
               [4,5,6]])

a2 = np.array([[7,8,9],
               [10,11,12]])

In [None]:
# Your answer here

Stack 2 numpy arrays vertically i.e., 2 arrays having the same last dimension (number of columns in 2D arrays)

In [None]:
a3 = np.array([[1,2],
               [3,4],
               [5,6]])

a4 = np.array([[7,8],
               [9,10],
               [11,12]])

In [None]:
# Your answer here