Homework 3: Arrays


In this assignment, we will study arrays. Along the way we will get practice with functions and packages.

The questions guide you step-by-step through these approaches. Please post to Slack with any questions. 


### Rubric

Question | Points
--- | ---
Question 1.1 | 1
Question 1.2 | 1
Question 1.3 | 2
Question 2.1 | 0
Question 2.2 | 1
Question 2.3 | 1
Question 3.1 | 1
Question 3.2 | 1
Total | 8

### 0. Load Packages

We will be working with the `numpy` array for operations on arrays of numbers. 

In [1]:
import numpy as np

In [2]:
# TEST 

import sys

assert "numpy" in sys.modules and "np" in locals()

Many companies employ data analysts to understand the habits of customers. Instacart is a company that operates grocery delivery services in the United States and Canada. Their marketing groups want to encourage purchases in future orders based on records of purchases in past orders. Their data analytics groups will collect and analyze data on habits of customers.

| order_id        | product_id           | add_to_cart_order  | reordered |
|:-------------:|:-------------:|:-------------:|:-------------:|
1|49302|1|1|
1|11109|2|1|
1|10246|3|0|
1|49683|4|0|
1|43633|5|1|
1|13176|6|0|
1|47209|7|0|
1|22035|8|1|

Each order has a record of purchased products. The records indicate whether the product has occurred in a past order. Data analysts at Instacart can look for patterns of recurring purchases in a table with orders of different customers.  

<center>
<img src="table.PNG" width="500"/>
</center>
    
Each row is a customer and each column is an item in the company inventory. The entries are 0 for no purchase and 1 for purchase. Note that the table contains many entries with the value 0. We can think of these entries like blanks. 

If we have many blank entries then we have **sparse** data. Data analysts can avoid the blank entries to reduce computing time and save storage resources. 

### 1. Reshaping Arrays

Sometimes we need to convert an array with two axes to an array with one axis. 

#### Question 1.1

Write a function called `flatten`:

- Input: a `numpy` array with two dimensions
- Output: a `numpy` array with one dimension containing the entries of the input  

For example, if the input is `np.array([[1,2], [3,4]])`, then the output would be `np.array([1,2,3,4])`. 

Try to use a `for` loop to add entries to a list. Iterate over the locations of the entries in the rows and columns. 

In [7]:
def flatten(input_array):
    output_array = []
    
    number_rows, number_columns = input_array.shape 
    
    for i in range(number_rows):
        for j in range(number_columns):
            output_array.append(input_array[i,j])

    return np.array(output_array)

In [8]:
# TEST 

input_array = np.array([[1,2],[3,4]])
output_array = flatten(input_array)

assert output_array.shape == (4,)


#### Question 1.2

Write a function called `flatten_jagged`:

- Input: a jagged `numpy` array with two dimensions
- Output: a `numpy` array with one dimension containing the entries of the input  

For example, if the input is `np.array([[1,2,3], [4]])`, then the output would be `np.array([1,2,3,4])`. 

Try to use a `for` loop to add entries to a list. Iterate over the location of the rows. For each row, iterate over the entries. 

In [9]:
def flatten_jagged(input_array):
    output_array = []
    
    number_rows = input_array.shape[0] 
    
    for i in range(number_rows):
        for j in range(len(input_array[i])):
            output_array.append(input_array[i][j])
    # YOUR CODE HERE
    #raise NotImplementedError()

    return np.array(output_array)

In [10]:
# TEST 

input_array = np.array([[1,2,3],[4]], dtype = object)
output_array = flatten_jagged(input_array)

assert output_array.shape == (4,)


<ol start=2>
<img src="diagram.PNG" alt="drawing" width="200" style="float:right"/>

<br>
<br>

#### Question 1.3
    
Modify the function `flatten` from Question 1.1: 

- Add a parameter called `major`. 
  * Make the default value of `major` be `"row"`
- Rewrite the loop so 
  * If `major` is `"row"`, then the array is flattened in row major order
  * If `major` is `"column"`, then the array is flattened in column major order 
 
Call the function `flatten_v2`.
</ol>

In [11]:
def flatten_v2(input_array, major = "row"):
    output_array = []
    
    number_rows, number_columns = input_array.shape 
    
    if major == "row":
        output_array = flatten(input_array)
    else:
        for j in range(number_columns):
            for i in range(number_rows):
                output_array.append(input_array[i,j])
        # YOUR CODE HERE
        #raise NotImplementedError()

    return np.array(output_array)

In [12]:
# TEST 

input_array = np.array([[1,2],[3,4]])
output_array = flatten_v2(input_array, "row")

assert output_array.shape == (4,)


In [13]:
# TEST 

input_array = np.array([[1,2],[3,4]])
output_array = flatten_v2(input_array, "column")

assert output_array.shape == (4,)


### 2. Measuring Size of Arrays

We can measure the size of arrays in different ways. We need space in memory to store the array. The memory usage helps us to distinguish small arrays and large arrays. 

#### Question 2.1

We have provided you with a file called `helper_functions.py`. Import the module `helper_functions`.

In [15]:
import helper_functions
# YOUR CODE HERE
#raise NotImplementedError()

#### Question 2.2

Consider the array `arr` 

```python
arr = np.array([[1, 0, 0, 1, 0, 0], [0, 0, 2, 0, 0, 1], [0, 0, 0, 2, 0, 0]])
```

Use `helper_functions.get_size` from Question 2.1 to determine how much space `arr` takes up in memory. Note that the input of `helper_functions.get_size` is an array and the output is number of bytes.


In [17]:
arr = np.array([[1, 0, 0, 1, 0, 0], [0, 0, 2, 0, 0, 1], [0, 0, 0, 2, 0, 0]])
size_in_memory = helper_functions.get_size(arr)

# YOUR CODE HERE
#raise NotImplementedError()

In [18]:
# TEST 

assert 0 < size_in_memory < 200


#### Question 2.3

Use the function `helper_functions.sparse_matrix` to covert 

```python
arr = np.array([[1, 0, 0, 1, 0, 0], [0, 0, 2, 0, 0, 1], [0, 0, 0, 2, 0, 0]])
```

into a `numpy` sparse array. Call the sparse array `arr_sparse`. Use `helper_functions.get_size` to determine how much space `arr_sparse` takes up in memory.

In [21]:
arr_sparse = helper_functions.sparse_matrix(arr)

# YOUR CODE HERE
#raise NotImplementedError()

size_in_memory = helper_functions.get_size(arr_sparse)

# YOUR CODE HERE
#raise NotImplementedError()

In [22]:
# TEST 

assert 0 < size_in_memory < 100


### 3. Sparse Arrays

We find that `arr_sparse` has a smaller size than `arr`. 

> While does `arr_sparse` take up less space in memory?

If we think of the value `0` as a blank, then the array has many blank entries. We can use `sparse_matrix` to remove these blanks. So we just have to record the non-zero values.

#### Question 3.1

We can write our own version of `sparse_matrix`. Write a function called `dense_to_sparse` that inputs an array and outputs a dictionary with

- Keys as tuples containing `(row,column)` of all non-zero entries    
- Values as the corresponding non-zero entries.

along with the shape of the input.

The resulting entries should be in _row-major_ order (see Question 1.3). 

For example, if the input is 

> `np.array([[1,0], [0,4]])` 

then the output is

> `{(0,0):1, (1,1):4}`, `(2,2)`       
    

In [23]:
def dense_to_sparse(array):
    output_dictionary = {}
    
    for i in range(array.shape[0]):
        for j in range(array.shape[1]):
            if array[i][j]>0:
                output_dictionary[(i,j)]=array[i][j]
            else:
                continue
                
    return output_dictionary, array.shape

In [24]:
# TEST 

input_array = np.array([[1,0],[0,4]])
output_array, shape = dense_to_sparse(input_array)

assert len(output_array) == 2
assert shape == (2,2)


#### Question 3.2



Write the opposite function of `dense_to_sparse` from Question 3.1. Call it `sparse_to_dense`. For example, if the input is 

> `{(0,0):1, (1,1):4}`, `(2,2)`

then the output is

> `np.array([[1,0], [0,4]])` 

In [25]:
def sparse_to_dense(input_dictionary, shape):
    output_array = np.zeros(shape)

    for key, value in input_dictionary.items():
        output_array[key]=value
    
    return output_array

In [None]:
# TEST 

input_array = {(0,0):1, (1,1):4}
shape = (2,2)

output_array = sparse_to_dense(input_array, shape)

assert output_array.shape == (2,2)
