# Numpy 1 - *Python and Grid-based Gambling*

You work for a casino that is piloting new gambling machines.  The casino has 6 rows of the new machines; each row contains 6 machines. Each machine in this 6x6 grid will allow a no more than 10 jackpot prizes per hour.

The casino asks you to monitor and log the machines over time; this requires various tasks outlined in the sections below.

Goal: This assignment aims to explore file input and output with a complex data structure.

## Part 1 -- Create

(10 points)  You've been asked to simulate **12 hours** of data collection, where the number of each machine's **jackpots per hour** are recorded.

Create a 3D structure capable of storing the hourly totals for each machine in the 6x6 grid; fill this structure with random integer values between 0 and 10 (inclusive) to indicate how many jackpots per machine happened.

Your 3D data structure should be able to hold 12 copies of 6x6 values. Print your structure.

In [1]:
import numpy as np

# create 3D array for 6 rows of 6 machines and simulate jackpots for 12 hours
# jackpots simulated by random integers from 0-10 representing number of 
# jackpots per hour
jackpot_arr = np.random.randint(0,11, size=(6,6,12))

# print 3d array
print(jackpot_arr)

[[[ 9  7  8 10  5  8  4  9  5  8  6 10]
  [ 6  3  2  2  5  3 10  4  3  9  2  0]
  [ 6  2  1  6  1  1  0  7  5  2  7  5]
  [ 7  0 10  0  3  1  2  4  0  2  7  7]
  [ 5  5  4  5  2  6  8  6  5 10  3  8]
  [ 5 10  7  4  4  1  4  5  1  2  6  1]]

 [[ 5  3 10  9  2  0  7  2  1 10  8  9]
  [ 7  7  4  3  5  8  6  1  7  2  7  2]
  [10  1  4  7  7  5  0  9  2  1  7  0]
  [ 7  7 10  6  7  1  7  1  4  6  5 10]
  [ 2  6  0  2  8 10  5  1 10  8  4  6]
  [ 3  7 10  4  5  4  8  5  2  3  4  8]]

 [[10  6  5  4  2  6  0  2  3  5  9  4]
  [ 9  4  7  3 10  9  8  0  6  8  4  3]
  [ 8 10  1  6 10  7  0  0  9  8  5  1]
  [ 6  1 10  2  2  7  1  5  8 10  8  8]
  [ 5  5  4  1  4  9 10  6  3  1  4  4]
  [10  0  7  5  6  2  6  9 10  9  6  3]]

 [[ 2  4 10  8 10  2  3  3  3  6  1  5]
  [ 3  6  4  0  7  5  8  2  3  0  3  2]
  [ 9  7  3  4  7  3  1  4  1  3  8  0]
  [ 2  4  7  5  3 10  5  2  2  1  0  6]
  [ 2  3  9 10  5  5 10  1  9  2  0  8]
  [10  4  1  4  8 10  8  0  7  4  9  3]]

 [[ 7  6  7  1  4  6  9  3 10  2

## Part 2 -- Write to File

(15 points) Write a function that takes (1) a filename and (2) a 3D data structure; this function should write the contents of the 3D structure to the specified file. Use any file format or any module you wish.  Run your function with an appropriate filename and the 3D structure from Part #1.

In [2]:
def datasave_func(fname, data):
  '''
  Function to save NumPy array as binary .npy file using NumPy "save" function.
  
  PARAMETERS
  -----------
  fname : string
      filename or path
  data : array
      array of any shape

  '''
  # call NumPy function to save any array to .npy file
  np.save(fname, data)

# creating a filename variable with .npy extension for saving 3d data structure
# to working directory
filename = 'jackpot_file.npy'

# calling function created above to save data structure created in part 1
datasave_func(filename, jackpot_arr)


## Part 3 -- Read from File

(15 points) Write a function that takes a filename, opens it, and returns the 3D structure saved inside the specified file.  Call your function using the filename from Part 2 and save the results into a new variable. Print the contents of this variable. 

In other words, your function should open what your function in Part 2 has saved.

In [3]:
def dataload_func(fname):
  '''
  Function to load binary .npy file as NumPy ndarray using the NumPy "load" function.
  
  PARAMETERS
  -----------
  fname : string
      filename or path
  
  RETURNS
  --------
  NumPy ndarray

  '''
  # call NumPy function to load .npy file and return array
  return np.load(fname)

# open data file created in part 2 as new array
# using previous filename variable form part 2
new_jackpot_arr = dataload_func(filename)

# test to compare original array to new array with boolean
comparison = jackpot_arr == new_jackpot_arr

# print overall result of comparison boolean array
print('Is the new array the same as the old array?\n', comparison.all(), '\n')

# print new array
print(new_jackpot_arr)

Is the new array the same as the old array?
 True 

[[[ 9  7  8 10  5  8  4  9  5  8  6 10]
  [ 6  3  2  2  5  3 10  4  3  9  2  0]
  [ 6  2  1  6  1  1  0  7  5  2  7  5]
  [ 7  0 10  0  3  1  2  4  0  2  7  7]
  [ 5  5  4  5  2  6  8  6  5 10  3  8]
  [ 5 10  7  4  4  1  4  5  1  2  6  1]]

 [[ 5  3 10  9  2  0  7  2  1 10  8  9]
  [ 7  7  4  3  5  8  6  1  7  2  7  2]
  [10  1  4  7  7  5  0  9  2  1  7  0]
  [ 7  7 10  6  7  1  7  1  4  6  5 10]
  [ 2  6  0  2  8 10  5  1 10  8  4  6]
  [ 3  7 10  4  5  4  8  5  2  3  4  8]]

 [[10  6  5  4  2  6  0  2  3  5  9  4]
  [ 9  4  7  3 10  9  8  0  6  8  4  3]
  [ 8 10  1  6 10  7  0  0  9  8  5  1]
  [ 6  1 10  2  2  7  1  5  8 10  8  8]
  [ 5  5  4  1  4  9 10  6  3  1  4  4]
  [10  0  7  5  6  2  6  9 10  9  6  3]]

 [[ 2  4 10  8 10  2  3  3  3  6  1  5]
  [ 3  6  4  0  7  5  8  2  3  0  3  2]
  [ 9  7  3  4  7  3  1  4  1  3  8  0]
  [ 2  4  7  5  3 10  5  2  2  1  0  6]
  [ 2  3  9 10  5  5 10  1  9  2  0  8]
  [10  4  1  4  8 10  

## Part 4 -- Averages

(15 points) Write a function that takes a 3D data structure and returns a 6x6 2D structure, where each value is the average of the hourly grid values *per machine*. Call this function using the data from Part 1; print the results.

In [4]:
def mean3d_func(data):
  '''
  Function to calculate averages of third dimension in 3D array and return 2D 
  array of means.
  
  PARAMETERS
  -----------
  data : array
      3D array
  
  RETURNS
  --------
  NumPy ndarray

  '''
  # call NumPy function on 3rd dimension (axis 2) and return 2D array of means
  # using NumPy "mean" function
  return np.mean(data, axis=2)

# call function created aboe to calculate average jackpots of simulated jackpots 
# for each machine over 12 hour period
jackpot_mean_arr = mean3d_func(jackpot_arr)

# print means and round to 3 decimal places for readability
print(jackpot_mean_arr.round(3))

[[7.417 4.083 3.583 3.583 5.583 4.167]
 [5.5   4.917 4.417 5.917 5.167 5.25 ]
 [4.667 5.917 5.417 5.667 4.667 6.083]
 [4.75  3.583 4.167 3.917 5.333 5.667]
 [5.333 5.5   4.167 5.083 4.917 5.5  ]
 [5.25  3.917 5.917 4.167 5.833 5.667]]


## Part 5 -- Averages (Constrained)

(15 points) A coworker training you casually mentions that occurences of 0s in the jackpot hourly totals imply the machine was disabled and out of order for that hour.

Write a function that takes a 3D structure and returns a 6x6 2D structure, where each value is the average of the **non-zero** hourly grid values per machine. If all hourly values for a single machine are 0, return -1 for that machine's totals.

Call this function using the data from Part 1; print the results.

In [5]:
def nzmean3d_func(data):
  '''
  Function to calculate averages ignoring non-zero values of third dimension in 
  3D array and return 2D array of means. When calculated mean=0, result returned 
  as -1.
  
  PARAMETERS
  -----------
  data : array
      3D array
  
  RETURNS
  --------
  NumPy 2D ndarray
  
  '''
  # find machines with ALL hourly jackpots of 0 and replace with -1
  # modify input 3d array by calling NumPy "count_nonzero" function along 3rd 
  # dimension of array to find total number of non-zero values; if count is 0, 
  #then replace 0 count with -1
  data[np.count_nonzero(data, axis=2) == 0] = -1

  # calculate constrained mean of jackpot simulation over 12 hours by calling
  # NumPy "sum" function along 3rd dimension then divide using NumPy "count_nonzero"
  # function by the number of non-zero elements along 3rd dimension for constrained mean
  nzm = np.sum(data, axis=2) / np.count_nonzero(data, axis=2)

  # return 2D array of constrained means with 0 converted to -1 if present
  return nzm

# calculate constrained average jackpots of each machine over 12 hour period
jackpot_constrainedmean_arr = nzmean3d_func(jackpot_arr)

# print results of constrained means, rounding to 3 decimals for readability
print(jackpot_constrainedmean_arr.round(3))

# testing machine on first row and column with 0 hourly jackpots
# copy 3d array
test1 = jackpot_arr.copy()
# change first row, first column, all 3rd dimension values to 0
test1[0,0,:] = 0
# calling function from above, should give -1 in position 0,0
print('\n', nzmean3d_func(test1).round(3))

# testing input with all machines having 0 hourly jackpots; should give 2d array
# of all -1 values
test2 = np.zeros((6,6,12))
print('\n', nzmean3d_func(test2))

[[7.417 4.455 3.909 4.778 5.583 4.167]
 [6.    4.917 5.3   5.917 5.636 5.25 ]
 [5.091 6.455 6.5   5.667 4.667 6.636]
 [4.75  4.3   4.545 4.273 5.818 6.182]
 [5.333 6.    4.545 5.545 4.917 6.   ]
 [5.25  4.7   6.455 5.556 6.364 6.8  ]]

 [[-1.     4.455  3.909  4.778  5.583  4.167]
 [ 6.     4.917  5.3    5.917  5.636  5.25 ]
 [ 5.091  6.455  6.5    5.667  4.667  6.636]
 [ 4.75   4.3    4.545  4.273  5.818  6.182]
 [ 5.333  6.     4.545  5.545  4.917  6.   ]
 [ 5.25   4.7    6.455  5.556  6.364  6.8  ]]

 [[-1. -1. -1. -1. -1. -1.]
 [-1. -1. -1. -1. -1. -1.]
 [-1. -1. -1. -1. -1. -1.]
 [-1. -1. -1. -1. -1. -1.]
 [-1. -1. -1. -1. -1. -1.]
 [-1. -1. -1. -1. -1. -1.]]


## Part 6 -- Targeting Machines

(10 points) The company is very happy with your ability to create averages of their machines' jackpots.  They instruct you to disable any machine that averaged at least 7 jackpots during the initial 12 hour period because it's "too hot" and to run your simulation again on the remaining machines.

In other words, repeat <mark>Part 1</mark> for another 12 hour period by creating another 3D structure of randomized results 0-10, where the "too hot" machines from the first simulation are also disabled and forced to appear as 0s at every hour. 

Show your results by printing the indices of the "too hot" machines and by printing the final 3D structure.

In [6]:
# create new jackpot simulation for another 12 hour period;
# 6 machines on 6 rows for 12 hours (same methods as part 1 above)
next_jackpot_arr = np.random.randint(0,11, size=(6,6,12))

# find indices of "hot" machines (at least 7 jackpots) using constrained
# means function from part 5 above
# "argwhere" function returns indices of input array; here input array is mask
# of constrained means where the number of jackpots over 12 hours is 7 or greater
hot_idx = np.argwhere(jackpot_constrainedmean_arr >= 7)

# disable hot machines found in new simulation by changing to zero
# for second 12 hour time period; use indices determined in step above to index
# simulation and replace hot machines with 0 ("hot" rows, "hot" columns, all 12 hours)
next_jackpot_arr[hot_idx[:,0], hot_idx[:,1],:] = 0

# print indices and final modified second simulation
print('Indices of hot machines in the first simulation: \n', hot_idx)
print('\nSecond simulation without "hot" machines: \n', next_jackpot_arr)

Indices of hot machines in the first simulation: 
 [[0 0]]

Second simulation without "hot" machines: 
 [[[ 0  0  0  0  0  0  0  0  0  0  0  0]
  [ 4  3  8  6  1  9  6  2  1  2 10  9]
  [ 4  7  7  3  6  3  4  5  8  1  0  7]
  [10 10  2  2  7  2  2  2  4 10  3  2]
  [ 5  8  0  8  7  4  0  5  2  4  9  4]
  [10  1  9  0  6  9  6  7  9  0 10  5]]

 [[ 1  9  4  1  5  0  1  1  2  3  5 10]
  [ 6  2  4  7  2  6  0  0  1  1  3  8]
  [ 9  8  3  4 10  3  8  7  5  5  1  0]
  [ 7  2  3  0  6  2  5  8  7  0  3  0]
  [ 8  0  3  0  6  2 10  7  9  5  5  0]
  [ 9  9  0  3  2  6  4  6  7  8  0  5]]

 [[ 2  3  6  6  3  6 10  8  1  3  9  6]
  [ 1  4  3  0  8  5  7  5  3  1  4  9]
  [ 6  1  9 10  4  1  8 10  1  7  7  8]
  [ 0  4  2  1  1  2  6  9  9  3  5  7]
  [ 5  3  4  6  3  0  4  9  4  4  5  5]
  [ 9  5  5 10  9  9 10  6  1 10  6 10]]

 [[ 9  2  3  0  2  4  5  3  8  5  0  1]
  [10  9  9  0  0  2  3  6  7  2  8  4]
  [10  1  0  5 10  9 10 10  1  4  3 10]
  [ 0  7  4  6  9  8  4  7  2  8  8  4]
  [ 2 10  

## Review Points

The below may be helpful in preparing for this assignment. The examples below are from numpy, but pandas examples would be similar in spirit.

In [7]:
# The following code generates a random 3D array of integers 0-9
import numpy as np
ar3 = np.random.randint(0, 10, (3, 4, 4))

In [8]:
# This is a numpy ndarray
print(type(ar3))

<class 'numpy.ndarray'>


In [9]:
# The entire array is printable
print(ar3)

[[[6 6 8 0]
  [9 4 3 4]
  [5 3 3 2]
  [2 4 2 0]]

 [[3 3 1 2]
  [9 2 9 4]
  [0 9 1 5]
  [7 1 5 5]]

 [[1 9 8 1]
  [8 4 6 0]
  [1 4 3 9]
  [0 9 9 8]]]


In [10]:
# Recall that you can access data inside arrays via indexing; the 0th element is a 4x4 array.
print(ar3[0])

[[6 6 8 0]
 [9 4 3 4]
 [5 3 3 2]
 [2 4 2 0]]
