![Ironhack logo](https://i.imgur.com/1QgrNNw.png)

# Lab | Numpy

## Introduction

An important ability of a data scientist/data engineer is to know where and how to find information that helps you to accomplish your work. In the exercise, you will both practice the Numpy features we discussed in the lesson and learn new features by looking up documentations and references. You will work on your own but remember the teaching staff is at your service whenever you encounter problems.

## Getting Started
There are a bunch of comments which instruct what you are supposed to do step by step. Follow the order of the instructions from top to bottom. Read each instruction carefully and provide your answer beneath it. You should also test your answers to make sure your responses are correct. If one of your responses is incorrect, you may not be able to proceed because later responses may depend upon previous responses.


## Resources

Some of the questions in the assignment are not covered in our lesson. You will learn how to efficiently look up the information on your own. Below are some resources you can find the information you need.

[Numpy User Guide](https://docs.scipy.org/doc/numpy/user/index.html)

[Numpy Reference](https://docs.scipy.org/doc/numpy/reference/)

[Google Search](https://www.google.com/search?q=how+to+use+numpy)

## Additional Challenges for the Nerds

If you are way ahead of your classmates and willing to accept some tough challenges about Numpy, take one or several of the following Codewar *katas*. 	You need to already possess a good amount of knowledge in Python and statistics because you will need to write Python functions, do loops, write conditionals, and deal with matrices.

* [Insert dashes](https://www.codewars.com/kata/insert-dashes)
* [Thinkful - Logic Drills: Red and bumpy](https://www.codewars.com/kata/thinkful-logic-drills-red-and-bumpy)


### 1. Import the NUMPY package under the name np.

In [1]:
import numpy as np

### 2. Print the NUMPY version and the configuration.

In [2]:
np.version.version

'1.18.1'

In [None]:
np.show_config()

In [None]:
# note that np.version returns a file path from the file version
np.version

### 3. Generate a 3x2x5 3-dimensional array with random values. Assign the array to variable "a"
* Challenge: there are at least three easy ways that use numpy to generate random arrays. How many ways can you find?

In [5]:
# note that in a DataFrame each column represent a dimension

### method 1: using numpy.random.randn¶
It returns a sample (or samples) from the “uniform distribution” over [0, 1]

* https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.rand.html


In [6]:
# an array with 1 element
np.random.rand(1)

array([0.84357784])

In [7]:
# an array with 2 elements 
np.random.rand(2)

array([0.14174862, 0.29858063])

In [8]:
# the following parameters represent the matrix's shape

In [9]:
# an array with 3 elements in 2 dimensions
np.random.rand(2, 1, 3)

array([[[0.56770102, 0.28649358, 0.173129  ]],

       [[0.21808276, 0.72848563, 0.59667493]]])

In [10]:
# an array with 3 dimensions with 2 elements each
np.random.rand(3, 1, 2)

array([[[0.99561979, 0.13583569]],

       [[0.99916866, 0.39868337]],

       [[0.54535035, 0.0261194 ]]])

In [11]:
# method 1: generante random numbers passing the number of dimensions
a = np.random.rand(3, 2, 5)
print(a)

[[[0.22017856 0.88764119 0.72658821 0.23510415 0.23309436]
  [0.78320958 0.76104241 0.92200557 0.35251785 0.66872625]]

 [[0.25136189 0.38656696 0.72296182 0.1101356  0.99020643]
  [0.29111354 0.44298057 0.48520334 0.24322697 0.35593313]]

 [[0.84482342 0.54578201 0.27369268 0.86321989 0.58727674]
  [0.98541693 0.77589301 0.03370206 0.32439396 0.9545534 ]]]


In [12]:
# method 2
a = np.random.random((3,2,5))
print(a)


[[[0.99699095 0.39560216 0.62386433 0.26091464 0.76939102]
  [0.88539666 0.95737445 0.76328245 0.66878237 0.91235369]]

 [[0.39472089 0.84508085 0.96029453 0.10466319 0.11316267]
  [0.73352149 0.78534497 0.35344214 0.77562133 0.02881426]]

 [[0.6970978  0.98910494 0.94318763 0.25913986 0.66208265]
  [0.01764176 0.04547547 0.07069185 0.75278326 0.15424034]]]


In [13]:
# method 3
a = np.random.random_sample((3,2,5))
print(a)

[[[0.83867925 0.87382267 0.24582605 0.97958316 0.58558162]
  [0.17977047 0.1244125  0.22491835 0.81407705 0.74723232]]

 [[0.05850054 0.14357927 0.4744351  0.50675259 0.00804975]
  [0.16136561 0.82574474 0.04408009 0.98491127 0.24591901]]

 [[0.08132056 0.35483949 0.60468938 0.85758055 0.8509201 ]
  [0.75682622 0.45299075 0.71279466 0.04999433 0.08725225]]]


In [14]:
#Other method
a = np.random.ranf((3,2,5))
print(a)

[[[0.45013111 0.59865131 0.34251233 0.45807046 0.95110193]
  [0.10893062 0.82529559 0.8194393  0.9421153  0.71242431]]

 [[0.39167552 0.32619121 0.05200376 0.51279941 0.90491788]
  [0.07056273 0.20169783 0.91742803 0.55597442 0.36208024]]

 [[0.56456348 0.28350581 0.23209108 0.32813382 0.3064288 ]
  [0.44572411 0.61636921 0.5562398  0.43456474 0.94055776]]]


#### Method 2: we can also use the function np.random.randn
It returns a sample (or samples) from the “standard normal” distribution.


https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.randn.html

In [15]:
# an array with 4 dimensions, each dimension with  3 arrays, and each array with 2 elements
a = np.random.randn(3, # number of elements of 'outer' elements
                    2, # number of arrays inside 'inner' elements
                    5 # number of elements inside the 'inner' elements
           )


In [16]:
print(a)

[[[ 0.93171459 -0.57702829 -0.25880783 -0.10006801  0.46362387]
  [ 0.09943998 -1.40715152 -0.47992992 -1.39020338  0.13464111]]

 [[ 1.26782984 -0.95785409 -0.55697048  0.11377147 -1.16066617]
  [-0.98434157 -0.5212356   0.7956859  -0.46703895  0.40780353]]

 [[ 0.07872914  0.5051086  -0.37602142  0.17527685  0.66651272]
  [-1.09160435 -0.12817557 -0.16869472 -1.37376022  0.21393816]]]


### method 3
We can also create an array of integer values using np.random.randint

* check it out: https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.randint.html


In [17]:
# notice that the paramaters are different from np.random.rand and np.randm.randn
a = np.random.randint(5, # longest number
                      size=[
                          # number of dimensions
                          3,
                          # number of arrays in the first array
                          2, 
                          # number of elements from the second array
                          5])
print(a)

[[[3 0 0 2 3]
  [2 2 2 4 1]]

 [[0 1 0 4 2]
  [4 1 3 2 2]]

 [[2 4 0 0 0]
  [0 1 0 2 2]]]


### 4. Print a.

In [18]:
a = np.random.rand(3, 2, 5)
print(a)

[[[0.94384431 0.6761689  0.48945308 0.19861991 0.31511073]
  [0.11355712 0.91717167 0.13030967 0.53276879 0.81000779]]

 [[0.0379854  0.74994714 0.75892984 0.11118469 0.47539152]
  [0.05433635 0.42071734 0.93173972 0.0327763  0.03721198]]

 [[0.53153679 0.36757282 0.90079861 0.17411678 0.65031526]
  [0.23311142 0.01846718 0.21790026 0.87346036 0.78288957]]]


### 5. Create a 5x2x3 3-dimensional array with all values equaling 1.
#Assign the array to variable "b"

### 6. Print b.

In [19]:
# Method 1: using np.ones()

b = np.ones((5,2,3))

In [20]:
b

array([[[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]]])

In [21]:
# Method 2: using np.full()

b = np.full(
    # matrix dimention
    (5,2,3), 
    # element to fill the matrix
    1)

In [22]:
print(b)

[[[1 1 1]
  [1 1 1]]

 [[1 1 1]
  [1 1 1]]

 [[1 1 1]
  [1 1 1]]

 [[1 1 1]
  [1 1 1]]

 [[1 1 1]
  [1 1 1]]]


In [23]:
b.shape

(5, 2, 3)

### 7. Do a and b have the same size? How do you prove that in Python code?

__Method 1__: manually compare the size of each array using the method size()

In [24]:
# comparing the size of arrays 'a' and 'b'
a.size == b.size

True

__Method 2:__ Now let's build a function to compare the sizes

In [25]:
def compare_arrays(array_1, array_2):
    if array_1.size == array_2.size:
        return('Arrays with the same size')
    else:
        return('Arrays with different sizes')

In [26]:
# After calling the function we can see that 'a' and 'b' have the same size
compare_arrays(a,b)

'Arrays with the same size'

### 8. Are you able to add a and b? Why or why not?

In [27]:
# Before perform any array operation, let's check the array's shape

In [28]:
a.shape

(3, 2, 5)

In [29]:
b.shape

(5, 2, 3)

* In order to perform operations using arrays, they have to have the same shape

In [30]:
# trying to add 'a' to 'b' using the 'plus' operator
# It won't work as they have different shapes
a + b

ValueError: operands could not be broadcast together with shapes (3,2,5) (5,2,3) 

In [31]:
# the same error will rase if we use the add() numpy's method
np.add(a, b)

ValueError: operands could not be broadcast together with shapes (3,2,5) (5,2,3) 

### 9. Transpose b so that it has the same structure of a (i.e. become a 3x2x5 array). Assign the transposed array to varialbe "c".

In [32]:
# Method 1: transpose the data
c = np.transpose(b,
                 # define the data axis
                 (2,1,0))

In [33]:
print(c)

[[[1 1 1 1 1]
  [1 1 1 1 1]]

 [[1 1 1 1 1]
  [1 1 1 1 1]]

 [[1 1 1 1 1]
  [1 1 1 1 1]]]


In [34]:
# Method 2:
# using reshape function to transform the matrix structure
c = b.reshape((3, 2, 5))

In [35]:
c.shape

(3, 2, 5)

In [36]:
print(c)

[[[1 1 1 1 1]
  [1 1 1 1 1]]

 [[1 1 1 1 1]
  [1 1 1 1 1]]

 [[1 1 1 1 1]
  [1 1 1 1 1]]]


### 10. Try to add a and c. Now it should work. Assign the sum to varialbe "d". But why does it work now?

In [37]:
# Method 1: use + as a way to concat arrays 'a' and 'c'
d_m1 = a + c

In [38]:
d_m1.shape

(3, 2, 5)

In [39]:
d_m1

array([[[1.94384431, 1.6761689 , 1.48945308, 1.19861991, 1.31511073],
        [1.11355712, 1.91717167, 1.13030967, 1.53276879, 1.81000779]],

       [[1.0379854 , 1.74994714, 1.75892984, 1.11118469, 1.47539152],
        [1.05433635, 1.42071734, 1.93173972, 1.0327763 , 1.03721198]],

       [[1.53153679, 1.36757282, 1.90079861, 1.17411678, 1.65031526],
        [1.23311142, 1.01846718, 1.21790026, 1.87346036, 1.78288957]]])

In [40]:
# Method 2: use the method add() from numpy
d = np.add(a, c)

In [41]:
d.shape

(3, 2, 5)

In [42]:
d

array([[[1.94384431, 1.6761689 , 1.48945308, 1.19861991, 1.31511073],
        [1.11355712, 1.91717167, 1.13030967, 1.53276879, 1.81000779]],

       [[1.0379854 , 1.74994714, 1.75892984, 1.11118469, 1.47539152],
        [1.05433635, 1.42071734, 1.93173972, 1.0327763 , 1.03721198]],

       [[1.53153679, 1.36757282, 1.90079861, 1.17411678, 1.65031526],
        [1.23311142, 1.01846718, 1.21790026, 1.87346036, 1.78288957]]])

In [43]:
# compare d_m1 to d
#they're equal
d_m1 == d

array([[[ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True]],

       [[ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True]],

       [[ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True]]])

### 11. Print a and d. Notice the difference and relation of the two array in terms of the values? Explain.

In [44]:
print(a)

[[[0.94384431 0.6761689  0.48945308 0.19861991 0.31511073]
  [0.11355712 0.91717167 0.13030967 0.53276879 0.81000779]]

 [[0.0379854  0.74994714 0.75892984 0.11118469 0.47539152]
  [0.05433635 0.42071734 0.93173972 0.0327763  0.03721198]]

 [[0.53153679 0.36757282 0.90079861 0.17411678 0.65031526]
  [0.23311142 0.01846718 0.21790026 0.87346036 0.78288957]]]


In [45]:
print(d)
#the values in d are equal to the a values + 1 in the corresponding positions  

[[[1.94384431 1.6761689  1.48945308 1.19861991 1.31511073]
  [1.11355712 1.91717167 1.13030967 1.53276879 1.81000779]]

 [[1.0379854  1.74994714 1.75892984 1.11118469 1.47539152]
  [1.05433635 1.42071734 1.93173972 1.0327763  1.03721198]]

 [[1.53153679 1.36757282 1.90079861 1.17411678 1.65031526]
  [1.23311142 1.01846718 1.21790026 1.87346036 1.78288957]]]


### 12. Multiply a and c. Assign the result to e.

In [46]:
# Method 1: using * to multiply values
e_m1 = a * c

In [47]:
e_m1.shape

(3, 2, 5)

In [48]:
e_m1

array([[[0.94384431, 0.6761689 , 0.48945308, 0.19861991, 0.31511073],
        [0.11355712, 0.91717167, 0.13030967, 0.53276879, 0.81000779]],

       [[0.0379854 , 0.74994714, 0.75892984, 0.11118469, 0.47539152],
        [0.05433635, 0.42071734, 0.93173972, 0.0327763 , 0.03721198]],

       [[0.53153679, 0.36757282, 0.90079861, 0.17411678, 0.65031526],
        [0.23311142, 0.01846718, 0.21790026, 0.87346036, 0.78288957]]])

In [49]:
# Method 2: use the multiply() method from numpy
e = np.multiply(a, c)

In [50]:
e.shape

(3, 2, 5)

In [51]:
e

array([[[0.94384431, 0.6761689 , 0.48945308, 0.19861991, 0.31511073],
        [0.11355712, 0.91717167, 0.13030967, 0.53276879, 0.81000779]],

       [[0.0379854 , 0.74994714, 0.75892984, 0.11118469, 0.47539152],
        [0.05433635, 0.42071734, 0.93173972, 0.0327763 , 0.03721198]],

       [[0.53153679, 0.36757282, 0.90079861, 0.17411678, 0.65031526],
        [0.23311142, 0.01846718, 0.21790026, 0.87346036, 0.78288957]]])

In [52]:
# comparing the results from method 1 and method 2 -- they're equal
e_m1 == e

array([[[ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True]],

       [[ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True]],

       [[ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True]]])

### 13. Does e equal to a? Why or why not?

In [53]:
# yes, array 'e' is equal to 'a' because they have equal values and shape
#since all the values in c were 1 the multiplication results are the same
e == a

array([[[ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True]],

       [[ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True]],

       [[ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True]]])

In [54]:
e.shape

(3, 2, 5)

In [55]:
a.shape

(3, 2, 5)

### 14. Identify the max, min, and mean values in d. Assign those values to variables "d_max", "d_min", and "d_mean"

In [56]:
# use the methods max(), min(), and mean() from numpy

In [57]:
d_max = d.max()
print(d_max)

1.943844311704622


In [58]:
d_min = d.min()
print(d_min)

1.0184671811104282


In [59]:
d_mean = d.mean()
print(d_mean)

1.4495800433689014


### 15. Now we want to label the values in d. First create an empty array "f" with the same shape (i.e. 3x2x5) as d using `np.empty`.

In [60]:
# calling the empty function fom numpy
f = np.empty([3,2,5])

In [61]:
f

array([[[0.94384431, 0.6761689 , 0.48945308, 0.19861991, 0.31511073],
        [0.11355712, 0.91717167, 0.13030967, 0.53276879, 0.81000779]],

       [[0.0379854 , 0.74994714, 0.75892984, 0.11118469, 0.47539152],
        [0.05433635, 0.42071734, 0.93173972, 0.0327763 , 0.03721198]],

       [[0.53153679, 0.36757282, 0.90079861, 0.17411678, 0.65031526],
        [0.23311142, 0.01846718, 0.21790026, 0.87346036, 0.78288957]]])


### 16. Populate the values in f. For each value in d, if it's larger than d_min but smaller than d_mean, assign 25 to the corresponding value in f.
If a value in d is larger than d_mean but smaller than d_max, assign 75 to the corresponding value in f.
If a value equals to d_mean, assign 50 to the corresponding value in f.
Assign 0 to the corresponding value(s) in f for d_min in d.
Assign 100 to the corresponding value(s) in f for d_max in d.
In the end, f should have only the following values: 0, 25, 50, 75, and 100.
Note: you don't have to use Numpy in this question.


In [62]:
for i in range(3):
    for j in range(2):
        for k in range(5):

            if d[i,j,k]>d_min and d[i,j,k]<d_mean:
                f[i,j,k]=25
            elif d[i,j,k]>d_mean and d[i,j,k]<d_max:
                f[i,j,k]=75
            elif d[i,j,k]==d_mean:
                f[i,j,k]=50
            elif d[i,j,k]==d_min:
                f[i,j,k]=0
            elif d[i,j,k]==d_max:
                f[i,j,k]=100

print(f)

[[[100.  75.  75.  25.  25.]
  [ 25.  75.  25.  75.  75.]]

 [[ 25.  75.  75.  25.  75.]
  [ 25.  25.  75.  25.  25.]]

 [[ 75.  25.  75.  25.  75.]
  [ 25.   0.  25.  75.  75.]]]


### 17. Print d and f. Do you have your expected f?
For instance, if your *d* is:
```python
[[[1.85836099, 1.67064465, 1.62576044, 1.40243961, 1.88454931],
  [1.75354326, 1.69403643, 1.36729252, 1.61415071, 1.12104981]],

[[1.72201435, 1.1862918 , 1.87078449, 1.7726778 , 1.88180042],
  [1.44747908, 1.31673383, 1.02000951, 1.52218947, 1.97066381]],

[[1.79129243, 1.74983003, 1.96028037, 1.85166831, 1.65450881],
 [1.18068344, 1.9587381 , 1.00656599, 1.93402165, 1.73514584]]]
```
Your *f* should be:
```python
[[[ 75.  75.  75.  25.  75.]
  [ 75.  75.  25.  25.  25.]]

 [[ 75.  25.  75.  75.  75.]
  [ 25.  25.  25.  25. 100.]]

 [[ 75.  75.  75.  75.  75.]
  [ 25.  75.   0.  75.  75.]]]
```

In [63]:
# your code
print(d)

[[[1.94384431 1.6761689  1.48945308 1.19861991 1.31511073]
  [1.11355712 1.91717167 1.13030967 1.53276879 1.81000779]]

 [[1.0379854  1.74994714 1.75892984 1.11118469 1.47539152]
  [1.05433635 1.42071734 1.93173972 1.0327763  1.03721198]]

 [[1.53153679 1.36757282 1.90079861 1.17411678 1.65031526]
  [1.23311142 1.01846718 1.21790026 1.87346036 1.78288957]]]


In [64]:
print(f)

[[[100.  75.  75.  25.  25.]
  [ 25.  75.  25.  75.  75.]]

 [[ 25.  75.  75.  25.  75.]
  [ 25.  25.  75.  25.  25.]]

 [[ 75.  25.  75.  25.  75.]
  [ 25.   0.  25.  75.  75.]]]


### 18. Bonus question: instead of using numbers (i.e. 0, 25, 50, 75, and 100), how to use string values 
```python
[[['D' 'D' 'D' 'B' 'D']
  ['D' 'D' 'B' 'B' 'B']]

 [['D' 'B' 'D' 'D' 'D']
  ['B' 'B' 'B' 'B' 'E']]

 [['D' 'D' 'D' 'D' 'D']
  ['B' 'D' 'A' 'D' 'D']]]
```
**Note**: you don't have to use Numpy in this question.

In [65]:
f = f.astype(str)
for i in range(3):
    for j in range(2):
        for k in range(5):

            if d[i,j,k]>d_min and d[i,j,k]<d_mean:
                f[i,j,k]="B"
            elif d[i,j,k]>d_mean and d[i,j,k]<d_max:
                f[i,j,k]="D"
            elif d[i,j,k]==d_mean:
                f[i,j,k]="C"
            elif d[i,j,k]==d_min:
                f[i,j,k]="A"
            elif d[i,j,k]==d_max:
                f[i,j,k]="E"

print(f) 

[[['E' 'D' 'D' 'B' 'B']
  ['B' 'D' 'B' 'D' 'D']]

 [['B' 'D' 'D' 'B' 'D']
  ['B' 'B' 'D' 'B' 'B']]

 [['D' 'B' 'D' 'B' 'D']
  ['B' 'A' 'B' 'D' 'D']]]
