![Ironhack logo](https://i.imgur.com/1QgrNNw.png)

# Lab | Numpy

## Introduction

An important ability of a data scientist/data engineer is to know where and how to find information that helps you to accomplish your work. In the exercise, you will both practice the Numpy features we discussed in the lesson and learn new features by looking up documentations and references. You will work on your own but remember the teaching staff is at your service whenever you encounter problems.

## Getting Started
There are a bunch of comments which instruct what you are supposed to do step by step. Follow the order of the instructions from top to bottom. Read each instruction carefully and provide your answer beneath it. You should also test your answers to make sure your responses are correct. If one of your responses is incorrect, you may not be able to proceed because later responses may depend upon previous responses.


## Resources

Some of the questions in the assignment are not covered in our lesson. You will learn how to efficiently look up the information on your own. Below are some resources you can find the information you need.

[Numpy User Guide](https://docs.scipy.org/doc/numpy/user/index.html)

[Numpy Reference](https://docs.scipy.org/doc/numpy/reference/)

[Google Search](https://www.google.com/search?q=how+to+use+numpy)



# Intrduction to NumPy


#### 1. Import NumPy under the name np.

In [1]:
!pip install numpy --upgrade --user

Requirement already up-to-date: numpy in c:\users\shcle\appdata\roaming\python\python37\site-packages (1.18.5)


In [2]:
import numpy as np

#### 2. Print your NumPy version.

In [3]:
np.__version__

'1.18.5'

#### 3. Generate a 3x2x5 3-dimensional array with random values. Assign the array to variable *a*.
**Challenge**: there are at least three easy ways that use numpy to generate random arrays. How many ways can you find?

**Example of output**:
````python
[[[0.29932768, 0.85812686, 0.75266145, 0.09278988, 0.78358352],
  [0.13437453, 0.65695946, 0.82047594, 0.09764179, 0.52230096]],
 
 [[0.54248247, 0.06431281, 0.65902257, 0.92736679, 0.3302839 ],
  [0.86867236, 0.33960592, 0.62295821, 0.74563567, 0.24351584]],
 
 [[0.21276812, 0.06917533, 0.35106591, 0.82273425, 0.7910178 ],
  [0.37768961, 0.56107736, 0.99965953, 0.97615549, 0.2445537 ]]]
````

In [4]:
# Method 1
a = np.random.random(size=(3,2,5))
a

array([[[0.63131549, 0.86466053, 0.54718164, 0.55899052, 0.40201374],
        [0.99818757, 0.91526968, 0.2499162 , 0.95107085, 0.46316482]],

       [[0.80407886, 0.9867449 , 0.95910554, 0.56338864, 0.54489734],
        [0.81995348, 0.68079327, 0.47926141, 0.08197315, 0.6494385 ]],

       [[0.91863423, 0.87443887, 0.17841802, 0.21345031, 0.45761625],
        [0.81531385, 0.44121037, 0.75371992, 0.36206328, 0.99315307]]])

In [5]:
# Method 2
a = np.random.randint(1,20,size=(3,2,5))
a

array([[[16,  3,  6, 10, 11],
        [ 2,  5, 10,  2, 16]],

       [[ 8,  8,  9,  2, 10],
        [16,  8,  9, 18, 13]],

       [[19,  9, 15,  1, 12],
        [16,  8,  1, 10, 14]]])

In [6]:
# Method 3
a = np.random.random_sample(size=(3,2,5))
a

array([[[0.28830171, 0.76638545, 0.78366185, 0.16105664, 0.34914714],
        [0.17005055, 0.35084144, 0.15840016, 0.52523907, 0.79636335]],

       [[0.24131766, 0.22132747, 0.47719533, 0.00672019, 0.96268522],
        [0.09704316, 0.81939395, 0.97298392, 0.51985184, 0.56774563]],

       [[0.82698018, 0.55989667, 0.14690625, 0.40967054, 0.21010993],
        [0.34166183, 0.61165489, 0.37845614, 0.59543237, 0.3961828 ]]])

#### 4. Print *a*.


In [7]:
print(a)

[[[0.28830171 0.76638545 0.78366185 0.16105664 0.34914714]
  [0.17005055 0.35084144 0.15840016 0.52523907 0.79636335]]

 [[0.24131766 0.22132747 0.47719533 0.00672019 0.96268522]
  [0.09704316 0.81939395 0.97298392 0.51985184 0.56774563]]

 [[0.82698018 0.55989667 0.14690625 0.40967054 0.21010993]
  [0.34166183 0.61165489 0.37845614 0.59543237 0.3961828 ]]]


#### 5. Create a 5x2x3 3-dimensional array with all values equaling 1. Assign the array to variable *b*.

Expected output:

````python
      [[[1, 1, 1],
        [1, 1, 1]],

       [[1, 1, 1],
        [1, 1, 1]],

       [[1, 1, 1],
        [1, 1, 1]],

       [[1, 1, 1],
        [1, 1, 1]],

       [[1, 1, 1],
        [1, 1, 1]]]
````

In [8]:
b = np.ones(shape=(5,2,3))

#### 6. Print *b*.


In [9]:
print(b)

[[[1. 1. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 1. 1.]]]


#### 7. Do *a* and *b* have the same size? How do you prove that in Python code?

In [10]:
print(f'Size of array a: {a.shape}')
print(f'Size of array b: {b.shape}')

Size of array a: (3, 2, 5)
Size of array b: (5, 2, 3)


In [11]:
a.shape == b.shape

False

#### 8. Are you able to add *a* and *b*? Why or why not?


In [12]:
# Two matrices may be added or subtracted only if they have the same dimension
# As shown above, a and b have different dimensions which are 5x2x3 and 3x2X5,
# ann then unable to add a and b


def add_array(x, y):
    if x.shape != y.shape:
        print("ERROR!: Two matrices may be added or subtracted only if they have the same dimension")
    else:
        print("Two arrays show same shapes, and then, the result of addtion of two arrays below:")
        print("\n")
        print (x + y)
              
add_array(a, b)

ERROR!: Two matrices may be added or subtracted only if they have the same dimension


#### 9. Reshape *b* so that it has the same structure of *a* (i.e. become a 3x2x5 array). Assign the reshaped array to variable *c*.

Expected output:

````python
      [[[1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1]],

       [[1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1]],

       [[1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1]]]
````

In [13]:
c = b.reshape((3,2,5))
c

array([[[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]],

       [[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]],

       [[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]]])

#### 10. Try to add *a* and *c*. Now it should work. Assign the sum to variable *d*. But why does it work now?

In [14]:
d = a + c
d

array([[[1.28830171, 1.76638545, 1.78366185, 1.16105664, 1.34914714],
        [1.17005055, 1.35084144, 1.15840016, 1.52523907, 1.79636335]],

       [[1.24131766, 1.22132747, 1.47719533, 1.00672019, 1.96268522],
        [1.09704316, 1.81939395, 1.97298392, 1.51985184, 1.56774563]],

       [[1.82698018, 1.55989667, 1.14690625, 1.40967054, 1.21010993],
        [1.34166183, 1.61165489, 1.37845614, 1.59543237, 1.3961828 ]]])

In [15]:
a.shape == c.shape

True

In [16]:
add_array(a,c)

Two arrays show same shapes, and then, the result of addtion of two arrays below:


[[[1.28830171 1.76638545 1.78366185 1.16105664 1.34914714]
  [1.17005055 1.35084144 1.15840016 1.52523907 1.79636335]]

 [[1.24131766 1.22132747 1.47719533 1.00672019 1.96268522]
  [1.09704316 1.81939395 1.97298392 1.51985184 1.56774563]]

 [[1.82698018 1.55989667 1.14690625 1.40967054 1.21010993]
  [1.34166183 1.61165489 1.37845614 1.59543237 1.3961828 ]]]


#### 11. Print *a* and *d*. Notice the difference and relation of the two array in terms of the values? Explain.

In [17]:
print(a)
print(d)

d - a  # the resulto fo subtraction of two arrays  is array c (array ones)

[[[0.28830171 0.76638545 0.78366185 0.16105664 0.34914714]
  [0.17005055 0.35084144 0.15840016 0.52523907 0.79636335]]

 [[0.24131766 0.22132747 0.47719533 0.00672019 0.96268522]
  [0.09704316 0.81939395 0.97298392 0.51985184 0.56774563]]

 [[0.82698018 0.55989667 0.14690625 0.40967054 0.21010993]
  [0.34166183 0.61165489 0.37845614 0.59543237 0.3961828 ]]]
[[[1.28830171 1.76638545 1.78366185 1.16105664 1.34914714]
  [1.17005055 1.35084144 1.15840016 1.52523907 1.79636335]]

 [[1.24131766 1.22132747 1.47719533 1.00672019 1.96268522]
  [1.09704316 1.81939395 1.97298392 1.51985184 1.56774563]]

 [[1.82698018 1.55989667 1.14690625 1.40967054 1.21010993]
  [1.34166183 1.61165489 1.37845614 1.59543237 1.3961828 ]]]


array([[[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]],

       [[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]],

       [[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]]])

#### 12. Multiply *a* and *c*. Assign the result to *e*.

In [18]:
e = a * c
e

array([[[0.28830171, 0.76638545, 0.78366185, 0.16105664, 0.34914714],
        [0.17005055, 0.35084144, 0.15840016, 0.52523907, 0.79636335]],

       [[0.24131766, 0.22132747, 0.47719533, 0.00672019, 0.96268522],
        [0.09704316, 0.81939395, 0.97298392, 0.51985184, 0.56774563]],

       [[0.82698018, 0.55989667, 0.14690625, 0.40967054, 0.21010993],
        [0.34166183, 0.61165489, 0.37845614, 0.59543237, 0.3961828 ]]])

#### 13. Does *e* equal to *a*? Why or why not?


In [19]:
e == a # array c is np.ones which pro

array([[[ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True]],

       [[ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True]],

       [[ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True]]])

#### 14. Identify the max, min, and mean values in *d*. Assign those values to variables *d_max*, *d_min* and *d_mean*.

In [20]:
def calc_array (x):
    '''
    A function is that to identify the max, min, and mean values in array 
    '''
    print(f'size of array: {x.shape}')
    print(f'Max value in array:  {x.max():0.2f}')
    print(f'Min value in array:  {x.min():0.2f}')
    print(f'Mean value in array: {x.mean():0.2f}')
    
calc_array(d)

size of array: (3, 2, 5)
Max value in array:  1.97
Min value in array:  1.01
Mean value in array: 1.46


#### 15. Now we want to label the values in *d*. First create an empty array *f* with the same shape (i.e. 3x2x5) as *d* using `np.empty`.


In [21]:
f = np.empty(shape=(3,2,5))
f

array([[[0.28830171, 0.76638545, 0.78366185, 0.16105664, 0.34914714],
        [0.17005055, 0.35084144, 0.15840016, 0.52523907, 0.79636335]],

       [[0.24131766, 0.22132747, 0.47719533, 0.00672019, 0.96268522],
        [0.09704316, 0.81939395, 0.97298392, 0.51985184, 0.56774563]],

       [[0.82698018, 0.55989667, 0.14690625, 0.40967054, 0.21010993],
        [0.34166183, 0.61165489, 0.37845614, 0.59543237, 0.3961828 ]]])

#### 16. Populate the values in *f*. 

For each value in *d*, 

* if it's larger than *d_min* but smaller than *d_mean*, assign 25 to the corresponding value in *f*. 
* If a value in *d* is larger than *d_mean* but smaller than *d_max*, assign 75 to the corresponding value in *f*. 
* If a value equals to *d_mean*, assign 50 to the corresponding value in *f*. 
* Assign 0 to the corresponding value(s) in *f* for *d_min* in *d*. 
* Assign 100 to the corresponding value(s) in *f* for *d_max* in *d*. 
* In the end, f should have only the following values: 0, 25, 50, 75, and 100.

**Note**: you don't have to use Numpy in this question.

In [22]:
print(d.shape)
d

(3, 2, 5)


array([[[1.28830171, 1.76638545, 1.78366185, 1.16105664, 1.34914714],
        [1.17005055, 1.35084144, 1.15840016, 1.52523907, 1.79636335]],

       [[1.24131766, 1.22132747, 1.47719533, 1.00672019, 1.96268522],
        [1.09704316, 1.81939395, 1.97298392, 1.51985184, 1.56774563]],

       [[1.82698018, 1.55989667, 1.14690625, 1.40967054, 1.21010993],
        [1.34166183, 1.61165489, 1.37845614, 1.59543237, 1.3961828 ]]])

In [23]:
print(f'Mean: {d.mean()}')
print(f'Max:  {d.max()}')
print(f'Min:  {d.min()}', '\n')

for i in range(d.shape[0]):
    for j in range(d.shape[1]):
        for k in range(d.shape[2]):

            if (d[i][j][k] > d.min() and d[i][j][k] < d.mean()):
                f[i][j][k] = 25

            elif(d[i][j][k] > d.mean() and d[i][j][k] < d.max()):
                f[i][j][k] = 75

            elif(d[i][j][k] == d.mean()):
                f[i][j][k] = 50

            elif(d[i][j][k] == d.min()):
                f[i][j][k] = 0
                
            elif(d[i][j][k] == d.max()):
                f[i][j][k] = 100
print(f)

Mean: 1.4570887786582432
Max:  1.9729839227999184
Min:  1.006720188280794 

[[[ 25.  75.  75.  25.  25.]
  [ 25.  25.  25.  75.  75.]]

 [[ 25.  25.  75.   0.  75.]
  [ 25.  75. 100.  75.  75.]]

 [[ 75.  75.  25.  25.  25.]
  [ 25.  75.  25.  75.  25.]]]


#### 17. Print *d* and *f*. Do you have your expected *f*?
For instance, if your *d* is:
```python
[[[1.85836099, 1.67064465, 1.62576044, 1.40243961, 1.88454931],
  [1.75354326, 1.69403643, 1.36729252, 1.61415071, 1.12104981]],

[[1.72201435, 1.1862918 , 1.87078449, 1.7726778 , 1.88180042],
  [1.44747908, 1.31673383, 1.02000951, 1.52218947, 1.97066381]],

[[1.79129243, 1.74983003, 1.96028037, 1.85166831, 1.65450881],
 [1.18068344, 1.9587381 , 1.00656599, 1.93402165, 1.73514584]]]
```
Your *f* should be:
```python
[[[ 75.  75.  75.  25.  75.]
  [ 75.  75.  25.  25.  25.]]

 [[ 75.  25.  75.  75.  75.]
  [ 25.  25.  25.  25. 100.]]

 [[ 75.  75.  75.  75.  75.]
  [ 25.  75.   0.  75.  75.]]]
```

#### 18. Bonus question: instead of using numbers (i.e. 0, 25, 50, 75, and 100), use string values  ("A", "B", "C", "D", and "E") to label the array elements. For the example above, the expected result is:

```python
[[['D' 'D' 'D' 'B' 'D']
  ['D' 'D' 'B' 'B' 'B']]

 [['D' 'B' 'D' 'D' 'D']
  ['B' 'B' 'B' 'B' 'E']]

 [['D' 'D' 'D' 'D' 'D']
  ['B' 'D' 'A' 'D' 'D']]]
```
**Note**: you don't have to use Numpy in this question.

In [24]:
g = np.empty((3, 2, 5))

# Change data type of given numpy array: .astype()
g = g.astype('str')

for i in range(d.shape[0]):
    for j in range(d.shape[1]):
        for k in range(d.shape[2]):

            if (d[i][j][k] > d.min() and d[i][j][k] < d.mean()):
                g[i][j][k] = 'B'

            elif(d[i][j][k] > d.mean() and d[i][j][k] < d.max()):
                g[i][j][k] = 'D'

            elif(d[i][j][k] == d.mean()):
                g[i][j][k] = 'C'

            elif(d[i][j][k] == d.min()):
                g[i][j][k] = 'A'

            elif(d[i][j][k] == d.max()):
                g[i][j][k] = 'E'
print(f)
print('\n')
print(g)

[[[ 25.  75.  75.  25.  25.]
  [ 25.  25.  25.  75.  75.]]

 [[ 25.  25.  75.   0.  75.]
  [ 25.  75. 100.  75.  75.]]

 [[ 75.  75.  25.  25.  25.]
  [ 25.  75.  25.  75.  25.]]]


[[['B' 'D' 'D' 'B' 'B']
  ['B' 'B' 'B' 'D' 'D']]

 [['B' 'B' 'D' 'A' 'D']
  ['B' 'D' 'E' 'D' 'D']]

 [['D' 'D' 'B' 'B' 'B']
  ['B' 'D' 'B' 'D' 'B']]]


## Additional Challenges for the Nerds

If you are way ahead of your classmates and willing to accept some tough challenges about Numpy, take one or several of the following Codewar *katas*. 	You need to already possess a good amount of knowledge in Python and statistics because you will need to write Python functions, do loops, write conditionals, and deal with matrices.

* [Insert dashes](https://www.codewars.com/kata/insert-dashes)
* [Thinkful - Logic Drills: Red and bumpy](https://www.codewars.com/kata/thinkful-logic-drills-red-and-bumpy)

> <b>Insert dashes</b>
>
>Write a function insertDash(num)/InsertDash(int num) that will insert dashes ('-') between each two odd numbers in num. 
>For example: if num is 454793 the output should be 4547-9-3. Don't count zero as an odd number.
>Note that the number will always be non-negative (>= 0).

In [25]:
def insert_dash (num: int):
    '''
    Function that will insert dashes('-') between each two odd number in num
    '''
    # Checking condition of number
    if num >= 0:
        # converting to list of string        
        str_num = list(str(num))
    else:
        return print('The number should be an integer number')
    
    # Checking the condition to add dash ('-') in that component.
    for i in range(len(str_num)-1):
        if ((int(str_num[i]) % 2 != 0) and (int(str_num[i+1]) % 2 != 0)):
            str_num[i] = str_num[i] + '-'
        else:
            str_num[i] = str_num[i]
            
    return("".join(str_num))    


print(insert_dash (454793))
print(insert_dash (64687679171))


4547-9-3
6468767-9-1-7-1


In [27]:
# Note:
num=454793
print(list(str(num)))
print(str(num)[0])


['4', '5', '4', '7', '9', '3']
4


><b>Thinkful - Logic Drills: Red and bumpy</b>
>
>You're playing a game with a friend involving a bag of marbles. In the bag are ten marbles:
> 
> 1 smooth red marble \
> 4 bumpy red marbles \
> 2 bumpy yellow marbles \
> 1 smooth yellow marble \
> 1 bumpy green marble \
> 1 smooth green marble 
>
> You can see that 
>the probability of picking a smooth red marble from the bag is 1 / 10 or 0.10 and \
>the probability of picking a bumpy yellow marble is 2 / 10 or 0.20.

> The game works like this: 
> your friend puts her hand in the bag, chooses a marble (without looking at it) and 
> tells you whether it's bumpy or smooth. 
> Then you have to guess which color it is before she pulls it out and reveals whether you're correct or not.
>
> You know that the information about 
> whether the marble is bumpy or smooth changes the probability of what color it is, and 
> you want some help with your guesses.
>
> Write a function color_probability() that takes two arguments: 
> a color ('red', 'yellow', or 'green') and a texture ('bumpy' or 'smooth') 
> and 
> returns the probability as a decimal fraction accurate to two places.

> The probability should be a string and should discard any digits after the 100ths place. 
> For example, 2 / 3 or 0.6666666666666666 would become the string '0.66'. 
> Note this is different from rounding.

> As a complete example, color_probability('red', 'bumpy') should return the string '0.57'.


In [28]:
def color_probability(color:str, touch:str):
    '''
    A function that return the value of probability (only 2 digits) of choosen colors in string format
    within smooth and bumpy marbles.
    '''
    # probability of colors for smooth marble
    if touch == 'smooth':
        if color =='red':
            prob = str(1/3)
            return prob[:4]
        
        elif color == 'yellow':
            prob = str(1/3)
            return prob[:4]
        
        elif color == 'green':
            prob = str(1/3)
            return prob[:4]
        
    # probability of colors for dumby marble
    elif touch == 'bumpy':
        if color =='red':
            prob = str(4/7)
            return prob[:4]
        
        elif color == 'yellow':
            prob = str(2/7)
            return prob[:4]
        
        elif color == 'green':
            prob = str(1/7)
            return prob[:4]
        
print(color_probability('red', 'bumpy'))   
print(color_probability('red', 'smooth')) 

0.57
0.33
