### Concatenate

This involves join/stack multiple arrays together in vertical or horizontal manner.

By default, concatenate happens in vertical direction, where `axis=0`.
- With 1-D arrays, their last (and also, first) axis is `0`. So axis=1 would error because it doesn't exist for a 1D array.
<br>

- `c = np.concatenate(a, b)` is possible ; iff `a.ndim` == `b.ndim`. Resultingly, dimension of `c` will also be same.

In [8]:
## 1D array

a = np.arange(1,5)
b = np.arange(10,15)

print(a)
print(a.shape)

print(b)
print(b.shape)

[1 2 3 4]
(4,)
[10 11 12 13 14]
(5,)


In [9]:
np.concatenate([a,b])

## this appears to be a horizontal concatenation, but since 1-D arrays are vectors, so on being displayed their elements
## appear in horizontal manner

array([ 1,  2,  3,  4, 10, 11, 12, 13, 14])

In [11]:
np.concatenate([a,b], axis=1)

## vertical concate isn't possible in 1-D arrays

AxisError: axis 1 is out of bounds for array of dimension 1

In [12]:
np.concatenate([a,b], axis=0)

array([ 1,  2,  3,  4, 10, 11, 12, 13, 14])

In [18]:
## 2-D array

d = np.arange(1,5).reshape((1,4))
e = np.arange(1,5).reshape((1,4))

print(d)
print(d.shape)
print(d.ndim)

print("-" * 20)

print(e)
print(e.shape)
print(e.ndim)

[[1 2 3 4]]
(1, 4)
2
--------------------
[[1 2 3 4]]
(1, 4)
2


In [19]:
np.concatenate((d,e))

## default concateneation would happen vertically ie. axis=0

array([[1, 2, 3, 4],
       [1, 2, 3, 4]])

In [20]:
np.concatenate((d,e), axis=1)

array([[1, 2, 3, 4, 1, 2, 3, 4]])

In [21]:
np.concatenate((d,e), axis=0)

## concateneation in vertical direction

array([[1, 2, 3, 4],
       [1, 2, 3, 4]])

In [32]:
## example :

## default concate happens along vertical direction

x = np.arange(1,13).reshape(3,4)
y = np.arange(1,13).reshape(3,4)

np.concatenate((x,y))

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [33]:
np.concatenate((x,y), axis=1)

## horizontal direction

array([[ 1,  2,  3,  4,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  5,  6,  7,  8],
       [ 9, 10, 11, 12,  9, 10, 11, 12]])

In [37]:
## example:

d = np.arange(1,13).reshape(3,4)
e = np.arange(-1,-5,-1).reshape(1,4)

print(d)
print(d.shape)
print(d.ndim)

print("-" * 30)

print(e)
print(e.shape)
print(e.ndim)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
(3, 4)
2
------------------------------
[[-1 -2 -3 -4]]
(1, 4)
2


In [38]:
np.concatenate((d,e))

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [-1, -2, -3, -4]])

In [39]:
np.concatenate((d,e), axis=1)

## this will NOT workout

ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 3 and the array at index 1 has size 1

<br>

For arrays with 2-D or higher dimention, logically, if concatenation is to be done in :<br>
- horizontal direction => `a` and `b` should have equal number of **rows**.
<br>

- vertical direction => `a` and `b` should have equal number of **columns**.

(keep in mind, a.ndim == b.ndim is a prerequisite for concatenation)

### Stacking

- `hstack`
- `vstack`

In [41]:
## 1-D array

## both arrays are 1-D

x = np.arange(1,5)
y = np.arange(1,5)

print(x)
print(x.shape)
print(x.ndim)

print("-" * 20)

print(y)
print(y.shape)
print(y.ndim)

[1 2 3 4]
(4,)
1
--------------------
[1 2 3 4]
(4,)
1


In [46]:
h_xy = np.hstack((x, y))

print("after h-stacking => ",h_xy)
print(h_xy.ndim)
print(h_xy.shape)

after h-stacking =>  [1 2 3 4 1 2 3 4]
1
(8,)


In [49]:
v_xy = np.vstack((x,y))

print("after v-stacking => ",v_xy)
print("-" * 10)
print(v_xy.ndim)
print(v_xy.shape)

## here, the dimension of resultant array gets upcasted to 2 dimension

after v-stacking =>  [[1 2 3 4]
 [1 2 3 4]]
----------
2
(2, 4)


In [52]:
## 2-D array

## both arrays are 2-D , each with a single row and multiple columns

a = np.arange(1,5).reshape((1,4))
b = np.arange(-1,-5,-1).reshape((1,4))

print(a)
print(a.shape)
print(a.ndim)

print("-" * 20)

print(b)
print(b.shape)
print(b.ndim)

[[1 2 3 4]]
(1, 4)
2
--------------------
[[-1 -2 -3 -4]]
(1, 4)
2


In [54]:
np.hstack((a, b))

array([[ 1,  2,  3,  4, -1, -2, -3, -4]])

In [55]:
np.vstack((a, b))

array([[ 1,  2,  3,  4],
       [-1, -2, -3, -4]])

In [76]:
## example :
## both arrays are 2-D, each with multiple rows as well as columns

x = np.arange(1,13).reshape((3,4))
y = np.arange(-1,-13,-1).reshape((3,4))

x, y

(array([[ 1,  2,  3,  4],
        [ 5,  6,  7,  8],
        [ 9, 10, 11, 12]]),
 array([[ -1,  -2,  -3,  -4],
        [ -5,  -6,  -7,  -8],
        [ -9, -10, -11, -12]]))

In [77]:
np.hstack((x, y))

array([[  1,   2,   3,   4,  -1,  -2,  -3,  -4],
       [  5,   6,   7,   8,  -5,  -6,  -7,  -8],
       [  9,  10,  11,  12,  -9, -10, -11, -12]])

In [78]:
np.vstack((x, y))

array([[  1,   2,   3,   4],
       [  5,   6,   7,   8],
       [  9,  10,  11,  12],
       [ -1,  -2,  -3,  -4],
       [ -5,  -6,  -7,  -8],
       [ -9, -10, -11, -12]])

In [79]:
## example :

## one array 2-D while another is 1-D array

d = np.arange(1,13).reshape((3,4))
print(d)
print(d.shape)
print(d.ndim)

print("-" * 20)

f = np.arange(-5, -9, -1)
print(f)
print(f.shape)
print(f.ndim)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
(3, 4)
2
--------------------
[-5 -6 -7 -8]
(4,)
1


In [81]:
np.vstack((d, f))

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [-5, -6, -7, -8]])

In [83]:
np.hstack((d, f))

## NOT POSSIBLE

ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)

<br>

__Note :__
`np.stack` should NOT be used often because intrinsically the result of .stack produces the result with a new-axis.<br>
This could generally create confusion down the line, so this function is avoided and `hstack` and `vstack` are commonly preferred.

<br>

### Splitting

- `.split(arr)` => returns a list that contains the splitted arrays of `arr`

- Splitting happens from left to right.

<u>__1-D array__</u>

In [123]:
d = np.arange(30, 48)
print(d)

## here we split the array `d` in 3 equal parts

np.split(d, 3)

[30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47]


[array([30, 31, 32, 33, 34, 35]),
 array([36, 37, 38, 39, 40, 41]),
 array([42, 43, 44, 45, 46, 47])]

In [102]:
## throws error if array is not equally spaced seperable
## split in 4 equal parts is not possible

np.split(d, 4)

ValueError: array split does not result in an equal division

In [107]:
## specify list of indexes to split at multiple positions
## 0 to 4 inclusivve
## 5 to 17 inclusive

np.split(d, [5])

[array([30, 31, 32, 33, 34]),
 array([35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47])]

In [108]:
## 0 to 4 inclusive
## then, 5 to 7 inclusive
## then, 8 to 9 inclusive
## then, 10 to 17 inclusive

np.split(d, [5, 8, 10])

[array([30, 31, 32, 33, 34]),
 array([35, 36, 37]),
 array([38, 39]),
 array([40, 41, 42, 43, 44, 45, 46, 47])]

In [113]:
## although index 25 is not in the list. Error does not arise, but split would happen till the end,
## and the next sub-array would be an empty array

np.split(d, [5, 25])

[array([30, 31, 32, 33, 34]),
 array([35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]),
 array([], dtype=int32)]

In [114]:
np.split(d, [-5])

[array([30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42]),
 array([43, 44, 45, 46, 47])]

In [116]:
np.split(d, [-5, -8])

## split cannot happen from -5 to -8 index (i.e., from right to left) , so second sub-array is empty.
## third sub-array starts from -8 index.

[array([30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42]),
 array([], dtype=int32),
 array([40, 41, 42, 43, 44, 45, 46, 47])]

In [119]:
## similarly,

np.split(d, [2, 10, 6])

[array([30, 31]),
 array([32, 33, 34, 35, 36, 37, 38, 39]),
 array([], dtype=int32),
 array([36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47])]

<u>__2-D array__</u>

In [125]:
## example: single row but multiple columns

d = np.arange(1, 13).reshape((1, 12))
print(d)

[[ 1  2  3  4  5  6  7  8  9 10 11 12]]


In [126]:
np.split(d, 2, axis=1)

[array([[1, 2, 3, 4, 5, 6]]), array([[ 7,  8,  9, 10, 11, 12]])]

In [129]:
## similar like above step, `hsplit` would do same

np.hsplit(d, 2)

[array([[1, 2, 3, 4, 5, 6]]), array([[ 7,  8,  9, 10, 11, 12]])]

In [None]:
## example: multiple rows as well as columns

In [131]:
d = np.arange(1,21).reshape(4,5)
d

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20]])

In [135]:
np.split(d, 2)

## 2 equal parts
## default operation is vertical, i.e. axis=0.

## for 3 equal parts, it would throw error because 4 rows cannot split in 3 equal parts.

[array([[ 1,  2,  3,  4,  5],
        [ 6,  7,  8,  9, 10]]),
 array([[11, 12, 13, 14, 15],
        [16, 17, 18, 19, 20]])]

In [150]:
np.split(d, [1, 3, -1], axis=0)

[array([[1, 2, 3, 4, 5]]),
 array([[ 6,  7,  8,  9, 10],
        [11, 12, 13, 14, 15]]),
 array([], shape=(0, 5), dtype=int32),
 array([[16, 17, 18, 19, 20]])]

In [151]:
np.split(d, [1, 3, -1], axis=1)

[array([[ 1],
        [ 6],
        [11],
        [16]]),
 array([[ 2,  3],
        [ 7,  8],
        [12, 13],
        [17, 18]]),
 array([[ 4],
        [ 9],
        [14],
        [19]]),
 array([[ 5],
        [10],
        [15],
        [20]])]

In [152]:
np.split(d, [2,4], axis=1)

[array([[ 1,  2],
        [ 6,  7],
        [11, 12],
        [16, 17]]),
 array([[ 3,  4],
        [ 8,  9],
        [13, 14],
        [18, 19]]),
 array([[ 5],
        [10],
        [15],
        [20]])]

In [153]:
np.hsplit(d, [1,3])

[array([[ 1],
        [ 6],
        [11],
        [16]]),
 array([[ 2,  3],
        [ 7,  8],
        [12, 13],
        [17, 18]]),
 array([[ 4,  5],
        [ 9, 10],
        [14, 15],
        [19, 20]])]

In [154]:
np.vsplit(d, [1,3])

[array([[1, 2, 3, 4, 5]]),
 array([[ 6,  7,  8,  9, 10],
        [11, 12, 13, 14, 15]]),
 array([[16, 17, 18, 19, 20]])]

## Broadcasting

Broadcasting is the name given to the method that NumPy uses to allow array arithmetic between arrays with a different shape or size.<br>
It's a necessary intrinsic mechanism automatically carried out by Numpy.<br>
Broadcasting solves the problem of arithmetic between arrays of differing shapes by in effect replicating the smaller array along the last mismatched dimension.

https://machinelearningmastery.com/broadcasting-with-numpy-arrays/

In order to understand it, let's take a look at its rules.


### <u>Rules</u> :

### *Rule 1 :* If two arrays differ in the number of dimensions, the shape of one with fewer dimension is padded with `1` on its leading (i.e., left) side.

### Prerequisites for Rule 2 :

#### Consider each array's shape (going from right side),
#### 1. The value of corresponding dimension size should be same, OR
#### 2. The value of one dimension size should be 1

### *Rule 2 :* If the corresponding dimension sizes of two arrays do not match, the array with dimension size equal to 1 is stretched to match with corresponding dimension size of other array.

### *Rule 3 :* If the dimension sizes still disagree and neither equals to 1, then Error is raised.

In [9]:
## example :
a = np.arange(1, 5)

a + 2

array([3, 4, 5, 6])

In [11]:
## example :

b = np.arange(1,7).reshape(3,2)
c = np.arange(1,3)

print(b)
print(b.ndim)
print(b.shape)

print("-" * 20)

print(c)
print(c.ndim)
print(c.shape)

[[1 2]
 [3 4]
 [5 6]]
2
(3, 2)
--------------------
[1 2]
1
(2,)


In [16]:
c % b

array([[0, 0],
       [1, 2],
       [1, 2]])

In [24]:
## example :

b = np.arange(1,7).reshape(3,2)
d = np.arange(1,5)

print(b)
print(b.ndim)
print(b.shape)

print("-" * 20)

print(d)
print(d.ndim)
print(d.shape)

[[1 2]
 [3 4]
 [5 6]]
2
(3, 2)
--------------------
[1 2 3 4]
1
(4,)


In [25]:
d + b

ValueError: operands could not be broadcast together with shapes (4,) (3,2) 

In [34]:
## example :

a = np.arange(1,7).reshape((3,2))
b = np.array([10, 20, 30, 40]).reshape(2,2)

print(a)
print(a.ndim)
print(a.shape)

print("-" * 20)

print(b)
print(b.ndim)
print(b.shape)

[[1 2]
 [3 4]
 [5 6]]
2
(3, 2)
--------------------
[[10 20]
 [30 40]]
2
(2, 2)


In [35]:
a * b

ValueError: operands could not be broadcast together with shapes (3,2) (2,2) 

In [36]:
## example :

a = np.array([[1,2,3]])
b = np.array([[10], [20], [30]])

print(a)
print(a.ndim)
print(a.shape)

print("-" * 20)

print(b)
print(b.ndim)
print(b.shape)

[[1 2 3]]
2
(1, 3)
--------------------
[[10]
 [20]
 [30]]
2
(3, 1)


In [38]:
a * b

array([[10, 20, 30],
       [20, 40, 60],
       [30, 60, 90]])

In [44]:
## example :

a = np.arange(1,5).reshape(1,4)
b = np.arange(10,15).reshape(5,1)

print(a)
print(a.ndim)
print(a.shape)

print("-" * 20)

print(b)
print(b.ndim)
print(b.shape)

[[1 2 3 4]]
2
(1, 4)
--------------------
[[10]
 [11]
 [12]
 [13]
 [14]]
2
(5, 1)


In [45]:
a + b

array([[11, 12, 13, 14],
       [12, 13, 14, 15],
       [13, 14, 15, 16],
       [14, 15, 16, 17],
       [15, 16, 17, 18]])

In [51]:
## example :
A = np.arange(12).reshape(3,4)
B = np.array([1, 3])

print(A)
print(A.ndim)
print(A.shape)

print("-" * 20)

print(B)
print(B.ndim)
print(B.shape)

A + B

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
2
(3, 4)
--------------------
[1 3]
1
(2,)


ValueError: operands could not be broadcast together with shapes (3,4) (2,) 

In [54]:
## example :
A = np.arange(1,3).reshape(2,1)
B = np.arange(1,4).reshape(1,3)

print(A)
print(A.ndim)
print(A.shape)

print("-" * 20)

print(B)
print(B.ndim)
print(B.shape)

A * B


#### explanation =>
## - array A and B are of same dimension - Rule 1 is already satisified. Move ahead for rule 2.
## - Now for Rule 2: A.shape = (2,1)  |  B.shape = (1,3) . Heading from right side,
## 3 (of B's shape) is not equal to 1 (of A's shape).
## But one of these is 1 (i.e. in A's shape), so it will be stretched to 3. Hence we get resultant shape as (2,3)
## Similarly, now in left side, 1 (of B's shape) is not equal to 2 (in A's shape). But being it's 1, it'll get broadcasted
## to 2. So resultant shape, is again, (2,3).

[[1]
 [2]]
2
(2, 1)
--------------------
[[1 2 3]]
2
(1, 3)


array([[1, 2, 3],
       [2, 4, 6]])

In [2]:
## example :

A = np.array([1, 2, 3])
B = np.array([9, 8, 7])

print(A)
print(A.ndim)
print(A.shape)

print("-" * 20)

print(B)
print(B.ndim)
print(B.shape)

np.dot(A, B)

[1 2 3]
1
(3,)
--------------------
[9 8 7]
1
(3,)


46

In [3]:
## example :

A = np.array([[1,2], [3,4]])
B = np.array([[1], [2]])

print(A)
print(A.ndim)
print(A.shape)

print("-" * 20)

print(B)
print(B.ndim)
print(B.shape)

np.dot(A, B)

[[1 2]
 [3 4]]
2
(2, 2)
--------------------
[[1]
 [2]]
2
(2, 1)


array([[ 5],
       [11]])

In [4]:
## example :

A = np.array([1, 2, 3])
k = 3

np.dot(A, k)

array([3, 6, 9])

In [5]:
## example :

A = np.array([[1,2], [3,4]])
B = np.array([1, 1])

print(A)
print(A.ndim)
print(A.shape)

print("-" * 20)

print(B)
print(B.ndim)
print(B.shape)

np.dot(A, B)

[[1 2]
 [3 4]]
2
(2, 2)
--------------------
[1 1]
1
(2,)


array([3, 7])

### Shallow and Deep copy

In [236]:
## example :

a = np.arange(1,5)
print("before `a` :",a)

b = a.reshape(2,2)
print("before `b` :",b)
print("-" * 30)

a[0] = 50            ## change value of 0 index of `a` 

print("after `a` :",a)
print("after `b` :",b)

before `a` : [1 2 3 4]
before `b` : [[1 2]
 [3 4]]
------------------------------
after `a` : [50  2  3  4]
after `b` : [[50  2]
 [ 3  4]]


Notice that, `a` and `b` both reflect the same change if a change in array `a `.
<br><br>

In [237]:
b[1][1] = 180

In [238]:
a

array([ 50,   2,   3, 180])

In [239]:
b

array([[ 50,   2],
       [  3, 180]])

In [240]:
np.shares_memory(a, b)

True

Similar behavior happens if change is made in array `b`.
<br><br>

**This is because, `a` and `b` point to same mempry location where all its elements are allocated.**

<br>

In [241]:
## example :

a = np.arange(1,5)
print("before `a` :", a)
print("-" * 20)

c = a + 2

print("c :", c)
print("after `a` :", a)

before `a` : [1 2 3 4]
--------------------
c : [3 4 5 6]
after `a` : [1 2 3 4]


In [242]:
np.shares_memory(a, b)

False

However here, `a` does not change. **Because `c` is an entirely new allocated array that got created after some mathematical operation happened on `a`.**

<hr>

### Internal organization of NumPy arrays - https://numpy.org/devdocs/dev/internals.html

When a new array is created entirely - __Deep copy__

When new numpy array (from an existing array) is NOT created, but only a new header gets created - __Shallow copy__

`a.copy()` => would perfrom Deep copy on array `a`, and creates a new separate array. 

<hr><br>

In [244]:
## example :

d = np.arange(1,5)
print("before `d` :",d)
print("-" * 20)

e = d * 1
print("e :",e)
print("after `d` :",d)

## despite `e` and `d` look same. But since e is formed as a result of a mathematical operation on d, so e and d don't
## share memory.

before `d` : [1 2 3 4]
--------------------
e : [1 2 3 4]
after `d` : [1 2 3 4]


In [245]:
np.shares_memory(e,d)

False

In [253]:
## example :

a = np.arange(1,10).reshape(1,9)
print('before `a` :',a)
print("-" * 20)

b = a[: , ::2]           ## header allocation also store the info about stride
print(b)
print('after `a` :',a)

before `a` : [[1 2 3 4 5 6 7 8 9]]
--------------------
[[1 3 5 7 9]]
after `a` : [[1 2 3 4 5 6 7 8 9]]


In [254]:
np.shares_memory(a, b)

True

In [277]:
## example :

a = np.arange(1,10)
print('before `a` :',a)
print("-" * 20)

b = np.split(a, 3)
a[0] = 548

print('after `a` :',a)
print(b)

## shallow copy works here

before `a` : [1 2 3 4 5 6 7 8 9]
--------------------
after `a` : [548   2   3   4   5   6   7   8   9]
[array([548,   2,   3]), array([4, 5, 6]), array([7, 8, 9])]


In [286]:
np.shares_memory(a, b)   ## False

np.shares_memory(a, b[0])         ## True, also for (a,b[1]) as well as (a,b[2])

True

In [275]:
## example :
a = np.array([3,4,6,0,1])
print(a)

b = a.copy()
print(b)
print("-" * 10)

a[2] = 99
print(a)
print(b)

[3 4 6 0 1]
[3 4 6 0 1]
----------
[ 3  4 99  0  1]
[3 4 6 0 1]


In [276]:
np.shares_memory(a, b)

False