#### Notes on hacking numpy arrays

Note:  For the homework assignment you will use the numpy arrays representation of the data.

To experiment with using different sets of features you will need
to be able to concatenate the  columns of the
dataset that are appropriate to each experiment.

Here are some of the relevant functions.

In [3]:
import numpy as np

# Cook up random 1 and 2d arrays of different shapes
a_1D_1 = np.random.rand(4,)
a_1D_2 = np.random.rand(2,)
b_2D_1 = np.random.rand(4,2)
b_2D_2 = np.random.rand(4,3)
b_2D_3 = np.random.rand(2,2)

For example:

In [4]:
b_2D_3

array([[0.39292363, 0.2782125 ],
       [0.61183459, 0.15992419]])

#### Concatenate columnwise (hstack and concatenate)

We concatenate two 2D arrays columnwise using `np.hstack`.  Notice that `np.hstack`
takes 1 argument, which should be the sequence of arrays to be concatenated.

All the `numpy` concatenation functions work this way.
Input a sequnce containing an array with 2 columns and an array with 3 columns; get back an array with 5 columns
which concatenates the columns of the two input arrays.

In [5]:
print(b_2D_1)
print(b_2D_2)
np.hstack([b_2D_1, b_2D_2])

[[0.66679165 0.51653912]
 [0.53947541 0.7048727 ]
 [0.22253847 0.88057288]
 [0.92553945 0.78514359]]
[[0.95012211 0.90591711 0.73263701]
 [0.76655295 0.27463151 0.0868449 ]
 [0.65977099 0.43141711 0.37612845]
 [0.10337356 0.24392897 0.04832889]]


array([[0.66679165, 0.51653912, 0.95012211, 0.90591711, 0.73263701],
       [0.53947541, 0.7048727 , 0.76655295, 0.27463151, 0.0868449 ],
       [0.22253847, 0.88057288, 0.65977099, 0.43141711, 0.37612845],
       [0.92553945, 0.78514359, 0.10337356, 0.24392897, 0.04832889]])

Obviously the number of rows must be the same.

In [6]:
b_2D_1.shape, b_2D_2.shape

((4, 2), (4, 3))

For example:

In [7]:
print(b_2D_1.shape)
print(b_2D_3.shape)

#A Value error. Hstacking 4x2 with 2X2.  Number of  rows does not agree.
#np.hstack([b_2D_1, b_2D_3])

(4, 2)
(2, 2)


The number of dimensions must also agree.

In [8]:
print(b_2D_1.shape)
print(a_1D_1.shape)

#A Value error because b_2D_1 and a_1D do not have the same number of dimensions
#(2D vs 1D)
#np.hstack([b_2D_1, a_1D_1])

(4, 2)
(4,)


Equivalently use `np.concatenate`

The numpy function `np.concatenate` can do the same concatenations as `np.hstack`, but
the `axis` parameter must be taken into account, since
it can concatenate along different axes (columnwise or rowwise)

The default value for axis is 0. So the axis=1 parameter (columnwise concatenation)  is necessary here.

In [9]:
# concat 4x2 with 4x3 to get 4x5.  Columnwise concat
print(b_2D_1.shape)
print(b_2D_2.shape)
np.concatenate([b_2D_1, b_2D_2],axis=1)

(4, 2)
(4, 3)


array([[0.66679165, 0.51653912, 0.95012211, 0.90591711, 0.73263701],
       [0.53947541, 0.7048727 , 0.76655295, 0.27463151, 0.0868449 ],
       [0.22253847, 0.88057288, 0.65977099, 0.43141711, 0.37612845],
       [0.92553945, 0.78514359, 0.10337356, 0.24392897, 0.04832889]])

#### Concatenate rowwise (vstack and concatenate)

Concatenate 4x2 with 2x2 to get 6x2.  Rowwise concat.


In [10]:
print(b_2D_1)
print(b_2D_3)
np.vstack([b_2D_1, b_2D_3,])

[[0.66679165 0.51653912]
 [0.53947541 0.7048727 ]
 [0.22253847 0.88057288]
 [0.92553945 0.78514359]]
[[0.39292363 0.2782125 ]
 [0.61183459 0.15992419]]


array([[0.66679165, 0.51653912],
       [0.53947541, 0.7048727 ],
       [0.22253847, 0.88057288],
       [0.92553945, 0.78514359],
       [0.39292363, 0.2782125 ],
       [0.61183459, 0.15992419]])

In this case, the number of columns must agree:

In [165]:
print(b_2D_1.shape)
print(b_2D_3.shape)
# This is a ValueError
# np.vstack([b_2D_1, b_2D_2,])

(4, 2)
(2, 2)


Again,  the number of dimensions must also agree:

In [166]:
print(b_2D_1.shape)
print(a_1D_2.shape)
# This is a Value Error
#np.vstack([b_2D_1, a_1D,])

(4, 2)
(2,)


Alternatively, use concatenate with axis = 0;
Default value for axis is 0. So the axis=0 parameter can be omitted,
but is often left on for clarity:

In [167]:
print(b_2D_1.shape)
print(b_2D_3.shape)
np.concatenate([b_2D_1, b_2D_3,],axis=0)

(4, 2)
(2, 2)


array([[0.03400579, 0.12307457],
       [0.8332505 , 0.5723692 ],
       [0.9454464 , 0.59837892],
       [0.83839972, 0.97849475],
       [0.30522725, 0.67635996],
       [0.99056986, 0.92396184]])

#### Concatenating 1D arrays

Since alll the successful examples up till now  have been 2D
arrays it is worth noting that that 1D arrays can be concatenated 
too, as long as all the arrays being concatenated are 1D.

We cook up some 1D data.

In [170]:
n1 = np.array([3,1,0,-2,])
n2 = np.array([13/2,1,1/2,-4,2])
# T is a tuple of 1D arrays
T =  (n1,2*n1,n2, 3*n2,)
T

(array([ 3,  1,  0, -2]),
 array([ 6,  2,  0, -4]),
 array([ 6.5,  1. ,  0.5, -4. ,  2. ]),
 array([ 19.5,   3. ,   1.5, -12. ,   6. ]))

The result is necessarily a 1D array, so that specifying an axis 
has no meaning:

In [171]:
for a in T:
    print(a.shape)

print(np.concatenate(T).shape)

(4,)
(4,)
(5,)
(5,)
(18,)
