# 8 Data Manipulation with NumPy
- Examine how to clean and preprocess data using NumPy.
- Hoy to discover missing values (and fill them up).
- Ways to remove irrelevant data.
- sort(), shuffle(), reshape(), stack(), strip()
## 8_11 Stacking NDarrays
- stack(), vstack() - vert, hstack() - horizon, dstack() - depth

#### numpy.stack(arrays, axis=0, out=None, *, dtype=None, casting='same_kind')
- Join a sequence of arrays along a new axis.
- The axis parameter specifies the index of the new axis in the dimensions of the result. For example, if axis=0 it will be the first dimension and if axis=-1 it will be the last dimension.

#### numpy.vstack(tup, *, dtype=None, casting='same_kind')
- Stack arrays in sequence vertically (row wise).
- This is equivalent to concatenation along the first axis after 1-D arrays of shape (N,) have been reshaped to (1,N). Rebuilds arrays divided by vsplit.
- This function makes most sense for arrays with up to 3 dimensions. For instance, for pixel-data with a height (first axis), width (second axis), and r/g/b channels (third axis). The functions concatenate, stack and block provide more general stacking and concatenation operations.

#### numpy.hstack(tup, *, dtype=None, casting='same_kind')
- Stack arrays in sequence horizontally (column wise).
- This is equivalent to concatenation along the second axis, except for 1-D arrays where it concatenates along the first axis. Rebuilds arrays divided by hsplit.
- This function makes most sense for arrays with up to 3 dimensions. For instance, for pixel-data with a height (first axis), width (second axis), and r/g/b channels (third axis). The functions concatenate, stack and block provide more general stacking and concatenation operations.

#### numpy.dstack(tup)
- Stack arrays in sequence depth wise (along third axis).
- This is equivalent to concatenation along the third axis after 2-D arrays of shape (M,N) have been reshaped to (M,N,1) and 1-D arrays of shape (N,) have been reshaped to (1,N,1). Rebuilds arrays divided by dsplit.
- This function makes most sense for arrays with up to 3 dimensions. For instance, for pixel-data with a height (first axis), width (second axis), and r/g/b channels (third axis). The functions concatenate, stack and block provide more general stacking and concatenation operations.

In [1]:
import numpy as np
np.__version__

'1.26.4'

In [2]:
# Function show_attr

def show_attr(arrnm: str) -> str:
    strout = f' {arrnm}: '

    for attr in ('shape', 'ndim', 'size', 'dtype'):     #, 'itemsize'):
            arrnm_attr = arrnm + '.' + attr
            strout += f'| {attr}: {eval(arrnm_attr)} '

    return strout

In [3]:
# Load only some strings columns
lend_LT = np.genfromtxt('lending-co-LT.csv',
                        delimiter=',',
                        usecols=[1,2,4],
                        dtype=str,
                        skip_header=1)

display(show_attr('lend_LT'))
lend_LT

' lend_LT: | shape: (1043, 3) | ndim: 2 | size: 3129 | dtype: <U12 '

array([['id_1', 'Product B', 'Location 2'],
       ['id_2', 'Product B', 'Location 3'],
       ['id_3', 'Product C', 'Location 5'],
       ...,
       ['id_1041', 'Product B', 'Location 23'],
       ['id_1042', 'Product C', 'Location 52'],
       ['id_1043', 'Product B', 'Location 142']], dtype='<U12')

In [4]:
# First strip 'id_' from col[0] <- inplace=False
lend_LT[:,0] = np.chararray.strip(lend_LT[:,0], 'id_')
lend_LT
## FUTURE make a fun that get rid of all non numeric part

array([['1', 'Product B', 'Location 2'],
       ['2', 'Product B', 'Location 3'],
       ['3', 'Product C', 'Location 5'],
       ...,
       ['1041', 'Product B', 'Location 23'],
       ['1042', 'Product C', 'Location 52'],
       ['1043', 'Product B', 'Location 142']], dtype='<U12')

In [5]:
# strip excess strings in cols[1-2] - Leave letters in col[1]
lend_LT[:,1] = np.chararray.strip(lend_LT[:,1], 'Product ')
lend_LT[:,2] = np.chararray.strip(lend_LT[:,2], 'Location ')
lend_LT

array([['1', 'B', '2'],
       ['2', 'B', '3'],
       ['3', 'C', '5'],
       ...,
       ['1041', 'B', '23'],
       ['1042', 'C', '52'],
       ['1043', 'B', '142']], dtype='<U12')

In [6]:
## Convert letters of col[1] to numbers
# np.where(lend_LT[:,1] == 'A', 1, lend_LT[:,1])
# np.where(lend_LT[:,1] == 'B', 2, lend_LT[:,1])
# np.where(lend_LT[:,1] == 'C', 3, lend_LT[:,1])
# dic = {'A':1, 'B':2, 'C':3, 'D':4, 'E':5}
dic = {chr(i + 64): i for i in range(1,7)}
for k,v in dic.items():
    lend_LT[:,1] = np.where(lend_LT[:,1] == k, v, lend_LT[:,1])

lend_LT

array([['1', '2', '2'],
       ['2', '2', '3'],
       ['3', '3', '5'],
       ...,
       ['1041', '2', '23'],
       ['1042', '3', '52'],
       ['1043', '2', '142']], dtype='<U12')

In [7]:
## Now we can type cast to num the whole array - to float or direc int
lend_LT.astype(dtype=np.int32)

array([[   1,    2,    2],
       [   2,    2,    3],
       [   3,    3,    5],
       ...,
       [1041,    2,   23],
       [1042,    3,   52],
       [1043,    2,  142]])

### Notes and Examples from Manual - np.strings.strip

In [8]:
display(c := np.array(['aAaAaA', '  aA  ', 'abBABba']))
np.chararray.strip(c)
display(np.chararray.strip(c, 'a'))
# display(c)
np.chararray.strip(c, 'A')

array(['aAaAaA', '  aA  ', 'abBABba'], dtype='<U7')

chararray(['AaAaA', '  aA', 'bBABb'], dtype='<U7')

chararray(['aAaAa', '  aA', 'abBABba'], dtype='<U7')

In [9]:
### FUTURE! play with Python strp strings, and ??