### ***ndarray Object Internals***

1. **ndarray provides a way to interpret a block of homogenously typed data as a multidimensional array object**. 
2. **Part of what makes ndarray flexible is that every array object is a strided view on a block of data**.
3. **More precisely, the ndarray internally consists of the following**:
    4. *A pointer to data—that is, a block of data in RAM or in a memorymapped file*
    5. *The data type or dtype, describing fixed-size value cells in the array*
    6. *A tuple indicating the array’s shape*
    7. *A tuple of strides, integers indicating the number of bytes to “step” in order to advance one element along a dimension*

In [1]:
import numpy as np

In [2]:
np.ones((10,5)).shape

(10, 5)

In [3]:
np.ones((3,4,5),dtype=np.float64).strides

(160, 40, 8)

*Strides can even negative which can allow an array to move backward in memory.*

eg. obj[[::-1]]

#### ***Numpy dtype heirarchy*** 

In [4]:
# the dtypes have superclasses np.integer and np.floating which can be used in conjunction with the np.issubdtype function
ints = np.ones(10,dtype=np.uint16)

In [5]:
floats = np.ones(10,dtype=np.float32)

In [6]:
np.issubdtype(ints.dtype,np.integer)

True

In [7]:
np.issubdtype(floats.dtype,np.floating)

True

In [8]:
# you can also see all of the parent classes of a specific dtype by calling the type's mro method:
np.float64.mro()

[numpy.float64,
 numpy.floating,
 numpy.inexact,
 numpy.number,
 numpy.generic,
 float,
 object]

In [9]:
# we also have
np.issubdtype(ints.dtype,np.number)

True

### ***Advanced Array Manipulation***

*While much of heavy lifting for data analysis is done by pandas, you may require this at some point*

#### ***Reshaping array***

*You can convert an array from one shape to another without copying any data.*

In [10]:
arr = np.arange(8)

In [11]:
arr

array([0, 1, 2, 3, 4, 5, 6, 7])

In [12]:
arr.reshape((4,2))

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7]])

In [13]:
# a multidimensional array can also be reshaped
arr.reshape((4,2)).reshape((2,4))

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

In [14]:
# one of the value can be -1, in which case the value used for that dimension  will be inferred from the data
arr = np.arange(15)


In [15]:
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [16]:
arr.reshape((5,-1))

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [17]:
arr1 = np.arange(20)

In [18]:
arr1.reshape((5,-1))

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

In [19]:
# since an array's shape attribute is a tuple, it can be passed to reshape too:

other_arr = np.ones((3,5))

In [20]:
other_arr.shape

(3, 5)

In [21]:
arr.reshape(other_arr.shape)

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

**The opposite operation to reshaping from one-dimensional to a higher dimension is typically known as flattening or raveling**

In [22]:
arr = np.arange(15).reshape((5,3))

In [23]:
arr

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [24]:
arr.ravel()

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [25]:
arr.flatten()

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

#### ***C versus FORTRAN Order***

*FORTRAN -- matrices are column major -- column values are stored in adjacent memory locations.*

*C -- matrices are row major -- row values are stored in adjacent memory locations.*

In [26]:
arr = np.arange(12).reshape((3,4))

In [27]:
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [28]:
arr.ravel()

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [29]:
arr.ravel('F')

array([ 0,  4,  8,  1,  5,  9,  2,  6, 10,  3,  7, 11])

#### ***Concatenation and Splitting Array***

*numpy.concatenate takes a sequence of arrays and joins them together in order along the input axis:*

In [30]:
arr1 = np.array([[1,2,3],[4,5,6]])

In [31]:
arr2 = np.array([[7,8,9],[10,11,12]])

In [32]:
np.concatenate([arr1,arr2],axis=1)

array([[ 1,  2,  3,  7,  8,  9],
       [ 4,  5,  6, 10, 11, 12]])

In [33]:
np.concatenate([arr1,arr2],axis=0)

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [34]:
np.vstack((arr1,arr2))

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [35]:
np.hstack((arr1,arr2))

array([[ 1,  2,  3,  7,  8,  9],
       [ 4,  5,  6, 10, 11, 12]])

***split, on the other hand, slices apart an array into multiple arrays along an axis:***

In [36]:
arr = np.random.randn(5,2)

In [37]:
arr

array([[ 0.62759792,  0.81979715],
       [ 0.30847778,  0.53597999],
       [ 0.78667828,  1.22697069],
       [-1.97315729,  1.78041135],
       [ 0.73235149, -0.26197522]])

In [38]:
first,second,third = np.split(arr,[1,3]) # the value [1,3] indicates the indices at which to split the array into pieces 

In [39]:
first

array([[0.62759792, 0.81979715]])

In [40]:
second

array([[0.30847778, 0.53597999],
       [0.78667828, 1.22697069]])

In [41]:
third

array([[-1.97315729,  1.78041135],
       [ 0.73235149, -0.26197522]])

#### ***Stacking Helpers: r_ and c_***

In [42]:
arr = np.arange(6)

In [43]:
arr1 = arr.reshape((3,2))

In [44]:
arr2 = np.random.randn(3,2)

In [45]:
np.r_[arr1,arr2]

array([[ 0.        ,  1.        ],
       [ 2.        ,  3.        ],
       [ 4.        ,  5.        ],
       [ 0.23669321,  0.62308946],
       [-1.38531346, -0.86860088],
       [ 0.23018646, -2.37609683]])

In [46]:
np.c_[np.r_[arr1,arr2],arr]

array([[ 0.        ,  1.        ,  0.        ],
       [ 2.        ,  3.        ,  1.        ],
       [ 4.        ,  5.        ,  2.        ],
       [ 0.23669321,  0.62308946,  3.        ],
       [-1.38531346, -0.86860088,  4.        ],
       [ 0.23018646, -2.37609683,  5.        ]])

In [47]:
np.c_[arr1,arr2]

array([[ 0.        ,  1.        ,  0.23669321,  0.62308946],
       [ 2.        ,  3.        , -1.38531346, -0.86860088],
       [ 4.        ,  5.        ,  0.23018646, -2.37609683]])

In [48]:
np.c_[1:6,-10:-5]

array([[  1, -10],
       [  2,  -9],
       [  3,  -8],
       [  4,  -7],
       [  5,  -6]])

#### Repeating Elements: tile and repeat

*Two useful tools for repeating and replicating arrays to produce larger arrays are the **repeat** and **tile** functions.*

***repeat** replicates each element in an array some number of times, producing a large array.*

***tile**, is a shortcut for stacking copies of an array along an axis.*

In [49]:
arr = np.arange(3)

In [50]:
arr

array([0, 1, 2])

In [51]:
arr.repeat(3)

array([0, 0, 0, 1, 1, 1, 2, 2, 2])

In [52]:
arr.repeat([2,3,4]) # for each element to be repeated different number of times

array([0, 0, 1, 1, 1, 2, 2, 2, 2])

In [53]:
# multidimensional array can have their elements repeated along a particular axis

arr = np.random.randn(2,2)

In [54]:
arr

array([[ 0.21563322,  0.93231894],
       [ 0.07884797, -0.60083107]])

In [55]:
arr.repeat(2,axis=0)

array([[ 0.21563322,  0.93231894],
       [ 0.21563322,  0.93231894],
       [ 0.07884797, -0.60083107],
       [ 0.07884797, -0.60083107]])

In [56]:
arr.repeat(2) # notice it flattens the array if axis is not passed

array([ 0.21563322,  0.21563322,  0.93231894,  0.93231894,  0.07884797,
        0.07884797, -0.60083107, -0.60083107])

In [57]:
arr.repeat([2,3],axis=1)

array([[ 0.21563322,  0.21563322,  0.93231894,  0.93231894,  0.93231894],
       [ 0.07884797,  0.07884797, -0.60083107, -0.60083107, -0.60083107]])

In [58]:
arr

array([[ 0.21563322,  0.93231894],
       [ 0.07884797, -0.60083107]])

In [59]:
np.tile(arr,2)

array([[ 0.21563322,  0.93231894,  0.21563322,  0.93231894],
       [ 0.07884797, -0.60083107,  0.07884797, -0.60083107]])

In [60]:
arr

array([[ 0.21563322,  0.93231894],
       [ 0.07884797, -0.60083107]])

In [61]:
np.tile(arr,(2,1))

array([[ 0.21563322,  0.93231894],
       [ 0.07884797, -0.60083107],
       [ 0.21563322,  0.93231894],
       [ 0.07884797, -0.60083107]])

In [62]:
np.tile(arr,(3,2))

array([[ 0.21563322,  0.93231894,  0.21563322,  0.93231894],
       [ 0.07884797, -0.60083107,  0.07884797, -0.60083107],
       [ 0.21563322,  0.93231894,  0.21563322,  0.93231894],
       [ 0.07884797, -0.60083107,  0.07884797, -0.60083107],
       [ 0.21563322,  0.93231894,  0.21563322,  0.93231894],
       [ 0.07884797, -0.60083107,  0.07884797, -0.60083107]])

#### ***Fancy Indexing Equivalent: take and put***

In [63]:
arr = np.arange(10)*100

In [64]:
arr

array([  0, 100, 200, 300, 400, 500, 600, 700, 800, 900])

In [65]:
inds = [7,2,6,3]

In [66]:
arr[inds]

array([700, 200, 600, 300])

In [67]:
arr.take(inds)

array([700, 200, 600, 300])

In [68]:
arr.put(inds,42)

In [69]:
arr

array([  0, 100,  42,  42, 400, 500,  42,  42, 800, 900])

In [70]:
arr.put(inds,[40,41,43,42])

In [71]:
arr

array([  0, 100,  41,  42, 400, 500,  43,  40, 800, 900])

In [72]:
# to use take along other axis, you can pass the axis keyword
inds = [2,0,2,1]

In [73]:
arr = np.random.randn(2,4)

In [74]:
arr

array([[-0.09663553,  0.35229709,  0.41068387,  1.5435796 ],
       [ 0.59125662, -1.12538739,  1.15570769, -0.39656128]])

In [75]:
arr.take(inds,axis=1)

array([[ 0.41068387, -0.09663553,  0.41068387,  0.35229709],
       [ 1.15570769,  0.59125662,  1.15570769, -1.12538739]])

### ***Broadcasting***

*It governs how operations work between array of different shapes.*

In [76]:
arr= np.arange(5)

In [77]:
arr

array([0, 1, 2, 3, 4])

In [78]:
arr*4 # the scalar value 4 is broadcasted to all th other elements of the array arr in the multiplication operation

array([ 0,  4,  8, 12, 16])

In [79]:
arr = np.random.randn(4,3)

In [80]:
arr

array([[-0.39944912,  0.34825583, -0.01394106],
       [-0.90059731,  0.05042898, -1.09216457],
       [-0.2305115 ,  2.16578349, -2.31892095],
       [ 0.32282304,  0.66212233,  0.30563256]])

In [81]:
arr.mean(0)

array([-0.30193372,  0.80664766, -0.77984851])

In [82]:
demeaned = arr-arr.mean(0)

In [83]:
demeaned

array([[-0.09751539, -0.45839183,  0.76590744],
       [-0.59866359, -0.75621868, -0.31231606],
       [ 0.07142223,  1.35913583, -1.53907245],
       [ 0.62475676, -0.14452533,  1.08548106]])

In [84]:
arr

array([[-0.39944912,  0.34825583, -0.01394106],
       [-0.90059731,  0.05042898, -1.09216457],
       [-0.2305115 ,  2.16578349, -2.31892095],
       [ 0.32282304,  0.66212233,  0.30563256]])

In [85]:
row_means = arr.mean(1)

In [86]:
row_means

array([-0.02171145, -0.6474443 , -0.12788299,  0.43019264])

In [87]:
row_means.shape

(4,)

In [88]:
row_means.reshape((4,1))

array([[-0.02171145],
       [-0.6474443 ],
       [-0.12788299],
       [ 0.43019264]])

In [89]:
demeaned = arr-row_means.reshape((4,1))

In [90]:
demeaned

array([[-0.37773767,  0.36996728,  0.00777039],
       [-0.25315302,  0.69787328, -0.44472027],
       [-0.10262851,  2.29366648, -2.19103797],
       [-0.1073696 ,  0.23192969, -0.12456008]])

In [91]:
demeaned.mean(1)

array([-5.20417043e-18,  3.70074342e-17,  0.00000000e+00, -1.85037171e-17])

In [92]:
arr-arr.mean(1).reshape((4,1))

array([[-0.37773767,  0.36996728,  0.00777039],
       [-0.25315302,  0.69787328, -0.44472027],
       [-0.10262851,  2.29366648, -2.19103797],
       [-0.1073696 ,  0.23192969, -0.12456008]])

In [93]:
arr = np.zeros((4,4))

In [94]:
arr_3d = arr[:,np.newaxis,:]

In [95]:
arr_3d.shape

(4, 1, 4)

In [96]:
arr_3d

array([[[0., 0., 0., 0.]],

       [[0., 0., 0., 0.]],

       [[0., 0., 0., 0.]],

       [[0., 0., 0., 0.]]])

In [97]:
arr_1d = np.random.normal(size=3)

In [98]:
arr_1d

array([ 1.48883856, -1.03475736, -0.11668608])

In [99]:
arr_1d[:,np.newaxis]

array([[ 1.48883856],
       [-1.03475736],
       [-0.11668608]])

In [100]:
arr_1d[np.newaxis,:]

array([[ 1.48883856, -1.03475736, -0.11668608]])

In [101]:
arr = np.random.randn(3,4,5)

In [102]:
depth_means = arr.mean(2)

In [103]:
depth_means

array([[ 0.33501239,  0.22620632, -0.30178896, -0.16890987],
       [-0.14001324, -0.00210464, -0.0064693 , -0.150707  ],
       [-0.07509772,  0.30277673,  0.07868871,  0.08624097]])

In [104]:
depth_means.shape

(3, 4)

In [105]:
demeaned = arr - depth_means[:,:,np.newaxis]

In [106]:
demeaned.mean(2)

array([[ 0.00000000e+00,  0.00000000e+00, -3.33066907e-17,
         0.00000000e+00],
       [-1.11022302e-17, -4.44089210e-17,  0.00000000e+00,
         0.00000000e+00],
       [-2.22044605e-17,  3.33066907e-17,  2.22044605e-17,
         1.38777878e-17]])