# Numpy Basics

In [13]:
import numpy as np

In [14]:
np.__name__

'numpy'

In [15]:
np.__version__

'1.19.2'

In [16]:
np.__doc__

'\nNumPy\n=====\n\nProvides\n  1. An array object of arbitrary homogeneous items\n  2. Fast mathematical operations over arrays\n  3. Linear Algebra, Fourier Transforms, Random Number Generation\n\nHow to use the documentation\n----------------------------\nDocumentation is available in two forms: docstrings provided\nwith the code, and a loose standing reference guide, available from\n`the NumPy homepage <https://www.scipy.org>`_.\n\nWe recommend exploring the docstrings using\n`IPython <https://ipython.org>`_, an advanced Python shell with\nTAB-completion and introspection capabilities.  See below for further\ninstructions.\n\nThe docstring examples assume that `numpy` has been imported as `np`::\n\n  >>> import numpy as np\n\nCode snippets are indicated by three greater-than signs::\n\n  >>> x = 42\n  >>> x = x + 1\n\nUse the built-in ``help`` function to view a function\'s docstring::\n\n  >>> help(np.sort)\n  ... # doctest: +SKIP\n\nFor some objects, ``np.info(obj)`` may provide a

A single integer in Python 3.4 actually contains four pieces:
* ob_refcnt,  a  reference  count  that  helps  Python  silently  handle  memory  alloca‐tion and deallocation
* ob_type, which encodes the type of the variable
* ob_size, which specifies the size of the following data members
* ob_digit, which contains the actual integer value that we expect the Python vari‐able to represent

PyObject_HEAD is the part of the structure containing the reference count, type code, and other pieces mentioned before.

a C integer is essentially a label for a position in memory whose bytes encode an integer value. A Python integer is a pointer to a position in memory containing all the Python object information, including the bytes that con‐ tain the integer value. This extra information in the Python integer structure is what allows Python to be coded so freely and dynamically. All this additional information in Python types comes at a cost, however, which becomes especially apparent in structures that combine many of these objects.

At the implementation level, the array essentially contains a single pointer to one con‐ tiguous block of data. The Python list, on the other hand, contains a pointer to a block of pointers, each of which in turn points to a full Python object like the Python integer we saw earlier. Again, the advantage of the list is flexibility: because each list element is a full structure containing both data and type information, the list can be filled with data of any desired type. Fixed-type NumPy-style arrays lack this flexibil‐ ity, but are much more efficient for storing and manipulating data.

### list and arrays

In [17]:
a = [3,4,5] #list

In [18]:
id(a)

140341552795072

In [19]:
b = ['4',7.4,8] #list can be of mulitiple elments
# as element is a separted python object and list contaings multiple pointers of different types of data types

In [20]:
id(b) 

140343162180864

In [21]:
c = np.array([a,b])

In [22]:
c # a numpy array only contains only one type of data type type pointer
# therefore it is faster than the list manipulation and bit less flexibile
# when an array of mulitple list of different kinds of data types numpy converts to highest data type (here strings)

array([['3', '4', '5'],
       ['4', '7.4', '8']], dtype='<U21')

In [23]:
a[0] = 9 # 1st element is altered

In [24]:
a

[9, 4, 5]

In [25]:
c #numpy array does not change

array([['3', '4', '5'],
       ['4', '7.4', '8']], dtype='<U21')

In [26]:
c[0] = 5

In [27]:
a

[9, 4, 5]

In [28]:
c = np.array([a,[3.3,5,6]])

In [29]:
c

array([[9. , 4. , 5. ],
       [3.3, 5. , 6. ]])

In [30]:
d = np.array(a)

In [31]:
d.shape

(3,)

In [32]:
a[0] = 0

In [33]:
d

array([9, 4, 5])

### creating array using python lists

numpy is buit over built-in package `array`

the ndarray object of the NumPy package. While Python’s array object provides efficient storage of array-based data, NumPy adds to this efficient operations on that data. We will explore these operations in later sec‐ tions; here we’ll demonstrate several ways of creating a NumPy array.

In [34]:
np.array([1, 4, 2, 5, 3])

array([1, 4, 2, 5, 3])

In [35]:
np.array([3.14, 4, 2, 3])

array([3.14, 4.  , 2.  , 3.  ])

In [36]:
 np.array([1, 2, 3, 4], dtype='float32')

array([1., 2., 3., 4.], dtype=float32)

In [37]:
 np.array([1, 2, 3, 4], dtype=str)

array(['1', '2', '3', '4'], dtype='<U1')

In [38]:
 np.array([1, 2, 3, 4], dtype='str')

array(['1', '2', '3', '4'], dtype='<U1')

In [39]:
np.array([range(i, i + 3) for i in [2, 4, 6]])

array([[2, 3, 4],
       [4, 5, 6],
       [6, 7, 8]])

### Creating Arrays from Scratch

In [40]:
# Create a length-10 integer array filled with zeros
np.zeros([10,10], dtype=int)

array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

In [41]:
 # Create a 3x5 floating-point array filled with 
np.ones([3, 5], dtype=float)

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [42]:
# Create a 3x5 array filled with 3.14 
np.full((3, 5), 3.14)

array([[3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14]])

In [43]:
np.full((3, 5), np.pi)

array([[3.14159265, 3.14159265, 3.14159265, 3.14159265, 3.14159265],
       [3.14159265, 3.14159265, 3.14159265, 3.14159265, 3.14159265],
       [3.14159265, 3.14159265, 3.14159265, 3.14159265, 3.14159265]])

In [44]:
# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function)
np.arange(0, 20, 2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [45]:
np.arange(0, 20, 2).reshape(2,5)

array([[ 0,  2,  4,  6,  8],
       [10, 12, 14, 16, 18]])

In [46]:
# Create an array of five values evenly spaced between 0 and 1 
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [47]:
np.random.random((3, 3))

array([[0.547242  , 0.92875906, 0.68380791],
       [0.36860524, 0.74847228, 0.00200374],
       [0.88786353, 0.36829402, 0.15570853]])

In [48]:
# Create a 3x3 array of normally distributed 
#random values 
# with mean 0 and standard deviation 1 
np.random.normal(0, 1, (3, 3))

array([[-0.80582124,  1.30216285,  0.26691602],
       [-0.546633  ,  0.48320273, -0.17183117],
       [-0.38437868, -1.71684094, -0.03122701]])

In [49]:
# Create a 3x3 array of random integers 
#in the interval [0, 10) 
np.random.randint(0, 10, (3, 3))

array([[1, 8, 3],
       [0, 6, 0],
       [9, 8, 2]])

In [50]:
np.random.seed()

In [51]:
# Create a 3x3 identity matrix 
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [52]:
# Create an uninitialized array of three integers
# The values will be whatever happens to already exist at that # memory location
np.empty(3)

array([1., 1., 1.])

In [53]:
np.empty((3,3))

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [54]:
np.empty([3,3])

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

NumPy arrays contain values of a single type, so it is important to have detailed knowledge of those types and their limitations. Because NumPy is built in C, the types will be familiar to users of C, Fortran, and other related languages.

In [55]:
np.diag([5])

array([[5]])

<insert figure>

In [56]:
A = np.random.randn(4,4)
A

array([[-2.46761746, -1.23199465,  2.52767526, -0.87525496],
       [ 0.26484056, -1.19916063,  1.91655928, -0.30702834],
       [ 0.61255439, -1.08275293,  1.47066464,  1.04540809],
       [-0.29947627, -0.40535738, -0.04029403, -0.33097644]])

In [57]:
dia = np.diag(A)
dia

array([-2.46761746, -1.19916063,  1.47066464, -0.33097644])

In [58]:
U = np.triu(A)
U

array([[-2.46761746, -1.23199465,  2.52767526, -0.87525496],
       [ 0.        , -1.19916063,  1.91655928, -0.30702834],
       [ 0.        ,  0.        ,  1.47066464,  1.04540809],
       [ 0.        ,  0.        ,  0.        , -0.33097644]])

In [59]:
L = np.triu(A)
L

array([[-2.46761746, -1.23199465,  2.52767526, -0.87525496],
       [ 0.        , -1.19916063,  1.91655928, -0.30702834],
       [ 0.        ,  0.        ,  1.47066464,  1.04540809],
       [ 0.        ,  0.        ,  0.        , -0.33097644]])

$insert figure$

1. __Attributes of arrays__
_Determining the size, shape, memory consumption, and data types of arrays_

2. __Indexing of arrays__
_Getting and setting the value of individual array elements_

3. __Slicing of arrays__
_Getting and setting smaller subarrays within a larger array_

4. __Reshaping of arrays__
_Changing the shape of a given array
Joining and splitting of arrays
Combining multiple arrays into one, and splitting one array into many_


### 1. __Attributes of arrays__

In [60]:
np.random.seed() 

In [61]:
np.random.seed(0) # seed for reproducibility

In [133]:
x1 = np.random.randint(10, size=6) # One-dimensional array
x2 = np.random.randint(10, size=(3, 4)) # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5)) # Three-dimensional array

In [63]:
print("x3 ndim: ", x3.ndim) 
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)

x3 ndim:  3
x3 shape: (3, 4, 5)
x3 size:  60


In [64]:
print("x3 ndim: ", x1.ndim) 
print("x3 shape:", x1.shape)
print("x3 size: ", x1.size)

x3 ndim:  1
x3 shape: (6,)
x3 size:  6


In [65]:
x1

array([5, 0, 3, 3, 7, 9])

In [66]:
 print("dtype:", x3.dtype)

dtype: int64


Other attributes include itemsize,
which lists the size (in bytes) of each array 
element, and nbytes, which lists the total size
(in bytes) of the array:

In [67]:
print("itemsize:", x3.itemsize, "bytes")
print("nbytes:", x3.nbytes, "bytes")

itemsize: 8 bytes
nbytes: 480 bytes


In general, we expect that nbytes is equal to itemsize times size.

### 2. __Indexing of arrays__

In [68]:
x1

array([5, 0, 3, 3, 7, 9])

In [69]:
x1[0]

5

In [70]:
x1[-1]

9

In [71]:
x2

array([[3, 5, 2, 4],
       [7, 6, 8, 8],
       [1, 6, 7, 7]])

In [72]:
x2[0, 0]

3

In [73]:
x2[0]

array([3, 5, 2, 4])

In [74]:
x2[0][0]

3

In [75]:
x2[2, -1]

7

In [76]:
x2[2] [-1]

7

In [77]:
#can also modify values using any of the above index notation:

In [78]:
x2[0, 0] = 12
x2

array([[12,  5,  2,  4],
       [ 7,  6,  8,  8],
       [ 1,  6,  7,  7]])

In [79]:
x1[0] = 3.14159 # this will be truncated! x1

In [80]:
x1

array([3, 0, 3, 3, 7, 9])

### 3. __Slicing of arrays__

__syntax:__
    
$$<x>[start:stop:step]$$

If any of these are unspecified, they default to the values start=0, stop=size of dimension, step=1


$<x>: $ numpy array name 



In [81]:
x = np.arange(10) 

In [82]:
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [83]:
x[:5] # first five elements

array([0, 1, 2, 3, 4])

In [84]:
 x[5:] # elements after index 5

array([5, 6, 7, 8, 9])

In [85]:
 x[4:7] # middle subarray

array([4, 5, 6])

In [86]:
x[::2] # every other element

array([0, 2, 4, 6, 8])

In [87]:
x[1::2] # every other element, starting at index 1

array([1, 3, 5, 7, 9])

In [88]:
x[::-1] # all elements, reversed

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

In [89]:
x[5::-2] # reversed every other from index 5

array([5, 3, 1])

#### Multidimensional subarrays

In [134]:
x2

array([[3, 7, 5, 5],
       [0, 1, 5, 9],
       [3, 0, 5, 0]])

In [142]:
y = x2[:2, :3].copy()

In [143]:
y 

array([[10000,     7,     5],
       [    0,     1,     5]])

In [144]:
y[0,0] = 2000

In [138]:
x2

array([[10000,     7,     5,     5],
       [    0,     1,     5,     9],
       [    3,     0,     5,     0]])

In [92]:
x2[:3, ::2] # all rows, every other column


array([[12,  2],
       [ 7,  8],
       [ 1,  7]])

In [93]:
x2[::-1, ::-1]

array([[ 7,  7,  6,  1],
       [ 8,  8,  6,  7],
       [ 4,  2,  5, 12]])

### Accessing array rows and columns

One commonly needed routine is accessing single rows or columns of an array. You can do this by combining indexing and slicing, using an empty slice marked by a single colon (:)

In [94]:
print(x2[0, :]) # first row of x2


[12  5  2  4]


In [95]:
print(x2[:, 0]) # first column of x2

[12  7  1]


In the case of row access, the empty slice can be omitted for a more compact syntax:

In [96]:
print(x2[0]) # equivalent to x2[0, :]

[12  5  2  4]


#### Subarrays as no-copy views

array slices is that they return views rather than copies of the array data. This is one area in which NumPy array slicing differs from Python list slicing: in lists, slices will be copies. Consider our two-dimensional array from before:

In [97]:
print(x2)

[[12  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]


In [98]:
x2_sub = x2[:2, :2] 
print(x2_sub)

[[12  5]
 [ 7  6]]


In [99]:
x2_sub[0, 0] = 99 
print(x2_sub)

[[99  5]
 [ 7  6]]


In [100]:
print(x2)

[[99  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]


This default behavior is actually quite useful: it means that when we work with large datasets, we can access and process pieces of these datasets without the need to copy the underlying data buffer.

#### __Creating copies of arrays__

In [101]:
x2_sub_copy = x2[:2, :2].copy() 
print(x2_sub_copy)

[[99  5]
 [ 7  6]]


In [102]:
x2_sub_copy[0, 0] = 42 
print(x2_sub_copy)

[[42  5]
 [ 7  6]]


In [103]:
print(x2)


[[99  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]


### 3. __Reshaping of arrays__

In [104]:
grid = np.arange(1, 10).reshape((3, 3)) 
print(grid)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


for this to work, the size of the initial array must match the size of the reshaped array. Where possible, the reshape method will use a no-copy view of the initial array, but with noncontiguous memory buffers this is not always the case.


common reshaping pattern is the conversion of a one-dimensional array into a two-dimensional row or column matrix. You can do this with the reshape method, or more easily by making use of the newaxis keyword within a slice operation:

In [105]:
x = np.array([1, 2, 3])

In [106]:
x.ndim

1

In [107]:
x.shape

(3,)

In [108]:
x.size

3

In [109]:
# row vector via reshape
y = x.reshape((1, 3))
y

array([[1, 2, 3]])

In [110]:
y.shape

(1, 3)

In [111]:
# row vector via newaxis 
x[np.newaxis, :].ndim

2

In [112]:
# column vector via reshape
x.reshape((3, 1))

array([[1],
       [2],
       [3]])

In [113]:
# column vector via newaxis
x[:, np.newaxis]

array([[1],
       [2],
       [3]])

In [114]:
x

array([1, 2, 3])

#### Array Concatenation

In [115]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])

array([1, 2, 3, 3, 2, 1])

In [116]:
np.concatenate([[x], [y]])

array([[1, 2, 3],
       [3, 2, 1]])

In [117]:
np.concatenate([[x, y]])

array([[1, 2, 3],
       [3, 2, 1]])

In [118]:
np.concatenate([[x.T], [y.T]])

array([[1, 2, 3],
       [3, 2, 1]])

In [119]:
z = [99, 99, 99] 
print(np.concatenate([x, y, z]))

[ 1  2  3  3  2  1 99 99 99]


In [120]:
grid = np.array([[1, 2, 3], [4, 5, 6]])
grid

array([[1, 2, 3],
       [4, 5, 6]])

In [121]:
# concatenate along the first axis 
np.concatenate([grid, grid])


array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

In [122]:
np.concatenate([grid, grid], axis=1)

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

For working with arrays of mixed dimensions, it can be clearer to use the np.vstack (vertical stack) and np.hstack (horizontal stack) functions:

In [123]:
x = np.array([1, 2, 3]) 
grid = np.array([[9, 8, 7],[6, 5, 4]])

In [124]:
np.vstack([x, grid])

array([[1, 2, 3],
       [9, 8, 7],
       [6, 5, 4]])

In [125]:
 # horizontally stack the arrays 
y = np.array([[99],[99]]) 
np.hstack([grid, y])

array([[ 9,  8,  7, 99],
       [ 6,  5,  4, 99]])

Similarly, np.dstack will stack arrays along the third axis

 some shape dimens should be same for concatenation

#### Spliting of arrays

`np.split`, `np.hsplit`, and `np.vsplit` 

In [126]:
x=[1,2,3,99,99,3,2,1] 
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)

[1 2 3] [99 99] [3 2 1]


Notice that N split points lead to N + 1 subarrays. The related functions np.hsplit and np.vsplit are similar:

In [127]:
grid = np.arange(16).reshape((4, 4)) 
grid

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [128]:
upper, mid,lower = np.vsplit(grid, [1,3])
print(upper)
print(mid)
print(lower)

[[0 1 2 3]]
[[ 4  5  6  7]
 [ 8  9 10 11]]
[[12 13 14 15]]


In [129]:
left, right = np.hsplit(grid, [2]) 
print(left)
print(right)

[[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]]
[[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]]


Similarly, `np.dsplit` will split arrays along the third axis.

In [130]:
grid[grid>5]

array([ 6,  7,  8,  9, 10, 11, 12, 13, 14, 15])