# **Data Analysis with Python - 1  (23 Apr 22)**

## **Pre-Class**

### Introduction

In [4]:
import numpy as np
a = np.array([1, 2, 3])
a

array([1, 2, 3])

3 main benefits of Numpy array over a python list:
- Less memory
- Fast
- Convenient

In [2]:
a[0]

1

In [3]:
a[1]

2

In [5]:
list1 = range(1000)
print(sys.getsizeof(5)*len(list1))

array = np.arange(1000)
print(array.size * array.itemsize)

28000
4000


In [6]:
np.__version__

'1.20.3'

### Creating NumPy arrays using built-in methods

`.arange` Returns an ndarray object containing evenly spaced values within a given range. 

In [7]:
np.arange(0, 50, 5)

array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45])

In [8]:
type(np.arange(0, 50, 5))

numpy.ndarray

`.linspace` Returns the number of evenly spaced values between the interval is specified

In [11]:
# equal pieces
np.linspace(0, 30, 5)

array([ 0. ,  7.5, 15. , 22.5, 30. ])

`.random` Returns random floats in the half-open interval [0.0, 1.0).

In [24]:
np.random.seed(5) # to get the same results everytime
np.random.rand(3)

array([0.22199317, 0.87073231, 0.20671916])

`.zeros` Returns a new array of specified size, filled with zero 

In [25]:
np.zeros(7, dtype=int)

array([0, 0, 0, 0, 0, 0, 0])

`.ones` Returns a new array of specified size and type, filled with ones

In [29]:
np.ones(5) # default data type is float

array([1., 1., 1., 1., 1.])

### Array attributes and methods

`.shape`  the shape tuple give the lengths of the corresponding array dimensions.

In [32]:
arr = np.arange(0, 50, 10)
print(arr)
arr.shape

[ 0 10 20 30 40]


(5,)

`.reshape` Gives a new shape to an array without changing its data.

In [36]:
arr.reshape(1, 5).shape

(1, 5)

`.max`  Return the maximum along a given axis.

In [38]:
np.random.seed(7)
randarr = np.random.randint(0, 20, 6)

randarr

array([15,  4,  3, 19,  7, 14])

In [39]:
randarr.max()

19

`.min` Return the minimum along a given axis.

In [40]:
randarr.min()

3

`.argmax`  Return the **index** of the **maximum** element along a given axis

`.argmin` Return the **index** of the **minimum** element along a given axis 

In [44]:
print(randarr)

print(randarr.argmax())

[15  4  3 19  7 14]
3


In [42]:
randarr.argmin()

2

### The Basic - Ndarray

`.ndim`  the number of axes (dimensions) of the array.

In [45]:
np.random.seed(17)
t = np.random.randint(10, size=10) 

# random.randint(low, high=None, size=None, dtype=int)
# Return random integers from low (inclusive) to high (exclusive).
# Return random integers from the “discrete uniform” distribution of the specified dtype in the “half-open” interval [low, high). If high is None (the default), then results are from [0, low).

t

array([1, 6, 6, 9, 0, 6, 4, 7, 4, 7])

In [47]:
# rows and columns = 2 dimensions
t.ndim

1

`.shape`  the dimensions  of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the number of axes, ndim.

In [49]:
u = np.random.randint(13, size=(4, 9))
u

array([[ 1, 12,  9,  8,  3, 11,  2, 11,  7],
       [11, 10, 11,  8,  9, 10,  2,  1,  8],
       [ 1,  8,  0, 12,  4,  5, 10,  4,  4],
       [11,  3,  3,  3,  7,  8,  5, 12,  1]])

In [50]:
u.shape

(4, 9)

`.size` the total number of elements of the array. This is equal to the product of the elements of shape.

In [52]:
u.size # equals to 4*9 (products of u.shape)

36

`.dtype` an object describing the type of the elements in the array.

In [53]:
u.dtype

dtype('int32')

`.itemsize` the size in bytes of each element of the array. It is equivalent to  .dtype.itemsize.

In [54]:
u.itemsize

4

In [55]:
u.dtype.itemsize

4

`.concatenate` Join a sequence of arrays along an existing axis.

In [57]:
x = np.array([2, 4, 6])
y = np.array([1, 3, 5])

In [58]:
x

array([2, 4, 6])

In [59]:
y

array([1, 3, 5])

In [60]:
np.concatenate([y, x])

array([1, 3, 5, 2, 4, 6])

In [62]:
np.concatenate([x, y]) # be aware of the sequence

array([2, 4, 6, 1, 3, 5])

`.split` Split an array into multiple sub-arrays as views into *ary*

numpy.split(ary, indices_or_sections, axis=0)

If indices_or_sections is a 1-D array of sorted integers, the entries indicate where along axis the array is split. For example, [2, 3] would, for axis=0, result in

        ary[:2]

        ary[2:3]

        ary[3:]



In [63]:
k = np.array([1, 2, 3, 99, 99, 3, 2, 1])
k

array([ 1,  2,  3, 99, 99,  3,  2,  1])

In [64]:
np.split(k, [3, 5])

[array([1, 2, 3]), array([99, 99]), array([3, 2, 1])]

In [68]:
np.split(k, [4, 4])

[array([ 1,  2,  3, 99]), array([], dtype=int32), array([99,  3,  2,  1])]

In [67]:
np.split(k, 2)

[array([ 1,  2,  3, 99]), array([99,  3,  2,  1])]

`.sort` Return a sorted copy of an array.

In [70]:
v = np.random.randint(9, size=5)
v

array([4, 7, 3, 3, 1])

In [76]:
np.sort(v)

array([1, 3, 3, 4, 7])

In [72]:
v

array([4, 7, 3, 3, 1])

In [74]:
v.sort() # changed in-place
v

array([1, 3, 3, 4, 7])

### NumPy Array Indexing and Selection

In [77]:
p = np.array([5, 7, 9])
p[0:2]

array([5, 7])

In [78]:
p[-1]

9

In [80]:
g = np.array([[6, 7, 8], [1, 2, 3], [9, 3, 2]])
g

array([[6, 7, 8],
       [1, 2, 3],
       [9, 3, 2]])

In [81]:
g[1, 2] # row 1, column 2

3

In [84]:
g[0:2, 2 ] # from first two rows, return second column

array([8, 3])

In [86]:
g[-1] # returns last element

array([9, 3, 2])

In [87]:
g[-1, 0:2]

array([9, 3])

In [89]:
g[:, 1:]

array([[7, 8],
       [2, 3],
       [3, 2]])

Iterate through arrays

In [90]:
g = np.array([[6, 7, 8], [1, 2, 3], [9, 3, 2]])
g

array([[6, 7, 8],
       [1, 2, 3],
       [9, 3, 2]])

In [93]:
for row in g:
    print(row)

[6 7 8]
[1 2 3]
[9 3 2]


In [94]:
for cell in g.flat:
    print(cell)

6
7
8
1
2
3
9
3
2


## **In-Class (23 Apr 22)**

See DAwPy_S1