<a href="https://colab.research.google.com/github/saffarizadeh/BUAN4061/blob/main/Numpy_and_Pandas_I.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src="http://saffarizadeh.com/Logo.png" width="300px"/>

# *BUAN 4061: Advanced Business Analytics*

# **Numpy and Pandas I**

Instructor: Dr. Kambiz Saffarizadeh

---

#Numpy

`numpy` is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays. If you are already familiar with MATLAB, you might find this tutorial useful to get started with Numpy.

## Arrays
A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension.

We can initialize numpy arrays from nested Python lists, and access elements using square brackets:

In [1]:
import numpy as np

In [2]:
a = np.array([1, 2, 3])   # Create a rank 1 array
print(type(a))            # Prints "<class 'numpy.ndarray'>"
print(a.shape)            # Prints "(3,)"
print(a[0], a[1], a[2])   # Prints "1 2 3"

<class 'numpy.ndarray'>
(3,)
1 2 3


In [3]:
a[0] = 5                  # Change an element of the array
print(a)                  # Prints "[5, 2, 3]"

[5 2 3]


In [4]:
b = np.array([[1,2,3],[4,5,6]])    # Create a rank 2 array
print(b)
print(b.shape)                     # Prints "(2, 3)"

[[1 2 3]
 [4 5 6]]
(2, 3)


In [5]:
print(b[0, 0], b[0, 1], b[1, 0])   # Prints "1 2 4"

1 2 4


### Array indexing
Numpy offers several ways to index into arrays.

Slicing: Similar to Python lists, numpy arrays can be sliced. Since arrays may be multidimensional, you must specify a slice for each dimension of the array:

Create the following rank 2 array with shape (3, 4)

[[ 1  2  3  4]

 [ 5  6  7  8]
 
 [ 9 10 11 12]]

In [6]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(a)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


Use slicing to pull out the subarray consisting of the first 2 rows and columns 1 and 2; b is the following array of shape (2, 2):

[[2 3]

 [6 7]]

In [7]:
b = a[:2, 1:3]

A slice of an array is a view into the same data, so modifying it will modify the original array.

In [8]:
print(a[0, 1])   # Prints "2"

2


In [9]:
b[0, 0] = 77     # b[0, 0] is the same piece of data as a[0, 1]

In [10]:
print(a[0, 1])   # Prints "77"

77


You can select a single row or column of an array too:

In [11]:
row_r1 = a[1, :]    # Rank 1 view of the second row of a
print(row_r1)

[5 6 7 8]


In [12]:
col_r1 = a[:, 2]    # Rank 1 view of the third column of a
print(col_r1)

[ 3  7 11]


Boolean array indexing: Boolean array indexing lets you pick out arbitrary elements of an array. Frequently this type of indexing is used to select the elements of an array that satisfy some condition. Here is an example:

In [13]:
a = np.array([[1,2], [3, 4], [5, 6]])
print(a)

[[1 2]
 [3 4]
 [5 6]]


In [14]:
bool_idx = (a > 2)   # Find the elements of a that are bigger than 2;
                     # this returns a numpy array of Booleans of the same
                     # shape as a, where each slot of bool_idx tells
                     # whether that element of a is > 2.
print(bool_idx)

[[False False]
 [ True  True]
 [ True  True]]


We use boolean array indexing to construct a rank 1 array consisting of the elements of a corresponding to the True values of bool_idx

In [15]:
print(a[bool_idx])

[3 4 5 6]


We can do all of the above in a single concise statement:

In [16]:
print(a[a > 2])     # Prints "[3 4 5 6]"

[3 4 5 6]


## Numpy Datatypes

https://numpy.org/doc/stable/reference/arrays.dtypes.html

Every numpy array is a grid of elements of the same type. Numpy provides a large set of numeric datatypes that you can use to construct arrays. Numpy tries to guess a datatype when you create an array, but functions that construct arrays usually also include an optional argument to explicitly specify the datatype. Here is an example:

In [17]:
x = np.array([1, 2])   # Let numpy choose the datatype
print(x.dtype)         # Prints "int64"

x = np.array([1.0, 2.0])   # Let numpy choose the datatype
print(x.dtype)             # Prints "float64"

x = np.array([1, 2], dtype=np.int64)   # Force a particular datatype
print(x.dtype)

int64
float64
int64


## Numpy Methods

### Min and Max
https://numpy.org/doc/stable/reference/generated/numpy.amin.html

https://numpy.org/doc/stable/reference/generated/numpy.amax.html

You can find `min` and `max` values in an array.

In [18]:
b = np.array([12, 4, 1.2, 2.5, 5, 14])

In [19]:
b.min()

1.2

In [20]:
b.max()

14.0

In [21]:
# or you can use np itself
np.amin(b)

1.2

In [22]:
np.amax(b)

14.0

You can also find the index of the `min` and `max` values.

In [23]:
np.argmin(b)

2

In [24]:
np.argmax(b)

5

### Mean and Standard Deviation

https://numpy.org/doc/stable/reference/generated/numpy.mean.html

https://numpy.org/doc/stable/reference/generated/numpy.std.html

In [25]:
c = np.array([[1,2], [3, 4], [5, 6]])
c

array([[1, 2],
       [3, 4],
       [5, 6]])

In [26]:
c.mean()

3.5

In [27]:
c.mean(axis=0) # mean of c along axis 0

array([3., 4.])

In [28]:
c.mean(axis=1) # mean of c along axis 1

array([1.5, 3.5, 5.5])

In [29]:
c.std()

1.707825127659933

In [30]:
c.std(axis=0)

array([1.63299316, 1.63299316])

In [31]:
c.std(axis=1)

array([0.5, 0.5, 0.5])

### NAN

To specify `not a number` value, which is often used represent missing data, you can use `np.nan`

https://numpy.org/doc/stable/reference/constants.html#numpy.nan

In [32]:
np.nan

nan

### arange

https://numpy.org/doc/stable/reference/generated/numpy.arange.html

In [33]:
np.arange(6)

array([0, 1, 2, 3, 4, 5])

### Linespace

https://numpy.org/doc/stable/reference/generated/numpy.linspace.html

In [34]:
np.linspace(0,1,5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [35]:
np.linspace(0,1,5, endpoint=False)

array([0. , 0.2, 0.4, 0.6, 0.8])

### Repeat

https://numpy.org/doc/stable/reference/generated/numpy.repeat.html

In [36]:
np.repeat(0,5)

array([0, 0, 0, 0, 0])

### Zeros and Ones

https://numpy.org/doc/stable/reference/generated/numpy.zeros.html

https://numpy.org/doc/stable/reference/generated/numpy.zeros_like.html

https://numpy.org/doc/stable/reference/generated/numpy.ones.html

https://numpy.org/doc/stable/reference/generated/numpy.ones_like.html

In [37]:
np.zeros((2,3))

array([[0., 0., 0.],
       [0., 0., 0.]])

In [38]:
np.ones((4,2))

array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])

In [39]:
np.zeros_like(a) # returns an array of the same shape as the input array with elements replaced by zero

array([[0, 0],
       [0, 0],
       [0, 0]])

In [40]:
np.ones_like(a) # returns an array of the same shape as the input array with elements replaced by one

array([[1, 1],
       [1, 1],
       [1, 1]])

### Reshape

https://numpy.org/doc/stable/reference/generated/numpy.reshape.html

In [41]:
a = np.arange(6)
a

array([0, 1, 2, 3, 4, 5])

In [42]:
a_reshaped = a.reshape(3,2)
a_reshaped

array([[0, 1],
       [2, 3],
       [4, 5]])

There is a special case of reshaping that are very useful in deep learning: flattening an array to create a long 1-dimensional array

In [43]:
a_reshaped.reshape(1,-1) # one row

array([[0, 1, 2, 3, 4, 5]])

In [44]:
a_reshaped.reshape(-1,1) # one column

array([[0],
       [1],
       [2],
       [3],
       [4],
       [5]])

There is another special case of reshaping that is also useful in deep learning: adding a new axis to an array.

In [45]:
b = np.arange(6).reshape(3,2)
b

array([[0, 1],
       [2, 3],
       [4, 5]])

In [46]:
b.shape

(3, 2)

In [47]:
reshaped_b = b.reshape((1,3,2))
reshaped_b

array([[[0, 1],
        [2, 3],
        [4, 5]]])

In [48]:
reshaped_b.shape

(1, 3, 2)

We can also achieve the same result using the following code:

In [49]:
reshaped_b_2 = b[np.newaxis, :]
reshaped_b_2

array([[[0, 1],
        [2, 3],
        [4, 5]]])

In [50]:
reshaped_b_2.shape

(1, 3, 2)

### Squeeze

https://numpy.org/doc/stable/reference/generated/numpy.squeeze.html

To remove extra dimensions that don't add anything to our array you can use `squeeze`

In [51]:
squeezed_reshaped_b_2 = reshaped_b_2.squeeze()
squeezed_reshaped_b_2

array([[0, 1],
       [2, 3],
       [4, 5]])

In [52]:
squeezed_reshaped_b_2.shape

(3, 2)

### Random and Distributions

You can use numpy to generate random numbers.

https://numpy.org/doc/stable/reference/random/index.html

In [53]:
np.random.random()

0.19575776423682334

In [54]:
np.random.rand()

0.6214867091969837

In [55]:
np.random.rand()

0.4557305892772239

In [56]:
np.random.randint(1,10)

6

In [57]:
b = np.array([12, 4, 1.2, 2.5, 5, 14])
np.random.choice(b)

4.0

You can specify the number of draws from a list.

In [58]:
np.random.choice(b, 2)

array([ 1.2, 12. ])

You can assign a probability to each item in a list.

In [59]:
np.random.choice([12, 4, 1.2, 2.5, 5, 14], p=[0.1, 0.1, 0.1, 0.5, 0.05, 0.15])

2.5

#### Draw from `normal` distribution

Check this tool to learn more about this distribution: https://homepage.divms.uiowa.edu/~mbognar/applets/normal.html

In [60]:
np.random.normal()

-0.5759024190947967

#### Draw from `binomial` distribution

Check this tool to learn more about this distribution: https://homepage.divms.uiowa.edu/~mbognar/applets/binnormal.html

In [61]:
np.random.binomial(2000, 0.2)

386