<h1>NumPy</h1>

Import NumPy library into your notebook

In [1]:
import numpy as np

Create a one-dimensional array with data type of np.int32

In [2]:
arr = np.array([1,2,3,4,5], dtype = np.int32)
arr

array([1, 2, 3, 4, 5])

In [3]:
arr.dtype

dtype('int32')

Get itemsize and size of the array

In [4]:
print("Array size: ", arr.size)
print("Array item size: ", arr.itemsize)

Array size:  5
Array item size:  4


Calculate the total memory captured by this array in units of bytes and compare result with the value of nbytes attribute

In [5]:
print("Calculated total size: ", arr.size * arr.itemsize, "bytes")

Calculated total size:  20 bytes


In [6]:
print("Total size: ", arr.nbytes, "bytes")

Total size:  20 bytes


Change the data type to np.int16

In [7]:
arr = arr.astype(np.int16)

In [8]:
arr.dtype

dtype('int16')

Get itemsize, size, and total memory again. Did you see any change? If so, why?

In [9]:
print("Array size: ", arr.size)
print("Array item size: ", arr.itemsize)
print("Calculated total size: ", arr.size * arr.itemsize, "bytes")
print("Total size: ", arr.nbytes, "bytes")

Array size:  5
Array item size:  2
Calculated total size:  10 bytes
Total size:  10 bytes


In [10]:
# Because we decreased the memory captured by a single item, total memory will decrease so.

Create a two-dimensional array with optional shape

In [11]:
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]], dtype = np.int32)
print(arr)

[[ 1  2  3  4  5]
 [ 6  7  8  9 10]]


Get the shape of the array. 

In [12]:
arr.shape

(2, 5)

Multiply each element of the shape manually and compare the result with the value of size attribute

In [13]:
2 * 5

10

In [14]:
arr.size

10

Create the following array and do indexing to get the value of 9

![two%20dimensional%20array.png](attachment:two%20dimensional%20array.png)

In [15]:
arr = np.array([[4,1,6,5], [2,9,3,8]], dtype = np.int32)

arr[1,1]

9

Use slicing to get the following:

![sliced%20array2.png](attachment:sliced%20array2.png)

In [16]:
arr[1,1:3]

array([9, 3])

Change the order of first row reversely.
You should obtain the following:

![reversed_array-2.png](attachment:reversed_array-2.png)

In [17]:
arr[0] = arr[0,::-1]
arr

array([[5, 6, 1, 4],
       [2, 9, 3, 8]])

Replace the places of the second column and third column. You should get the following:

![replaced%20one%20dimensional%20array.png](attachment:replaced%20one%20dimensional%20array.png)

In [18]:
arr[:,[1,2]] = arr[:,[2,1]]
arr

array([[5, 1, 6, 4],
       [2, 3, 9, 8]])

Apply the following code in your own notebook

![random%20array%20code.png](attachment:random%20array%20code.png)

In [19]:
np.random.seed(42)
arr = np.random.randint(1, 20, size = (5,5))
print(arr)

[[ 7 15 11  8  7]
 [19 11 11  4  8]
 [ 3  2 12  6  2]
 [ 1 12 12 17 10]
 [16 15 15 19 12]]


Your task is to choose first, third, forth columns with first three rows from arr and get the following:

![sliced%20array-2.png](attachment:sliced%20array-2.png)

In [20]:
new_arr = arr[:3,[0,2,3]]
new_arr

array([[ 7, 11,  8],
       [19, 11,  4],
       [ 3, 12,  6]])

Replace places of first and third rows in the previous two-dimensional subarray:

![subarray.png](attachment:subarray.png)

In [21]:
new_arr[[0,2], :] = new_arr[[2,0], :]
new_arr

array([[ 3, 12,  6],
       [19, 11,  4],
       [ 7, 11,  8]])

Replace places of first and third columns in the previous two-dimensional subarray:

![subarray2.png](attachment:subarray2.png)

In [22]:
new_arr[:, [0,2]] = new_arr[:, [2,0]]
new_arr

array([[ 6, 12,  3],
       [ 4, 11, 19],
       [ 8, 11,  7]])

Obtain the diagonal of the matrix just by using slicing. Hint: You can reshape to 1D array. If you have another better solution with slicing, let us know!

In [23]:
new_arr = new_arr.reshape(-1)
new_arr[::4]

array([ 6, 11,  7])

Create a 2-dimensional NumPy array with dimensions 3x3, filled with zeros.

In [24]:
arr = np.zeros((3, 3))
print(arr)

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


Get the following array in NumPy. Hint: use zero matrix and replace proper indices with 1

![chessboard.png](attachment:chessboard.png)

In [25]:
chessboard = np.zeros((8, 8), dtype = np.int32)
chessboard[::2, ::2] = 1
chessboard[1::2, 1::2] = 1
print(chessboard)

[[1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1]]


Use NumPy slicing to extract a 2x2 sub-array from a 4x4 NumPy array. 


In [26]:
import numpy as np

array = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]])
sub_array = array[1:3, 1:3]
print(sub_array)

# It doesn't matter which subarray you choose. Just make sure if the shape matches.

[[ 6  7]
 [10 11]]


Create two NumPy arrays a and b, each with dimensions 2x3, filled with random integers between 0 and 9. Use NumPy's concatenate() function to concatenate the two arrays vertically.

In [27]:
import numpy as np

# Create two arrays with dimensions 2x3 filled with random integers between 0 and 9
a = np.random.randint(0, 10, size=(2, 3))
b = np.random.randint(0, 10, size=(2, 3))

# Use concatenate() function to concatenate the two arrays vertically
result = np.concatenate((a, b), axis=0)

print("Array a:")
print(a)
print("\nArray b:")
print(b)
print("\nResult after vertical concatenation:")
print(result)


Array a:
[[6 3 8]
 [2 4 2]]

Array b:
[[6 4 8]
 [6 1 3]]

Result after vertical concatenation:
[[6 3 8]
 [2 4 2]
 [6 4 8]
 [6 1 3]]


Create a 2-dimensional NumPy array with dimensions 4x4, filled with random integers between 0 and 9. <br>Calculate the mean, standard deviation, and variance of the array using NumPy's built-in functions.

<h3>Hint:</h3>

You can use the random.randint() function to create the random array, and the mean(), std(), and var() functions to calculate the mean, standard deviation, and variance, respectively.

In [28]:
import numpy as np

# Create a 2-dimensional array with dimensions 4x4 filled with random integers between 0 and 9
arr = np.random.randint(0, 10, size=(4, 4))

# Calculate the mean, standard deviation, and variance
mean = np.mean(arr)
std_dev = np.std(arr)
variance = np.var(arr)

# Print the original array and the calculated statistics
print("Array:")
print(arr)
print("Mean:", mean)
print("Standard deviation:", std_dev)
print("Variance:", variance)


Array:
[[8 1 9 8]
 [9 4 1 3]
 [6 7 2 0]
 [3 1 7 3]]
Mean: 4.5
Standard deviation: 3.0618621784789726
Variance: 9.375


<h1>Least Square Method</h1>

So, you will be given non-real dataset and your task is to: 

a) Read csv file with Pandas

In [29]:
import pandas as pd
df = pd.read_csv('../data/data.csv')

b) Convert it to numpy array

In [30]:
# You can also simply type "df.values". Check this one out as well!

arr = df.to_numpy()

c) Extract each column as separate variables, say "feature" and "target". Reshape them to (100, 1) so that they have 2 dimensions

In [31]:
feature, target = arr[:, 0], arr[:, 1]

feature = feature.reshape(100, 1)
target = target.reshape(100, 1)
print(feature.shape, target.shape, sep = '\n')

(100, 1)
(100, 1)


In [32]:
# # There is another better way without using reshape function.
# feature, target = arr[:, [0]], arr[:, [1]]
# print(feature.shape, target.shape, sep = '\n')
# # Check this one out as well!

d) Use numpy.linalg.lstsq method to estimate coefficient of relationship between feature and target data. The method will return 4 different outputs representing coefficient, residuals, rank, and singular values, respectively. You will need the first one

In [33]:
## Note: It is important to realize that the method always expects a 2-D array for only feature. 
##       That was the reason why we needed to reshape to (100, 1)
##       However, target array can be 1-D though.

coefficient, residuals, rank, singular = np.linalg.lstsq(feature, target, rcond = None)

e) Use the below formula to make some predictions

![formula.png](attachment:formula.png)

In [34]:
prediction = feature * coefficient
prediction.shape

(100, 1)

f) Now, you need to evaluate your results with the coefficient of determination method - R^2. Follow the steps shown in the Picture

![r2%20score.png](attachment:r2%20score.png)

PS: If you don't have sklearn library, simply type "pip install sklearn" in one of the emtpy cells in your notebook. In case you have problem, you can skip this part

In [35]:
from sklearn.metrics import r2_score
r2_score(target, prediction)

1.0

g) Try to write the following code yourself and compare the results with second output of numpy.linalg.lstsq. (PS: If you named different variable names, just write correspondingly)

![residuals.png](attachment:residuals.png)

In [36]:
## This exercise was aimed to show the method behind computation of residuals 

print("Calculated residuals from scratch:", np.sum((prediction - target) ** 2))
print("Actual residuals:", residuals[0])

Calculated residuals from scratch: 7.267801741789267e-14
Actual residuals: 7.267801869492718e-14
