![numpy logo](numpy.png)

# <center>Activity 2
<center>Prepared by: Princess Nicole Oriola</center>
<center>BSCS Data Science III</center>

### Introduction
In this activity, I explore various methods and operations using **NumPy**, a powerful library for numerical computing in Python. The goal is to demonstrate array creation, perform a range of NumPy operations, and apply statistical functions to analyze data. For this project, I selected the "Simple Rainfall Classification Dataset," which contains key meteorological variables such as rainfall, temperature, humidity, and wind speed. This dataset was chosen for its simplicity and relevance, making it suitable for exploring statistical relationships.

The dataset used in this notebook is sourced from Kaggle: https://www.kaggle.com/datasets/sujithmandala/simple-rainfall-classification-dataset

### I. Demonstrate various methods for creating NumPy arrays.


import numpy as np

In [40]:
# creating a 1D NumPy array with 5 elements
array_1d = np.array([1, 2, 3, 4, 5])
print("1D Array:", array_1d)

1D Array: [1 2 3 4 5]


In [42]:
# creating a 2D Numpy array (matrix)
array_2d = np.array([[1, 2, 3,],
                     [4, 5, 6],
                     [7, 8, 9]])
print("2D Array:\n", array_2d)

2D Array:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]


In [44]:
# creating an array of zeros (shape of 4x4)
zeros_array = np.zeros((4, 4))
print("Array of zeros:\n", zeros_array)

Array of zeros:
 [[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]


In [46]:
# creating an array of ones (3, 5)
ones_array = np.ones((3, 5))
print("Array of ones:\n", ones_array)

Array of ones:
 [[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]


In [48]:
# creating an array with values ranging from 0 to 9
range_array = np.arange(10)
print("Array with a range of values:", range_array)

Array with a range of values: [0 1 2 3 4 5 6 7 8 9]


In [50]:
# creating an array with 5 evenly spaced values between 0 and a
linspace_array = np.linspace(0, 1, 5)
print("Array with evenly spaced values:", linspace_array)

Array with evenly spaced values: [0.   0.25 0.5  0.75 1.  ]


In [54]:
# creating a 4x4 identity matrix (diagonal is 1s and the rest are 0s)
identity_matrix = np.eye(4)
print("Identity Matrix:\n", identity_matrix)

Identity Matrix:
 [[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]


In [66]:
# creating a 2x3 array of random values between 0 and 1
random_array = np.random.rand(2, 3)
print("Random array:\n", random_array)

Random array:
 [[0.49655959 0.26530973 0.41730127]
 [0.78079064 0.21147845 0.12843992]]


### II. Perform a range of NumPy operations.


- **Basic Arithmetic Operations** (applied element-wise)


In [72]:
array_a = np.array([1, 2, 3])
array_b = np.array([4, 5, 6])

In [74]:
# adding two arrays
sum_array = array_a + array_b
print("Sum:", sum_array)

Sum: [5 7 9]


In [82]:
# subtracting two arrays
difference_array = array_b - array_a
print("Difference:", difference_array)

Difference: [3 3 3]


In [84]:
# multiplying two arrays
product_array = array_a * array_b
print("Product:", product_array)

Product: [ 4 10 18]


In [86]:
# dividing two arrays
div_array = array_b / array_a
print("Division:", div_array)

Division: [4.  2.5 2. ]


- **Broadcasting** (allows NumPy to perform operations between arrays of different shapes)

In [99]:
# adding a scalar to an array
broadcast_array = array_a + 100
print("Broadcasting result (add 100 to each element):", broadcast_array)

Broadcasting result (add 100 to each element): [101 102 103]


- **Aggregation Functions**

In [104]:
array_2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [106]:
# sum of all elements
total_sum = np.sum(array_2d)
print("Sum of all elements:", total_sum)

Sum of all elements: 45


In [110]:
# mean of elements
mean_value = np.mean(array_2d)
print("Mean of elements:", mean_value)

Mean of elements: 5.0


In [112]:
# minimum and maximum values
min_value = np.min(array_2d)
max_value = np.max(array_2d)

print("Minimum value:", min_value)
print("Maximum value:", max_value)

Minimum value: 1
Maximum value: 9


- **Axis-based Operations** (allows to perform operations along a specific dimension)

In [121]:
array_2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [119]:
# sum along the columns
sum_cols = np.sum(array_2d, axis=0)
print("Sum of each column:", sum_cols)

Sum of each column: [12 15 18]


In [123]:
# sum along the rows
sum_rows = np.sum(array_2d, axis=1)
print("Sum of each row:", sum_rows)

Sum of each row: [ 6 15 24]


- **Sorting an Array**

In [128]:
# sorting a 1D array
unsorted_array = np.array([4, 1, 8, 3, 5])
sorted_array = np.sort(unsorted_array)
print("Sorted array:", sorted_array)

Sorted array: [1 3 4 5 8]


In [132]:
# sorting along an axis in a 2D array
unsorted_array_2d = np.array([[3, 8, 2,],
                     [6, 1, 9],
                     [4, 7, 5]])
sorted_2d_array = np.sort(unsorted_array_2d, axis=1)
print("Row-wise sorted 2D array:\n", sorted_2d_array)

Row-wise sorted 2D array:
 [[2 3 8]
 [1 6 9]
 [4 5 7]]


### III. Read a dataset and convert it into a NumPy array.


In [145]:
import pandas as pd

In [147]:
# reading the dataset
df = pd.read_csv('rainfall.csv')

In [193]:
df.head()

Unnamed: 0,date,rainfall,temperature,humidity,wind_speed,weather_condition
0,1/1/2022,12.5,15.2,78,8.5,Rainy
1,1/2/2022,8.2,17.8,65,5.2,Rainy
2,1/3/2022,0.0,20.1,52,3.1,Sunny
3,1/4/2022,3.7,18.6,71,6.7,Rainy
4,1/5/2022,21.1,14.8,82,9.3,Rainy


In [195]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 53 entries, 0 to 52
Data columns (total 6 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   date               53 non-null     object 
 1   rainfall           53 non-null     float64
 2   temperature        53 non-null     float64
 3   humidity           53 non-null     int64  
 4   wind_speed         53 non-null     float64
 5   weather_condition  53 non-null     object 
dtypes: float64(3), int64(1), object(2)
memory usage: 2.6+ KB


In [149]:
# converting the numerical columns into a NumPy array
numeric_columns = df[['rainfall', 'temperature', 'humidity', 'wind_speed']].to_numpy()

In [151]:
print("NumPy Array of numerical data:\n", numeric_columns)

NumPy Array of numerical data:
 [[12.5 15.2 78.   8.5]
 [ 8.2 17.8 65.   5.2]
 [ 0.  20.1 52.   3.1]
 [ 3.7 18.6 71.   6.7]
 [21.1 14.8 82.   9.3]
 [15.3 16.5 75.   7.8]
 [ 6.8 19.2 61.   4.5]
 [ 0.  21.7 48.   2.9]
 [11.2 17.3 73.   6.1]
 [18.6 15.8 79.   8.9]
 [ 9.5 16.2 72.   5.7]
 [ 2.1 19.8 58.   3.5]
 [ 0.  22.4 45.   2.1]
 [ 7.4 17.9 69.   6.5]
 [14.9 15.1 81.   8.7]
 [19.2 16.8 77.   7.2]
 [ 5.6 18.5 63.   4.9]
 [ 0.  21.2 51.   3.3]
 [10.8 17.6 75.   6.7]
 [16.3 14.5 84.   9.1]
 [ 3.9 16.1 71.   5.3]
 [ 0.  19.4 59.   4.1]
 [ 7.1 18.2 67.   7.1]
 [12.7 15.7 79.   8.3]
 [18.4 17.1 76.   6.8]
 [ 9.8 19.7 62.   4.7]
 [ 0.  22.9 47.   2.5]
 [ 6.2 18.5 70.   7.3]
 [15.5 14.2 83.   9.5]
 [21.8 16.4 78.   7.6]
 [11.6 19.1 65.   5.1]
 [ 0.  21.6 53.   3.7]
 [ 4.3 17.2 73.   7.5]
 [ 9.1 15.4 81.   8.9]
 [17.9 16.7 77.   6.2]
 [ 7.5 18.9 64.   4.3]
 [ 0.  22.1 49.   2.7]
 [ 5.8 17.8 72.   6.9]
 [13.2 14.7 85.   9.3]
 [19.6 16.2 79.   7.8]
 [10.4 19.5 66.   5.5]
 [ 0.  23.4 44.   3.1]
 [

In [153]:
# checking the shape of the NumPy array
print("Shape of the NumPy array:", numeric_columns.shape)

Shape of the NumPy array: (53, 4)


In [155]:
# checking the data type of elements
print("Data type of elements:", numeric_columns.dtype)

Data type of elements: float64


### IV. Apply statistical functions to the NumPy array.


- **Mean** (average value)

In [172]:
# mean of each column
mean_values = np.mean(numeric_columns, axis=0)
print("Mean of each column:", mean_values)

Mean of each column: [ 9.0490566  17.9509434  69.24528302  6.40377358]


- **Median** (middle value)

In [175]:
# median of each column
median_values = np.median(numeric_columns, axis=0)
print("Median of each column:", median_values)

Median of each column: [ 8.3 17.8 72.   6.7]


- **Standard Deviation** (how spread out the values are from the mean)

In [178]:
# standard deviation of each column
std_values = np.std(numeric_columns, axis=0)
print("Standard deviation of each column:", std_values)

Standard deviation of each column: [ 6.6488933   2.47695969 11.85590476  2.1881294 ]


- **Variance** (how much the data varies in each column)

In [181]:
# variance of each column
variance_values = np.var(numeric_columns, axis=0)
print("Variance of each column:", variance_values)

Variance of each column: [ 44.20778213   6.1353293  140.56247775   4.78791029]


- **Minimum and Maximum** (can help identify extreme data points)

In [186]:
# minimum values of each column
min_values = np.min(numeric_columns, axis=0)
print("Minimum values of each column:", min_values)

# maximum values of each column
max_values = np.max(numeric_columns, axis=0)
print("Maximum values of each column:", max_values)

Minimum values of each column: [ 0.  13.9 44.   2.1]
Maximum values of each column: [21.8 23.4 89.  10.5]


- **Sum of Values**

In [189]:
# sum of values in each column
sum_values = np.sum(numeric_columns, axis=0)
print("Sum of values in each column:", sum_values)

Sum of values in each column: [ 479.6  951.4 3670.   339.4]
