## Assignment : Understanding and Applying numpy for big data analysis

The goal of this assignment is to provide you with hands-on experience using the NumPy library 
for numerical computations in Python. You will explore array creation, mathematical operations 
slicing, and aggregations, which are foundational skills for analyzing and processing lar e
datasets in a Big Data environment

Scenario: Data Preparation for Retail Analytics
You are a data scientist working for a retail company. The company collects massive amounts of 
sales data daily, which you need to preprocess and analyze efficiently. This assignment wil 
focus on tasks such as:
1. Converting raw sales data into structured arrays.
2. Performing basic mathematical operations for sales insights.
3. Aggregating sales data across multiple dimensions.
4. Using slicing and indexing to extract subsets of data for analysis

# Task 1: Array Creation
Create a 2D NumPy array to represent the sales data for 7 days across 3 products. Use the
following sales values:
o Product A: [100, 150, 200, 250, 300, 350, 400]
o Product B: [120, 170, 220, 270, 320, 370, 420]
o Product C: [90, 140, 190, 240, 290, 340, 390]
o Steps:
▪ Create the array using np.array.
▪ Print the shape, dimensions, and total size of the array.
o Deliverable Example Output:
Shape: (3, 7)
Dimensions: 2
Total elements: 21

In [1]:
import numpy as np

x= np.array([ [100, 150, 200, 250, 300, 350, 400], [120, 170, 220, 270, 320, 370, 420],[90, 140, 190, 240, 290, 340, 390]])
print(x)

print(x.shape)
print(x.ndim)
print(x.size)

[[100 150 200 250 300 350 400]
 [120 170 220 270 320 370 420]
 [ 90 140 190 240 290 340 390]]
(3, 7)
2
21


# Task 2: Mathematical Operations
Calculate the following for the sales data array:
o Square of each element.
o Sales data increased by 10% across all products.
o Boolean array indicating sales greater than 250.
o Steps:
▪ Use elementwise operations to compute the results.
▪ Use logical operations to generate the boolean a
ay.
o Deliverable Example Output:
Sales increased by 10%:
[[110. 165. 220. ... ]
...
]

In [3]:
# Square of each element
print(x**2)

# Sales data increased by 10% across all products.
print( (x*0.10)+x)

# Boolean array indicating sales greater than 250.
print( x>250)

[[ 10000  22500  40000  62500  90000 122500 160000]
 [ 14400  28900  48400  72900 102400 136900 176400]
 [  8100  19600  36100  57600  84100 115600 152100]]
[[110. 165. 220. 275. 330. 385. 440.]
 [132. 187. 242. 297. 352. 407. 462.]
 [ 99. 154. 209. 264. 319. 374. 429.]]
[[False False False False  True  True  True]
 [False False False  True  True  True  True]
 [False False False False  True  True  True]]


 # Task 3 Slicing and Indexing
 1. Extract the following subsets from the sales array:
o Sales data for the first 3 days across all products.
o Sales data for Product B only.
o Modify the sales data for Product C on Day 7 to 50

o Deliverable Example Output:
Sales for the first 3 days:
[[100 150 200]
[120 170 220]
[90 140 190]]
Updated array:
[[100 ...]
[120 ...]
[90 ... 500]]

In [17]:
# Sales data for the first 3 days across all products.
print(x[:, :3]) #extracts first 3 columns for all 3 rows
    
# Sales data for Product B only.
print(x[:2]) #extracts the second row, which represents product B
    
# Modify the sales data for Product C on Day 7 to 500.
x[2,6] =500 # set element in row 3 col 6 to 500
print(x)


[[100 150 200]
 [120 170 220]
 [ 90 140 190]]
[[100 150 200 250 300 350 400]
 [120 170 220 270 320 370 420]]
[[100 150 200 250 300 350 400]
 [120 170 220 270 320 370 420]
 [ 90 140 190 240 290 500 500]]


# Task 5: Placeholder Arrays
1. Create the following placeholder arrays for future data:
o A 3x3 array of zeros to represent sales adjustments.
o A 4x4 identity matrix for matrix computations.
o A 2x3 random array for forecasting future sal.
o Deliverable Example Output:
Zeros:
[[0. 0. 0.]
[0. 0. 0.]
...
Random:
[[0.1 0.5 ...]

In [20]:
# A 3x3 array of zeros to represent sales adjustments.
sales_adj= np.zeros((3,3))
print(sales_adj)

#A 4x4 identity matrix for matrix computations.
identity= np.eye(4)
print(identity)

#A 2x3 random array for forecasting future sales.
random=np.random.rand(2,3)
print(random)

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]
[[0.18923788 0.52664555 0.77459448]
 [0.97697021 0.25134082 0.96631422]]


# Task 6: Real-World Application
1. Using np.linspace, generate 12 evenly spaced timestamps representing monthly sale 
data.
2. Compute the sine transformation of the sales data for Product A (hint: use np.sin

o Deliverable Example Output:
Timestamps: [1, 2, 3, ...]
Sine-transformed sales:
[sin(100), sin(150), ...]

In [22]:
#generate 12 evenly spaces timestamps
timestamps = np.linspace(1, 12, 12, dtype=int)  # 12 months

#random data for product a
sales_data = np.array([100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650])

#sin trf of data
sine_trf=np.sin(sales_data)


print("Timestamps:", timestamps)
print("\nSine-transformed sales:")
print(sine_trf)

Timestamps: [ 1  2  3  4  5  6  7  8  9 10 11 12]

Sine-transformed sales:
[-0.50636564 -0.71487643 -0.8732973  -0.97052802 -0.99975584 -0.95893283
 -0.85091936 -0.68328373 -0.46777181 -0.21948408  0.04418245  0.3047532 ]


Bonus Task
• Using np.dot or @, compute the dot product of sales data (Product A, B) with  
transformation matrix:
[[1.1, 0. ],
[0.8, 1
.2]]
• Interpret the result in the context of adjusting sales data for a discount scenario.

In [23]:
sales_data = np.array([
    [100, 200],  
    [150, 250] 
])

transformation_matrix = np.array([
    [1.1, 0.9],  
    [0.8, 1.2]  
])

# Compute the dot product to get adjusted sales
adjusted_sales = np.dot(sales_data, transformation_matrix)  

# Print results
print("Original Sales Data:")
print(sales_data)

print("\nAdjusted Sales Data:")
print(adjusted_sales)

Original Sales Data:
[[100 200]
 [150 250]]

Adjusted Sales Data:
[[270. 330.]
 [365. 435.]]
