# NumPy Practice — SOLUTIONS
### Dataset: AusApparalSales4thQrt2020.csv (Australian Apparel Sales Q4 2020)

**⚠️ Try solving the questions yourself first before looking at solutions!**

Open `numpy_practice.ipynb` to attempt the questions, then come here to verify your answers.

In [None]:
# Setup
import numpy as np
import pandas as pd

df = pd.read_csv('../AusApparalSales4thQrt2020.csv')
df = df.apply(lambda x: x.str.strip() if x.dtype == 'object' else x)

units = df['Unit'].to_numpy()
sales = df['Sales'].to_numpy()

print(f"Dataset loaded: {len(sales)} rows")
print(f"Units: {units[:5]}")
print(f"Sales: {sales[:5]}")

---
## Section 1: Array Basics

**Q1.** Print the shape, dimensions, size, and data type of the `sales` array.

In [None]:
# Q1 Solution
print(f"Shape: {sales.shape}")
print(f"Dimensions: {sales.ndim}")
print(f"Size: {sales.size}")
print(f"Data type: {sales.dtype}")

**Q2.** Print the first 10 elements, last 5 elements, and every 3rd element of the `units` array.

In [None]:
# Q2 Solution
print("First 10:", units[:10])
print("Last 5:", units[-5:])
print("Every 3rd:", units[::3])

**Q3.** Reshape the first 60 elements of `sales` into a 2D array of shape (10, 6). Print the result.

In [None]:
# Q3 Solution
reshaped = sales[:60].reshape(10, 6)
print(f"Shape: {reshaped.shape}")
print(reshaped)

**Q4.** Flatten the reshaped array from Q3 back to 1D and verify it matches the original 60 elements.

In [None]:
# Q4 Solution
flattened = reshaped.flatten()
print(f"Shape: {flattened.shape}")
print(f"Matches original: {np.array_equal(flattened, sales[:60])}")

**Q5.** Create a copy of the `units` array. Modify the first element of the copy to 999. Verify the original array is unchanged.

In [None]:
# Q5 Solution
units_copy = units.copy()
units_copy[0] = 999
print(f"Copy first element: {units_copy[0]}")
print(f"Original first element: {units[0]}")
print(f"Original unchanged: {units[0] != 999}")

---
## Section 2: Indexing & Slicing

**Q6.** Extract all sales values from index 100 to 120 (inclusive).

In [None]:
# Q6 Solution
print(sales[100:121])

**Q7.** Extract every 50th element from the `sales` array.

In [None]:
# Q7 Solution
print(sales[::50])

**Q8.** Using boolean indexing, find all `units` values that are greater than 20.

In [None]:
# Q8 Solution
high_units = units[units > 20]
print(f"Count: {len(high_units)}")
print(f"Values: {high_units[:20]}...")  # first 20

**Q9.** Using boolean indexing, find all `sales` values where `units` is exactly 10.

In [None]:
# Q9 Solution
sales_when_10_units = sales[units == 10]
print(f"Count: {len(sales_when_10_units)}")
print(f"Values: {sales_when_10_units[:20]}...")  # first 20

**Q10.** Use fancy indexing to extract sales at indices [0, 100, 500, 1000, 5000].

In [None]:
# Q10 Solution
indices = [0, 100, 500, 1000, 5000]
print(sales[indices])

---
## Section 3: Mathematical & Statistical Operations

**Q11.** Calculate the mean, median, standard deviation, and variance of the `sales` array.

In [None]:
# Q11 Solution
print(f"Mean: {np.mean(sales):.2f}")
print(f"Median: {np.median(sales):.2f}")
print(f"Std Dev: {np.std(sales):.2f}")
print(f"Variance: {np.var(sales):.2f}")

**Q12.** Find the minimum and maximum values in `units`. Also find their index positions using `argmin()` and `argmax()`.

In [None]:
# Q12 Solution
print(f"Min: {np.min(units)} at index {np.argmin(units)}")
print(f"Max: {np.max(units)} at index {np.argmax(units)}")

**Q13.** Calculate the total (sum) of all sales. What is the cumulative sum of the first 20 sales values?

In [None]:
# Q13 Solution
print(f"Total sales: {np.sum(sales)}")
print(f"Cumulative sum (first 20): {np.cumsum(sales[:20])}")

**Q14.** Calculate the average sales per unit. (Hint: divide `sales` array by `units` array — handle division by zero if any)

In [None]:
# Q14 Solution
sales_per_unit = np.divide(sales, units, out=np.zeros_like(sales, dtype=float), where=units != 0)
print(f"Average sales per unit: {np.mean(sales_per_unit):.2f}")
print(f"First 10 values: {sales_per_unit[:10]}")

**Q15.** What percentage of total sales does each individual sale represent? Store the result in a new array.

In [None]:
# Q15 Solution
pct_of_total = (sales / np.sum(sales)) * 100
print(f"First 10 percentages: {pct_of_total[:10]}")
print(f"Sum of all percentages: {np.sum(pct_of_total):.2f}%")

---
## Section 4: Array Operations

**Q16.** Add 500 to every element in the `sales` array (broadcasting). Print the first 10 results.

In [None]:
# Q16 Solution
sales_plus_500 = sales + 500
print(f"Original first 10: {sales[:10]}")
print(f"After +500:        {sales_plus_500[:10]}")

**Q17.** Multiply all `units` values by 2.5 (simulating a price increase). Print the first 10 results.

In [None]:
# Q17 Solution
units_scaled = units * 2.5
print(f"Original first 10: {units[:10]}")
print(f"After x2.5:        {units_scaled[:10]}")

**Q18.** Create a boolean array where `sales > 30000`. How many True values are there? What percentage of total rows is that?

In [None]:
# Q18 Solution
high_sales = sales > 30000
count = np.sum(high_sales)
pct = (count / len(sales)) * 100
print(f"Count of sales > 30000: {count}")
print(f"Percentage: {pct:.2f}%")

**Q19.** Use `np.where()` to create a new array that labels each sale as `'High'` if sales > 25000, else `'Low'`.

In [None]:
# Q19 Solution
labels = np.where(sales > 25000, 'High', 'Low')
print(f"First 20 labels: {labels[:20]}")
print(f"High count: {np.sum(labels == 'High')}")
print(f"Low count: {np.sum(labels == 'Low')}")

**Q20.** Clip the `sales` array so that all values are between 10000 and 40000. Print the first 20 values.

In [None]:
# Q20 Solution
clipped = np.clip(sales, 10000, 40000)
print(f"First 20 clipped: {clipped[:20]}")
print(f"Min after clip: {clipped.min()}, Max after clip: {clipped.max()}")

---
## Section 5: Sorting & Searching

**Q21.** Sort the `sales` array in ascending order. What are the 5 smallest and 5 largest sales values?

In [None]:
# Q21 Solution
sorted_sales = np.sort(sales)
print(f"5 smallest: {sorted_sales[:5]}")
print(f"5 largest: {sorted_sales[-5:]}")

**Q22.** Use `np.argsort()` on `sales` to find the indices of the top 10 highest sales.

In [None]:
# Q22 Solution
top_10_indices = np.argsort(sales)[-10:][::-1]
print(f"Top 10 indices: {top_10_indices}")
print(f"Top 10 values: {sales[top_10_indices]}")

**Q23.** Find all unique values in the `units` array. How many unique unit values exist?

In [None]:
# Q23 Solution
unique_units = np.unique(units)
print(f"Unique values: {unique_units}")
print(f"Count: {len(unique_units)}")

**Q24.** Use `np.unique()` with `return_counts=True` on `units` to find the frequency of each unit value.

In [None]:
# Q24 Solution
values, counts = np.unique(units, return_counts=True)
for v, c in zip(values, counts):
    print(f"Unit {v}: {c} occurrences")

**Q25.** Use `np.searchsorted()` on a sorted `sales` array to find where the value 25000 would be inserted.

In [None]:
# Q25 Solution
sorted_sales = np.sort(sales)
pos = np.searchsorted(sorted_sales, 25000)
print(f"25000 would be inserted at index: {pos}")
print(f"That means {pos} values are below 25000 ({pos/len(sales)*100:.1f}%)")

---
## Section 6: Reshaping & Stacking

**Q26.** Take the first 120 sales values. Reshape them into a (10, 12) matrix. Find the sum of each row and each column.

In [None]:
# Q26 Solution
matrix = sales[:120].reshape(10, 12)
print(f"Shape: {matrix.shape}")
print(f"Row sums: {matrix.sum(axis=1)}")
print(f"Column sums: {matrix.sum(axis=0)}")

**Q27.** Split the first 100 elements of `units` into 5 equal arrays using `np.split()`.

In [None]:
# Q27 Solution
splits = np.split(units[:100], 5)
for i, s in enumerate(splits):
    print(f"Split {i+1}: {s} (length: {len(s)})")

**Q28.** Create two arrays: one with the first 50 sales values and another with the next 50. Stack them vertically and horizontally.

In [None]:
# Q28 Solution
a = sales[:50]
b = sales[50:100]

vstacked = np.vstack([a, b])
hstacked = np.hstack([a, b])

print(f"a shape: {a.shape}, b shape: {b.shape}")
print(f"vstack shape: {vstacked.shape}")
print(f"hstack shape: {hstacked.shape}")

**Q29.** Take the (10, 12) matrix from Q26. Transpose it. What is the new shape?

In [None]:
# Q29 Solution
transposed = matrix.T
print(f"Original shape: {matrix.shape}")
print(f"Transposed shape: {transposed.shape}")

**Q30.** Add a new axis to the `units` array to make it a column vector. Print the shape.

In [None]:
# Q30 Solution
col_vector = units[:, np.newaxis]
print(f"Original shape: {units.shape}")
print(f"Column vector shape: {col_vector.shape}")

---
## Section 7: Aggregation with Axis

**Q31.** Reshape the first 200 sales values into a (10, 20) matrix. Calculate sum and mean along both axes.

In [None]:
# Q31 Solution
matrix = sales[:200].reshape(10, 20)
print(f"Sum axis=0 (column-wise): {matrix.sum(axis=0)}")
print(f"Sum axis=1 (row-wise): {matrix.sum(axis=1)}")
print(f"Mean axis=0: {matrix.mean(axis=0)}")
print(f"Mean axis=1: {matrix.mean(axis=1)}")

**Q32.** For the same matrix, find the max value in each row and the min value in each column.

In [None]:
# Q32 Solution
print(f"Max per row (axis=1): {matrix.max(axis=1)}")
print(f"Min per column (axis=0): {matrix.min(axis=0)}")

**Q33.** For the same matrix, use `np.argmax(axis=1)` to find which column has the highest value in each row.

In [None]:
# Q33 Solution
col_of_max = np.argmax(matrix, axis=1)
for i, col in enumerate(col_of_max):
    print(f"Row {i}: max at column {col} (value: {matrix[i, col]})")

---
## Section 8: Random & Simulation

**Q34.** Set a random seed of 42. Generate a random array of 100 values from a normal distribution with mean=25000 and std=5000. Compare with actual sales.

In [None]:
# Q34 Solution
np.random.seed(42)
simulated = np.random.normal(loc=25000, scale=5000, size=100)

print(f"Simulated - Mean: {simulated.mean():.2f}, Std: {simulated.std():.2f}")
print(f"Actual    - Mean: {sales.mean():.2f}, Std: {sales.std():.2f}")

**Q35.** Use `np.random.choice()` to randomly sample 50 values from the `sales` array (without replacement).

In [None]:
# Q35 Solution
sample = np.random.choice(sales, size=50, replace=False)
print(f"Sample size: {len(sample)}")
print(f"Sample mean: {sample.mean():.2f}")
print(f"Sample: {sample[:10]}...")

**Q36.** Simulate 1000 random daily sales by sampling from the `units` array with replacement. What is the mean?

In [None]:
# Q36 Solution
simulated_units = np.random.choice(units, size=1000, replace=True)
print(f"Simulated mean: {simulated_units.mean():.2f}")
print(f"Actual mean: {units.mean():.2f}")

**Q37.** Generate a 5x5 random integer matrix with values between 1 and 50. Find its determinant and inverse (if it exists).

In [None]:
# Q37 Solution
np.random.seed(42)
mat = np.random.randint(1, 50, size=(5, 5))
print("Matrix:")
print(mat)

det = np.linalg.det(mat)
print(f"\nDeterminant: {det:.2f}")

if det != 0:
    inv = np.linalg.inv(mat)
    print(f"\nInverse:\n{inv}")
else:
    print("Matrix is singular, no inverse exists.")

---
## Section 9: Linear Algebra (Bonus)

**Q38.** Create a 3x3 matrix from the first 9 sales values. Calculate its transpose, determinant, and trace.

In [None]:
# Q38 Solution
mat = sales[:9].reshape(3, 3).astype(float)
print(f"Matrix:\n{mat}")
print(f"\nTranspose:\n{mat.T}")
print(f"\nDeterminant: {np.linalg.det(mat):.2f}")
print(f"Trace: {np.trace(mat):.2f}")

**Q39.** Create two 3x3 matrices from sales data. Perform matrix multiplication.

In [None]:
# Q39 Solution
A = sales[:9].reshape(3, 3).astype(float)
B = sales[9:18].reshape(3, 3).astype(float)

print(f"A:\n{A}")
print(f"\nB:\n{B}")
print(f"\nA @ B (using @ operator):\n{A @ B}")
print(f"\nnp.dot(A, B):\n{np.dot(A, B)}")

**Q40.** Solve the system of equations: 2x + 3y = 25000, 4x + y = 30000

In [None]:
# Q40 Solution
A = np.array([[2, 3], [4, 1]])
b = np.array([25000, 30000])

solution = np.linalg.solve(A, b)
print(f"x = {solution[0]:.2f}")
print(f"y = {solution[1]:.2f}")

# Verify
print(f"\nVerification:")
print(f"2x + 3y = {2*solution[0] + 3*solution[1]:.2f} (should be 25000)")
print(f"4x + y  = {4*solution[0] + 1*solution[1]:.2f} (should be 30000)")

---
## Section 10: Real-World Analysis with NumPy

**Q41.** Calculate the z-score for each value in the `sales` array. How many sales are more than 2 standard deviations from the mean?

In [None]:
# Q41 Solution
z_scores = (sales - np.mean(sales)) / np.std(sales)
outliers = np.sum(np.abs(z_scores) > 2)
print(f"Z-scores (first 10): {z_scores[:10]}")
print(f"Sales > 2 std from mean: {outliers}")
print(f"Percentage: {outliers/len(sales)*100:.2f}%")

**Q42.** Calculate the correlation coefficient between `units` and `sales` using `np.corrcoef()`.

In [None]:
# Q42 Solution
corr = np.corrcoef(units, sales)
print(f"Correlation matrix:\n{corr}")
print(f"\nCorrelation between units and sales: {corr[0, 1]:.4f}")

**Q43.** Use `np.percentile()` to find the 25th, 50th, 75th, and 90th percentiles of `sales`.

In [None]:
# Q43 Solution
percentiles = [25, 50, 75, 90]
for p in percentiles:
    print(f"{p}th percentile: {np.percentile(sales, p):.2f}")

**Q44.** Bin the `sales` data into 5 equal-width bins using `np.histogram()`. Print the bin edges and counts.

In [None]:
# Q44 Solution
counts, bin_edges = np.histogram(sales, bins=5)
print("Bin edges:", bin_edges)
print("Counts:", counts)
print("\nBin ranges and counts:")
for i in range(len(counts)):
    print(f"  {bin_edges[i]:.0f} - {bin_edges[i+1]:.0f}: {counts[i]}")

**Q45.** Calculate the moving average of `sales` with a window size of 7 using `np.convolve()`.

In [None]:
# Q45 Solution
window = 7
kernel = np.ones(window) / window
moving_avg = np.convolve(sales, kernel, mode='valid')

print(f"Original length: {len(sales)}")
print(f"Moving average length: {len(moving_avg)}")
print(f"First 10 moving averages: {moving_avg[:10]}")

---
## ✅ All 45 NumPy Solutions Complete!

Go back to `numpy_practice.ipynb` and try any questions you couldn't solve.