### Mini-Project: Basic Statistics & Data Filtering with Numpy

This mini-project uses Numpy to analyze and summarize simulated data, showcasing array manipulation, statistics, reshaping, and boolean indexing—all core FCC topics.

##### 1. Import Libraries

In [1]:
import numpy as np

##### 2. Generate Sample Data
Create a 1D array of 50 random integers between 10 and 99 (inclusive).

In [2]:
data = np.random.randint(10, 100, size=50)
print("Data array:\n", data)

Data array:
 [70 60 47 94 15 54 71 84 78 20 83 33 90 59 38 41 36 19 64 43 27 25 86 41
 25 51 53 71 76 28 15 48 99 47 42 46 64 13 34 32 33 85 94 12 53 63 11 75
 35 30]


##### 3. Basic Statistics

In [3]:
mean = data.mean()
std = data.std()
min_val = data.min()
max_val = data.max()

print(f"Mean: {mean:.2f}")
print(f"Std Dev: {std:.2f}")
print(f"Min: {min_val}, Max: {max_val}")

Mean: 50.26
Std Dev: 24.55
Min: 11, Max: 99


##### 4. Reshape and Slice
Reshape data into a 5x10 matrix and compute the sum of each row.


In [4]:
data_matrix = data.reshape(5, 10)
print("Data (5x10 matrix):\n", data_matrix)

row_sums = data_matrix.sum(axis=1)
print("Sum of each row:", row_sums)

Data (5x10 matrix):
 [[70 60 47 94 15 54 71 84 78 20]
 [83 33 90 59 38 41 36 19 64 43]
 [27 25 86 41 25 51 53 71 76 28]
 [15 48 99 47 42 46 64 13 34 32]
 [33 85 94 12 53 63 11 75 35 30]]
Sum of each row: [593 506 483 440 491]


##### 5. Boolean Indexing: Filter Values

Find all numbers in the original data greater than (mean + standard deviation).

In [5]:
high_values = data[data > (mean + std)]
print("Values greater than one std above mean:", high_values)

Values greater than one std above mean: [94 84 78 83 90 86 76 99 85 94 75]


##### 6.  Replace Outliers

Replace all values above (mean + std) with the mean (demonstrating boolean assignment).

In [6]:
data_clean = data.copy()
data_clean[data_clean > (mean + std)] = int(mean)
print("Data after replacing high outliers with mean:\n", data_clean)

Data after replacing high outliers with mean:
 [70 60 47 50 15 54 71 50 50 20 50 33 50 59 38 41 36 19 64 43 27 25 50 41
 25 51 53 71 50 28 15 48 50 47 42 46 64 13 34 32 33 50 50 12 53 63 11 50
 35 30]


#### Reflection

In this mini-project, I used Numpy to generate, analyze, and transform a dataset of random integers. I practiced several core skills, including array creation, reshaping, statistical calculations (mean, standard deviation, min, max), and boolean indexing for data filtering and conditional replacement.

What I learned:
- How to quickly produce and reshape large numerical arrays with Numpy functions.
- Efficient ways to calculate aggregate statistics and extract meaningful data patterns.
- The power of boolean indexing for filtering and cleaning data without explicit loops.
- How reshaping and axis-based operations simplify working with multidimensional data.

Questions/Next Steps:
- How might I handle data that contains missing values (NaN) solely using Numpy?
- Are there Numpy functions specialized for trimming or capping extreme outliers in more complex data scenarios?
- I’d like to explore more real-world datasets to apply these methods and compare with Pandas workflows in the future.

Overall, this project reinforced my understanding of Numpy and prepared me for deeper data analysis tasks using Python.
