# 📌 **NumPy Essentials for Data Science: Mathematical & Statistical Operations**

---

## 1. **Basic Arithmetic Operations**

* Element-wise operations
* `+`, `-`, `*`, `/`, `**` (power)
* Scalar vs. Array operations
* Practice: Apply arithmetic on arrays, row-wise, and column-wise.

---

## 2. **Aggregate Functions (Statistical Measures)**

These are very commonly used in exploratory data analysis (EDA).

* `np.sum()`: Summation
* `np.mean()`: Mean
* `np.median()`: Median
* `np.std()`: Standard Deviation
* `np.var()`: Variance
* `np.min()`, `np.max()`: Minimum & Maximum
* `np.percentile()`: Percentiles
* `np.quantile()`: Quantiles
* `np.cumsum()`: Cumulative sum
* `np.cumprod()`: Cumulative product

👉 **Practice:**
Try computing these across:

* Entire arrays
* Specific axes (rows/columns)

---

## 3. **Mathematical Functions**

* `np.sqrt()`: Square root
* `np.exp()`: Exponent
* `np.log()`, `np.log10()`, `np.log2()`: Logarithms
* `np.abs()`: Absolute values
* Trigonometric: `np.sin()`, `np.cos()`, `np.tan()`
* Rounding: `np.floor()`, `np.ceil()`, `np.round()`

👉 **Practice:**
Apply these to arrays of random numbers and plot results to visualize.

---

## 4. **Linear Algebra (Very Important for Data Science & Machine Learning)**

* `np.dot()`: Dot product
* `np.matmul()`: Matrix multiplication
* `np.transpose()`: Matrix transpose
* `np.linalg.inv()`: Matrix inverse
* `np.linalg.det()`: Determinant
* `np.linalg.eig()`: Eigenvalues and Eigenvectors
* `np.linalg.norm()`: Norm of a vector/matrix
* `np.linalg.solve()`: Solve system of linear equations

👉 **Practice:**

* Multiply matrices of different shapes.
* Solve simple linear equations.
* Calculate eigenvalues of small matrices.

---

## 5. **Random Number Generation**

* `np.random.rand()`: Uniform distribution
* `np.random.randn()`: Standard normal distribution
* `np.random.randint()`: Random integers
* `np.random.choice()`: Random sampling
* `np.random.seed()`: Setting seed for reproducibility

👉 **Practice:**

* Simulate dice rolls.
* Generate random datasets.
* Use random sampling for bootstrapping.

---

## 6. **Statistical Distributions (from `numpy.random`)**

* Normal Distribution: `np.random.normal()`
* Binomial Distribution: `np.random.binomial()`
* Poisson Distribution: `np.random.poisson()`
* Uniform Distribution: `np.random.uniform()`

👉 **Practice:**

* Generate synthetic datasets from these distributions.
* Visualize them using histograms.

---

## 7. **Correlation and Covariance**

* `np.corrcoef()`: Correlation matrix
* `np.cov()`: Covariance matrix

👉 **Practice:**

* Check correlation between synthetic variables.
* Visualize covariance matrices.

---

## 8. **Sorting & Searching (Useful in EDA)**

* `np.sort()`
* `np.argsort()`
* `np.searchsorted()`
* `np.argmin()`, `np.argmax()`

👉 **Practice:**

* Sort and find percentiles.
* Identify positions of minimum/maximum values.

---

## 9. **Handling Missing Data**

* Represent missing data with `np.nan`
* Use:

  * `np.isnan()`
  * `np.nanmean()`, `np.nanstd()` (ignore NaNs during calculations)

👉 **Practice:**

* Simulate datasets with NaN values and perform calculations ignoring them.

---

## 10. **Vectorization and Broadcasting (Key to Speed)**

* Avoid loops by applying vectorized operations.
* Use broadcasting to perform operations between arrays of different shapes.

👉 **Practice:**

* Apply functions across arrays without using for-loops.
* Perform operations between 1D and 2D arrays using broadcasting.

---

# ✅ Summary Table

| Area            | Key Functions/Concepts                            |
| --------------- | ------------------------------------------------- |
| Arithmetic      | +, -, \*, /, \*\*                                 |
| Aggregate Stats | sum, mean, median, std, var, min, max, percentile |
| Math Functions  | sqrt, exp, log, abs, trigonometry, rounding       |
| Linear Algebra  | dot, matmul, transpose, inv, det, eig, solve      |
| Random          | rand, randint, randn, seed, sampling              |
| Distributions   | normal, binomial, poisson, uniform                |
| Correlation     | corrcoef, cov                                     |
| Sorting/Search  | sort, argsort, searchsorted, argmin, argmax       |
| Missing Data    | nan, isnan, nanmean, nanstd                       |
| Vectorization   | Avoid loops, use broadcasting                     |

---

## 🎯 **Next Steps for You:**

1. Practice these on **real or synthetic datasets**.
2. Try **visualizing results** using `matplotlib` or `seaborn`.
3. Move on to **pandas** after this – it's the next step in data handling.

---

If you want, I can also:

* Provide **practice problems** on each of the above.
* Provide **step-by-step solutions** for them.

Would you like me to prepare that for you? 😊
