---

# **🧪 SciPy Cheatsheet**

---

### 📐 1. `scipy.stats` – Statistics & Distributions  
📌 Probability, tests, PDFs, and CDFs  
```python
from scipy import stats

stats.describe(data)                 # 📊 Summary stats  
stats.zscore(data)                   # 🧠 Z-scores  
stats.ttest_ind(a, b)                # 📈 T-test (independent)  
stats.ttest_rel(a, b)                # 🔄 Paired T-test  
stats.pearsonr(x, y)                 # 🔗 Pearson correlation  
stats.norm.pdf(x, loc, scale)        # 🧮 Normal PDF  
stats.norm.cdf(x, loc, scale)        # 📈 Normal CDF  
stats.shapiro(data)                  # 🧪 Normality test  
```

---

### 📏 2. `scipy.spatial` – Distance & Geometry  
📌 Useful for clustering, k-NN, NLP  
```python
from scipy.spatial import distance

distance.euclidean(a, b)             # 📐 Euclidean distance  
distance.cosine(a, b)                # 🔄 Cosine similarity  
distance.cdist(X, Y, metric='euclidean')  # 📏 Distance matrix  
```

---

### 🔧 3. `scipy.optimize` – Optimization  
📌 Minimize functions, solve equations  
```python
from scipy.optimize import minimize, curve_fit

minimize(f, x0)                      # 🎯 Minimize any function  
curve_fit(model_func, xdata, ydata)  # 🧩 Fit model to data  
```

---

### 🧮 4. `scipy.linalg` – Linear Algebra  
📌 Used behind the scenes in ML libraries  
```python
from scipy import linalg

linalg.inv(A)                        # 🔁 Matrix inverse  
linalg.det(A)                        # 🔢 Determinant  
linalg.solve(A, b)                   # 🧠 Solve Ax = b  
linalg.svd(A)                        # 🔍 Singular Value Decomposition  
```

---

### ⚙️ 5. `scipy.integrate` – Calculus Tools  
📌 Numerical integration, useful in modeling  
```python
from scipy.integrate import quad, odeint

quad(lambda x: x**2, 0, 1)           # ∫ x² dx from 0 to 1  
odeint(model, y0, t)                 # 🧪 Solve ODE system  
```

---

### 🌱 6. `scipy.interpolate` – Interpolation  
📌 Fill missing data or create smooth curves  
```python
from scipy.interpolate import interp1d

f = interp1d(x, y, kind='linear')   # 🔁 Linear interpolation  
f(2.5)                               # 🎯 Interpolated value  
```

---

### 🧊 7. `scipy.ndimage` – Image Processing  
📌 Useful in computer vision and 2D data  
```python
from scipy import ndimage

ndimage.gaussian_filter(img, sigma=2)    # 🌫 Gaussian blur  
ndimage.rotate(img, 45)                  # 🔄 Rotate image  
ndimage.zoom(img, 2)                     # 🔍 Zoom in  
```

---

### 🧱 8. `scipy.cluster` – Clustering  
📌 Hierarchical clustering and dendrograms  
```python
from scipy.cluster.hierarchy import linkage, dendrogram

Z = linkage(X, method='ward')       # 🔗 Linkage matrix  
dendrogram(Z)                       # 🌳 Plot dendrogram  
plt.show()
```

---

✅ **When to Use SciPy?**  
- More advanced, lower-level than NumPy  
- Behind many ML algorithms  
- Great for modeling, hypothesis testing, clustering, optimization, etc.


---

# **⚔️ NumPy vs SciPy – Quick Comparison**

| Feature / Use Case               | ✅ NumPy                             | 🔬 SciPy                                 |
|-------------------------------|-------------------------------------|-----------------------------------------|
| 📦 **Purpose**                 | Core array & math operations        | Built on top of NumPy – advanced tools  |
| 🧮 **Array Support**           | ✅ `ndarray` is core                 | 🚫 Uses NumPy arrays internally          |
| ➕ **Basic Math & Algebra**     | Yes (e.g., `np.sum`, `np.dot`)      | Yes (but via `scipy.linalg`)            |
| 📊 **Statistics**             | Basic (mean, std, var)              | Advanced (`scipy.stats`)                |
| 📏 **Linear Algebra**          | Basic (`np.linalg`)                 | Advanced (`scipy.linalg`)               |
| 📐 **Optimization**            | ❌ Not available                     | ✅ `scipy.optimize`                      |
| 📚 **Distributions**           | ❌ Not available                     | ✅ `scipy.stats` (PDF, CDF, fitting)     |
| 🔁 **Interpolation**           | Basic via `np.interp`               | Advanced (`scipy.interpolate`)          |
| 🧪 **Numerical Integration**   | ❌ Not available                     | ✅ `scipy.integrate`                     |
| 🧊 **Image Processing**        | ❌ Not available                     | ✅ `scipy.ndimage`                       |
| 🧠 **Clustering & Distance**   | ❌ Not available                     | ✅ `scipy.cluster`, `scipy.spatial`     |
| 🧾 **Signal Processing**       | Very limited                        | ✅ `scipy.signal`                        |
| 📥 **Dependencies**            | Lightweight                         | Heavier (depends on NumPy)              |

---

### ✅ Summary:
- **NumPy** = 🔢 Fast array ops + math foundations  
- **SciPy** = 🧠 Advanced scientific tools **built on NumPy**  
- **Use together** in almost every ML/DS pipeline!

---



---

# **🤝SciPy + Scikit-learn Combo Cheatsheet**
🧪 Add scientific power 💥 to your machine learning pipeline

---

### 1️⃣ Custom Distance Metrics with `scipy.spatial`  
📌 Use in clustering, k-NN, etc.
```python
from scipy.spatial.distance import euclidean
from sklearn.neighbors import KNeighborsClassifier

model = KNeighborsClassifier(metric=euclidean)  # ✅ Custom distance
model.fit(X_train, y_train)
```

✅ Can also use `cosine`, `minkowski`, or even `cdist` for pairwise.

---

### 2️⃣ Feature Selection with `scipy.stats`  
📌 Filter features using statistical tests  
```python
from scipy.stats import ttest_ind
import numpy as np

p_vals = [ttest_ind(X[y==0, i], X[y==1, i]).pvalue for i in range(X.shape[1])]
selected = np.where(np.array(p_vals) < 0.05)[0]   # 🎯 Only significant features
X_new = X[:, selected]
```

✅ Combine with `sklearn.feature_selection.SelectKBest` for automation.

---

### 3️⃣ Curve Fitting as ML Model with `scipy.optimize.curve_fit`  
📌 Fit custom function as your "regressor"
```python
from scipy.optimize import curve_fit

def model_func(x, a, b):  # 📈 Define your function
    return a * x + b

params, _ = curve_fit(model_func, X_train.ravel(), y_train)
preds = model_func(X_test.ravel(), *params)
```

✅ Good for baseline or interpretable models.

---

### 4️⃣ Optimize Model Hyperparameters Manually  
📌 Use `scipy.optimize.minimize` for custom tuning  
```python
from scipy.optimize import minimize
from sklearn.model_selection import cross_val_score

def loss_fn(params):
    model = KNeighborsClassifier(n_neighbors=int(params[0]))
    return -cross_val_score(model, X, y, cv=3).mean()

result = minimize(loss_fn, [3], bounds=[(1, 10)])
best_k = int(result.x[0])
```

✅ Great when `GridSearchCV` is too rigid.

---

### 5️⃣ Solve ML Equations (like Linear Regression)  
📌 Use `scipy.linalg` to solve Ax = b  
```python
from scipy.linalg import solve

X_b = np.c_[np.ones((X.shape[0], 1)), X]  # Add bias
theta = solve(X_b.T @ X_b, X_b.T @ y)     # 💡 Normal Equation
```

✅ Very fast for small linear systems.

---

### 6️⃣ Evaluate Gaussian Assumptions with `scipy.stats`  
📌 Use in Naive Bayes / Gaussian modeling  
```python
from scipy.stats import norm

pdf_vals = norm.pdf(X_test, loc=mu, scale=sigma)  # 🔍 Gaussian likelihood
```

✅ Use for generative modeling or Naive Bayes tweaks.

---

### 7️⃣ Signal or Time-Series Preprocessing  
📌 With `scipy.signal` before feeding to models  
```python
from scipy.signal import savgol_filter

smoothed = savgol_filter(signal_data, window_length=11, polyorder=2)
```

✅ Helps remove noise before ML step.

---

### 8️⃣ Distance Matrix for Clustering  
📌 Use `scipy.spatial.distance_matrix` for custom clusters  
```python
from scipy.spatial import distance_matrix
from sklearn.cluster import AgglomerativeClustering

D = distance_matrix(X, X)
model = AgglomerativeClustering(affinity='precomputed', linkage='average')
model.fit(D)
```

✅ Great for non-Euclidean or domain-specific metrics.

---

### 9️⃣ Hierarchical Clustering & Dendrogram  
📌 Combine `scipy.cluster` with `sklearn`  
```python
from scipy.cluster.hierarchy import dendrogram, linkage

Z = linkage(X, method='ward')
dendrogram(Z)
plt.title("Dendrogram for Clustering")
plt.show()
```

✅ Visualize `sklearn.cluster.AgglomerativeClustering` results.

---

### 🔚 Wrap-Up Summary

| Use Case                        | Scipy Tool Used         | sklearn Purpose     |
|-------------------------------|-------------------------|---------------------|
| Distance metrics              | `scipy.spatial`         | k-NN, clustering    |
| Feature selection             | `scipy.stats.ttest_ind` | Reduce input size   |
| Custom model fitting          | `scipy.optimize.curve_fit` | Regression     |
| Manual tuning                | `scipy.optimize.minimize` | Hyperparameter tuning |
| Linear systems                | `scipy.linalg.solve`    | Linear models       |
| Gaussian modeling             | `scipy.stats.norm`      | Naive Bayes         |
| Signal smoothing              | `scipy.signal`          | Time-series cleanup |
| Clustering visualization      | `scipy.cluster`         | Hierarchical ML     |

---