# ü§ù Section 8: Interfacing NumPy with Other Libraries ‚Äî Real-World Applications

NumPy is not an isolated library ‚Äî it‚Äôs the **numerical foundation** for nearly every major Python data and AI framework. 

In this section, you‚Äôll learn how to seamlessly interface NumPy arrays with:
- **pandas** for data manipulation
- **matplotlib** for visualization
- **scikit-learn** for machine learning
- **Numba** for performance acceleration

You‚Äôll also see how NumPy enables **zero-copy data sharing** across these libraries, boosting performance for real-world workloads.

## üì¶ 1. NumPy + Pandas: Efficient Data Wrangling

Pandas DataFrames are built on top of NumPy arrays. Understanding this relationship helps you:
- Transfer data efficiently between pandas and NumPy
- Handle conversions without unnecessary copying
- Work with structured numerical data in analytics pipelines

In [ ]:
import numpy as np
import pandas as pd

# Create a NumPy array simulating product sales data
np.random.seed(42)
sales_data = np.random.randint(50, 500, size=(10, 3))
columns = ['Product_A', 'Product_B', 'Product_C']

# Convert to pandas DataFrame
df = pd.DataFrame(sales_data, columns=columns)
print("DataFrame built from NumPy array:\n", df.head(), '\n')

# Back to NumPy array for numerical processing
arr = df.to_numpy()
print("Converted back to NumPy array:\n", arr[:5])

‚úÖ NumPy‚Äìpandas conversions are **zero-copy** when dtypes are compatible ‚Äî meaning data is shared, not duplicated.

This makes NumPy essential for **ETL pipelines**, **data cleaning**, and **feature preprocessing** in analytics workflows.

## üìä 2. NumPy + Matplotlib: Visualization at Scale

Most Matplotlib plotting functions accept NumPy arrays directly. 
This is especially useful when visualizing large-scale simulation or sensor data.

### Real-World Example: Analyzing Stock Price Trends

In [ ]:
import matplotlib.pyplot as plt

# Simulate 1 year of daily stock price data using NumPy
days = np.arange(365)
base_price = 100 + np.cumsum(np.random.normal(0, 1, size=365))

# Compute moving averages with NumPy
short_ma = np.convolve(base_price, np.ones(7)/7, mode='valid')
long_ma = np.convolve(base_price, np.ones(30)/30, mode='valid')

# Plot results
plt.figure(figsize=(10,5))
plt.plot(days, base_price, label='Daily Price', alpha=0.6)
plt.plot(days[6:], short_ma, label='7-Day MA', color='orange')
plt.plot(days[29:], long_ma, label='30-Day MA', color='red')
plt.title('Stock Price Trend with Moving Averages')
plt.xlabel('Day')
plt.ylabel('Price ($)')
plt.legend()
plt.grid(True)
plt.show()

‚úÖ NumPy provides fast numerical computation, while Matplotlib turns those results into visuals.

Together they form the **core of exploratory data analysis** (EDA) in finance, science, and engineering.

## ü§ñ 3. NumPy + Scikit-Learn: Machine Learning Pipelines

Scikit-learn uses NumPy arrays as its **primary data structure**. 
All models and transformers expect NumPy-like input (or objects convertible to it).

### Real-World Example: Predicting House Prices

In [ ]:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

# Generate synthetic housing data
np.random.seed(0)
n_samples = 500
size = np.random.normal(1500, 300, n_samples)  # square feet
bedrooms = np.random.randint(1, 5, n_samples)
age = np.random.randint(0, 30, n_samples)

# Price is a linear combination + noise
price = 50_000 + (size * 200) + (bedrooms * 10_000) - (age * 1000) + np.random.normal(0, 10_000, n_samples)

# Create NumPy feature matrix and target vector
X = np.column_stack((size, bedrooms, age))
y = price

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit model
model = LinearRegression().fit(X_train, y_train)
print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)
print("R¬≤ on test data:", model.score(X_test, y_test))

‚úÖ All features (`X`) and targets (`y`) are **NumPy arrays**. 

This tight integration allows seamless preprocessing and model training pipelines.
You can also use `pandas.DataFrame.values` or `df.to_numpy()` directly in ML models.

## ‚ö° 4. NumPy + Numba: Accelerating Computations

[Numba](https://numba.pydata.org/) uses **JIT compilation** to turn NumPy-based Python functions into optimized machine code.

This is ideal for accelerating inner loops or custom numerical kernels that can‚Äôt be vectorized easily.

In [ ]:
from numba import njit
import time

# Plain NumPy-based function (slow loop)
def moving_average(arr, window):
    result = np.empty(len(arr) - window + 1)
    for i in range(len(result)):
        result[i] = np.mean(arr[i:i+window])
    return result

# JIT-compiled version
@njit
def moving_average_fast(arr, window):
    result = np.empty(len(arr) - window + 1)
    for i in range(len(result)):
        result[i] = np.mean(arr[i:i+window])
    return result

# Benchmark
data = np.random.random(1_000_000)
start = time.time(); moving_average(data, 50); print(f'Python: {time.time()-start:.3f}s')
start = time.time(); moving_average_fast(data, 50); print(f'Numba JIT: {time.time()-start:.3f}s')

‚úÖ The **Numba-accelerated function** can be 10‚Äì100√ó faster depending on complexity ‚Äî perfect for scientific computing and finance simulations.

## üîç Under the Hood

All these libraries ‚Äî pandas, scikit-learn, Numba, and others ‚Äî rely on NumPy‚Äôs **array interface** (`__array_interface__`) to access underlying memory buffers.

This interface allows **zero-copy interoperability**, meaning large datasets don‚Äôt have to be copied between tools ‚Äî a crucial performance advantage in large pipelines.

## ‚ö†Ô∏è Best Practices & Pitfalls

**Best Practices:**
- Use `df.to_numpy()` instead of `values` (future-proof).
- Always ensure dtype consistency when passing arrays between libraries.
- For performance-critical loops, prefer JIT compilation (Numba) or vectorized NumPy functions.
- Avoid unnecessary DataFrame <-> NumPy conversions ‚Äî work natively where possible.

**Pitfalls:**
- Copying large arrays between pandas, NumPy, and ML libraries can silently double memory usage.
- Some libraries (e.g., older scikit-learn versions) may coerce `float64` to `float32` ‚Äî check model expectations.

## üß© Challenge Exercise

**Task:** Build a mini data pipeline:
1. Generate synthetic e-commerce sales data with NumPy.
2. Load it into pandas, compute rolling weekly averages, and convert back to NumPy.
3. Train a regression model (using scikit-learn) to predict next week‚Äôs sales.
4. Use Numba to optimize one computational step.

üéØ *Goal:* Practice end-to-end interoperability and performance tuning in a realistic workflow.

# --- End of Section 8 ---

Next up ‚Üí **Advanced Capstone: Building a High-Performance Numerical Pipeline**, where we‚Äôll combine memory mapping, Numba, and linear algebra for large-scale analytics.