## Question:
You have a dataset with millions of records and need to perform efficient similarity search to find the most similar data points to a given query. How would you implement Approximate Nearest Neighbors (ANN) Search in Python using FAISS for high-dimensional data?

## Solution:
FAISS (Facebook AI Similarity Search) is optimized for large-scale nearest neighbor searches, making it much faster than brute-force methods.

In [None]:
import numpy as np
import faiss

ImportError: Numba needs NumPy 2.1 or less. Got NumPy 2.2.

In [None]:
# Generate a large synthetic dataset (10M rows, 100 features)
np.random.seed(42)
data = np.random.rand(10_000_000, 100).astype(np.float32)
df = pd.DataFrame(data)

In [None]:
@jit(nopython=True, parallel=True, fastmath=True)
def pearson_corr(X):
    """
    Compute Pearson correlation matrix efficiently using Numba.
    :param X: NumPy array of shape (n_samples, n_features)
    :return: Correlation matrix of shape (n_features, n_features)
    """
    n, m = X.shape
    corr_matrix = np.zeros((m, m), dtype=np.float32)
    
    # Compute mean and standard deviation for each feature
    means = np.mean(X, axis=0)
    stds = np.std(X, axis=0)

    # Compute correlation using vectorized dot product
    for i in prange(m):
        for j in prange(i, m):
            num = np.sum((X[:, i] - means[i]) * (X[:, j] - means[j]))
            denom = (n - 1) * stds[i] * stds[j]
            corr_matrix[i, j] = num / denom if denom != 0 else 0
            corr_matrix[j, i] = corr_matrix[i, j]  # Symmetric matrix
    
    return corr_matrix


In [3]:
# Convert DataFrame to NumPy and compute correlation matrix
correlation_matrix = pearson_corr(df.to_numpy())


NameError: name 'pearson_corr' is not defined

In [4]:
# Convert back to DataFrame if needed
correlation_df = pd.DataFrame(correlation_matrix, index=df.columns, columns=df.columns)


NameError: name 'correlation_matrix' is not defined

In [None]:
# Display first few rows of the correlation matrix
print(correlation_df.head())