# AI-Powered Prospectivity Mapping for Rare Earth Minerals

**Copyright (c) 2026 Shrikara Kaudambady. All rights reserved.**

This notebook demonstrates how to use a Random Forest machine learning model to create a mineral prospectivity map. We will use synthetic datasets representing various geological features to predict the likelihood of finding Rare Earth Element (REE) deposits.

### 1. Setup and Library Imports

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

sns.set(style='whitegrid', context='notebook')

### 2. Data Simulation

Real-world geological data is often not publicly available. Here, we generate synthetic data to simulate a realistic exploration scenario. We'll create three data layers:

1.  **Geology Layer:** Different rock types.
2.  **Geophysics Layer:** A magnetic anomaly, which could indicate a mineral-rich intrusion.
3.  **Structure Layer:** A fault line, which can be a pathway for mineralizing fluids.
4.  **Known Deposits:** Locations of confirmed mineralization, which we will use for training.

In [None]:
MAP_SIZE = 500
np.random.seed(42)

# Layer 1: Geological Formations (different rock types)
geology_layer = np.random.randint(1, 5, size=(MAP_SIZE, MAP_SIZE))

# Layer 2: Geophysical Anomaly (e.g., a magnetic high)
x, y = np.mgrid[0:MAP_SIZE, 0:MAP_SIZE]
geophysics_layer = 255 * np.exp(-((x - 200)**2 / 8000 + (y - 250)**2 / 8000))
geophysics_layer += np.random.normal(0, 10, size=(MAP_SIZE, MAP_SIZE)) # Add some noise

# Layer 3: Structural Feature (e.g., a fault line)
structure_layer = np.zeros((MAP_SIZE, MAP_SIZE))
structure_layer[248:252, 100:400] = 1 # A horizontal fault segment
structure_layer = np.maximum(0, structure_layer + np.random.normal(0, 0.1, size=(MAP_SIZE, MAP_SIZE)))

# Known Mineral Deposits (for training the model)
# These are placed in areas where our features are favorable
known_deposits = np.array([
    [190, 260], [210, 240], [200, 250], [220, 255], [180, 245],
    [250, 150], [250, 200], [250, 300], [250, 350] # Along the fault
])

### 3. Feature Engineering

We will now build a training dataset. We'll extract the data from our layers at:
1.  The locations of the `known_deposits` (Positive Samples, Target = 1).
2.  Random locations that are not near known deposits (Negative Samples, Target = 0).

In [None]:
# Create Positive Samples
positive_samples = []
for x, y in known_deposits:
    positive_samples.append({
        'geo_val': geology_layer[x, y],
        'gph_val': geophysics_layer[x, y],
        'str_val': structure_layer[x, y],
        'target': 1
    })

# Create Negative Samples
negative_samples = []
num_negative = len(positive_samples) * 3 # Create more negative samples for a balanced dataset
while len(negative_samples) < num_negative:
    x, y = np.random.randint(0, MAP_SIZE, size=2)
    # Ensure we are not sampling too close to a known deposit
    is_far_enough = True
    for dx, dy in known_deposits:
        if np.sqrt((x-dx)**2 + (y-dy)**2) < 20:
            is_far_enough = False
            break
    if is_far_enough:
        negative_samples.append({
            'geo_val': geology_layer[x, y],
            'gph_val': geophysics_layer[x, y],
            'str_val': structure_layer[x, y],
            'target': 0
        })

# Combine into a single DataFrame
training_data = pd.DataFrame(positive_samples + negative_samples)

print(f"Created {len(positive_samples)} positive and {len(negative_samples)} negative samples.")
training_data.head()

### 4. Model Training

We will now train a Random Forest Classifier on our dataset. This model is an ensemble of decision trees and is very effective for this type of classification task.

In [None]:
# Prepare data for scikit-learn
X = training_data[['geo_val', 'gph_val', 'str_val']]
y = training_data['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

# Initialize and train the classifier
rf_model = RandomForestClassifier(n_estimators=100, random_state=42, class_weight='balanced')
rf_model.fit(X_train, y_train)

# Evaluate the model
y_pred = rf_model.predict(X_test)
print(f"Model Accuracy: {accuracy_score(y_test, y_pred):.2f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

### 5. Prospectivity Map Generation

Now for the exciting part. We apply our trained model to the entire map area. The model will predict the probability of mineralization for every single pixel, creating our final prospectivity map.

In [None]:
# Create a feature set for the entire map
map_features = pd.DataFrame({
    'geo_val': geology_layer.flatten(),
    'gph_val': geophysics_layer.flatten(),
    'str_val': structure_layer.flatten()
})

# Predict probabilities for the entire map
prospectivity_probabilities = rf_model.predict_proba(map_features)[:, 1]

# Reshape the probabilities back into a 2D map
prospectivity_map = prospectivity_probabilities.reshape(MAP_SIZE, MAP_SIZE)

### 6. Visualization

Let's visualize our input data layers and the final prospectivity map. The prospectivity map should highlight the areas our model thinks are most likely to contain mineral deposits.

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(14, 12))
fig.suptitle('Prospectivity Mapping for Rare Earth Minerals', fontsize=16)

# Plot Geology Layer
sns.heatmap(geology_layer, ax=axes[0, 0], cmap='viridis', cbar=False)
axes[0, 0].set_title('Layer 1: Geology')
axes[0, 0].set_xticks([])
axes[0, 0].set_yticks([])

# Plot Geophysics Layer
sns.heatmap(geophysics_layer, ax=axes[0, 1], cmap='inferno', cbar=False)
axes[0, 1].set_title('Layer 2: Geophysics (Anomaly)')
axes[0, 1].set_xticks([])
axes[0, 1].set_yticks([])

# Plot Structure Layer
sns.heatmap(structure_layer, ax=axes[1, 0], cmap='binary', cbar=False)
axes[1, 0].set_title('Layer 3: Structure (Fault)')
axes[1, 0].set_xticks([])
axes[1, 0].set_yticks([])

# Plot Final Prospectivity Map
im = axes[1, 1].imshow(prospectivity_map, cmap='hot_r', interpolation='nearest')
axes[1, 1].scatter(known_deposits[:, 1], known_deposits[:, 0], marker='*', color='cyan', s=100, label='Known Deposits')
axes[1, 1].legend()
axes[1, 1].set_xticks([])
axes[1, 1].set_yticks([])
fig.colorbar(im, ax=axes[1, 1], fraction=0.046, pad=0.04).set_label('Probability of Mineralization')

plt.tight_layout(rect=[0, 0.03, 1, 0.95])
plt.show()

### 7. Conclusion

The final map clearly shows that the model has learned the patterns from the training data. The highest probabilities (hottest colors) are concentrated around the geophysical anomaly and along the fault line, which aligns with where we placed our synthetic known deposits. 

This notebook provides a complete, albeit simplified, template for applying machine learning to mineral exploration. By replacing the synthetic data with real-world geospatial datasets, exploration teams can create powerful, data-driven tools to guide their search for valuable resources like rare earth elements.