<a href="https://colab.research.google.com/github/fjadidi2001/Cyber-Attack-Detection/blob/main/SatelliteImageEnvironment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


```markdown
# Vegetation Monitoring Workflow for Remote Sensing Satellite Images

## Dataset Adaptation
**New Dataset**: `umeradnaan/remote-sensing-satellite-images` from Kaggle
- **Structure**: Directory-based organization (likely class folders)
- **Content**: Satellite imagery with land cover classes
- **Key Changes**:
  - No captions → Focus on class labels from directory structure
  - Requires different loading approach
  - Likely contains multiple land cover classes beyond vegetation

## Revised Technical Workflow

### Phase 1: Dataset Setup & Exploration
```python
import os
import cv2
import numpy as np
from sklearn.model_selection import train_test_split

# Load dataset from directory
def load_dataset(path):
    classes = os.listdir(path)
    images = []
    labels = []
    
    for class_idx, class_name in enumerate(classes):
        class_path = os.path.join(path, class_name)
        for img_file in os.listdir(class_path):
            if img_file.endswith(('.png', '.jpg', '.jpeg')):
                img_path = os.path.join(class_path, img_file)
                images.append(img_path)
                labels.append(class_idx)
                
    return images, labels, classes

# Path from Kaggle download
path = kagglehub.dataset_download("umeradnaan/remote-sensing-satellite-images")
image_paths, labels, class_names = load_dataset(path)

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(
    image_paths, labels, test_size=0.2, random_state=42
)
```

### Phase 2: Vegetation-Focused Preprocessing
**Key Adjustments**:
1. **Standardize image sizes** (critical for satellite imagery)
2. **Atmospheric correction** (simplified)
3. **Shadow reduction** (common in satellite images)

```python
def preprocess_image(img_path, target_size=(256, 256)):
    img = cv2.imread(img_path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = cv2.resize(img, target_size)
    
    # Simplified atmospheric correction
    lab = cv2.cvtColor(img, cv2.COLOR_RGB2LAB)
    l, a, b = cv2.split(lab)
    clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))
    l = clahe.apply(l)
    lab = cv2.merge((l,a,b))
    img = cv2.cvtColor(lab, cv2.COLOR_LAB2RGB)
    
    # Shadow reduction
    hsv = cv2.cvtColor(img, cv2.COLOR_RGB2HSV)
    hsv[:,:,2] = cv2.equalizeHist(hsv[:,:,2])
    return cv2.cvtColor(hsv, cv2.COLOR_HSV2RGB)
```

### Phase 3: Vegetation Feature Extraction
**Enhanced Techniques**:
1. **Advanced Vegetation Indices**:
   ```python
   def calculate_vegetation_indices(rgb):
       r, g, b = rgb[:,:,0], rgb[:,:,1], rgb[:,:,2]
       with np.errstate(divide='ignore', invalid='ignore'):
           # Modified Visible Vegetation Index (MVVI)
           mvvi = (g - 1.3*r) / (g + r - b)
           mvvi = np.nan_to_num(mvvi)
           
           # Triangular Greenness Index (TGI)
           tgi = g - 0.39*r - 0.61*b
       return mvvi, tgi
   ```
2. **Multiscale Texture Analysis**:
   - GLCM at multiple distances (1,3,5 pixels)
   - Rotation-invariant LBP

### Phase 4: Vegetation Classification & Segmentation
**Revised Approach**:
```python
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

# Feature extraction pipeline
def extract_features(images):
    features = []
    for img in images:
        preprocessed = preprocess_image(img)
        mvvi, tgi = calculate_vegetation_indices(preprocessed)
        
        # Color features
        color_features = np.concatenate((
            np.mean(preprocessed, axis=(0,1)),
            np.std(preprocessed, axis=(0,1))
        ))
        
        # Texture features (GLCM example)
        gray = cv2.cvtColor(preprocessed, cv2.COLOR_RGB2GRAY)
        glcm = graycomatrix(gray, distances=[1], angles=[0], symmetric=True, normed=True)
        contrast = graycoprops(glcm, 'contrast')[0,0]
        
        features.append(np.hstack([color_features, mvvi.mean(), tgi.mean(), contrast]))
    return np.array(features)

# Train classifier
train_features = extract_features(X_train)
classifier = make_pipeline(
    StandardScaler(),
    RandomForestClassifier(n_estimators=100, class_weight='balanced')
)
classifier.fit(train_features, y_train)
```

### Phase 5: Vegetation Health Assessment
**Key Metrics**:
1. **Vegetation Vigor Index**:
   ```python
   def calculate_vigor(img):
       _, tgi = calculate_vegetation_indices(img)
       return np.percentile(tgi, 75)  # Use 75th percentile to ignore outliers
   ```
2. **Stress Detection**:
   - Color clustering in CIELAB space
   - Brown/Yellow pixel ratio

### Phase 6: Temporal Analysis (If Multiple Timestamps)
**Implementation Strategy**:
1. Organize images by location ID
2. Compute vegetation index time series
3. Use change vector analysis:
   ```python
   def detect_change(img1, img2):
       mvvi1, _ = calculate_vegetation_indices(img1)
       mvvi2, _ = calculate_vegetation_indices(img2)
       change = mvvi2 - mvvi1
       return np.abs(change) > 0.2  # Empirical threshold
   ```

### Phase 7: Validation
**New Approach**:
```python
from sklearn.metrics import accuracy_score, f1_score

# Test evaluation
test_features = extract_features(X_test)
y_pred = classifier.predict(test_features)

print("Accuracy:", accuracy_score(y_test, y_pred))
print("F1 Score:", f1_score(y_test, y_pred, average='weighted'))

# Vegetation-specific evaluation
vegetation_indices = [i for i, name in enumerate(class_names) if 'vegetation' in name.lower()]
vegetation_mask = np.isin(y_test, vegetation_indices)
print("Vegetation Classification Accuracy:",
      accuracy_score(np.array(y_test)[vegetation_mask],
      np.array(y_pred)[vegetation_mask]))
```

## Technical Adjustments for Satellite Imagery

1. **Optimal Feature Set**:
```python
FEATURE_SET = [
    'mean_R', 'mean_G', 'mean_B',
    'std_R', 'std_G', 'std_B',
    'MVVI_mean', 'TGI_mean',
    'GLCM_contrast', 'GLCM_dissimilarity',
    'vegetation_cover_ratio'
]
```

2. **Vegetation-Specific Processing**:
```python
def create_vegetation_mask(img):
    hsv = cv2.cvtColor(img, cv2.COLOR_RGB2HSV)
    # Green color range in HSV
    lower_green = np.array([35, 40, 40])
    upper_green = np.array([85, 255, 255])
    return cv2.inRange(hsv, lower_green, upper_green)
```

3. **Performance Optimization**:
- Implement image tiling for large satellite images
- Use OpenCV UMat for GPU acceleration
- Apply pyramid downsampling for initial exploration

## Revised Implementation Stack

```python
# Core libraries
import cv2          # OpenCV 4.x
import numpy as np
from skimage.feature import graycomatrix, graycoprops

# Machine learning
from sklearn.ensemble import RandomForestClassifier
from sklearn.cluster import KMeans

# Parallel processing
from joblib import Parallel, delayed

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
```

## Critical Success Factors

1. **Class Mapping Strategy**:
```python
VEGETATION_CLASSES = {
    'forest': ['evergreen_forest', 'deciduous_forest'],
    'agriculture': ['crops', 'farmland'],
    'grassland': ['meadow', 'pasture'],
    'stressed': ['dried_vegetation', 'burnt_areas']
}
```

2. **Validation Approach**:
- Stratified sampling by land cover class
- Visual validation with NDVI comparisons (if other data sources available)
- Confusion matrix analysis per vegetation type

3. **Performance Targets**:
- >85% accuracy for vegetation vs non-vegetation
- >75% F1-score for vegetation sub-types
- <15% false positives in change detection

## Execution Plan

1. **Phase 1** (2 days): Dataset organization and exploratory analysis
2. **Phase 2-3** (3 days): Feature engineering pipeline
3. **Phase 4** (2 days): Classifier training and optimization
4. **Phase 5-6** (3 days): Health assessment and temporal analysis
5. **Phase 7** (2 days): Validation and reporting

## Key Advantages of Revised Workflow
1. Handles directory-based dataset organization
2. Robust to atmospheric distortions in satellite imagery
3. Implements satellite-specific vegetation indices
4. Includes class imbalance handling
5. Optimized for medium-resolution satellite data
6. GPU acceleration support through OpenCV
```

This revised workflow:
1. Adapts to the Kaggle dataset's directory structure
2. Uses satellite-specific preprocessing techniques
3. Implements robust vegetation indices for RGB imagery
4. Includes class imbalance handling strategies
5. Optimizes for medium-resolution satellite data
6. Provides clear validation metrics
7. Adds temporal analysis capabilities
8. Includes GPU acceleration options

The workflow maintains focus on classical computer vision while addressing the unique characteristics of satellite imagery and directory-based dataset organization.

In [1]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("mahmoudreda55/satellite-image-classification")

print("Path to dataset files:", path)

Path to dataset files: /kaggle/input/satellite-image-classification
