<a href="https://colab.research.google.com/github/fjadidi2001/Cyber-Attack-Detection/blob/main/SatelliteImageEnvironment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Vegetation Monitoring with Sentinel-2 RGB Dataset Using Classical Computer Vision

## Project Overview
Develop a vegetation monitoring system using the Sentinel-2 RGB captioned dataset with classical computer vision techniques to analyze vegetation cover, health, and changes over time.

## Dataset Information
**Dataset**: `sshh12/sentinel-2-rgb-captioned` from Hugging Face
- **Content**: Pre-processed Sentinel-2 RGB images with captions
- **Format**: RGB images (Red, Green, Blue bands)
- **Advantages**: Clean, pre-processed data with descriptive captions
- **Focus**: Vegetation analysis using visible spectrum

## Detailed Workflow

### Phase 1: Dataset Setup & Exploration ( 1)
**Objectives:**
- Load and explore the Sentinel-2 RGB dataset
- Understand data structure and captions
- Set up vegetation monitoring framework

**Tasks:**
1. **Dataset Loading**
   ```python
   from datasets import load_dataset
   ds = load_dataset("sshh12/sentinel-2-rgb-captioned")
   ```
2. **Data Exploration**
   - Analyze image dimensions and RGB channel distributions
   - Study caption content for vegetation-related keywords
   - Create sample visualizations of different vegetation types
3. **Environment Setup**
   - Install libraries: OpenCV, scikit-image, matplotlib, pandas, numpy
   - Set up project structure for vegetation analysis

### Phase 2: Vegetation-Focused Preprocessing ( 2)
**Objectives:**
- Enhance RGB images for vegetation analysis
- Extract vegetation-specific features from limited spectral bands

**Classical CV Techniques:**
1. **RGB Enhancement for Vegetation**
   - Histogram equalization on individual channels
   - Contrast Limited Adaptive Histogram Equalization (CLAHE)
   - Color space conversions (RGB → HSV, RGB → LAB)

2. **Vegetation Index Approximation**
   - **Visible Atmospherically Resistant Index (VARI)**: (Green - Red) / (Green + Red - Blue)
   - **Green Leaf Index (GLI)**: (2×Green - Red - Blue) / (2×Green + Red + Blue)
   - **Red-Green Ratio**: Red/Green for vegetation stress detection

3. **Color-Based Vegetation Enhancement**
   - Green channel enhancement
   - Color thresholding for vegetation masking
   - HSV-based vegetation extraction

### Phase 3: Vegetation Feature Extraction ( 3)
**Objectives:**
- Extract vegetation-specific features from RGB imagery

**Classical CV Techniques:**
1. **Color-Based Vegetation Features**
   - **Green Dominance Analysis**: Quantify green pixel distribution
   - **Color Moment Analysis**: Mean, variance, skewness of each channel
   - **Color Histogram Features**: Vegetation-specific color patterns

2. **Texture Analysis for Vegetation**
   - **GLCM on Green Channel**: Vegetation texture characterization
   - **Local Binary Patterns (LBP)**: Forest vs. grassland texture differentiation
   - **Gabor Filters**: Directional texture analysis for crop patterns

3. **Morphological Features**
   - **Vegetation Boundary Detection**: Canny edge detection on green-enhanced images
   - **Shape Analysis**: Contour analysis for vegetation patches
   - **Canopy Structure**: Morphological operations to identify tree crowns

4. **Spatial Vegetation Patterns**
   - **Vegetation Density Maps**: Green pixel density analysis
   - **Patch Size Distribution**: Connected component analysis
   - **Fragmentation Metrics**: Edge-to-area ratios

### Phase 4: Vegetation Classification & Segmentation ( 4)
**Objectives:**
- Classify different vegetation types and health conditions

**Vegetation Categories:**
- Dense Forest
- Sparse Forest/Woodland
- Grassland/Shrubland
- Agricultural Crops
- Stressed/Unhealthy Vegetation
- Non-Vegetation (Urban, Water, Bare Soil)

**Classical CV Techniques:**
1. **Color-Based Segmentation**
   - **K-means Clustering**: Separate vegetation types by color characteristics
   - **HSV Thresholding**: Isolate healthy green vegetation
   - **Watershed Segmentation**: Separate individual vegetation patches

2. **Machine Learning Classification**
   - **Support Vector Machine (SVM)**: Multi-class vegetation classification
   - **Random Forest**: Combine multiple vegetation features
   - **Decision Trees**: Interpretable vegetation health assessment

3. **Rule-Based Classification**
   - **Vegetation Index Thresholding**: VARI and GLI-based classification
   - **Color Rule Sets**: IF-THEN rules for vegetation types
   - **Multi-criteria Decision**: Combine color, texture, and shape features

### Phase 5: Vegetation Health Assessment ( 5)
**Objectives:**
- Assess vegetation health and stress conditions

**Classical CV Approaches:**
1. **Health Indicators from RGB**
   - **Greenness Assessment**: Green channel intensity analysis
   - **Color Deviation Analysis**: Deviation from healthy vegetation colors
   - **Browning Detection**: Red/Brown pixel identification for stress

2. **Vegetation Vigor Analysis**
   - **VARI Trend Analysis**: Vegetation activity and vigor
   - **Seasonal Color Changes**: Multi-temporal color analysis
   - **Stress Pattern Recognition**: Identify yellowing/browning patterns

3. **Canopy Analysis**
   - **Canopy Coverage**: Percentage of vegetation cover
   - **Canopy Density**: Pixel intensity-based density estimation
   - **Gap Analysis**: Identify clearings and deforestation

### Phase 6: Temporal Vegetation Analysis ( 6)
**Objectives:**
- Monitor vegetation changes over time using available temporal data

**Change Detection Methods:**
1. **RGB-Based Change Detection**
   - **Image Differencing**: Compare vegetation indices across time
   - **Color Change Analysis**: Track color shifts indicating phenology
   - **Threshold-Based Change**: Binary change detection

2. **Vegetation Trend Analysis**
   - **Greenness Trends**: Long-term vegetation health trends
   - **Seasonal Pattern Recognition**: Identify phenological cycles
   - **Disturbance Detection**: Identify sudden vegetation loss

### Phase 7: Validation & Results ( 7)
**Objectives:**
- Validate results and create comprehensive vegetation analysis

**Validation Methods:**
1. **Caption-Based Validation**
   - Use image captions to validate classification results
   - Cross-reference vegetation descriptions with analysis
   - Accuracy assessment using caption keywords

2. **Visual Validation**
   - Expert interpretation of results
   - Comparison with known vegetation patterns
   - Ground truth validation where available

## Technical Implementation Stack

### Core Libraries
```python
# Data handling
from datasets import load_dataset
import pandas as pd
import numpy as np

# Image processing
import cv2
from skimage import filters, segmentation, measure, morphology
from skimage.feature import graycomatrix, graycoprops, local_binary_pattern

# Machine learning
from sklearn.cluster import KMeans
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
```

### Key Vegetation Algorithms
1. **VARI Calculation**: `(Green - Red) / (Green + Red - Blue)`
2. **GLI Calculation**: `(2*Green - Red - Blue) / (2*Green + Red + Blue)`
3. **Green Dominance**: `Green / (Red + Green + Blue)`
4. **Vegetation Masking**: HSV-based green extraction
5. **Canopy Coverage**: Green pixel percentage calculation

## Vegetation-Specific Features to Extract

### Color Features
- Mean, std, skewness of R, G, B channels
- VARI and GLI vegetation indices
- Green dominance ratio
- HSV color moments
- Color histogram bins

### Texture Features
- GLCM properties (contrast, dissimilarity, homogeneity, energy)
- LBP histogram for vegetation texture
- Gabor filter responses for directional patterns

### Morphological Features
- Vegetation patch area and perimeter
- Compactness and roundness of vegetation areas
- Edge density within vegetation regions

## Expected Vegetation Classification Results

### Vegetation Types to Identify
1. **Dense Forest**: High green intensity, coarse texture
2. **Open Woodland**: Moderate green, mixed texture
3. **Grassland**: Uniform green, fine texture
4. **Cropland**: Regular patterns, seasonal color changes
5. **Stressed Vegetation**: Yellow/brown tones, reduced green intensity
6. **Mixed Vegetation**: Varied color and texture patterns

### Performance Metrics
- Overall classification accuracy > 80%
- Vegetation vs. non-vegetation accuracy > 90%
- Healthy vs. stressed vegetation accuracy > 75%
- F1-score per vegetation class > 0.7

## Sample Code Structure

```python
# 1. Dataset loading and exploration
ds = load_dataset("sshh12/sentinel-2-rgb-captioned")
explore_vegetation_dataset(ds)

# 2. Preprocessing
enhanced_images = preprocess_for_vegetation(ds['image'])
vegetation_indices = calculate_vegetation_indices(enhanced_images)

# 3. Feature extraction
color_features = extract_color_features(enhanced_images)
texture_features = extract_texture_features(enhanced_images)
vegetation_features = combine_features(color_features, texture_features, vegetation_indices)

# 4. Classification
vegetation_classifier = train_vegetation_classifier(vegetation_features, labels)
vegetation_map = classify_vegetation(test_images)

# 5. Analysis and visualization
analyze_vegetation_health(vegetation_map)
create_vegetation_visualizations(results)
```

## Final Deliverables
1. **Vegetation Classification System**: Automated vegetation type identification
2. **Vegetation Health Assessment Tool**: RGB-based health monitoring
3. **Vegetation Coverage Analysis**: Quantitative vegetation coverage metrics
4. **Temporal Vegetation Monitoring**: Change detection capabilities
5. **Comprehensive Report**: Methodology, results, and vegetation insights
6. **Interactive Visualizations**: Vegetation maps and health indicators

## Success Criteria
- Accurate vegetation type classification using only RGB data
- Effective vegetation health assessment from color analysis
- Reliable vegetation change detection over time
- Clear visualization of vegetation patterns and trends
- Validation against image captions and expert knowledge

In [2]:
!pip install mlcroissant

Collecting mlcroissant
  Downloading mlcroissant-1.0.17-py2.py3-none-any.whl.metadata (10 kB)
Collecting jsonpath-rw (from mlcroissant)
  Downloading jsonpath-rw-1.4.0.tar.gz (13 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting rdflib (from mlcroissant)
  Downloading rdflib-7.1.4-py3-none-any.whl.metadata (11 kB)
Downloading mlcroissant-1.0.17-py2.py3-none-any.whl (141 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m141.4/141.4 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading rdflib-7.1.4-py3-none-any.whl (565 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m565.1/565.1 kB[0m [31m28.1 MB/s[0m eta [36m0:00:00[0m
[?25hBuilding wheels for collected packages: jsonpath-rw
  Building wheel for jsonpath-rw (setup.py) ... [?25l[?25hdone
  Created wheel for jsonpath-rw: filename=jsonpath_rw-1.4.0-py3-none-any.whl size=15127 sha256=ed64276709c68af7de0fd34ad26873f225111d948da1816ee465b7a7a7d5f403
  Stored in directory: /

In [3]:
from mlcroissant import Dataset

# The Croissant metadata exposes the first 5GB of this dataset
ds = Dataset(jsonld="https://huggingface.co/api/datasets/sshh12/sentinel-2-rgb-captioned/croissant")
records = ds.records("default")

  -  [Metadata(sentinel-2-rgb-captioned)] Property "http://mlcommons.org/croissant/citeAs" is recommended, but does not exist.
  -  [Metadata(sentinel-2-rgb-captioned)] Property "https://schema.org/datePublished" is recommended, but does not exist.
  -  [Metadata(sentinel-2-rgb-captioned)] Property "https://schema.org/version" is recommended, but does not exist.
