# Step 1: Data Preprocessing (NASA FIRMS Fire Data)

In this step, we process the raw NASA FIRMS satellite dataset.  
We perform the following tasks:
- Select important columns (`latitude`, `longitude`, `acq_date`, `acq_time`)
- Convert acquisition date & time into proper formats
- Handle missing values
- Save the cleaned dataset as **fires_clean.csv**

This prepares the fire dataset for further analysis.


STEP 1: Load and Clean Raw Fire Data

In [1]:
import pandas as pd

df_raw = pd.read_csv("/content/fire_nrt_J1V-C2_565335.csv")
print(df_raw.shape)
print(df_raw.columns)
df_raw.head()

(87154, 14)
Index(['latitude', 'longitude', 'brightness', 'scan', 'track', 'acq_date',
       'acq_time', 'satellite', 'instrument', 'confidence', 'version',
       'bright_t31', 'frp', 'daynight'],
      dtype='object')


Unnamed: 0,latitude,longitude,brightness,scan,track,acq_date,acq_time,satellite,instrument,confidence,version,bright_t31,frp,daynight
0,70.39933,68.27425,323.29,0.32,0.55,2024-11-01,43.0,N20,VIIRS,n,2.0NRT,258.44,3.14,N
1,70.39788,68.27815,338.5,0.32,0.55,2024-11-01,43.0,N20,VIIRS,n,2.0NRT,260.52,3.08,N
2,70.38014,68.293,320.24,0.32,0.55,2024-11-01,43.0,N20,VIIRS,n,2.0NRT,254.21,1.92,N
3,65.7707,24.18973,323.35,0.41,0.37,2024-11-01,48.0,N20,VIIRS,n,2.0NRT,266.97,1.64,N
4,65.76741,24.18719,338.44,0.41,0.37,2024-11-01,48.0,N20,VIIRS,n,2.0NRT,269.87,5.8,N


STEP 2: Clean Fire Data

In [2]:
import pandas as pd

# Load your uploaded file
df = pd.read_csv("/content/fire_nrt_J1V-C2_565335.csv")

# Keep only important columns
df = df[["latitude", "longitude", "acq_date", "acq_time"]].copy()

# Parse acquisition date
df["acq_date"] = pd.to_datetime(df["acq_date"], errors="coerce")

# Format acquisition time to HH:MM
df["acq_time"] = df["acq_time"].astype(str).str.zfill(4).str[:2] + ":" + df["acq_time"].astype(str).str.zfill(4).str[2:]

# Add simple date column
df["date"] = df["acq_date"].dt.date

# Drop missing values
df = df.dropna(subset=["latitude", "longitude", "acq_date"])

# Save cleaned dataset
df.to_csv("fires_clean.csv", index=False)

print("Cleaned data saved as fires_clean.csv")
print("Rows:", len(df))
print("Date range:", df['date'].min(), "→", df['date'].max())
print("Latitude range:", df['latitude'].min(), "→", df['latitude'].max())
print("Longitude range:", df['longitude'].min(), "→", df['longitude'].max())



Cleaned data saved as fires_clean.csv
Rows: 823481
Date range: 2024-11-01 → 2024-11-19
Latitude range: -53.20483 → 76.66631
Longitude range: -175.07471 → 179.60384
