
# Week 1 Project — Climate Risk & Disaster Management  
## Dataset: Heatwave Risk in India

This notebook completes the **Week 1 Project** up to the **data understanding** stage.

**Steps:**  
1. Import necessary libraries  
2. Load the dataset (Heatwave Risk in India)  
3. Explore the dataset with:  
   - `.info()`  
   - `.describe()`  
   - `.isnull().sum()` (to check missing values)



## Instructions

1. Download the dataset from Kaggle manually:  
   👉 https://www.kaggle.com/datasets/s3programmer/heatwave-risk-in-india  

   OR use the **Kaggle API cell below** to download automatically.

2. Place the CSV file (`Heatwave_Risk_India.csv`) into the `data/` folder if downloaded manually.

3. Run each cell step by step.

4. Save  notebook when finished.



## 🔄 Optional: Download dataset automatically with Kaggle API

If you don’t want to manually download from Kaggle, run the cell below **after placing your `kaggle.json` API key**:

- On Colab: upload your `kaggle.json` file when prompted.  
- On local (Windows/Linux/Mac): put `kaggle.json` into `~/.kaggle/` (Linux/Mac) or `C:\Users\<you>\.kaggle\` (Windows).  


In [None]:

# Install Kaggle API (skip if already installed)
!pip install kaggle -q

# For Colab users: upload kaggle.json here
#from google.colab import files
#files.upload()  # uncomment if running on Colab to upload kaggle.json

# Create kaggle directory if it doesn't exist
!mkdir -p ~/.kaggle

# If uploaded in Colab, copy to ~/.kaggle
#!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

# Download Flood Risk in India dataset into data/ folder
!kaggle datasets download -d s3programmer/heatwave-risk-in-india -p data/ --unzip

print("✅ Dataset downloaded into data/ folder")


In [None]:

# Step 1: Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os

print("Libraries imported successfully")


In [None]:

# Step 2: Set dataset path (adjust if filename differs)
DATA_FILE = "data/Flood_Risk_India.csv"

if not os.path.exists(DATA_FILE):
    print(f"⚠️ File not found: {DATA_FILE}. Please place the dataset CSV in the data/ folder.")
else:
    print(f"Found dataset: {DATA_FILE}")


In [None]:

# Helper: List available files in data/
for root, dirs, files in os.walk("data"):
    for file in files:
        print(os.path.join(root, file))


In [None]:

# Step 2: Load dataset
if os.path.exists(DATA_FILE):
    try:
        df = pd.read_csv(DATA_FILE)
    except UnicodeDecodeError:
        df = pd.read_csv(DATA_FILE, encoding="latin-1")
    except Exception as e:
        raise e

    print("Dataset loaded successfully!")
    print("Shape:", df.shape)
    display(df.head())
else:
    print("Please download from Kaggle and set DATA_FILE correctly.")


In [None]:

# Step 3a: Dataset info
if 'df' in globals():
    df.info()
else:
    print("Dataset not loaded yet.")


In [None]:

# Step 3b: Dataset statistics
if 'df' in globals():
    display(df.describe(include='all'))
else:
    print("Dataset not loaded yet.")


In [None]:

# Step 3c: Missing values check
if 'df' in globals():
    missing = df.isnull().sum().sort_values(ascending=False)
    display(missing)
else:
    print("Dataset not loaded yet.")



## (Optional Next Steps)
- Plot histograms or boxplots for numeric columns.  
- Check duplicates: `df.duplicated().sum()`  
- Write 4–6 bullet points summarizing what you learned from `.info()`, `.describe()`, and missing values.  
