
# 📍 Indian District-Wise Population Change Analysis (2001-2011)

## 🚀 Project Overview

India's population landscape is incredibly diverse, shaped by urbanization, migration, fertility, and state-level policies. Between 2001 and 2011, significant demographic shifts occurred, making it critical to understand how population growth varied across the country.

This project aims to perform an insightful statistical and geospatial analysis of district-level population changes across India using R and Jupyter Notebook.

### Project Objectives

- Calculate and analyze district-wise population growth rates between 2001 and 2011.
- Visualize which districts experienced the highest and lowest growth rates.
- Perform clustering to reveal spatial groupings of districts based on growth patterns.
- Explore state-wise growth trends to uncover broader regional dynamics.
- Provide simple projections for 2021 populations (bonus).
- Interpret and summarize key demographic insights for policymakers and stakeholders.

By leveraging statistical techniques, clustering algorithms, and interactive maps, this notebook serves as a powerful portfolio project demonstrating both technical and analytical skills.


In [None]:

# Load required libraries
library(tidyverse)
library(ggplot2)
library(sf)
library(cluster)
library(leaflet)
library(dendextend)


In [None]:

# Load dataset
data <- read.csv("district wise population and centroids.csv")
head(data)


In [None]:

# Check for missing values
sum(is.na(data))

# Calculate population growth percentage
data <- data %>%
  mutate(Growth_Percent = ((Population.in.2011 - Population.in.2001) / Population.in.2001) * 100)


In [None]:

# Descriptive statistics
summary(data$Population.in.2001)
summary(data$Population.in.2011)
summary(data$Growth_Percent)

# Top and Bottom 10 districts
top10_growth <- data %>% arrange(desc(Growth_Percent)) %>% head(10)
bottom10_growth <- data %>% arrange(Growth_Percent) %>% head(10)

top10_growth
bottom10_growth

# Correlation analysis
cor(data$Latitude, data$Growth_Percent)
cor(data$Longitude, data$Growth_Percent)



### 📊 Interpretation of Statistical Analysis Results

- **Population growth rates varied widely across districts**. Some districts grew rapidly while others had slow or even negative growth.
- **High growth districts were not exclusively urban**. Many rapidly growing districts were semi-urban or rural areas with higher fertility rates or inward migration. For example, Malda (West Bengal) had significant growth despite low urbanization.
- **Low growth districts were not always rural**. Some established or highly urban districts like Kolkata recorded lower population growth, which may be linked to factors such as lower fertility rates or population saturation.
- **Correlation with Latitude and Longitude**:
  - The correlation coefficients were close to zero, indicating **no strong linear relationship** between a district's geographical position (north-south or east-west) and its population growth rate.
  - This means that **geography alone does not explain population growth**. Demographic factors, socio-economic dynamics, and migration trends have a much greater influence.

Overall, this analysis reinforces that population growth patterns in India are influenced by complex factors rather than simple urban-rural or geographic categorizations.


In [None]:

# Bar Plot - Top 10 Growing Districts
ggplot(top10_growth, aes(x = reorder(District, Growth_Percent), y = Growth_Percent)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  coord_flip() +
  labs(title = "Top 10 Districts by Population Growth", x = "District", y = "Growth %") +
  theme_minimal()

# Bar Plot - Bottom 10 Growing Districts
ggplot(bottom10_growth, aes(x = reorder(District, Growth_Percent), y = Growth_Percent)) +
  geom_bar(stat = "identity", fill = "tomato") +
  coord_flip() +
  labs(title = "Bottom 10 Districts by Population Growth", x = "District", y = "Growth %") +
  theme_minimal()

# Scatter plot
ggplot(data, aes(x = Latitude, y = Growth_Percent)) +
  geom_point(color = "purple") +
  labs(title = "Population Growth vs Latitude", x = "Latitude", y = "Growth %") +
  theme_minimal()


In [None]:

# K-means clustering
data$Growth_Scaled <- scale(data$Growth_Percent)
set.seed(42)
kmeans_result <- kmeans(data$Growth_Scaled, centers = 3)
data$Cluster <- as.factor(kmeans_result$cluster)

# Visualize clusters
ggplot(data, aes(x = Longitude, y = Latitude, color = Cluster)) +
  geom_point(size = 3) +
  labs(title = "Spatial Clusters of Districts by Growth Rate", x = "Longitude", y = "Latitude") +
  theme_minimal()



### 📌 Interpretation of Spatial Clustering Results

- **Red (Cluster 1)** → Fastest growing districts (urban, economic hubs)
- **Green (Cluster 2)** → Moderate growth (semi-urban, stable regions)
- **Blue (Cluster 3)** → Low/negative growth (rural or declining regions)


In [None]:

# Fixed Leaflet Map with Cluster Colors
cluster_colors <- c("1" = "red", "2" = "green", "3" = "blue")
data$Cluster_Color <- cluster_colors[as.character(data$Cluster)]

leaflet(data) %>%
  addTiles() %>%
  addCircleMarkers(~Longitude, ~Latitude, 
                   popup = ~paste("District:", District, "<br>Growth %:", round(Growth_Percent, 2)),
                   color = ~Cluster_Color, radius = 5, fillOpacity = 0.7) %>%
  addLegend("bottomright", colors = c("red", "green", "blue"), 
            labels = c("Cluster 1", "Cluster 2", "Cluster 3"), title = "Clusters")



### 🗺️ Interpretation of Interactive Leaflet Map

Explore districts by hovering. Color shows growth rate cluster:

- **Red** → Fastest growing districts
- **Green** → Moderate growth
- **Blue** → Low/negative growth



## 📍 Additional Analysis: Population Growth by State


In [None]:

# State-wise Average Growth
state_growth <- data %>%
  group_by(State) %>%
  summarise(Average_Growth_Percent = mean(Growth_Percent, na.rm = TRUE)) %>%
  arrange(desc(Average_Growth_Percent))

head(state_growth, 10)
tail(state_growth, 10)

ggplot(state_growth, aes(x = reorder(State, Average_Growth_Percent), y = Average_Growth_Percent)) +
  geom_bar(stat = "identity", fill = "darkorange") +
  coord_flip() +
  labs(title = "Average District Population Growth by State (2001-2011)", x = "State", y = "Average Growth %") +
  theme_minimal()



### 📌 Interpretation of State-wise Growth

- States with the highest average district growth are not always highly urbanized. Some may be experiencing rapid population increases due to factors like higher birth rates, rural expansion, or internal migration rather than urban economic development alone.
- Similarly, lower growth states may not only be rural or facing migration outflows — they could also have stable or aging populations, or lower birth rates.
- Population growth at the state level reflects a complex mix of demographics, local migration patterns, fertility rates, and economic factors, not just urbanization.

This highlights the importance of avoiding simplistic assumptions and instead looking at state-specific contexts to understand growth patterns.


In [None]:

# Predict population for 2021
data <- data %>%
  mutate(Population.in.2021 = Population.in.2011 + (Population.in.2011 - Population.in.2001))

head(data[,c("District", "Population.in.2001", "Population.in.2011", "Population.in.2021")])



### 📈 Interpretation of 2021 Prediction

Simple linear extrapolation gives indicative estimates. However, it does not consider complex factors like policy or economic changes → treat predictions cautiously.



# 📌 Final Conclusion and Insights

### 📊 District Level Patterns

- **High Growth Districts**: These districts are not universally urban but reflect a mixture of rapidly expanding cities, semi-urban areas undergoing development, and rural regions experiencing natural population increases (e.g., higher birth rates). 
- **Low Growth or Negative Growth Districts**: Found largely in established or aging urban centers (where birth rates may be low) and remote or rural districts experiencing migration outflows or stagnation.

### 🧭 Spatial Clustering Findings

- The clustering analysis revealed three distinct groups:
  - **Red (High Growth)**: Rapidly expanding regions (urban or natural growth)
  - **Green (Moderate Growth)**: Balanced growth districts
  - **Blue (Low or Negative Growth)**: Areas requiring further development attention
  
- These clusters highlight regional disparities that can aid planners in targeting interventions more effectively.

### 🏙️ State Level Trends

- Contrary to assumptions, higher population growth at the state level was **not necessarily linked to urbanization**.
- States such as **Meghalaya** and **Arunachal Pradesh**, which are less urbanized, saw high growth rates due to natural population increases and other socio-demographic factors.
- Highly urbanized states like **Kerala** and **Goa** recorded relatively low population growth, showcasing that economic development and urbanization may correlate with slower growth in some cases due to lower fertility rates and aging populations.

### 🧹 Forecasting and Limitations

- Simple linear extrapolation for 2021 populations provided a rough idea of potential district populations.
- However, this method does not account for unpredictable variables such as policy shifts, migration patterns, and economic developments. More advanced models should be used for accurate forecasting.

### 📌 Final Takeaways

- Population growth is a complex phenomenon shaped by a variety of factors beyond mere geography or urbanization.
- This analysis underscores the importance of nuanced, multi-dimensional approaches when examining demographic changes.
- By combining descriptive statistics, clustering, and geospatial visualizations, this project offers a robust view of India's evolving population landscape and serves as a valuable tool for future urban planning, policy formulation, and academic research.

---

This notebook serves as a comprehensive portfolio project that showcases data wrangling, spatial analysis, clustering, and insightful reporting using R in a Jupyter environment.
