# EXPLORING GLOBAL POPULATION TRENDS

**TABLE OF CONTENTS**




# INTRODUCTION

In a world characterized by constant change, understanding population dynamics is essential for policymakers, researchers, and anyone curious about our planet's evolving demographics. This project, "Exploring Global Population Trends," delves into a rich dataset encompassing a wide array of information about countries worldwide, including their populations over several decades, geographical attributes, and more. Through data analysis and visualization, I aim to uncover intriguing insights into the past, present, and potential future of the world's population.


**DATA PATH EXTRACTION**

In [None]:
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))


**IMPORTING LIBRARIES**

In [None]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

***Numpy***

Function: Numerical Python (NumPy) is a library for numerical and mathematical operations.

***Pandas***

Function: Pandas is a library for data manipulation and analysis.

***Matplotlib.pyplot***

Function: Matplotlib is a plotting library for creating static, animated, or interactive visualizations in Python.

***Seaborn***

Function: Seaborn is a data visualization library built on top of Matplotlib.


# EXPLORATION

**Brief overview of the data set**


In [None]:
df=pd.read_csv("/kaggle/input/world-population-dataset/world_population.csv")
df.head()

**Total number or rows and columns**

In [None]:
df.shape

**Column names**

In [None]:
df.columns

**Column details**

In [None]:
df.info()

**Column Data types**

In [None]:
df.dtypes

**General Overview**

In [None]:
print(df["Country/Territory"].unique())
print(df.Capital.unique())
print(df.Continent.unique())

**DATA CLEANING**

**Renaming column for easy reference**

Remove the word population from the column headings eg 2022 Population to be simply 2022

In [None]:
df.columns = df.columns.str.replace(' Population', '')
df.head()

Change the "Country/Territory" column name to "Country"

In [None]:
df.rename(columns={'Country/Territory':'Country'}, inplace=True)

**Check for Null and Duplicate Entries**

Check for null entries

In [None]:
df.isnull().sum()

Check for duplicate entries

In [None]:
df.duplicated().sum()

**Convert growth rate to percent**


In [None]:
df["Growth Rate"] = (df["Growth Rate"] - 1) * 100
df[["Country", "Growth Rate"]].head()

# ANALYSIS

**Calculate Total Current World Population**

In [None]:
current_world_population = df['2022'].sum()
print ("Current World Population:", current_world_population)

**Plot World population Trend**

In [None]:
plt.subplots()
growth = df.iloc[:,5:13].sum()[::-1]
sns.lineplot(x=growth.index, y=growth.values, marker="o")
plt.xticks(rotation=70)
plt.ylabel("Population")
plt.title("World Population Trend (1970-2022)")
plt.show()

**Plot World Population Trend by Continent**

In [None]:
df_continent= df.groupby('Continent').sum().iloc[:,4:12]
df_continent=df_continent.iloc[:,::-1]

# Create line plots
plt.figure(figsize = (10,5))
sns.lineplot(data = df_continent.transpose()) # Transpose to change the axis
plt.title("Population Growth by Continent (1970-2022)")
plt.xlabel("Year")
plt.ylabel("Population")
plt.legend(title = "Continent")
plt.show()



**Calculate percentage of each continent in Total Population**

In [None]:
cont_data_percentage= df.groupby("Continent")['World Percentage'].sum().round (1).sort_values (ascending= False).reset_index()
cont_data_percentage

**Create labels for Pie Chart**

In [None]:
labels = cont_data_percentage ['Continent']
labels

In [None]:
pp= cont_data_percentage ["World Percentage"]
pp

**Plot pie chart to show Population Distribution by Continent**

In [None]:
pp.plot.pie(labels=labels,autopct='%1.2f%%')
plt.title ("Continent Population")

**Plot Barchats to show 10 Most and Least Populated Countries**

In [None]:
cont_population = df_continent.copy ().iloc[:,1:9]
cont_population= cont_population.iloc[:,::-1]

df_sorted=df[['Country','2022']].sort_values('2022',ascending=False)
df_sorted

fig, ax = plt.subplots(2, 1, figsize=(10, 10), constrained_layout=True)
#plot1 
sns.barplot (x="Country",y= "2022",data=df_sorted.head(10),palette= "Blues_r",ax=ax[0])
ax[0].set_title("10 Most Populated Countries 2022")
ax[0].set_xlabel("Countries")
ax[0].set_ylabel("Population\n")
ax[0].set_xticklabels(ax[0].get_xticklabels(), rotation=30)

#plot2

sns.barplot (x="Country",y= "2022",data=df_sorted.tail(10),palette= "Blues_r",ax=ax[1])
ax[1].set_title("10 Least Populated Countries 2022")
ax[1].set_xlabel("Countries")
ax[1].set_ylabel("Population\n")
ax[1].set_xticklabels(ax[1].get_xticklabels(), rotation=30)
plt.show()


**Plot Bar  Charts to show top 5 most Populated Countries in Each Continent**

In [None]:
# Get unique continents from your DataFrame
continents = df['Continent'].unique()

# Create a figure with subplots
fig, axes = plt.subplots(2, 3, figsize=(15, 10), constrained_layout = True)

# Loop through continents and create subplots
for i, continent in enumerate(continents):
    row, col = i // 3, i % 3  # Calculate row and column indices for subplots
    ax = axes[row, col]  # Get the current subplot
    
    # Filter data for the current continent
    pop_by_cont = df[df['Continent'] == continent][['Country', '2022']]
    
    # Sort by Population and Get Top 5
    pop_by_cont = pop_by_cont.sort_values('2022', ascending=False)[:5]
    
    # Create a barplot for the current continent
    sns.barplot(x='Country', y='2022', data=pop_by_cont, ax=ax, palette='Blues_r')
    
    # Customize subplot labels and title
    ax.set_xlabel('Country')
    ax.set_ylabel('Population in 2022')
    ax.set_title(f'Top 5 Countries in {continent}')
    ax.tick_params(axis='x', labelrotation=30)  # Rotate x-axis labels for readability

# Display the subplots
plt.show()


**POPULATION GROWTH RATE**

Calculate population growth rate statistics 

In [None]:
# Growth rate statistics
print("Number of countries with positive growth rate:", len(np.where(df["Growth Rate"] > 0)[0]))
print("Number of countries with negative growth rate:", len(np.where(df["Growth Rate"] < 0)[0]))
print("Number of countries with stagnant growth rate:", len(np.where(df["Growth Rate"] == 0)[0]))
print(f"Growth Rate Statistics:\n{df['Growth Rate'].describe()}")

**Plot Countries with highest and Lowest population Growth**

In [None]:
# Countries with highest and lowest population growth 

# Sort countries by growth rate
growth_sorted = df.sort_values("Growth Rate", ascending=False)

# Plot bar chart
fig, ax = plt.subplots(2,1, figsize=(10,10),constrained_layout = True)

# Highest population countries
sns.barplot(x="Country", y="Growth Rate", data=growth_sorted.head(10), 
            palette="Blues_r", ax=ax[0])
ax[0].set_title("10 Countries with highest population growth")
ax[0].set_xlabel("Countries")
ax[0].set_ylabel("Growth Rate (%)\n")
ax[0].set_xticklabels(ax[0].get_xticklabels(), rotation=30)

# Lowest population countries
sns.barplot(x="Country", y="Growth Rate", data=growth_sorted.tail(10)[::-1], 
            palette="Blues_r", ax=ax[1])
ax[1].set_title("\n10 Countries with lowest population growth")
ax[1].set_xlabel("Countries")
ax[1].set_ylabel("Growth Rate (%)")
ax[1].set_xticklabels(ax[1].get_xticklabels(), rotation=30)
plt.show()

# SUMMARY

In the period from 1980 to 2022, global population exhibited steady growth, reflecting a consistent upward trend. Asia emerged as the most populous continent, maintaining its status as the world's most densely populated region. Conversely, Oceania reported the lowest population figures, attributable to its small landmass and scattered islands. At the country level, China held the top position as the most populous nation, closely followed by India and the United States, a ranking that remained constant throughout the study period. Asia and Africa experienced significant population increases, likely influenced by  higher birth rates. Notably, Moldova registered the highest population growth rate among countries, indicative of significant demographic shifts, while Ukraine reported the lowest population growth, possibly due to the ongoing war with Russia. 

