# Unified Mentor Project 3 - World Population Analysis
Hemamalya K P

Objective:\
 The objective of this project is to analyze historical world population data and predict
 future population trends. Understanding population dynamics is crucial for planning and
 policy-making in various sectors such as healthcare, education, and infrastructure.

In [1]:
# Importing necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.subplots as sp
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler


In [2]:
# Load the dataset (example: world_population.csv)
df = pd.read_csv('C:/Users/HemaMalya/Downloads/Unified Mentor/world_population.csv')

In [3]:
 # Display basic info about the dataset
print(df.info())


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 234 entries, 0 to 233
Data columns (total 17 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   Rank                         234 non-null    int64  
 1   CCA3                         234 non-null    object 
 2   Country/Territory            234 non-null    object 
 3   Capital                      234 non-null    object 
 4   Continent                    234 non-null    object 
 5   2022 Population              234 non-null    int64  
 6   2020 Population              234 non-null    int64  
 7   2015 Population              234 non-null    int64  
 8   2010 Population              234 non-null    int64  
 9   2000 Population              234 non-null    int64  
 10  1990 Population              234 non-null    int64  
 11  1980 Population              234 non-null    int64  
 12  1970 Population              234 non-null    int64  
 13  Area (km²)          

In [4]:
print(df.head())

   Rank CCA3 Country/Territory           Capital Continent  2022 Population  \
0    36  AFG       Afghanistan             Kabul      Asia         41128771   
1   138  ALB           Albania            Tirana    Europe          2842321   
2    34  DZA           Algeria           Algiers    Africa         44903225   
3   213  ASM    American Samoa         Pago Pago   Oceania            44273   
4   203  AND           Andorra  Andorra la Vella    Europe            79824   

   2020 Population  2015 Population  2010 Population  2000 Population  \
0         38972230         33753499         28189672         19542982   
1          2866849          2882481          2913399          3182021   
2         43451666         39543154         35856344         30774621   
3            46189            51368            54849            58230   
4            77700            71746            71519            66097   

   1990 Population  1980 Population  1970 Population  Area (km²)  \
0         10694796

In [5]:
df.columns

Index(['Rank', 'CCA3', 'Country/Territory', 'Capital', 'Continent',
       '2022 Population', '2020 Population', '2015 Population',
       '2010 Population', '2000 Population', '1990 Population',
       '1980 Population', '1970 Population', 'Area (km²)', 'Density (per km²)',
       'Growth Rate', 'World Population Percentage'],
      dtype='object')

In [6]:
df.shape

(234, 17)

In [7]:
missing_data = df.isnull().mean() * 100
missing_data

Rank                           0.0
CCA3                           0.0
Country/Territory              0.0
Capital                        0.0
Continent                      0.0
2022 Population                0.0
2020 Population                0.0
2015 Population                0.0
2010 Population                0.0
2000 Population                0.0
1990 Population                0.0
1980 Population                0.0
1970 Population                0.0
Area (km²)                     0.0
Density (per km²)              0.0
Growth Rate                    0.0
World Population Percentage    0.0
dtype: float64

In [8]:
# Summary of statistics for numerical columns
print(df.describe())

             Rank  2022 Population  2020 Population  2015 Population  \
count  234.000000     2.340000e+02     2.340000e+02     2.340000e+02   
mean   117.500000     3.407441e+07     3.350107e+07     3.172996e+07   
std     67.694165     1.367664e+08     1.355899e+08     1.304050e+08   
min      1.000000     5.100000e+02     5.200000e+02     5.640000e+02   
25%     59.250000     4.197385e+05     4.152845e+05     4.046760e+05   
50%    117.500000     5.559944e+06     5.493074e+06     5.307400e+06   
75%    175.750000     2.247650e+07     2.144798e+07     1.973085e+07   
max    234.000000     1.425887e+09     1.424930e+09     1.393715e+09   

       2010 Population  2000 Population  1990 Population  1980 Population  \
count     2.340000e+02     2.340000e+02     2.340000e+02     2.340000e+02   
mean      2.984524e+07     2.626947e+07     2.271022e+07     1.898462e+07   
std       1.242185e+08     1.116982e+08     9.783217e+07     8.178519e+07   
min       5.960000e+02     6.510000e+02    

In [9]:
print(f"amount of duplicates: {df.duplicated().sum()}")

amount of duplicates: 0


In [10]:
df.drop(['CCA3', 'Capital'], axis=1, inplace=True)

In [11]:
df.head()

Unnamed: 0,Rank,Country/Territory,Continent,2022 Population,2020 Population,2015 Population,2010 Population,2000 Population,1990 Population,1980 Population,1970 Population,Area (km²),Density (per km²),Growth Rate,World Population Percentage
0,36,Afghanistan,Asia,41128771,38972230,33753499,28189672,19542982,10694796,12486631,10752971,652230,63.0587,1.0257,0.52
1,138,Albania,Europe,2842321,2866849,2882481,2913399,3182021,3295066,2941651,2324731,28748,98.8702,0.9957,0.04
2,34,Algeria,Africa,44903225,43451666,39543154,35856344,30774621,25518074,18739378,13795915,2381741,18.8531,1.0164,0.56
3,213,American Samoa,Oceania,44273,46189,51368,54849,58230,47818,32886,27075,199,222.4774,0.9831,0.0
4,203,Andorra,Europe,79824,77700,71746,71519,66097,53569,35611,19860,468,170.5641,1.01,0.0


In [12]:
custom_palette = ['#0b3d91', '#e0f7fa', '#228b22', '#1e90ff', '#8B4513', '#D2691E',
'#DAA520', '#556B2F']


In [13]:
countries_by_continents = df['Continent'].value_counts().reset_index()
countries_by_continent = pd.DataFrame(countries_by_continents)

In [14]:
print(countries_by_continent)

           index  Continent
0         Africa         57
1           Asia         50
2         Europe         50
3  North America         40
4        Oceania         23
5  South America         14


In [15]:
type(countries_by_continent)

pandas.core.frame.DataFrame

In [16]:
custom_palette = ['#636EFA', '#EF553B', '#00CC96', '#AB63FA', '#FFA15A']

In [17]:
# Create the bar chart
fig = px.bar(
    countries_by_continent,
    x='index',
    y='Continent',
    color='index',
    text='Continent',
    title='Number of Countries by Continent',
    color_discrete_sequence=custom_palette
)
 # Customize the layout
fig.update_layout(
xaxis_title='Continents',
yaxis_title='Number of Countries',
plot_bgcolor='rgba(0,0,0,0)', # Set the background color to transparent
font_family='Arial', # Set font family
title_font_size=20 # Set title font size
 )
# Show the plot
fig.show()

In [18]:
continent_population_percentage = df.groupby('Continent')['World Population Percentage'].sum().reset_index()

In [19]:
continent_population_percentage

Unnamed: 0,Continent,World Population Percentage
0,Africa,17.87
1,Asia,59.19
2,Europe,9.33
3,North America,7.51
4,Oceania,0.55
5,South America,5.48


In [20]:
# Create the pie chart
fig = go.Figure(data=[go.Pie(
    labels=continent_population_percentage['Continent'],
    values=continent_population_percentage['World Population Percentage']
)])

In [21]:
# Update layout
fig.update_layout(
    title='World Population Percentage by Continent',
    template='plotly',
    paper_bgcolor='rgba(255,255,255,0)',  # Set the paper background color to transparent
    plot_bgcolor='rgba(255,255,255,0)'    # Set the plot background color to transparent
)

# Update pie chart colors
fig.update_traces(
    marker=dict(colors=custom_palette, line=dict(color='#FFFFFF', width=1))
)
# Show the plot
fig.show()

In [22]:
# Define custom color palette
custom_palette = ['#636EFA', '#EF553B', '#00CC96', '#AB63FA', '#FFA15A']

# Create the pie chart
fig = go.Figure(data=[
    go.Pie(
        labels=continent_population_percentage['Continent'],
        values=continent_population_percentage['World Population Percentage']
    )
])

# Update layout
fig.update_layout(
    title='World Population Percentage by Continent',
    template='plotly',
    paper_bgcolor='rgba(255,255,255,0)',  # Set the paper background color to transparent
    plot_bgcolor='rgba(255,255,255,0)'   # Set the plot background color to transparent
)

# Update pie colors
fig.update_traces(
    marker=dict(
        colors=custom_palette,
        line=dict(color='#FFFFFF', width=1)
    )
)

# Show the plot
fig.show()


In [23]:
df_melted = df.melt(
    id_vars=['Continent'],
    value_vars=[
        '2022 Population', '2020 Population', '2015 Population',
        '2010 Population', '2000 Population', '1990 Population',
        '1980 Population', '1970 Population'
    ],
    var_name='Year',
    value_name='Population'
)

In [24]:
# Convert 'Year' to a more suitable format
df_melted['Year'] = df_melted['Year'].str.split().str[0].astype(int)


In [25]:
# Aggregate population by continent and year
population_by_continent = df_melted.groupby(['Continent', 'Year'])['Population'].sum().reset_index()


In [26]:
print(population_by_continent)

        Continent  Year  Population
0          Africa  1970   365444348
1          Africa  1980   481536377
2          Africa  1990   638150629
3          Africa  2000   818946032
4          Africa  2010  1055228072
5          Africa  2015  1201102442
6          Africa  2020  1360671810
7          Africa  2022  1426730932
8            Asia  1970  2144906290
9            Asia  1980  2635334228
10           Asia  1990  3210563577
11           Asia  2000  3735089604
12           Asia  2010  4220041327
13           Asia  2015  4458250182
14           Asia  2020  4663086535
15           Asia  2022  4721383274
16         Europe  1970   655923991
17         Europe  1980   692527159
18         Europe  1990   720320797
19         Europe  2000   726093423
20         Europe  2010   735613934
21         Europe  2015   741535608
22         Europe  2020   745792196
23         Europe  2022   743147538
24  North America  1970   315434606
25  North America  1980   368293361
26  North America  1990   42

In [27]:

fig = px.line(
    population_by_continent,
    x='Year',
    y='Population',
    color='Continent',
    title='Population Trends by Continent Over Time',
    labels={'Population': 'Population', 'Year': 'Year'}
)

fig.update_layout(
    template='plotly_white',
    xaxis_title='Year',
    yaxis_title='Population',
    title_font_size=20
)

fig.show()


# World Population Comparison: 1970 to 2020

In [28]:
# List of features for which to create the choropleth maps
features = ['1970 Population', '2020 Population']

# Loop through the features and create a choropleth for each
for feature in features:
    fig = px.choropleth(
        df,
        locations='Country/Territory',
        locationmode='country names',
        color=feature,
        hover_name='Country/Territory',
        template='plotly_white',
        title=feature
    )
    fig.show()

In [29]:
growth = (df.groupby(by='Country/Territory')['2022 Population'].sum() - 
          df.groupby(by='Country/Territory')['1970 Population'].sum()).sort_values(ascending=False).head(8)

In [30]:
fig = px.bar(
    growth,
    x=growth.index,
    y=growth.values,
    title='Top 8 Countries by Population Growth (1970 to 2022)',
    labels={'x': 'Country/Territory', 'y': 'Population Growth'},
    color=growth.values,
    color_continuous_scale='Viridis'
)

fig.update_layout(
    template='plotly_white',
    xaxis_title='Country/Territory',
    yaxis_title='Population Growth',
    title_font_size=20
)

fig.show()


In [31]:
fig = px.bar(
    x=growth.index,
    y=growth.values,
    text=growth.values,
    color=growth.values,
    color_continuous_scale='Viridis',  # Adding a color scale
    title='Growth Of Population From 1970 to 2022 (Top 8)',
    template='plotly_white'
)
# Customize layout
fig.update_layout(
    xaxis_title='Country/Territory',
    yaxis_title='Population Growth',
    title_font_size=20
)

# Show the plot
fig.show()

In [32]:
# Group by 'Country/Territory' and get the top 8 populated countries for 1970 and 2022
top_8_populated_countries_1970 = df.groupby('Country/Territory')['1970 Population'].sum().sort_values(ascending=False).head(8)
top_8_populated_countries_2022 = df.groupby('Country/Territory')['2022 Population'].sum().sort_values(ascending=False).head(8)


In [33]:
# Create a dictionary to store the data for both years
features = {
    'top_8_populated_countries_1970': top_8_populated_countries_1970,
    'top_8_populated_countries_2022': top_8_populated_countries_2022
}

# Loop through each feature and create a bar chart
for feature_name, feature_data in features.items():
    year = feature_name.split('_')[-1]  # Extract the year from the feature name
    
    # Create the bar chart for each year
    fig = px.bar(
        x=feature_data.index,
        y=feature_data.values,
        text=feature_data.values,
        color=feature_data.values,
        title=f'Top 8 Most Populated Countries ({year})',
        template='plotly_white'
    )

    # Update layout
    fig.update_layout(
        xaxis_title='Country/Territory',
        yaxis_title='Population',
        title_font_size=20
    )

    # Show the plot
    fig.show()


# World Population Growth Rates: The Fastest Growing Countries

In [34]:
sorted_df_growth = df.sort_values(by='Growth Rate', ascending=False)

top_fastest = sorted_df_growth.head(6)
top_slowest = sorted_df_growth.tail(6)

top_fastest


Unnamed: 0,Rank,Country/Territory,Continent,2022 Population,2020 Population,2015 Population,2010 Population,2000 Population,1990 Population,1980 Population,1970 Population,Area (km²),Density (per km²),Growth Rate,World Population Percentage
133,135,Moldova,Europe,3272996,3084847,3277388,3678186,4251573,4480199,4103240,3711140,33846,96.7026,1.0691,0.04
164,37,Poland,Europe,39857145,38428366,38553146,38597353,38504431,38064255,35521429,32482943,312679,127.4698,1.0404,0.5
148,54,Niger,Africa,26207977,24333639,20128124,16647543,11622665,8370647,6173177,4669708,1267000,20.6851,1.0378,0.33
202,60,Syria,Asia,22125249,20772595,19205178,22337563,16307654,12408996,8898954,6319199,185180,119.4797,1.0376,0.28
189,116,Slovakia,Europe,5643453,5456681,5424444,5396424,5376690,5261305,4973883,4522867,49037,115.0856,1.0359,0.07
55,15,DR Congo,Africa,99010212,92853164,78656904,66391257,48616317,35987541,26708686,20151733,2344858,42.2244,1.0325,1.24


In [35]:

def plot_population_trends(countries, df, custom_palette):
    # Calculate the number of rows needed
    n_cols = 2
    n_rows = (len(countries) + n_cols - 1) // n_cols  # Number of rows needed

    # Create subplots
    fig = sp.make_subplots(
        rows=n_rows, cols=n_cols, subplot_titles=countries, 
        horizontal_spacing=0.1, vertical_spacing=0.1
    )

    for i, country in enumerate(countries, start=1):
        # Filter data for the selected country
        country_df = df[df['Country/Territory'] == country]

        # Melt the DataFrame to have a long format
        country_melted = country_df.melt(
            id_vars=['Country/Territory'],
            value_vars=[
                '2022 Population', '2020 Population', '2015 Population',
                '2010 Population', '2000 Population', '1990 Population',
                '1980 Population', '1970 Population'
            ],
            var_name='Year',
            value_name='Population'
        )



In [36]:
# Function to plot population trends
def plot_population_trends(countries, df, custom_palette):
    # Calculate the number of rows needed
    n_cols = 2  # Fixed number of columns
    n_rows = (len(countries) + n_cols - 1) // n_cols  # Calculate number of rows required

    # Create subplots
    fig = sp.make_subplots(
        rows=n_rows, cols=n_cols, subplot_titles=countries, 
        horizontal_spacing=0.1, vertical_spacing=0.1
    )

    for i, country in enumerate(countries, start=1):
        # Filter data for the selected country
        country_df = df[df['Country/Territory'] == country]

        # Melt the DataFrame to have a long format
        country_melted = country_df.melt(
            id_vars=['Country/Territory'],
            value_vars=[
                '2022 Population', '2020 Population', '2015 Population',
                '2010 Population', '2000 Population', '1990 Population',
                '1980 Population', '1970 Population'
            ],
            var_name='Year',
            value_name='Population'
        )

        # Convert 'Year' to a more suitable format (handle non-numeric values)
        country_melted['Year'] = country_melted['Year'].str.split().str[0].astype(int)

        # Create a line plot for each country
        line_fig = px.line(
            country_melted, x='Year', y='Population',
            color='Country/Territory',
            labels={'Population': 'Population', 'Year': 'Year'},
            color_discrete_sequence=custom_palette
        )

        # Update the line plot to fit the subplot
        row = (i - 1) // n_cols + 1  # Calculate the row index
        col = (i - 1) % n_cols + 1   # Calculate the column index

        for trace in line_fig.data:
            fig.add_trace(trace, row=row, col=col)

    # Update the layout of the subplots
    fig.update_layout(
        title='Population Trends of Selected Countries Over Time',
        template='plotly_white',
        font_family='Arial',
        title_font_size=20,
        showlegend=False,
        height=600 * n_rows,  # Adjust height based on the number of rows
        width=800  # Set a fixed width for consistency
    )

    # Update line properties for all traces
    fig.update_traces(line=dict(width=3))

    # Update axis labels
    fig.update_xaxes(title_text='Year')
    fig.update_yaxes(title_text='Population')
# Show the plot
fig.show()

In [37]:
# Assuming 'top_fastest' is a DataFrame with 'Country/Territory' and 'Growth Rate' columns
# Sort 'top_fastest' by 'Growth Rate' in descending order
fastest = top_fastest[['Country/Territory', 'Growth Rate']].sort_values(by='Growth Rate', ascending=False).reset_index(drop=True)
# Display the sorted DataFrame
fastest

Unnamed: 0,Country/Territory,Growth Rate
0,Moldova,1.0691
1,Poland,1.0404
2,Niger,1.0378
3,Syria,1.0376
4,Slovakia,1.0359
5,DR Congo,1.0325


In [38]:

def plot_population_trends(countries, df, custom_palette):
    n_cols = 2
    n_rows = (len(countries) + n_cols - 1) // n_cols

    fig = sp.make_subplots(rows=n_rows, cols=n_cols, subplot_titles=countries,
                           horizontal_spacing=0.1, vertical_spacing=0.1)

    for i, country in enumerate(countries, start=1):
        # Filter data for the selected country
        country_df = df[df['Country/Territory'] == country]

        # Melt the DataFrame to have a long format
        country_melted = country_df.melt(id_vars=['Country/Territory'],
                                         value_vars=['2022 Population', '2020 Population', '2015 Population',
                                                     '2010 Population', '2000 Population', '1990 Population',
                                                     '1980 Population', '1970 Population'],
                                         var_name='Year', value_name='Population')

        # Convert 'Year' to a more suitable format
        country_melted['Year'] = country_melted['Year'].str.split().str[0].astype(int)

        # Print the last population value (for example)
        last_population = country_melted.iloc[-1]['Population']
        print(f"Last population for {country} in {country_melted.iloc[-1]['Year']}: {last_population}")

        # Create a line plot for each country
        line_fig = px.line(country_melted, x='Year', y='Population',
                           color='Country/Territory', labels={'Population': 'Population', 'Year': 'Year'},
                           color_discrete_sequence=custom_palette)

        # Update the line plot to fit the subplot
        row = (i - 1) // n_cols + 1
        col = (i - 1) % n_cols + 1
        for trace in line_fig.data:
            fig.add_trace(trace, row=row, col=col)

    # Update the layout of the subplots
    fig.update_layout(
        title='Population Trends of Selected Countries Over Time',
        template='plotly_white',
        font_family='Arial',
        title_font_size=20,
        showlegend=False,
        height=600 * n_rows,  # Adjust height for bigger plots
    )

    fig.update_traces(line=dict(width=3))
    fig.update_xaxes(title_text='Year')
    fig.update_yaxes(title_text='Population')

    # Show the plot
fig.show()



In [39]:
# Custom color palette (you can change it as per your preference)
custom_palette = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b']

# Assuming 'df' is already your DataFrame containing the population data
plot_population_trends(['Moldova', 'Poland', 'Niger', 'Syria', 'Slovakia', 'DR Congo'], df, custom_palette)


Last population for Moldova in 1970: 3711140
Last population for Poland in 1970: 32482943
Last population for Niger in 1970: 4669708
Last population for Syria in 1970: 6319199
Last population for Slovakia in 1970: 4522867
Last population for DR Congo in 1970: 20151733


# World Population Growth Rates: The Slowest Growing Countries

In [40]:
slowest = top_slowest[['Country/Territory', 'Growth Rate']].sort_values(by='Growth Rate', ascending=False).reset_index(drop=True)
slowest

Unnamed: 0,Country/Territory,Growth Rate
0,Latvia,0.9876
1,Lithuania,0.9869
2,Bulgaria,0.9849
3,American Samoa,0.9831
4,Lebanon,0.9816
5,Ukraine,0.912


In [41]:
custom_palette = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b']

# Call the plot_population_trends function with the specified countries
plot_population_trends(
    ['Latvia', 'Lithuania', 'Bulgaria', 'American Samoa', 'Lebanon', 'Ukraine'],
    df,  # Ensure df contains the data you want to use
    custom_palette
)

Last population for Latvia in 1970: 2397414
Last population for Lithuania in 1970: 3210147
Last population for Bulgaria in 1970: 8582950
Last population for American Samoa in 1970: 27075
Last population for Lebanon in 1970: 2381791
Last population for Ukraine in 1970: 47279086


# Land Area by Country

In [42]:
# Ensure the column name is correctly formatted by checking for hidden spaces or formatting issues
land_by_country = df.groupby('Country/Territory')['Area (km²)'].sum().sort_values(ascending=False)

# Get the top 5 countries with the most land area
most_land = land_by_country.head(5)

# Get the bottom 5 countries with the least land area
least_land = land_by_country.tail(5)

In [43]:
print("Top 5 countries with the most land area:")
print(most_land)

Top 5 countries with the most land area:
Country/Territory
Russia           17098242
Canada            9984670
China             9706961
United States     9372610
Brazil            8515767
Name: Area (km²), dtype: int64


In [44]:
print("\nTop 5 countries with the least land area:")
print(least_land)



Top 5 countries with the least land area:
Country/Territory
Saint Barthelemy    21
Tokelau             12
Gibraltar            6
Monaco               2
Vatican City         1
Name: Area (km²), dtype: int64


In [45]:
# Create subplots
fig = sp.make_subplots(
    rows=1, cols=2, subplot_titles=("Countries with Most Land", "Countries with Least Land")
)

# Plot countries with the most land
fig.add_trace(
    go.Bar(x=most_land.index, y=most_land.values, name='Most Land', marker_color=custom_palette[0]),
    row=1, col=1
)

# Plot countries with the least land
fig.add_trace(
    go.Bar(x=least_land.index, y=least_land.values, name='Least Land', marker_color=custom_palette[1]),
    row=1, col=2
)

# Update layout
fig.update_layout(
    title_text="Geographical Distribution of Land Area by Country",
    showlegend=False,
    template='plotly_white'
)

# Update y-axes for both subplots
fig.update_yaxes(title_text="Area (km2)", row=1, col=1)
fig.update_yaxes(title_text="Area (km2)", row=1, col=2)

# Show the plot
fig.show()



# Land Area Per Person by Country

In [46]:
# Calculate the Area per Person for each country
df['Area per Person'] = df['Area (km²)'] / df['2022 Population']

# Group by 'Country/Territory' and sum the 'Area per Person' for each country
country_area_per_person = df.groupby('Country/Territory')['Area per Person'].sum()

# Get the top 5 countries with the most land available per person
most_land_available = country_area_per_person.sort_values(ascending=False).head(5)

# Get the bottom 5 countries with the least land available per person
least_land_available = country_area_per_person.sort_values(ascending=False).tail(5)





In [47]:
# Print the results
print("Top 5 countries with the most land available per person:")
print(most_land_available)

Top 5 countries with the most land available per person:
Country/Territory
Greenland           38.360890
Falkland Islands     3.220370
Western Sahara       0.461817
Mongolia             0.460254
Namibia              0.321625
Name: Area per Person, dtype: float64


In [48]:
print("\nTop 5 countries with the least land available per person:")
print(least_land_available)


Top 5 countries with the least land available per person:
Country/Territory
Gibraltar    0.000184
Hong Kong    0.000147
Singapore    0.000119
Monaco       0.000055
Macau        0.000043
Name: Area per Person, dtype: float64


In [49]:
# Create subplots
fig = sp.make_subplots(
    rows=1, cols=2, 
    subplot_titles=("Countries with Most Land Available Per Capita", 
                    "Countries with Least Land Available Per Capita")
)

# Plot countries with the most land available per person
fig.add_trace(
    go.Bar(
        x=most_land_available.index, 
        y=most_land_available.values,
        name='Most Land Available Per Capita', 
        marker_color=custom_palette[2]
    ), 
    row=1, col=1
)

# Plot countries with the least land available per person
fig.add_trace(
    go.Bar(
        x=least_land_available.index, 
        y=least_land_available.values,
        name='Least Land Available Per Capita', 
        marker_color=custom_palette[3]
    ), 
    row=1, col=2
)

# Update layout
fig.update_layout(
    title_text="Distribution of Available Land Area by Country Per Capita",
    showlegend=False,
    template='plotly_white'
)

# Update y-axes titles
fig.update_yaxes(title_text="Land Available Per Person (km²)", row=1, col=1)
fig.update_yaxes(title_text="Land Available Per Person (km²)", row=1, col=2)

# Show the plot
fig.show()


# Conclusion

1.The dataset allows for analyzing population growth trends over decades (1970 to 2022).
2. Countries like India and China (not shown in the preview but likely in the data) will dominate population contributions due to their massive population numbers.\
3. Population density varies widely among countries, reflecting differences in geography and urbanization. For example, countries with small areas like Andorra and American Samoa have much higher densities compared to larger countries like Algeria.\
4. Countries with high growth rates might indicate rapid urbanization or higher birth rates (e.g., Afghanistan with a 1.03% growth rate).\
5. Comparatively, developed countries tend to have slower or declining growth rates.\
6. Some countries contribute a significant percentage to the world population (e.g., likely China and India with over 35% combined).
7. Smaller countries, like Andorra or American Samoa, contribute negligible amounts.\
8. Continent data allows us to analyze population distribution across continents, highlighting trends like:\
    Asia: High population density and growth rates.\
    Europe: Slower growth rates, often negative in certain regions.\
    Africa: Fast-growing populations with increasing world population share.
    
