# Population Data

* id: Serial number or rank of the country in the dataset.
* Country (or dependency): The name of the country or territory.
* Population 2025: Estimated total population of the country in the year 2025.
* Yearly Change: Percent change in the population compared to the previous year.
* Net change: Absolute number of people added (or lost) in a year.
* Density (P/Km²): Population density — how many people live per square kilometer of land area.
* Land Area (Km²): Total land area of the country.
* Migrants (net): Net migration — difference between people moving in and out of the country
* Fert. Rate: Total fertility rate — average number of children a woman is expected to have during her lifetime.
* Median Age: The age at which half the population is older and half is younger.
* Urban Pop: % the number of people live in in cities and other urban areas.
* World Share: % of total world population.


# Cleaning Process

- import library

In [None]:
import pandas as pd

- importing the dataset

In [None]:
df = pd.read_csv('population_data.csv')

In [None]:
df

- checking rows and columns

In [None]:
df.shape

## checking data type <br>




change dtype:
- yearly change (obj --> float64)
- net change (obj --> int64)
- migrants (obj --> int64)
- urban population (obj --> float64)
- world share (obj --> float64)

<br>
Change column name:<br>

- Country (or dependency) --> Country
- Yearly Change --> Yearly Change (%)
- Urban Pop % --> Urban Pop (%)
- World Share --> World Share (%)

In [None]:
df.info()

- Check duplicated

In [None]:
df.duplicated().sum()

- check for missing or invalid values

In [None]:
df.isnull().sum()

## 1. Fixing Column Names

In [None]:
df.columns

In [None]:
df.rename(columns={'Country (or dependency)':'Country'}, inplace=True)

In [None]:
df.rename(columns={'Yearly Change':'Yearly Change (%)'}, inplace=True)

In [None]:
df.rename(columns={'Urban Pop %':'Urban Pop (%)'}, inplace=True)

In [None]:
df.rename(columns={'World Share':'World Share (%)'}, inplace=True)

In [None]:
df.head(3)

## 2. Change Syntax Error and Correcting Data Type

- Yearly Change (%)

In [None]:
df['Yearly Change (%)']

In [None]:
df['Yearly Change (%)'] = df['Yearly Change (%)'].str.replace('%', '')

In [None]:
df['Yearly Change (%)'] = df['Yearly Change (%)'].str.replace('−','-').astype(float)

- Net Change

In [None]:
df['Net Change']

In [None]:
df['Net Change'] = df['Net Change'].str.replace(',','')

In [None]:
df['Net Change'] = df['Net Change'].str.replace('−','-').astype(int)

- Migrants (Net)

In [None]:
df['Migrants (net)']

In [None]:
df['Migrants (net)'] = df['Migrants (net)'].str.replace(',','')

In [None]:
df['Migrants (net)'] = df['Migrants (net)'].str.replace('−','-').astype(int)

- Urban Population (%)

In [None]:
df['Urban Pop (%)']

In [None]:
df['Urban Pop (%)'] = df['Urban Pop (%)'].str.replace('%','').astype(float)

- World Share (%)

In [None]:
df['World Share (%)'].tail(3)

In [None]:
df['World Share (%)'] = df['World Share (%)'].str.replace('%','').astype(float)

In [None]:
df['World Share (%)'] = df['World Share (%)'].round(2)

In [None]:
df.info()

## 3. Insert Missing Data

In [None]:
df.isnull().sum()

In [None]:
df[df['Urban Pop (%)'].isnull()]

In [None]:
df.loc[df['Country'] == 'Venezuela', 'Urban Pop (%)'] = 89.8

In [None]:
df.loc[df['Country'] == 'Hong Kong', 'Urban Pop (%)'] = 100

In [None]:
df.loc[df['Country'] == 'Singapore', 'Urban Pop (%)'] = 100

In [None]:
df.loc[df['Country'] == 'Uruguay', 'Urban Pop (%)'] = 95.9

In [None]:
df.loc[df['Country'] == 'Puerto Rico', 'Urban Pop (%)'] = 97.7

In [None]:
df.loc[df['Country'] == 'Bahrain', 'Urban Pop (%)'] = 89.5

In [None]:
df.loc[df['Country'] == 'Réunion', 'Urban Pop (%)'] = 99.8

In [None]:
df.loc[df['Country'] == 'Guadeloupe', 'Urban Pop (%)'] = 100

In [None]:
df.loc[df['Country'] == 'Martinique', 'Urban Pop (%)'] = 89.5

In [None]:
df.loc[df['Country'] == 'U.S. Virgin Islands', 'Urban Pop (%)'] = 97

In [None]:
df.loc[df['Country'] == 'American Samoa', 'Urban Pop (%)'] = 87.3

In [None]:
df.loc[df['Country'] == 'Northern Mariana Islands', 'Urban Pop (%)'] = 92.2

In [None]:
df.loc[df['Country'] == 'Monaco', 'Urban Pop (%)'] = 100

In [None]:
df.loc[df['Country'] == 'Marshall Islands', 'Urban Pop (%)'] = 100

In [None]:
df.loc[df['Country'] == 'San Marino', 'Urban Pop (%)'] = 97.9

In [None]:
df.loc[df['Country'] == 'Palau', 'Urban Pop (%)'] = 84

In [None]:
df.loc[df['Country'] == 'Anguilla', 'Urban Pop (%)'] = 100

In [None]:
df.loc[df['Country'] == 'Cook Islands', 'Urban Pop (%)'] = 100

In [None]:
df.loc[df['Country'] == 'Saint Barthelemy', 'Urban Pop (%)'] = 100

In [None]:
df.loc[df['Country'] == 'Wallis & Futuna', 'Urban Pop (%)'] = 0

In [None]:
df.loc[df['Country'] == 'Saint Pierre & Miquelon', 'Urban Pop (%)'] = 100

In [None]:
df.loc[df['Country'] == 'Tokelau', 'Urban Pop (%)'] = 0

In [None]:
df.loc[df['Country'] == 'Holy See', 'Urban Pop (%)'] = 100

## Save and Export

In [None]:
df.to_csv('population_data.csv', index=False)

In [None]:
from google.colab import files
files.download('population_data.csv')