1. datasets chosen : https://ourworldindata.org/urbanization
2. data preprocessing : 
3. why do i chose the data : 
    - the data collected spans from 1960 – 2021
    - part of the UNICEF-backed global World Development Indicator
    - high quality, accurate, clearly defined, and well-tested data
    - provides breakdown of data for each region, even country
4. the problem : Classify countries based on urbanization numbers and analyze the country's characteristics.
5. Classification of countries with high and low level of urbanization
6. explain results with graphics, solution : Point out which country has the highest and lowest urbanization
7. remarks and suggestion
8. write article

# 1. Datasets Chosen 
- Number of people living in urban areas

https://ourworldindata.org/urbanization

# 2. Data Preprocessing
- import dataset
- filter for year
- make another dataset for counting migration in 5 yrs period
- remove unnecessary columns

In [7]:
import pandas as pd

# import all data
urban = pd.read_csv('./urban-and-rural-population.csv')
urban.head()

Unnamed: 0,Entity,Code,Year,Urban population,Rural population
0,Afghanistan,AFG,1960,724373,7898093.0
1,Afghanistan,AFG,1961,763336,8026804.0
2,Afghanistan,AFG,1962,805062,8163985.0
3,Afghanistan,AFG,1963,849446,8308019.0
4,Afghanistan,AFG,1964,896820,8458694.0


In [8]:
#exclude world, regions, and income groups
urban = urban[urban['Entity'].isin([
    'Low-income countries',
    'Lower-middle-income countries',
    'Middle-income countries',
    'Upper-middle-income countries',
    'High-income countries', 
    'Middle East and North Africa (WB)', 
    'North America (WB)', 
    'South Asia (WB)',
    'Sub-Saharan Africa (WB)', 
    'Latin America and Caribbean (WB)',
    'Europe and Central Asia (WB)',
    'European Union (27)',
    'East Asia and Pacific (WB)',
    'World'])==False].reset_index(drop=True)

urban.head()

Unnamed: 0,Entity,Code,Year,Urban population,Rural population
0,Afghanistan,AFG,1960,724373,7898093.0
1,Afghanistan,AFG,1961,763336,8026804.0
2,Afghanistan,AFG,1962,805062,8163985.0
3,Afghanistan,AFG,1963,849446,8308019.0
4,Afghanistan,AFG,1964,896820,8458694.0


In [9]:
# filter for year, only 2021 is counted
urban2 = urban.copy()
urban2 = urban2[urban2['Year'].isin([2021])].reset_index(drop=True)

urban2.head()

Unnamed: 0,Entity,Code,Year,Urban population,Rural population
0,Afghanistan,AFG,2021,10551772,29547690.0
1,Albania,ALB,2021,1770478,1041188.0
2,Algeria,DZA,2021,32807002,11370967.0
3,American Samoa,ASM,2021,39257,5778.0
4,Andorra,AND,2021,69438,9596.0


In [10]:
# filter for 5-year period
urban_5yr = urban.copy()
urban_5yr = urban_5yr[urban_5yr['Year'].isin([2021,2020,2019,2018,2017])].reset_index(drop=True)

urban_5yr.head()

Unnamed: 0,Entity,Code,Year,Urban population,Rural population
0,Afghanistan,AFG,2017,8999963,26643456.0
1,Afghanistan,AFG,2018,9353296,27333488.0
2,Afghanistan,AFG,2019,9727157,28042342.0
3,Afghanistan,AFG,2020,10142913,28829316.0
4,Afghanistan,AFG,2021,10551772,29547690.0


In [11]:
# delete unnecessary columns
del urban2['Code']
del urban2['Year']
del urban_5yr['Code']
del urban_5yr['Year']

# rename columns
urban2.rename(columns={'Entity':'Country', 'Urban population':'Urban', 'Rural population':'Rural'}, inplace=True)
urban_5yr.rename(columns={'Entity':'Country', 'Urban population':'Urban', 'Rural population':'Rural'}, inplace=True)

urban2.head()

Unnamed: 0,Country,Urban,Rural
0,Afghanistan,10551772,29547690.0
1,Albania,1770478,1041188.0
2,Algeria,32807002,11370967.0
3,American Samoa,39257,5778.0
4,Andorra,69438,9596.0


# 3. Reason for choosing the data
- the data collected spans from 1960 – 2021
- part of the UNICEF-backed global World Development Indicator
- high quality, accurate, clearly defined, and well-tested data
- provides breakdown of data for each region, even country

# 4. The Problem

Classify countries based on urbanization numbers and analyze the country's characteristics.

# 5. Classify Countries Urbanization

### 2021 period
We will analyze the levels of urbanization in every country in the world in 2021. We will use k-means algorithm to cluster the data, and plotly.express to graph it

In [12]:
import pandas as pd
import numpy as np
import seaborn as sns

import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.manifold import TSNE
from sklearn.decomposition import PCA
from sklearn.metrics import euclidean_distances
from scipy.spatial.distance import cdist

import plotly.express as px 

fig = px.scatter(urban2, x='Urban', y='Rural', color='Urban', 
hover_data=['Country','Urban','Rural'])
fig.show()


The 

# 6. Explain the Results using graphs



In [13]:
# code here

# 7. Remarks and Suggestions



# 8. Article


### Nuclear Power is The Future

each year energy consumption is rising

the problem:
however that consumption is not sustainable because the non renewable energy sources are limited and causes more harm than good, renewable energy is expensive and not reliable enough to replace non renewable energy sources

solution:
we should use nuclear energy because it is a clean energy source that is reliable and can be used to replace non renewable energy sources at least until renewable energy sources are more reliable and cheaper

structure :
- introduction
- non-renewables
    - what are non-renewables
    - status quo of non-renewables
    - the benefits of non-renewables (cheap, reliable)
    - the disadvantages of non-renewables (limited , pollution, global warming, accidents)
- renewables
    - what are renewables
    - status quo of renewables
    - the benefits of renewables (clean, infinite)
    - the disadvantages of renewables (expensive, unreliable, )
- The perfect solution : nuclear energy
    - what is nuclear energy
    - status quo nuclear energy
    - the benefits of nuclear energy (clean, reliable, extremely efficient)
    - the disadvantages of nuclear energy (limited, byproduct waste, accidents)
- conclusion

# Nuclear Power is The Future

## Introduction
As our modern civilization continues to grow and evolve, the demand for energy is increasing at an unprecedented rate. However, our current methods of power generation, such as fossil fuels, hydroelectric, and renewable energy sources, are not capable of meeting the world's energy needs on their own. Moreover, these methods of power generation have their own set of limitations and drawbacks, such as air pollution, climate change, and the need for large land areas.

To address this issue, nuclear power has emerged as a viable alternative to traditional energy sources. Despite its controversial reputation, nuclear power offers many benefits such as high energy output, low greenhouse gas emissions, and a relatively small environmental footprint. Furthermore, advances in nuclear technology have made it increasingly safe and efficient, with modern nuclear reactors designed with multiple layers of safety features to prevent accidents and mitigate the consequences if they occur.

Therefore, as the demand for energy continues to increase, and the need for a sustainable and reliable power source becomes more critical, nuclear power has become an essential consideration for meeting our energy needs in a safe, efficient, and sustainable way.

## Non-Renewables

### What Are Non-Renewables
Non-renewable energy sources are energy resources that are finite and can be depleted over time. They include fossil fuels such as coal, oil, and natural gas, as well as nuclear energy sources like uranium. Yes, uranium, the fuel of nuclear power, we'll talk more about that later.

### Status Quo of Non-Renewables
Right now, non-renewables are viewed as the dirty and cheap energy. Pollution from non-renewable sources also contributes the most to global warming, smog, and human deaths. Despite the apparent disadvantages, globally, non-renewable energy sources such as fossil fuels, nuclear, and hydroelectric power continue to be the primary sources of energy. ACCORDING TO ........

### Benefits of Non-Renewables
Non-renewable energy sources like fossil fuels and nuclear power have been the primary sources of energy for human civilization for many decades. While they are not sustainable, they still offer a range of benefits, including:

- High Energy Density: Non-renewable energy sources have a high energy density, meaning they contain a significant amount of energy per unit of volume or mass, which makes them a reliable and efficient source of energy.

- Cost-Effective: Fossil fuels, in particular, have been the cheapest source of energy for many years, making them an affordable option for powering homes, industries, and transportation.

- Reliability: Non-renewable energy sources offer a reliable source of energy that can be used to meet the energy demands of a growing population, ensuring a stable supply of electricity and energy.

- Established Infrastructure: The infrastructure for non-renewable energy sources is well-established and readily available, making it easier to supply energy to remote locations or areas with limited infrastructure.

## Disadvantages of Non-Renewables

Non-renewable energy sources have a range of significant disadvantages, including:

- Environmental Impact: Non-renewable energy sources, particularly fossil fuels, have a significant impact on the environment, contributing to air pollution, water pollution, and climate change, which has far-reaching and long-term consequences for the planet and all living beings.

- Finite Resources: Non-renewable energy sources are finite and can be depleted over time, which means that they cannot meet the growing energy demands of a growing population indefinitely.

- Price Volatility: The price of non-renewable energy sources can be volatile and subject to price fluctuations, which can make it challenging to plan and budget for energy costs.

- Health and Safety Risks: Non-renewable energy sources, such as coal mining and oil drilling, pose significant health and safety risks to workers and communities, including accidents, explosions, and exposure to toxic substances. Energy generated from non-renewable sources has the highest death per terawatt ratio.

- Geopolitical Tensions: Non-renewable energy sources can contribute to geopolitical tensions and conflicts, particularly in regions where they are abundant or located near strategic locations.

## Renewables

### What are Renewables

### Status Quo of Renewables

### Benefits of Renewables

### Disadvantages of Renewables

## Nuclear Power

### What is Nuclear Power

### Status Quo of Nuclear Power

### Benefits of Renewables

### Disadvantages of Renewables

## Conclusion

### Sources

https://ourworldindata.org/nuclear-energy
https://energydata.info/dataset?q=nuclear