# Jupyter
You should build a data analytics project in a Jupyter notebook. In particular, you should do the following:
- Load the Gapminder dataset using the below code cell. 
- Extract and visualize interesting insights about countries using [Pandas](https://pandas.pydata.org/) and [Plotly](https://plotly.com/python/).
- Structure, document, and decorate your notebook using [Markdown](https://www.markdownguide.org/basic-syntax/).

# Data Analysis of Gapminder Dataset

## Introduction
In this notebook, we will perform a data analysis on the Gapminder dataset. 

## Loading the Dataset
We start by loading the Gapminder dataset using the code cell below:

In [31]:
import pandas as pd
import plotly
import plotly.express as px
import plotly.graph_objects as go

df = px.data.gapminder() #Loading the Gapminder dataset
df.head() #Displaying the first few rows of the dataset

Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap,iso_alpha,iso_num
0,Afghanistan,Asia,1952,28.801,8425333,779.445314,AFG,4
1,Afghanistan,Asia,1957,30.332,9240934,820.85303,AFG,4
2,Afghanistan,Asia,1962,31.997,10267083,853.10071,AFG,4
3,Afghanistan,Asia,1967,34.02,11537966,836.197138,AFG,4
4,Afghanistan,Asia,1972,36.088,13079460,739.981106,AFG,4


## Analysis and Visualization
Now, we will explore the dataset using Pandas for data manipulation and analysis, and Plotly for interactive visualizations.

## Insight 1: Life Expectancy by Region 

In [32]:
# Calculating the average life expectancy by region and year
life_expectancy_region_df = df.groupby(['year', 'continent'])['lifeExp'].mean().reset_index()

# Creating a line plot or box plot for life expectancy by region
fig = px.line(life_expectancy_region_df, x='year', y='lifeExp', color='continent', title='Life Expectancy by Region')
fig.show()

# Alternatively, creating a box plot for life expectancy by region
fig = px.box(df, x='continent', y='lifeExp', title='Life Expectancy by Region')
fig.show()


## Insight 2: Life Expectancy Over Time

In [33]:
# Grouping the data by country and year, calculate the average life expectancy
life_expectancy_df = df.groupby(['country', 'year'])['lifeExp'].mean().reset_index()

# Creating a line plot for life expectancy over time for selected countries
countries = ['United States', 'China', 'India']  # Modifying the list of countries as per your preference
filtered_df = life_expectancy_df[life_expectancy_df['country'].isin(countries)]

fig = px.line(filtered_df, x='year', y='lifeExp', color='country', title='Life Expectancy Over Time')
fig.show()

## Insight 3: GDP per Capita vs. Life Expectancy

In [34]:
# Creating a scatter plot of GDP per capita against life expectancy for a specific year
year = 2007  # Modifying the year as per your preference

filtered_df = df[df['year'] == year]

fig = px.scatter(filtered_df, x='gdpPercap', y='lifeExp', color='continent', hover_name='country',
                 log_x=True, title=f'GDP per Capita vs. Life Expectancy ({year})')
fig.show()

## Insight 4: Population Growth by Continent

In [35]:
# Calculating the total population for each continent and year
population_by_continent_df = df.groupby(['continent', 'year'])['pop'].sum().reset_index()

# Creating an area plot for population growth by continent
fig = px.area(population_by_continent_df, x='year', y='pop', color='continent', title='Population Growth by Continent')
fig.show()

## Insight 5: Population Distribution

In [36]:
# Calculating the total population for each country
total_population_df = df.groupby('country')['pop'].sum().reset_index()

# Sorting the data by population in descending order
total_population_df = total_population_df.sort_values('pop', ascending=False)

# Creating a bar chart for population distribution
fig = px.bar(total_population_df, x='country', y='pop', title='Population Distribution')
fig.show()

## Insight 6: Population Growth

In [37]:
import plotly.express as px

# Grouping the data by country and calculate the mean population for each year
population_df = df.groupby(['country', 'year'])['pop'].mean().reset_index()

# Creating a line plot for population growth
fig = px.line(population_df, x='year', y='pop', color='country', title='Population Growth')
fig.show()