# Day 21 - Creating Basic Plots in Python: Line, Bar, and Scatter


### Why Are Basic Plots Important?
Basic plots like line, bar, and scatter plots are the building blocks of data visualization. They provide a simple yet powerful way to understand your data at a glance. Whether you're comparing data over time, visualizing distributions, or identifying correlations, these plots are invaluable tools in your data science toolkit.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt

## Line Plot
Line plots are ideal for visualizing data trends over time.


In [None]:
# Example data
data = {
    'Year': [2010, 2012, 2014, 2016, 2018, 2020],
    'Population': [2.5, 2.7, 2.9, 3.0, 3.3, 3.5]
}
df = pd.DataFrame(data)

# Creating a line plot
plt.figure(figsize=(10, 6))
plt.plot(df['Year'], df['Population'], marker='o')
plt.title('Population Growth Over Years')
plt.xlabel('Year')
plt.ylabel('Population (in billions)')
plt.grid(True)
plt.show()

## Bar Plot
Bar plots are useful for comparing quantities among different groups.


In [None]:
# Example data
data = {
    'Country': ['USA', 'China', 'India', 'Brazil', 'Russia'],
    'GDP': [21.4, 14.3, 2.9, 2.1, 1.6]
}
df = pd.DataFrame(data)

# Creating a bar plot
plt.figure(figsize=(10, 6))
plt.bar(df['Country'], df['GDP'], color='skyblue')
plt.title('GDP by Country')
plt.xlabel('Country')
plt.ylabel('GDP (in Trillions USD)')
plt.grid(True)
plt.show()

## Scatter Plot
Scatter plots are excellent for identifying relationships between two variables.


In [None]:
# Example data
data = {
    'Height': [150, 160, 170, 180, 190],
    'Weight': [50, 60, 65, 70, 80]
}
df = pd.DataFrame(data)

# Creating a scatter plot
plt.figure(figsize=(10, 6))
plt.scatter(df['Height'], df['Weight'], color='red')
plt.title('Height vs. Weight')
plt.xlabel('Height (cm)')
plt.ylabel('Weight (kg)')
plt.grid(True)
plt.show()

## Use Case: Visualizing Population Data from a Public Dataset
For this use case, we'll use the `pandas-datareader` library to fetch population data from the World Bank. We'll visualize this data using line, bar, and scatter plots.

### Step 1: Install the Required Libraries
First, make sure you have the necessary libraries installed. You can install them using `pip`:
```sh
pip install pandas-datareader matplotlib
```

In [None]:
import pandas_datareader as pdr

# Fetching the population data from the World Bank for the years 2000 to 2020
population_data = pdr.get_data_wb(
    indicator='SP.POP.TOTL',  # Total population indicator
    country=['US', 'CN', 'IN', 'BR', 'RU'],  # Countries: USA, China, India, Brazil, Russia
    start=2000,
    end=2020
)

# Reshaping the data
population_data = population_data.reset_index()
population_df = population_data.pivot(index='year', columns='country', values='SP.POP.TOTL')

# Display the first few rows of the dataset
print(population_df.head())

### Step 3: Creating Visualizations
#### Line Plot: Visualizing Population Growth Over Time


In [None]:
plt.figure(figsize=(10, 6))
for country in population_df.columns:
    plt.plot(population_df.index, population_df[country], marker='o', label=country)
plt.title('Population Growth Over Time')
plt.xlabel('Year')
plt.ylabel('Population')
plt.legend()
plt.grid(True)
plt.show()

#### Bar Plot: Comparing Population by Country for a Specific Year (e.g., 2020)


In [None]:
plt.figure(figsize=(10, 6))
population_2020 = population_df.loc[2020]
plt.bar(population_2020.index, population_2020.values, color='skyblue')
plt.title('Population by Country in 2020')
plt.xlabel('Country')
plt.ylabel('Population')
plt.grid(True)
plt.show()

#### Scatter Plot: Population vs. GDP


In [None]:
# Simulating GDP data for illustration purposes
gdp_data = {
    'US': 21.43e12,
    'CN': 14.34e12,
    'IN': 2.87e12,
    'BR': 2.05e12,
    'RU': 1.7e12
}
gdp_df = pd.Series(gdp_data, name='GDP')

# Create a scatter plot
plt.figure(figsize=(10, 6))
plt.scatter(population_2020, gdp_df, color='purple')
plt.title('Population vs. GDP in 2020')
plt.xlabel('Population')
plt.ylabel('GDP (USD)')
for i, country in enumerate(population_2020.index):
    plt.text(population_2020[i], gdp_df[i], country)
plt.grid(True)
plt.show()

## Conclusion
In today's post, we learned how to create basic plots using Matplotlib and Pandas. We applied these skills to visualize population data, demonstrating how different types of plots can reveal different aspects of your data. Visualizing your data is a key step in any data analysis process, allowing you to communicate your findings effectively.

Stay tuned as we continue to explore more advanced visualization techniques in the upcoming posts!
