## Basic Ploting with Pandas

While we typically use a library like **matplotlib** or **seaborn** to plot graphs in Jupyter Notebook, Pandas DataFrames and Series also provide a handy **.plot()** method for quick and easy plotting..

In [None]:
from urllib.request import urlretrieve
import pandas as pd

In [None]:
## let's download a data frame for instance
#italy_covid_url = 'https://gist.githubusercontent.com/aakashns/f6a004fa20c84fec53262f9a8bfee775/raw/f309558b1cf5103424cef58e2ecb8704dcd4d74c/italy-covid-daywise.csv'

## Save the file with the correct filename
#urlretrieve(italy_covid_url, 'italy-covid-daywise.csv')

covid_df = pd.read_csv('italy-covid-daywise.csv')

In [None]:
covid_df

In [None]:
covid_df['total_cases'] = covid_df.new_cases.cumsum()
covid_df['total_deaths'] = covid_df.new_deaths.cumsum()
covid_df['total_tests'] = covid_df.new_tests.cumsum()

In [None]:
covid_df

In [None]:
covid_df.new_cases.plot();

Although this plot shows the overall trend, it's hard to tell when and where the peak occurred, as there are no dates on the X-axis.
We can use the date column as the index of the DataFrame to address this issue

In [None]:
covid_df.date

Since the dtype of the dates column is object, we need to convert it into a proper datetime format so that we can use it to plot graphs.

In [None]:
covid_df['date'] = pd.to_datetime(covid_df.date)

In [None]:
covid_df.date

In [None]:
covid_df

In [None]:
covid_df['year'] = pd.DatetimeIndex(covid_df.date).year
covid_df['month'] = pd.DatetimeIndex(covid_df.date).month
covid_df['day'] = pd.DatetimeIndex(covid_df.date).day
covid_df['weekday'] = pd.DatetimeIndex(covid_df.date).weekday

In [None]:
covid_df['month_name'] = covid_df['date'].dt.month_name()

In [None]:
covid_month_df = covid_df.groupby('month_name')[['new_cases', 'new_tests', 'new_deaths']].sum()

In [None]:
# Defining the correct order of months
month_order = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']

In [None]:
# Making sure 'month_name' column exists and is ordered correctly
covid_df['month_name'] = pd.Categorical(covid_df['month_name'], categories=month_order, ordered=True)

In [None]:
covid_month_df = covid_df.groupby('month_name')[['new_cases', 'new_tests', 'new_deaths']].sum().sort_index()

In [None]:
covid_month_df

In [None]:
covid_df

In [None]:
covid_df['weekday_name'] = covid_df['date'].dt.day_name()

In [None]:
covid_df.set_index('date', inplace = True)

In [None]:
covid_df

In [None]:
covid_df.plot()

In [None]:
covid_df.new_cases.plot();

## Note:
**covid_df.set_index('date', inplace = True)**

The above line could be written in a single line like:

**covid_df.set_index('date').plot()**

But if you use:
**covid_df.set_index('date', inplace = True).plot()**
        **OR**
**covid_df.set_index('date', inplace = True).plot();**

this will cause an error causew when you use .set_index(..., inplace=True), the method modifies the DataFrame in place and returns None. So trying to chain .plot() immediately after that will raise an error like:

**AttributeError: 'NoneType' object has no attribute 'plot'**

You can't use .plot() directly after .set_index(..., inplace=True) because:

* .set_index(..., inplace=True) returns None (it just modifies the DataFrame itself).

* So you're effectively doing None.plot(), which causes an error.

The index pf the Data Frame doesn't have to be numeric. We could also turn the date into index which will also allow us to get the data for a specific data
whle using the **.loc** method

In [None]:
covid_df.loc['2020-09-01']

In [None]:
covid_df.new_cases

In [None]:
covid_df.new_cases.plot()
covid_df.new_deaths.plot();

## Why does this work without error?

**covid_df.new_cases.plot()
covid_df.new_deaths.plot();**

1. To suppress the output in Jupyter Notebooks
   the semicolon hides the text output like:
       **<matplotlib.axes._subplots.AxesSubplot at 0x000001...>**
   
2. To separate multiple statements on the same line

**Why no error even if you skip the semicolon?**

Because Python doesn't need it at the end of a line. Both of these are fine:

**covid_df.new_cases.plot() and covid_df.new_deaths.plot()**

Basically the semicolon is just cosmetic in this context — it doesn't affect the behavior of the plot itself.

### Summary

* The semicolon is not required, it just suppresses output in notebooks.

* That's why both lines work — the semicolon in the second is optional.

* You could remove it entirely, or add one to both — it won’t change the plots.

In [None]:
covid_df.total_tests.plot()
covid_df.total_deaths.plot();

In [None]:
death_rate = covid_df.total_deaths / covid_df.total_cases

In [None]:
death_rate.plot(title = 'Death Rate');

In [None]:
positive_rate = covid_df.total_cases / covid_df.total_tests
positive_rate.plot(title = 'Positive Rate');

In [None]:
covid_df

In [None]:
# Now group and sort
covid_month_df = covid_df.groupby('month_name')[['new_cases', 'new_tests', 'new_deaths']].sum().sort_index()

In [None]:
covid_month_df

In [None]:
covid_month_df.plot(kind = 'bar')

In [None]:
covid_month_df.new_cases.plot(kind = 'bar')