# Creating Charts

Pandas allows you to read structured datasets and visualize them using the `plot()` method. By default, Pandas uses `matplotlib` to create the plots.

In this notebook, we will take work with open dataset of crime in London.

In [None]:
import pandas as pd
import os
import glob

%matplotlib inline
import matplotlib.pyplot as plt

We have 12 different CSV files containing crime data for each month of 2020. We can use the `glob` module to find all files matching a pattern.

In [None]:
data_pkg_path = 'data'
folder = 'crime'
file_pattern = '2020-*.csv'
file_path_pattern = os.path.join(data_pkg_path, folder, file_pattern)

file_list = []
for file in glob.glob(file_path_pattern):
    file_list.append(file)
file_list

It will be helpful to merge all these files into a single dataframe. We can use `pd.concat()` to merge a list of dataframes.

In [None]:
dataframe_list = []

for file in file_list:
    df = pd.read_csv(file)
    dataframe_list.append(df)

merged_df = pd.concat(dataframe_list)

Let's create a pie-chart showing the distribution of different types of crime. Pandas `groupby()` function allows us to calculate group statistics.

In [None]:
type_counts = merged_df.groupby('Crime type').size()
type_counts

We now uses the `plot()` method to create the chart. This method is a wrapper around `matplotlib` and can accept supported arguments from it. 

Reference: [pandas.DataFrame.plot](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html)

In [None]:
fig, ax = plt.subplots(1, 1)
fig.set_size_inches(15,7)
type_counts.plot(kind='pie', ax=ax, wedgeprops={'linewidth': 1.0, 'edgecolor': 'white'}, label='')
plt.tight_layout()
plt.title('Crime Types', fontsize = 18)

plt.show()

We can also chart the trend of crime over the year. For this, let's group the data by month.

In [None]:
monthly_counts = merged_df.groupby('Month').size()
monthly_counts

In [None]:
fig, ax = plt.subplots(1, 1)
fig.set_size_inches(15,7)
monthly_counts.plot(kind='bar', ax=ax)
plt.show()

We can make the chart more informating by stacking the chart with information about crime type. 

In [None]:
counts_by_type = merged_df.groupby(['Month', 'Crime type']).size()
counts_by_type

The result is not in a suitable format for plotting. We call `unstack()` to create a dataframe. 

In [None]:
counts_df = counts_by_type.unstack()
counts_df

Now we can create the stacked bar chart. Instead of the default legend, we create a horizontal legend with a frame using the `legend()` function.

In [None]:
fig, ax = plt.subplots(1, 1)
fig.set_size_inches(20,10)
counts_df.plot(kind='bar', stacked=True, ax=ax, colormap='tab20')
plt.legend(loc='upper center', ncol=5, frameon=True, bbox_to_anchor=(0.5, 1.1), fancybox=True, shadow=True)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
plt.xlabel('Year', size = 15)
plt.ylabel('Number of Incidents', size = 15)
plt.title('Crime in London (2020)', size = 18, y=1.1)
output_folder = 'output'
output_path = os.path.join(output_folder, 'stacked_chart.jpg')
plt.savefig(output_path)
plt.show()

## Exercise

Plot the trend of Bicycle thefts as a line chart.

Hint: Select the column 'Bicycle theft' from the `counts_df` dataframe and use the `plot()` function on the result.