# Creating Data Visualisations using Python

Creating data visualisations in Python generally involves three stages, including:

The three stages are:
1. Loading the dataset.
2. Identifying, cleaning, and shaping relevant data.
3. Visually presenting these insights.

> __Note__: We'll explore each of these stages, using a [Kickstarter Projects](https://www.kaggle.com/kemical/kickstarter-projects?select=ks-projects-201801.csv) dataset.

## 1. Loading the dataset

### What happens at this stage?

This is the stage where we import our data into the python environment.

### What are some of the common data sources used?

We can load data stored in common file formats including, `.csv`, `.xlsx`, and `.json`. We can also load data from a database.

### What python libraries can we use for this task?

- [pandas]() - _pandas_ can read data from many file formats outputting a dataframe.
- [openpyxl]() or [xlrd]() - _openpyxl_ and _xlrd_ are popular choices for loading excel files.
- [sqlalchemy]() - _sqlalchemy_ is an object relational mapper (ORM) for python which can be used to access SQL databases and load data.

## 2. Cleaning and Shaping the Dataset

### What happens at this stage?

At this stage, we identify the data that is relevant for the insights we wish to explore. From here, we ensure the data is accurate, complete, and in a format that’s ready for analysis or visualisation. This can include:

1. Handling missing values (`df.dropna()`, `df.fillna()`)
2. Renaming or selecting columns
3. Changing data types (`df['date'] = pd.to_datetime(df['date'])`)
4. Filtering or grouping data
5. Merging or joining datasets

### What are some of the common tools and libraries used at this stage?

- [pandas]() - _pandas_ for data manipulation
- [numpy]() - _numpy_ for numerical operations

## 3. Visually presenting these insights

### What happens at this stage?

This is where we use our prepared data to create visualisations including: charts, graphs, and other visual representations of the data.

### What are some of the popular libraries used for this task?

- [matplotlib]() – _matplotlib_ is a low-level plotting library that offers a high degree of control but can be more verbose.
- [seaborn]() – _seaborn_ provides statistical plots built on top of matplotlib and is considered easier to use but with less control.
- [plotly]() – _plotly_ produces interactive visualisations
- [pandas]() -  built-in plotting (df.plot())

Example using Seaborn:

python
Copy
Edit
import seaborn as sns
import matplotlib.pyplot as plt

sns.barplot(x='category', y='value', data=df)
plt.title('Value by Category')
plt.show()