# Creating Data Visualisations Using Python

Creating data visualisations in Python typically involves three key stages:

1. Loading the dataset
2. Cleaning and shaping the relevant data
3. Visually presenting the insights

> __Note__: We'll explore each of these stages using the [Kickstarter Projects dataset](https://www.kaggle.com/datasets/kemical/kickstarter-projects?select=ks-projects-201801.csv).

## 1. Loading the Dataset

### What happens at this stage?

This is where we import our data into the Python environment.

### What are some common data sources?

We can load data stored in common file formats such as `.csv`, `.xlsx`, and `.json`. Data can also be loaded directly from databases.

### Which Python libraries can be used for this task?

- [pandas](https://pandas.pydata.org/docs/) – Used to read data from various file formats and return a DataFrame.
- [openpyxl](https://openpyxl.readthedocs.io/en/stable/#documentation) or [xlrd](https://xlrd.readthedocs.io/en/latest/) – Useful for working with Excel files.
- [sqlalchemy]() – An Object Relational Mapper (ORM) for Python that allows access to SQL databases and loading of data.

## 2. Cleaning and Shaping the Dataset

### What happens at this stage?

Here, we identify and extract data relevant to the insights we want to explore. We then clean and structure the data to ensure it is accurate, complete, and ready for analysis or visualisation. This process can include:

- Handling missing values (df.dropna(), df.fillna())
- Renaming or selecting specific columns
- Changing data types (e.g., `df['date'] = pd.to_datetime(df['date'])`)
- Filtering or grouping data
- Merging or joining datasets

### What tools and libraries are commonly used?

- [pandas](https://pandas.pydata.org/docs/) – For powerful data manipulation and transformation
- [numpy](https://numpy.org/doc/stable/) – For numerical operations and array-based computations

## 3. Visually Presenting the Insights

### What happens at this stage?

This is where we use our cleaned and prepared data to create visual representations—such as charts and graphs—that help communicate the insights effectively.

### Which libraries are commonly used for data visualisation?

- [matplotlib](https://matplotlib.org/stable/index.html) – A foundational plotting library that provides fine-grained control, though it can be more verbose.
- [seaborn](https://seaborn.pydata.org/tutorial.html) – Built on top of matplotlib, it offers a higher-level interface for creating statistical graphics.
- [plotly]() – Allows creation of interactive and web-based visualisations.
- [pandas](https://pandas.pydata.org/docs/) – Offers built-in plotting capabilities via .plot().