# Data Visualization 

Python is a very popular programming language and has very good libraries like _**NumPy**_ and _**Pandas**_ to manipulate data and _**Matplotlib**_ and _**Seaborn**_ to visualize the data. We can make many types of visualizations like Bar graphs, Line graphs, Boxplots, Histograms. 

Data visulalization is all about loading data, simplifying data, cleaning data, augmenting data (when it is not reach enough) and understand data on a more intuitive level. 

First of all, we need data in a plottable form to be able to plot data. NumPy and Pandas can be used for this purpose. We can efficinetly load, store, manipulate and export data using this libraries.

Matplotlib and Seaborn are very popular Python plotting libraries. While Matplotlib API is relatively low-level, Seaborn API is hig-level and provides high-level graphics.   

### Data Loading

We can load/import data from a csv file or an excel file using Pandas library. First, we need to import Pandas library. Then, we get our data using Pandas' read_csv function which takes the file path or url of the file as argument. 

In [1]:
import pandas as pd
file_url = 'https://drive.google.com/uc?id=1_0F4v5dven3QQ9QgmJieoxmTJi6mwjT5' 
dataset = pd.read_csv(file_url)

Let's look at a small part of our dataset to figure out what kind of data we have. Rows include observations and columns include features. Rows have labels called indices or headers. It starts with zero. 

In [2]:
dataset.head(5)  # show the first 5 rows

Unnamed: 0,Month,Starting Balance,Repayment,Interest Paid,Principal Paid,New Balance,term,interest_rate,car_type
0,1,34689.96,687.23,202.93,484.3,34205.66,60,0.0702,Toyota Sienna
1,2,34205.66,687.23,200.1,487.13,33718.53,60,0.0702,Toyota Sienna
2,3,33718.53,687.23,197.25,489.98,33228.55,60,0.0702,Toyota Sienna
3,4,33228.55,687.23,194.38,492.85,32735.7,60,0.0702,Toyota Sienna
4,5,32735.7,687.23,191.5,495.73,32239.97,60,0.0702,Toyota Sienna


In [3]:
dataset.tail(5)  # show the last 5 rows

Unnamed: 0,Month,Starting Balance,Repayment,Interest Paid,Principal Paid,New Balance,term,interest_rate,car_type
403,56,3951.11,796.01,9.54,786.47,3164.64,60,0.029,VW Golf R
404,57,3164.64,796.01,7.64,788.37,2376.27,60,0.029,VW Golf R
405,58,2376.27,796.01,5.74,790.27,1586.0,60,0.029,VW Golf R
406,59,1586.0,796.01,3.83,792.18,793.82,60,0.029,VW Golf R
407,60,793.82,796.01,1.91,794.1,-0.28,60,0.029,VW Golf R


Each tabular view above is called Pandas' _**Data Frame_** and we transfer some content of the data into the Data Frame. Now we can examine the data whether it is correctly loaded and valid. We also want to see the last rows to see if they have same format as the first rows.

The data here shows you for example how much you will pay in interest over time or you can find out how much will be the total interest payment according to months for a particular car. But it is hard to see these informations just by looking at this table. That's why we need visualizations. 

We should verify our dataset. There are a couple of methods and attributes in Pandas library for it. For example: 

In [4]:
dataset.shape  # to see the shape of our dataset

(408, 9)

In [5]:
dataset.dtypes  # to check the column data types

Month                 int64
Starting Balance    float64
Repayment           float64
Interest Paid       float64
Principal Paid      float64
New Balance         float64
term                  int64
interest_rate       float64
car_type             object
dtype: object

In [6]:
dataset.info()  # gives you number of non-null values in the each column

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 408 entries, 0 to 407
Data columns (total 9 columns):
Month               408 non-null int64
Starting Balance    408 non-null float64
Repayment           408 non-null float64
Interest Paid       408 non-null float64
Principal Paid      408 non-null float64
New Balance         408 non-null float64
term                408 non-null int64
interest_rate       408 non-null float64
car_type            408 non-null object
dtypes: float64(6), int64(2), object(1)
memory usage: 28.8+ KB


The last one is important to do because null values are often not preferred for data analysis or visualization tasks. If we have zero in a row, then we can remove or modify that row.