# Data Visualization
<li>Data Visualization is the process of presenting data in the form of graphs or charts.</li>
<li>It helps to understand large and complex amounts of data very easily.</li>
<li>It allows the decision-makers to make decisions very efficiently and also allows them in identifying new trends and patterns very easily.</li>
<li>It is also used in high-level data analysis for Machine Learning and Exploratory Data Analysis (EDA).</li>
<li>At the heart of any data science workflow is data exploration. Most commonly, we explore data by using the following:</li>
<ol>
    <li>Statistical methods (measuring averages, measuring variability, etc.)</li>
    <li>Data visualization (transforming data into a visual form)</li>
</ol>

<li>This indicates that one of the central tasks of data visualization is to help us explore data.</li>
<li>The other central task is to help us communicate and explain the results we've found through exploring data.</li>
<li>That being said, we have two kinds of data visualization:</li>
<ol>
    <li><b>Exploratory data visualization:</b> we build graphs for ourselves to explore data and find patterns.</li>               <li><b>Explanatory data visualization:</b> we build graphs for others to communicate and explain the patterns we've found through exploring data.</li>
</ol>
<li>Data visualization can be done with various tools like Tableau, Power BI, Matplotlib, Plotly, Seaborn.</li>
<li>Here, we are going to learn about matplotlib and seaborn.</li>


## Matplotlib
<li>Matplotlib is a Python library specifically designed for creating visualizations.</li>
<li>Matplotlib is open source and we can use it freely.</li>
<li>It is a cross-platform library for making 2D plots from data in arrays.</li>
<li>It provides an object-oriented API that helps in embedding plots in applications using Python GUI toolkits such as PyQt, Tkinter.</li>
<li>It can be used in Python and IPython shells, Jupyter notebook and web application servers also.</li>
<li>The various plots we can utilize using matplotlib are: </li>
<ol>
    <b><li>Line Plot</li></b>
    <b><li>Histogram</li></b>
    <b><li>Bargraph</li></b> 
    <b><li>Pie Chart.</li></b>
    <b><li>Scatter Plot</li></b>
    <b><li>Box Plot</li></b>
    <b><li>Distribution plot</li></b>
    <b><li>3D Plot</li></b>
    <b><li>Image</li></b>
</ol>

## Installation Of Matplotlib
<li>Go to your terminal, open and activate your virtual environment and then use the following commands for installing matplotlib.</li>
<code>
    pip install matplotlib
</code>


## Importing Matplotlib
<li>We need to import matplotlib if we want to create any visualization graphs.</li>
<li>We can import matplotlib package using the following command:</li>
<code>
import matplotlib.pyplot as plt
</code>

## Coordinate Systems In A Graph

<li>We can create a graph by drawing two lines at right angles to each other.</li>
<li>Each line is called an axis — the horizontal line at the bottom is the x-axis, and the vertical line on the left is the y-axis.</li>
<li>The point where the two lines intersect is called the origin.</li>

![](images/axis_and_origin.png)

<li>Each axis has length — below, we see both axes marked with numbers, which represent unit lengths.</li>

![](images/unit_distance_from_axis.png)

<li>The length of the axes helps us precisely locate any point drawn on the graph.</li>
<li>Point A on the graph below, for instance, is seven length units away from the y-axis and two units away from the x-axis.</li>
<li>The x-coordinate shows the distance in unit lengths relative to the y-axis.</li>
<li>The y-coordinate shows the distance in unit lengths relative to the x-axis.</li>

![](images/coordinate_system.png)

## Interpret the following graph

![](images/coordinate_system_question.png)
<ol>
    <li>What is the unit length of the x-axis? Assign your answer to x_unit_length.</li>
    <li>What is the unit length of the y-axis? Assign your answer to y_unit_length.</li>
    <li>What is x-coordinate of point A? Assign your answer to x_coordinate_A.</li>
    <li>What is the y-coordinate of point B? Assign your answer to y_coordinate_B.</li>
<li>What are the x- and y-coordinates of point C? Assign your answer as a Python list to C_coordinates — the x-coordinate must come first in your list.</li>
</b>

## 1. Lineplot
<li>To create our line plot graph above, we need to:</li>
<ol>
    <li>Add the data to the plt.plot() function.</li>
    <li>The data should represent the coordinate systems data.</li>
    <li>x-coordinates and y-coordinates values should be passed separately in two different lists.</li>
    <li>Display the plot using the plt.show() function.</li>
    
</ol>

## Covid19 deaths according to months
month_number = [1, 2, 3, 4, 5, 6, 7]

new_deaths = [213, 2729, 37718, 184064, 143119, 136073, 165003]

## Question:

<li>Read cricket_scores.csv and load it into a pandas dataframe.</li>
<li>Extract the scores and overs details when Newzeland was batting.</li>
<li>Create a lineplot to show the scores scored by Newzeland in each over.</li>
<li>In the x-axis, we have to include the overs while in y-axis, you have to include the scores.</li> 

## Customizing a graph

<li>To give the title to the plot we can use <b>plt.title()</b> and add title details inside.</li>
<li>Similarly, we can also give names to the coordinate axes using <b>plt.xlabel()</b> and <b>plt.ylabel().</b></li>
<li>You can also use <b>marker</b> and <b>markersize</b> parameters if you also want to highlight the coordinate points while creating line graph plot.</li>

## Instructions
<li>Add title to the newzeland score per over line plot.</li>
<li>Also provide the xlabel and ylabel accordingly.</li>
<li>Experiment with marker and markersize parameter to highlight scores in each over.</li>

## Visualizing time series data using line graph
<li>Time-series data is a sequence of data points collected over time intervals, allowing us to track changes over time.</li> <li>Time-series data can track changes over milliseconds, days, or even years.</li>
<li>Typically we visualize time series with line graphs. The time values are always plotted, by convention, on the x-axis.</li>

<b>Use 'AirPassengers.csv' data</b>

### Growth over time and thier graphs
<li>Generally, a quantity that increases very quickly in the beginning — and then it slows down more and more over time — has a logarithmic growth.</li>
<li>Generally, a quantity that increases slowly in the beginning — but then starts growing faster and faster over time — has exponential growth.</li>
<li>Generally, a quantity that increases constantly over time has linear growth.</li>
<li>To sum up, these are the three types of growth we've learned in this session:</li>

![](images/types_of_graph.png)

<li>Change is not only about growth.</li>
<li>A quantity can also decrease following a linear, exponential, or logarithmic pattern.</li>

![](images/negative_growth_graph.png)



<li>In practice, most of the line graphs we plot don't show any clear pattern.</li>
<li>We need to pay close attention to what we see and try to extract meaning without forcing the data into some patterns we already know.</li>
<li>When we see irregularities on a line graph, this doesn't mean we can't extract any meaning.</li>
<li>By analyzing the irregularities, we can sometimes uncover interesting details.</li>

![](images/irregular_graph.png)



## Comparing graphs
<li>If we want to compare the rate of quantity change with respect to another variable then we can create two line plots and visualize it.</li>
<li>Don't forget to use label parameter inside plt.plot() if you are creating two line plots in a same graph.</li>
<li>Incase you are plotting two line plots in same graph, you can give colors to each line plot using color parameter inside plt.plot() function.</li>
<li>You can also add legend in a graph using plt.legend() but make sure you have supplied label.</li>
