# Data Visualization with Python - Lab 

Time to take a closer look at the data yourself!

![national-cancer-institute-ct10qdGv1hQ-unsplash.jpg](attachment:national-cancer-institute-ct10qdGv1hQ-unsplash.jpg) 

[Data Visualization Tips](#DVT)

**Tasks:**
* [Initial setup](#setup)
* [Task 1](#Task1)
* [Task 2](#Task2)
* [Task 3](#Task3)
* [Task 4](#Task4)
* [Task 5](#Task5)
* [Task 6](#Task6)
* [Task 7](#Task7)

## Data Visualization Tips <a class = 'anchor' id = 'DVT' ><a/>

### 1. Think about what message you want to give
* Try to tell a story visually
* Use the right representation that conveys your message best

### 2. Choose the right data visualization for your data

Some examples...

* **Bar graphs** are one of the most popular types of data visualizations. They offer a great amount of information in a quick   glance. They are best to compare a few values within the same category. For example, comparing the sales of two different products over the years.

* **Line** plots are useful for visualizing the trend in a numerical value over a continuous time interval. They effectively capture the trends and patterns in data and can be used to compare multiple values. An example of such a data visualization would be to show the trend in the monthly income of a company over the last few months.

* **Scatter** plots are useful for showing the relationship between two variables. Any correlation between variables or outliers in the data can be easily spotted using scatter plots. For example, it can be used to compare how the price of a house varies with the size of the living room.

* **Pie charts** are suitable to show the proportional distribution of items within the same category. But they should be used prudently otherwise they do more harm than good. For example, the percentage of android users to iOS users in a country.

* **Histograms** show the distribution of numeric data through a continuous interval by segmenting data into different bins. They are great for showing the distribution of data. For example, visualizing the number of orders for a product over the years.

### 3. Label your data visualizations
* **Labels should be legible**. If it is not clear, it is of no use. Therefore, make sure the labels are easy to read and comprehensible.

* **Give a title** to the graph. Viewers can easily get instant gist of what the graph is about when you give a suitable title to your graph.

* **Use a legend wisely**. A legend makes it easier to spot the difference between the various lines in the graph. But when using line charts, try to label directions.  This makes it easier to identify lines.

* **Label your axes**. Sometimes it might not be clear from the title what the axes represent. Therefore, you might want to label your axes at times.

* **Pay attention to the labeling on the axes**. Sometimes, you don’t need to label all the ticks on the axes. You can instead label them at intervals if they still convey the right message.

### 3. Reduce clutter through Gestalt Principles

**Closure** <br/>
The definition of enclosure is this: objects collected within a boundary-like structure are perceived as a group. By placing a line or shading around visual elements on a dashboard, signals that objects within the boundary form one, or belong together.

**Similarity** <br/>
When objects share similar attributes, such as color, direction, or shape, they are perceived as being part of a group. The Similarity Principle can help someone more readily identify which groups the displayed data belong with.

**Proximity** <br/>
This is the Principle that most data visualizers know well. We are accustomed to placing charts and graphs near each other or in some organized arrangement. We put filters usually together and we place titles near (above) the graph that they describe. It’s common to align text near the object that it is describing. The Proximity Principle states that objects arranged close together are perceived as more related that those placed further apart.

**Symmetry** <br/>
Last is the Symmetry Principle which states that symmetrical elements tend to be perceived as belonging together regardless if they are far apart. Symmetry gives us a sense of solidity and order. If we think about nature, many things have symmetry – our bodies, the shape of a leaf, a butterfly’s wings. Humans are attracted to symmetry mostly because it’s familiar, but also because we are programmed to recognize symmetrical objects such as faces.

**Common Fate** <br/>
Objects functioning or moving in the same direction appear to belong together, that is, they are perceived as a single unit (e.g., a flock of birds).

**Continuity** <br/>
The principle of continuity states that elements that are arranged on a line or curve are perceived to be more related than elements not on the line or curve.


### 4. Keep it as simple as possible, but no simpler
* Remove anything that doesn’t support the story 
***


Now let's practice!

## Initial setup <a class = 'anchor' id = 'setup' ><a/>

Import all needed libraries.

In [1]:
# You will need following libraries: pandas, numpy, matplotlib.pyplot, seaborn, 
# If you want dynamic plots you could also need: plotly.graph_objects, plotly.express

# Import your libraries in this cell


# Press SHIFT+ENTER or press the play button in the toolbar above, to run the piece of code

Import the data

In [2]:
# Import the data from the house_prices.csv file 
# Import the data in this cell


# Press SHIFT+ENTER or press the play button in the toolbar above, to run the piece of code

## Task 1 <a class = 'anchor' id = 'Task1' ><a/>

Create a scatterplot with house prices on the x axis and living room sizes on the y axis.

In [3]:
# Code your solution in this cell 


# Always press SHIFT+ENTER or press the play button in the toolbar above, to run the piece of code

## Task 2 <a class = 'anchor' id = 'Task2' ><a/>

Visualize the comparison between houses with and houses without balconies. 

In [4]:
# Code your solution in this cell (You could use a Seaborn Count Plot for that usecase...)




## Task 3 <a class = 'anchor' id = 'Task3' ><a/>

Show the price distribution of all houses up to 1 million euros. Change the scale of the diagram to make it more visible. 

In [5]:
# Code your solution in this cell 




## Task 4 <a class = 'anchor' id = 'Task4' ><a/>

Create a histogram that illustrates the distribution of the age of the buildings. <br/>
Consider only houses that are not older than 100 years.

In [6]:
# Code your solution in this cell 




## Task 5 <a class = 'anchor' id = 'Task5' ><a/>

Visualize the age of houses with alarm systems compared to those without. <br/>
Consider only houses that are 70 years old or less. <br/>
Use boxplots for this purpose. <br/>
(The houses that do not have an alarm system have the value 0 in the variable 'alarm', the houses with an alarm system have the value 1.)

In [7]:
# Code your solution in this cell - Maybe using the 'col' or 'row' argument in Seaborn could be useful... 




## Task 6<a class = 'anchor' id = 'Task6' ><a/>

Create a heat map of the correlations between the variables of the house data.

In [8]:
# Code your solution in this cell 




## Task 7<a class = 'anchor' id = 'Task7' ><a/>

Apply the Principal Component Analysis (PCA) to the Iris dataset.

In [9]:
import pandas as pd
import numpy as np
import random as rd
from sklearn.decomposition import PCA
from sklearn import preprocessing
import matplotlib.pyplot as plt

# Load the library
from sklearn import datasets

# Load the dataset
iris = datasets.load_iris()

In [10]:
# Assigning Input (X) and Output (y) variables




In [11]:
# Scale the data


# Create pca object 


# Build the model with our scaled input data


# Creating coordinates for the PCA Graph 


# Create a DataFrame from the pca data: pca_df

# Put labels in a DataFrame and concatonate with the pca dataframe for the new coordinates to plot
# (This part is done for you)
labels_df = pd.DataFrame(y, columns = ['Types'])

new_coordinates = pd.concat([pca_df, labels_df], axis = 1)
new_coordinates.head()

NameError: name 'y' is not defined

In [None]:
# Visualize the data with matplotlib or seaborn


