# Data Visuallization

It is important to select appropriate data visualization method to efficiently deliver the result of data analysis. 
There are the number of graph for each visualization method. Let examine each type of graph and visualization. 

In [None]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns # just for download example dataset

**1. Time visualization**

The data related with time express the change and it is called time series data.
Time series data is used for tracking trend that change over the time.
We can divide time series data into continueous and discrete.

For visuallization, we can use bar graph, stacked bar chart, dot graph.


In [None]:
# dataset
df = sns.load_dataset('tips')
df.head()

In [None]:
#bar chart
tips_for_day = df.groupby('day').tip.sum()

plt.bar(df['day'].unique(), 
        tips_for_day,
       alpha=0.8,
       width=0.5)
plt.title('total of Tips in each Day')
plt.xlabel('Day')
plt.ylabel('Sum of Tips')
plt.show()

In [None]:
# stacked bar chart
tips_for_day_f = df[df['sex']=='Female'].groupby('day').tip.sum()
tips_for_day_m = df[df['sex']=='Male'].groupby('day').tip.sum()

p1 = plt.bar(df['day'].unique(), tips_for_day_f, color='r',)
p2 = plt.bar(df['day'].unique(),tips_for_day_m, color='b',
             bottom=tips_for_day_f) # stacked bar chart

plt.title('Stacked Bar Chart of Sum of Tips by Day & Sex')
plt.ylabel('Sum of Tips')
plt.xlabel('Day')
plt.legend((p1[0], p2[0]), ('Female', 'Male'))
plt.show()

In [None]:
# dot chart

plt.plot(df[df['sex']=='Female']['total_bill'],df[df['sex']=='Female']['tip'],'r^')
plt.plot(df[df['sex']=='Male']['total_bill'],df[df['sex']=='Male']['tip'],'bs')
plt.title('Dot Chart of total price of Tips by Sex')
plt.ylabel('total price')
plt.xlabel('tip')
plt.show()

**2. distribution visualization**

The distribution data seperated into maximum, minimum, overall distribution.
In this dataset, we have to focus on distribution.
Sum of distribution data is 1 or 100%
we have to show the part of relationship on overall perspective.

For visualization, we can use pie chart, donut chart, tree map, stacked continuous chart.

In [None]:
# pie chart
# Pie chart, where the slices will be ordered and plotted counter-clockwise:
labels = df['day'].unique()
sizes = df.groupby('day').day.count()  # only "explode" the 2nd slice (i.e. 'Hogs')
explode = (0.2, 0, 0, 0) 

fig1, ax1 = plt.subplots()
ax1.pie(sizes, 
        explode=explode, 
        labels=labels, 
        autopct='%1.1f%%', # label numerical value
        shadow=True, 
        startangle=90)
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
plt.title('percentage of day that people give tips')
plt.show()

In [None]:
#donut chart
labels = df['day'].unique()
sizes = df.groupby('day').day.count()


fig1, ax1 = plt.subplots()
ax1.pie(sizes, 
        labels=labels, 
        autopct='%1.1f%%',
        startangle=90,
        wedgeprops={'width':0.5},
        pctdistance=0.7 # set the location of labels
       )
ax1.axis('equal') 
plt.title('percentage of day that people give tips')
plt.show()

In [None]:
# tree map
# we can't make the tree map only using matplotlib, so we have to import squarify and get help!
import squarify

plt.style.use('default')

sizes = df.groupby('day').day.count()
labels = df['day'].unique()
colors = ['lightgreen', 'cornflowerblue', 'mediumpurple', 'lightcoral']

squarify.plot(sizes, label=labels, color=colors,
              bar_kwargs=dict(linewidth=3, edgecolor="#eee"))
plt.show()

**3. relationship visualization**

If we know the correlation between two variable, we can predict the change of one variable according to the change of another variable. Using relationship visualization we can know the correlation between two variable.

For visualization, we can use scatter plot, bubble chart, histogram

In [None]:
#scatter plot
x = df['total_bill']
y = df['tip']

a, b = np.polyfit(x, y, 1) # fit a linear curve an estimate its y-values and their error.
y_est = a * x + b
y_err = x.std() * np.sqrt(1/len(x) +
                          (x - x.mean())**2 / np.sum((x - x.mean())**2))

fig, ax = plt.subplots()
plt.title('relationship between total bill and tip')
plt.xlabel('total price')
plt.ylabel('tip')
ax.plot(x, y_est, '-')
ax.fill_between(x, y_est - y_err, y_est + y_err, alpha=0.3)
ax.plot(x, y, 'o')

In [None]:
#bubble chart
x = df['size']
y = df['total_bill']
volume = df['tip']
fig, ax = plt.subplots()
ax.scatter(x,y,volume*50,c="g", alpha=0.5, label="tip",marker=r'$\clubsuit$')
ax.grid(True)
fig.tight_layout()

plt.xlabel("size")
plt.ylabel("total_bill")
plt.legend(loc='upper left')
plt.show()

In [None]:
#histogram
labels = df['day'].unique()
men_count = df[df['sex']=='Male'].groupby('day').day.count()
women_count = df[df['sex']=='Female'].groupby('day').day.count()

x = np.arange(len(labels))  # the label locations
width = 0.35  # the width of the bars

fig, ax = plt.subplots()
rects1 = ax.bar(x - width/2, men_count, width, label='Men')
rects2 = ax.bar(x + width/2, women_count, width, label='Women')

# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_ylabel('the number of giving tip')
ax.set_title('the number of giving tip grouped by gender')
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.legend()

ax.bar_label(rects1, padding=3)
ax.bar_label(rects2, padding=3)

fig.tight_layout()
plt.show()