# What is data visualization and why is it important?

**Data visualization is the representation of data or information in a graph, chart, or other visual format. It communicates relationships of the data with images. This is important because it allows trends and patterns to be more easily seen. With the rise of big data upon us, we need to be able to interpret increasingly larger batches of data. Machine learning makes it easier to conduct analyses such as predictive analysis, which can then serve as helpful visualizations to present. But data visualization is not only important for data scientists and data analysts, it is necessary to understand data visualization in any career. Whether you work in finance, marketing, tech, design, or anything else, you need to visualize data. That fact showcases the importance of data visualization.**

# Why do we need data visualization?

**We need data visualization because a visual summary of information makes it easier to identify patterns and trends than looking through thousands of rows on a spreadsheet. It’s the way the human brain works. Since the purpose of data analysis is to gain insights, data is much more valuable when it is visualized. Even if a data analyst can pull insights from data without visualization, it will be more difficult to communicate the meaning without visualization. Charts and graphs make communicating data findings easier even if you can identify the patterns without them.**

**In undergraduate business schools, students are often taught the importance of presenting data findings with visualization. Without a visual representation of the insights, it can be hard for the audience to grasp the true meaning of the findings. For example, rattling off numbers to your boss won’t tell them why they should care about the data, but showing them a graph of how much money the insights could save/make them is sure to get their attention.**

# Index:
> 2D Plot
> > 1. Scatter Plot
> > 2. Line Plot
> > 3. Histogram Plot
> > 4. Bar Plot
> > 5. Pie Chart Plot
> > 6. Box Plot
> > 7. Heat Map
> > 8. Faceting
> > 9. Pairplot
> > 10. Time Plot
> > 11. Tree Plot
> > 12. Sunburst Plot
> > 13. Bubble Plot
> > 14. Calender Plot
> > 15. Violin Plot
> > 16. Folium Map Plot
> > 17. Choropleth
> > 18. Rose Plot         

> 3D plot
> > Surface Plot         
Scatter Plot        
Bar Plot       
Volume Plot

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib as plt
from plotly.offline import iplot, init_notebook_mode
import plotly.offline as pyo
import plotly.graph_objs as go
pyo.init_notebook_mode()
import plotly.express as px
import warnings
warnings.filterwarnings("ignore")
#init_notebook_mode(connected=False)

cp = pd.read_csv("../input/factors-affecting-campus-placement/Placement_Data_Full_Class.csv")
df1 = pd.read_csv("/kaggle/input/house-prices-advanced-regression-techniques/train.csv")
age = pd.read_csv("../input/titanic/train.csv")
sales = pd.read_csv("/kaggle/input/competitive-data-science-predict-future-sales/items.csv")
cp = cp.fillna(0)

In [None]:
import pandas as pd
iris = pd.read_csv('/kaggle/input/iris-flower-dataset/IRIS.csv')
print(iris.head())

In [None]:
wine_reviews = pd.read_csv('/kaggle/input/winemagdata130k/winemag-data-130k-v2.csv', index_col=0)

# Scatter Plot

1. matplotlib

In [None]:
import matplotlib.pyplot as plt

ig, ax = plt.subplots()

# scatter the sepal_length against the sepal_width
ax.scatter(iris['sepal_length'], iris['sepal_width'])
# set a title and labels
ax.set_title('Iris Dataset')
ax.set_xlabel('sepal_length')
ax.set_ylabel('sepal_width')


In [None]:
# create color dictionary
colors = {'Iris-setosa':'r', 'Iris-versicolor':'g', 'Iris-virginica':'b'}
# create a figure and axis
fig, ax = plt.subplots()
# plot each data-point
for i in range(len(iris['sepal_length'])):
    ax.scatter(iris['sepal_length'][i], iris['sepal_width'][i],color=colors[iris['species'][i]])
# set a title and labels
ax.set_title('Iris Dataset')
ax.set_xlabel('sepal_length')
ax.set_ylabel('sepal_width')

2. Pandas Visualization

In [None]:
iris.plot.scatter(x='sepal_length', y='sepal_width', title='Iris Dataset')

3. seaborn

In [None]:
sns.scatterplot(x='sepal_length', y='sepal_width', data=iris)

In [None]:
sns.scatterplot(x='sepal_length', y='sepal_width', hue='species', data=iris)

# Line Plot

1. matplotlib

In [None]:
# get columns to plot
columns = iris.columns.drop(['species'])
# create x data
x_data = range(0, iris.shape[0])
# create figure and axis
fig, ax = plt.subplots()
# plot each column
for column in columns:
    ax.plot(x_data, iris[column], label=column)
# set title and legend
ax.set_title('Iris Dataset')
ax.legend()

2. Pandas Visualization

In [None]:
iris.drop(['species'], axis=1).plot.line(title='Iris Dataset')

3. seaborn

In [None]:
sns.lineplot(data=iris.drop(['species'], axis=1))

# Histogram Plot

1. matplotlib

In [None]:
# create figure and axis
fig, ax = plt.subplots()
# plot histogram
ax.hist(wine_reviews['points'])
# set title and labels
ax.set_title('Wine Review Scores')
ax.set_xlabel('Points')
ax.set_ylabel('Frequency')

In [None]:
from matplotlib import pyplot as plt
m = cp[cp['gender']=="M"]
f = cp[cp['gender']=="F"]
plt.figure(figsize=(10,6))
ax = m['degree_p'].plot.hist(bins=10 ,color="red")
ax = f['degree_p'].plot.hist(bins=10,color="blue")
plt.legend(['Male', 'Female'])

2. Pandas Visualization

In [None]:
iris.plot.hist(subplots=True, layout=(2,2), figsize=(10, 10), bins=20)

3. seaborn

In [None]:
sns.distplot(wine_reviews['points'], bins=10, kde=False)

In [None]:
sns.distplot(wine_reviews['points'], bins=10, kde=True)

# Bar Plot

1. matplotlib

In [None]:
# create a figure and axis 
fig, ax = plt.subplots() 
# count the occurrence of each class 
data = wine_reviews['points'].value_counts() 
# get x and y data 
points = data.index 
frequency = data.values 
# create bar chart 
ax.bar(points, frequency) 
# set title and labels 
ax.set_title('Wine Review Scores') 
ax.set_xlabel('Points') 
ax.set_ylabel('Frequency')

2. Pandas Visualization

In [None]:
wine_reviews['points'].value_counts().sort_index().plot.bar()

In [None]:
wine_reviews['points'].value_counts().sort_index().plot.barh()

In [None]:
wine_reviews.groupby("country").price.mean().sort_values(ascending=False)[:5].plot.bar()

3. seaborn

In [None]:
sns.countplot(wine_reviews['points'])

4. plotly

In [None]:
import plotly.express as px
grgs = cp.groupby(["gender","specialisation"])[["salary"]].mean().reset_index()
fig = px.bar(grgs[['gender', 'salary','specialisation']].sort_values('salary', ascending=False), 
             y="salary", x="gender", color='specialisation', 
             log_y=True, template='ggplot2')
fig.show()


# Pie Chart

In [None]:
grdsp = cp.groupby(["degree_t"])[["degree_p"]].mean().reset_index()

fig = px.pie(grdsp,
             values="degree_p",
             names="degree_t",
             template="seaborn")
fig.update_traces(rotation=45, pull=0.03, textinfo="percent+label")
fig.show()

# Box Plot

In [None]:
df = wine_reviews[(wine_reviews['points']>=95) & (wine_reviews['price']<1000)]
sns.boxplot('points', 'price', data=df)

In [None]:
plt.figure(figsize=(10,6))
ax = sns.boxplot(x="ssc_b", y="ssc_p", hue="gender",
                 data=cp, palette="Set3")

# Heat Map

In [None]:
import numpy as np

# get correlation matrix
corr = iris.corr()
fig, ax = plt.subplots()
# create heatmap
im = ax.imshow(corr.values)

# set labels
ax.set_xticks(np.arange(len(corr.columns)))
ax.set_yticks(np.arange(len(corr.columns)))
ax.set_xticklabels(corr.columns)
ax.set_yticklabels(corr.columns)

# Rotate the tick labels and set their alignment.
plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
         rotation_mode="anchor")

In [None]:
# get correlation matrix
corr = iris.corr()
fig, ax = plt.subplots()
# create heatmap
im = ax.imshow(corr.values)

# set labels
ax.set_xticks(np.arange(len(corr.columns)))
ax.set_yticks(np.arange(len(corr.columns)))
ax.set_xticklabels(corr.columns)
ax.set_yticklabels(corr.columns)

# Rotate the tick labels and set their alignment.
plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
         rotation_mode="anchor")

# Loop over data dimensions and create text annotations.
for i in range(len(corr.columns)):
    for j in range(len(corr.columns)):
        text = ax.text(j, i, np.around(corr.iloc[i, j], decimals=2),
                       ha="center", va="center", color="black")

In [None]:
sns.heatmap(iris.corr(), annot=True)

In [None]:
import seaborn as sns
plt.figure(figsize=(15,6))
h=pd.pivot_table(cp,columns='sl_no',values=["salary"])
sns.heatmap(h,cmap=['skyblue','red','green'],linewidths=0.05)


# Faceting

In [None]:
g = sns.FacetGrid(iris, col='species')
g = g.map(sns.kdeplot, 'sepal_length')

# Pairplot

1. seaborn

In [None]:
sns.pairplot(iris)

2. Pandas Visualization

In [None]:
from pandas.plotting import scatter_matrix

fig, ax = plt.subplots(figsize=(12,12))
scatter_matrix(iris, alpha=1, ax=ax)

# Time Plot

In [None]:
corona_data=pd.read_csv('/kaggle/input/novel-corona-virus-2019-dataset/covid_19_data.csv')
choro_map=px.choropleth(corona_data, 
                    locations="Country/Region", 
                    locationmode = "country names",
                    color="Confirmed", 
                    hover_name="Country/Region", 
                    animation_frame="ObservationDate"
                   )

choro_map.update_layout(
    title_text = 'Global Spread of Coronavirus',
    title_x = 0.5,
    geo=dict(
        showframe = False,
        showcoastlines = False,
    ))
    
choro_map.show()

# Tree Plot 

In [None]:
covid_India_cases = pd.read_csv('../input/covid19-in-india/covid_19_india.csv')
covid_India_cases.rename(columns={'State/UnionTerritory': 'State', 'Cured': 'Recovered', 'Confirmed': 'Confirmed'}, inplace=True)

statewise_cases = pd.DataFrame(covid_India_cases.groupby(['State'])['Confirmed', 'Deaths', 'Recovered'].max().reset_index())
statewise_cases["Country"] = "India" # in order to have a single root node
fig = px.treemap(statewise_cases, path=['Country','State'], values='Confirmed',
                  color='Confirmed', hover_data=['State'],
                  color_continuous_scale='Rainbow')
fig.show()

# Sunburst Plot

In [None]:
sales = sales.tail(20)
fig = px.sunburst(sales, path=["item_category_id",'item_id'],
                  color='item_category_id', hover_data=['item_id'],
                  color_continuous_scale='thermal')
fig.show()


# Bubble Plot

In [None]:
pip install bubbly

In [None]:
pip install chart-studio

In [None]:
m = pd.read_csv("../input/global-hospital-beds-capacity-for-covid19/hospital_beds_USA_v1.csv")
from bubbly.bubbly import bubbleplot 
from plotly.offline import iplot
import chart_studio.plotly as py


figure = bubbleplot(dataset=m, x_column='beds', y_column='population', 
    bubble_column='state', size_column='beds', color_column='type', 
    x_logscale=True, scale_bubble=2, height=350)

iplot(figure)

# Calender Plot

In [None]:
pip install calmap

In [None]:
import calmap
import numpy as np
f = plt.figure(figsize=(20,10))
all_days = pd.date_range('1/1/2019', periods=700, freq='D')
days = np.random.choice(all_days, 100)
events = pd.Series(np.random.randn(len(days)), index=days)
calmap.yearplot(events, year=2020)

# Violin Plot

In [None]:
plt.figure(figsize=(10,6))
ax = sns.violinplot(x="degree_t", y="salary", hue="specialisation",
                    data=cp, palette="muted")

# Folium Map

In [None]:
m = pd.read_csv("../input/global-hospital-beds-capacity-for-covid19/hospital_beds_USA_v1.csv")

import folium
map = folium.Map(location=[37.0902,-95.7129 ], zoom_start=4,tiles='cartodbpositron')

for lat, lon,state,type in zip(m['lat'], m['lng'],m['state'],m['type']):
    folium.CircleMarker([lat, lon],
                        radius=5,
                        color='red',
                      popup =(
                    'State: ' + str(state) + '<br>'),

                        fill_color='red',
                        fill_opacity=0.7 ).add_to(map)
map

# Choropleth

In [None]:
pyo.init_notebook_mode()
fig = px.choropleth(m, locations=m["state"],       

 color=m["beds"],
                    locationmode="USA-states",
                    scope="usa",
                    color_continuous_scale='Reds',
                   )

fig.show()



# Rose Plot

In [None]:
import plotly.express as px
df = px.data.wind()
fig = px.bar_polar(df, r="frequency", theta="direction",
                   color="strength", template="plotly_dark",
                   color_discrete_sequence= px.colors.sequential.Plasma_r)
fig.show()

In [None]:
import plotly.graph_objects as go
pyo.init_notebook_mode()

fig = go.Figure()

fig.add_trace(go.Barpolar(
    r=[77.5, 72.5, 70.0, 45.0, 22.5, 42.5, 40.0, 62.5],
    name='11-14 m/s',
    marker_color='rgb(106,81,163)'
))
fig.add_trace(go.Barpolar(
    r=[57.5, 50.0, 45.0, 35.0, 20.0, 22.5, 37.5, 55.0],
    name='8-11 m/s',
    marker_color='rgb(158,154,200)'
))
fig.add_trace(go.Barpolar(
    r=[40.0, 30.0, 30.0, 35.0, 7.5, 7.5, 32.5, 40.0],
    name='5-8 m/s',
    marker_color='rgb(203,201,226)'
))
fig.add_trace(go.Barpolar(
    r=[20.0, 7.5, 15.0, 22.5, 2.5, 2.5, 12.5, 22.5],
    name='< 5 m/s',
    marker_color='rgb(242,240,247)'
))

fig.update_traces(text=['North', 'N-E', 'East', 'S-E', 'South', 'S-W', 'West', 'N-W'])
fig.update_layout(
    title='Wind Speed Distribution in Laurel, NE',
    font_size=16,
    legend_font_size=16,
    polar_radialaxis_ticksuffix='%',
    polar_angularaxis_rotation=90,

)
fig.show()

In [None]:
pip install chord

In [None]:
matrix = [
    [0, 5, 6, 4, 7, 4],
    [5, 0, 5, 4, 6, 5],
    [6, 5, 0, 4, 5, 5],
    [4, 4, 4, 0, 5, 5],
    [7, 6, 5, 5, 0, 4],
    [4, 5, 5, 5, 4, 0],
]

names = ["Action", "Adventure", "Comedy", "Drama", "Fantasy", "Thriller"]

In [None]:
from chord import Chord
ax=Chord(matrix, names)
ax.show()

In [None]:
#get this graph from output file
Chord(matrix, names, wrap_labels=False, label_color="#4c40bf").to_html()

# 3D plot

## Surface Plot

In [None]:
from mpl_toolkits import mplot3d

import numpy as np
import matplotlib.pyplot as plt

fig = plt.figure()
ax = plt.axes(projection="3d")

plt.show()

In [None]:
fig = plt.figure()
ax = plt.axes(projection="3d")

z_line = np.linspace(0, 15, 1000)
x_line = np.cos(z_line)
y_line = np.sin(z_line)
ax.plot3D(x_line, y_line, z_line, 'gray')

z_points = 15 * np.random.random(100)
x_points = np.cos(z_points) + 0.1 * np.random.randn(100)
y_points = np.sin(z_points) + 0.1 * np.random.randn(100)
ax.scatter3D(x_points, y_points, z_points, c=z_points, cmap='hsv');

plt.show()

In [None]:
fig = plt.figure()
ax = plt.axes(projection="3d")
def z_function(x, y):
    return np.sin(np.sqrt(x ** 2 + y ** 2))

x = np.linspace(-6, 6, 30)
y = np.linspace(-6, 6, 30)

X, Y = np.meshgrid(x, y)
Z = z_function(X, Y)

fig = plt.figure()
ax = plt.axes(projection="3d")
ax.plot_wireframe(X, Y, Z, color='green')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')

plt.show()

In [None]:
ax = plt.axes(projection='3d')
ax.plot_surface(X, Y, Z, rstride=1, cstride=1,
                cmap='winter', edgecolor='none')
ax.set_title('surface');

In [None]:
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
 
# Get the data (csv file is hosted on the web)
url = 'https://python-graph-gallery.com/wp-content/uploads/volcano.csv'
data = pd.read_csv(url)
 
# Transform it to a long format
df=data.unstack().reset_index()
df.columns=["X","Y","Z"]
 
# And transform the old column name in something numeric
df['X']=pd.Categorical(df['X'])
df['X']=df['X'].cat.codes
 
# Make the plot
fig = plt.figure()
ax = fig.gca(projection='3d')
ax.plot_trisurf(df['Y'], df['X'], df['Z'], cmap=plt.cm.viridis, linewidth=0.2)
plt.show()
 
# to Add a color bar which maps values to colors.
surf=ax.plot_trisurf(df['Y'], df['X'], df['Z'], cmap=plt.cm.viridis, linewidth=0.2)
fig.colorbar( surf, shrink=0.5, aspect=5)
plt.show()
 
# Rotate it
ax.view_init(30, 45)
plt.show()
 
# Other palette
ax.plot_trisurf(df['Y'], df['X'], df['Z'], cmap=plt.cm.jet, linewidth=0.01)
plt.show()

## Scatter Plot

In [None]:
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
 
# Dataset
df=pd.DataFrame({'X': range(1,101), 'Y': np.random.randn(100)*15+range(1,101), 'Z': (np.random.randn(100)*15+range(1,101))*2 })
 
# plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(df['X'], df['Y'], df['Z'], c='skyblue', s=60)
ax.view_init(30, 185)
plt.show()

In [None]:
import pandas as pd
import numpy as np
from sklearn.decomposition import PCA
 
# Get the iris dataset
import seaborn as sns
sns.set_style("white")
df = sns.load_dataset('iris')
 
my_dpi=96
plt.figure(figsize=(480/my_dpi, 480/my_dpi), dpi=my_dpi)
 
# Keep the 'specie' column appart + make it numeric for coloring
df['species']=pd.Categorical(df['species'])
my_color=df['species'].cat.codes
df = df.drop('species', 1)
 
# Run The PCA
pca = PCA(n_components=3)
pca.fit(df)
 
# Store results of PCA in a data frame
result=pd.DataFrame(pca.transform(df), columns=['PCA%i' % i for i in range(3)], index=df.index)
 
# Plot initialisation
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(result['PCA0'], result['PCA1'], result['PCA2'], c=my_color, cmap="Set2_r", s=60)
 
# make simple, bare axis lines through space:
xAxisLine = ((min(result['PCA0']), max(result['PCA0'])), (0, 0), (0,0))
ax.plot(xAxisLine[0], xAxisLine[1], xAxisLine[2], 'r')
yAxisLine = ((0, 0), (min(result['PCA1']), max(result['PCA1'])), (0,0))
ax.plot(yAxisLine[0], yAxisLine[1], yAxisLine[2], 'r')
zAxisLine = ((0, 0), (0,0), (min(result['PCA2']), max(result['PCA2'])))
ax.plot(zAxisLine[0], zAxisLine[1], zAxisLine[2], 'r')
 
# label the axes
ax.set_xlabel("PC1")
ax.set_ylabel("PC2")
ax.set_zlabel("PC3")
ax.set_title("PCA on the iris data set")
#plt.show()

## Bar Plot 

In [None]:
import random
fig = plt.figure()
ax = plt.axes(projection="3d")

num_bars = 15
x_pos = random.sample(range(20), num_bars)
y_pos = random.sample(range(20), num_bars)
z_pos = [0] * num_bars
x_size = np.ones(num_bars)
y_size = np.ones(num_bars)
z_size = random.sample(range(20), num_bars)

ax.bar3d(x_pos, y_pos, z_pos, x_size, y_size, z_size, color='aqua')
plt.show()

## Volume Plot

In [None]:
init_notebook_mode(connected=False)
import plotly.graph_objects as go
import numpy as np
X, Y, Z = np.mgrid[-8:8:40j, -8:8:40j, -8:8:40j]
values = np.sin(X*Y*Z) / (X*Y*Z)

fig = go.Figure(data=go.Volume(
    x=X.flatten(),
    y=Y.flatten(),
    z=Z.flatten(),
    value=values.flatten(),
    isomin=0.1,
    isomax=0.8,
    opacity=0.1, # needs to be small to see through all surfaces
    surface_count=17, # needs to be a large number for good volume rendering
    ))
fig.show()

# Conclusion 
Effective data visualization is the crucial final step of data analysis. Without it, important insights and messages can be lost.
What can be suggested, though, are some guides for enhancing the visual quality of routine, workaday designs. Attractive displays of statistical information 
*  have a properly chosen format and design 
*  use words, numbers, and drawing together 
*  reflect a balance, a proportion, a sense of relevant scale 
*  display an accessible complexity of detail 
*  often have a narrative quality, a story to tell about the data 
*  are drawn in a professional manner, with the technical details of production done with care 
*  avoid content-free decoration, including chart junk.