>In this notebook, I worked on data visualization using Plotly. If you find this notebook useful, to share with other people please give upvote and comment.

# Introduction
A video game is an electronic game that involves interaction with a user interface or input device – such as a joystick, controller, keyboard, or motion sensing device – to generate visual feedback for a player. Video games are defined based on their platform, which include arcade games, console games, and PC games. Video games are classified into a wide range of genres based on their type of gameplay and purpose. (wikipedia)

1. [Loading Data and Explanation of Features](#1)
2. [Visualization with Plotly](#2)
    * [Line Charts](#21)
    * [Scatter Charts](#22)
    * [Bar Charts](#23)
    * [Pie Charts](#24)
    * [Bubble Charts](#25)
    * [Histogram](#26)
    * [Word Cloud](#27)
    * [Box Plot](#28)
    * [Scatter Plot Matrix](#29)
    * [3D Scatter Plot with Colorscaling](#210)
    * [Multiple Subplots](#211)


In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import missingno as mn
from collections import Counter

# import plotly.plotly as py
from plotly.offline import init_notebook_mode, iplot, plot
import plotly as py
init_notebook_mode(connected=True)
import plotly.graph_objs as go

# word cloud library
from wordcloud import WordCloud

import warnings
warnings.filterwarnings('ignore')

# matplotlib
import matplotlib.pyplot as plt
# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

<a id=1></a>
# Loading Data and Explanation of Features

In [None]:
game_data = pd.read_csv("../input/videogamesales/vgsales.csv")

In [None]:
game_data.columns

* Rank - Ranking of overall sales
 
* Name - The games name
 
* Platform - Platform of the games release (i.e. PC,PS4, etc.)
 
* Year - Year of the game's release
 
* Genre - Genre of the game
 
* Publisher - Publisher of the game
 
* NA_Sales - Sales in North America (in millions)
 
* EU_Sales - Sales in Europe (in millions)
 
* JP_Sales - Sales in Japan (in millions)
 
* Other_Sales - Sales in the rest of the world (in millions)

* Global_Sales - Total worldwide sales.

In [None]:
#Statistical Info on Numeric Values
game_data.describe()

In [None]:
game_data.head()

In [None]:
game_data.info()

In this data some values are null. First, we need to handle these null values.

In [None]:
# mn --> missingno
mn.bar(game_data)
plt.show()

In [None]:
game_data.isnull().sum()

271 out of 16598 Year values and 58 out of 16598 Publisher values are null. We can either find these rows' exact values or just remove them in file. In this case, I will simply remove these rows.

In [None]:
game_data = game_data.dropna()
#check null values
game_data.isnull().sum()

In [None]:
#Changing "Year" column into  int

game_data.Year = game_data.Year.astype('int')
game_data.info()

<a id=2></a>
# Visualization with Plotly

<a id=21></a>
## Line Charts 

In [None]:
game_data.info()

In [None]:
#select only first 100 rows
df = game_data.iloc[:100,:]

trace1 = go.Scatter(
                    x = df.Rank,
                    y = df.NA_Sales,
                    mode = "lines",
                    name = "NA_Sales",
                    marker = dict(color = 'rgba(147, 112, 219, 0.8)'),
                    text= df.Name)

trace2 = go.Scatter(
                    x = df.Rank,
                    y = df.EU_Sales,
                    mode = "lines+markers",
                    name = "EU_Sales",
                    marker = dict(color = 'rgba(250, 128, 114, 0.8)'),
                    text= df.Name)

data = [trace1, trace2]
layout = dict(title = 'NA and EU Sales vs Rank of Top 100 Rank',
              xaxis= dict(title= 'Rank',ticklen= 5,zeroline= False)
             )
fig = dict(data = data, layout = layout)
iplot(fig)

<a id=22></a>
## Scatter Charts 

In [None]:
game_data.info()

In [None]:
#select some data from game_data
df2017 = game_data[game_data.Year == 2017].iloc[:100,:]
df2007 = game_data[game_data.Year == 2007].iloc[:100,:]
df1997 = game_data[game_data.Year == 1997].iloc[:100,:]

trace1 =go.Scatter(
                    x = df2017.Genre,
                    y = df2017.NA_Sales,
                    mode = "markers",
                    name = "2017",
                    marker = dict(color = 'rgba(255, 128, 255, 0.8)'),
                    text= df2017.Name)

trace2 =go.Scatter(
                    x = df2007.Genre,
                    y = df2007.NA_Sales,
                    mode = "markers",
                    name = "2007",
                    marker = dict(color = 'rgba(255, 128, 2, 0.8)'),
                    text= df2007.Name)

trace3 =go.Scatter(
                    x = df1997.Genre,
                    y = df1997.NA_Sales,
                    mode = "markers",
                    name = "1997",
                    marker = dict(color = 'rgba(0, 255, 200, 0.8)'),
                    text= df1997.Name)

data = [trace1, trace2, trace3]
layout = dict(title = 'Genre vs NA_Sales of Top 100 Games in 2017, 2007 and 1997 years',
              xaxis= dict(title= 'Genre',ticklen= 5,zeroline= False),
              yaxis= dict(title= 'NA_Sales',ticklen= 5,zeroline= False)
             )
fig = dict(data = data, layout = layout)
iplot(fig)

In [None]:
#another one
df2017 = game_data[game_data.Year == 2017].iloc[:100,:]
df2007 = game_data[game_data.Year == 2007].iloc[:100,:]
df1997 = game_data[game_data.Year == 1997].iloc[:100,:]

trace1 =go.Scatter(
                    x = df2017.Publisher,
                    y = df2017.Genre,
                    mode = "markers",
                    name = "2017",
                    marker = dict(color = 'rgba(255, 128, 255, 0.8)'),
                    text= df2017.Name)
trace2 =go.Scatter(
                    x = df2007.Publisher,
                    y = df2007.Genre,
                    mode = "markers",
                    name = "2007",
                    marker = dict(color = 'rgba(255, 128, 2, 0.8)'),
                    text= df2007.Name)
trace3 =go.Scatter(
                    x = df1997.Publisher,
                    y = df1997.Genre,
                    mode = "markers",
                    name = "1997",
                    marker = dict(color = 'rgba(0, 255, 200, 0.8)'),
                    text= df1997.Name)
data = [trace1, trace2, trace3]
layout = dict(title = 'Publisher vs Genre of Top 100 Games in 2017, 2007 and 1997 years',
              xaxis= dict(title= 'Publisher',ticklen= 5,zeroline= False),
              yaxis= dict(title= 'Genre',ticklen= 5,zeroline= False)
             )
fig = dict(data = data, layout = layout)
iplot(fig)

<a id=23></a>
## Bar Charts 

In [None]:
#Total Video Game Count by Year
game_count_in_years=  game_data.groupby('Year')['Name'].count().reset_index()
game_count_in_years.sort_values(by = "Name", ascending = False).T

In [None]:
#Total Sales value for each game

total_sales = game_data["NA_Sales"] + game_data["EU_Sales"]+game_data["JP_Sales"] + game_data["Other_Sales"]+game_data["Global_Sales"]
game_data["Total_Sales"] = total_sales

In [None]:
#Since the most releasedd game year is 2009, choose this year's first 20 games
df2009 = game_data[game_data.Year == 2009].iloc[:20,:]

trace1 = go.Bar(
                x = df2009.Name,
                y = df2009.Total_Sales,
                name = "Total_Sales",
                marker = dict(color = 'rgba(255, 174, 255, 0.5)',
                             line=dict(color='rgb(0,0,0)',width=1.5)),
                text = df2009.Platform)

trace2 = go.Bar(
                x = df2009.Name,
                y = df2009.Global_Sales,
                name = "Global_Sales",
                marker = dict(color = 'rgba(255, 255, 128, 0.5)',
                              line=dict(color='rgb(0,0,0)',width=1.5)),
                text = df2009.Platform)
data = [trace1, trace2]

layout = go.Layout(barmode = "group")
fig = go.Figure(data = data, layout = layout)
iplot(fig)

<a id=24></a>
## Pie Charts 

In [None]:
pie = game_data.Total_Sales
labels = game_data.Genre

fig = {
  "data": [
    {
      "values": pie,
      "labels": labels,
      "domain": {"x": [0, .5]},
      "name": "Total sales of all genres",
      "hoverinfo":"label+percent",
      "hole": .2,
      "type": "pie"
    },],
  "layout": {
        "title":"Total sales rate of all genres",
        "annotations": [
            { "font": { "size": 20},
              "showarrow": False,
              "text": "Percentage of Games",
                "x": 0.10,
                "y": 1.10
            },
        ]
    }
}
iplot(fig)

<a id=25></a>
## Bubble Charts 

In [None]:
sub_game_data = game_data.iloc[:12,:]
sales = sub_game_data.Total_Sales
genre_color = ["indigo","purple","gold","orange","turquoise","royalblue","crimson","blueviolet","coral","olive","lime","magenta"]
data = [
    {
        'y': sub_game_data.Year,
        'x': sub_game_data.Genre,
        'mode': 'markers',
        'marker': {
            'color': genre_color,
            'size': sales,
            'showscale': True
        },
        "text" :  sub_game_data.Name    
    }
]
iplot(data)

<a id=26></a>
## Histogram

In [None]:
game_data.Year.unique()

In [None]:
# prepare data
df2015 = game_data.Total_Sales[game_data.Year == 2015]
df2000 = game_data.Total_Sales[game_data.Year == 2000]

trace1 = go.Histogram(
    x=df2015,
    opacity=0.75,
    name = "2015",
    marker=dict(color='rgba(171, 50, 96, 0.6)'))
trace2 = go.Histogram(
    x=df2000,
    opacity=0.75,
    name = "2000",
    marker=dict(color='rgba(12, 50, 196, 0.6)'))

data = [trace1, trace2]

layout = go.Layout(barmode='overlay',
                   title='Total Game Sales in 2015 and 2000',
                   xaxis=dict(title='Total Sales'),
                   yaxis=dict( title='Count'),
)
fig = go.Figure(data=data, layout=layout)
iplot(fig)

<a id=27></a>
## World Cloud

In [None]:
df2000 = game_data.Genre[game_data.Year == 2000]
plt.subplots(figsize=(8,8))

wordcloud = WordCloud(
                          background_color='white',
                          width=512,
                          height=384
                         ).generate(" ".join(df2000))
plt.imshow(wordcloud)
plt.axis('off')
plt.savefig('graph.png')

plt.show()

<a id=28></a>
## Box Plot

In [None]:
publisher = game_data[game_data.Year==2000]

trace0 = go.Box(
    y=publisher.Total_Sales,
    name = 'Total Game Sales in 2000',
    marker = dict(
        color = 'rgb(12, 12, 140)',
    )
)
trace1 = go.Box(
    y=publisher.EU_Sales,
    name = 'Game Sales in Europe in 2000',
    marker = dict(
        color = 'rgb(12, 128, 128)',
    )
)
trace2 = go.Box(
    y=publisher.NA_Sales,
    name = 'Game Sales in North America in 2000',
    marker = dict(
        color = 'rgb(102, 128, 150)',
    )
)
data = [trace0, trace1, trace2]
iplot(data)

<a id=29></a>
## Scatter Plot Matrix

In [None]:
game_data.info()

In [None]:
#import a new package
import plotly.figure_factory as ff

df2009 = game_data[game_data.Year == 2009]
sub_df2009 = df2009.loc[:,["Genre","JP_Sales", "Other_Sales"]]
sub_df2009['index'] = np.arange(1,len(sub_df2009)+1)
# scatter matrix
fig = ff.create_scatterplotmatrix(sub_df2009, diag='box', index='index',colormap='Blues',
                                  colormap_type='cat',
                                  height=700, width=1200)
iplot(fig)

<a id=210></a>
## 3D Scatter Plot with Colorscaling 

In [None]:
trace = go.Scatter3d(
    x=game_data.Year,
    y=game_data.Genre,
    z=game_data.Publisher,
    mode='markers',
    marker=dict(
        size=10,
        colorscale = 'twilight'
    )
)

data = [trace]
layout = go.Layout(
    margin=dict(#left, right, bottom, top
        l=0,
        r=0,
        b=0,
        t=0  
    )
    
)
fig = go.Figure(data=data, layout=layout)
iplot(fig)

<a id=210></a>
## Multiple Subplots

In [None]:
sub_game_data = game_data.iloc[:40,:]
trace1 = go.Scatter(
    x=sub_game_data.Rank,
    y=sub_game_data.EU_Sales,
    name = "EU_Sales"
)
trace2 = go.Scatter(
    x=sub_game_data.Rank,
    y=sub_game_data.Other_Sales,
    xaxis='x2',
    yaxis='y2',
    name = "Other_Sales"
)
trace3 = go.Scatter(
    x=sub_game_data.Rank,
    y=sub_game_data.JP_Sales,
    xaxis='x3',
    yaxis='y3',
    name = "JP_Sales"
)
trace4 = go.Scatter(
    x=sub_game_data.Rank,
    y=sub_game_data.Genre,
    xaxis='x4',
    yaxis='y4',
    name = "Genre"
)
data = [trace1, trace2, trace3, trace4]
layout = go.Layout(
    xaxis=dict(
        domain=[0, 0.45]
    ),
    yaxis=dict(
        domain=[0, 0.45]
    ),
    xaxis2=dict(
        domain=[0.55, 1]
    ),
    xaxis3=dict(
        domain=[0, 0.45],
        anchor='y3'
    ),
    xaxis4=dict(
        domain=[0.55, 1],
        anchor='y4'
    ),
    yaxis2=dict(
        domain=[0, 0.45],
        anchor='x2'
    ),
    yaxis3=dict(
        domain=[0.55, 1]
    ),
    yaxis4=dict(
        domain=[0.55, 1],
        anchor='x4'
    ),
    title = 'EU Sales, Other Sales, JP Sales and Genre VS Rank of Top 40 Games'
)
fig = go.Figure(data=data, layout=layout)
iplot(fig)