A PyGal documentation by Mukul Pathak. http://mukulpathak.com 


### PyGal Reference Guide
Source:
    http://www.pygal.org/en/stable/documentation/index.html
    
        

PyGal is a python SVG (Scalable Vector Graphics) Charts Creator. It defines in vector based graphics in XML format. In this document we will use PyGal and see it's various features.

To install PyGal in our systems, we will run the "pip install pygal" command on terminal or with an "!" on jupyter notebook as shown below.

In [1]:
!pip install pygal



In [2]:
#We then import pygal and other important packages
import pygal
import pandas as pd
import numpy as np

##### First we will run a basic program involving PyGal.
## 1. Use of Line Chart
Here, I have collected data from PyGal's documentation website showing browser usage in percentage.

In [3]:
line_chart = pygal.Line()        #Calls Line functionality of PyGal
line_chart.title = 'Browser usage evolution (in %)'
line_chart.x_labels = map(str, range(2002, 2013))
line_chart.add('Firefox', [None, None,    0, 16.6,   25,   31, 36.4, 45.5, 46.3, 42.8, 37.1])
line_chart.add('Chrome',  [None, None, None, None, None,    0,  3.9, 10.8, 23.8, 35.3])
line_chart.add('IE',      [85.8, 84.6, 84.7, 74.5,   66, 58.6, 54.7, 44.8, 36.2, 26.6, 20.1])
line_chart.add('Others',  [14.2, 15.4, 15.3,  8.9,    9, 10.4,  8.9,  5.8,  6.7,  6.8,  7.5])
line_chart.render_to_file('pic.svg')   

This will create an svg file named "pic.svg" in the current directory.

In order to render the visualisations on IPYNB file, we will run the following HTML code which will fetch scripts from Github repos involving jquery.js & tooltips.min.js. And then render the charts on browser.

In [4]:
from IPython.display import display, HTML

base_html = """
<!DOCTYPE html>
<html>
  <head>
  <script type="text/javascript" src="http://kozea.github.com/pygal.js/javascripts/svg.jquery.js"></script>
  <script type="text/javascript" src="https://kozea.github.io/pygal.js/2.0.x/pygal-tooltips.min.js""></script>
  </head>
  <body>
    <figure>
      {rendered_chart}
    </figure>
  </body>
</html>
"""


Now, we will perform visualisation on 2013 movie database fetched from tmdb. This db has a total of 71 movie entries from 2017 and has 13 columns depicting different details of the movie. We will see those in detail while running the .head() command

In [5]:
tmdb = pd.read_csv("tmdb.csv")  # Load the dataset
tmdb.head()

Unnamed: 0,budget,genres,id,keywords,popularity,production_companies,production_countries,release_date,revenue,runtime,title,vote_average,vote_count
0,15000000,"[{""id"": 53, ""name"": ""Thriller""}, {""id"": 878, ""...",333371,"[{""id"": 1930, ""name"": ""kidnapping""}, {""id"": 23...",53.698683,"[{""name"": ""Paramount Pictures"", ""id"": 4}, {""na...",us,3/10/16,108286421,103,10 Cloverfield Lane,6.8,2468
1,50000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 18, ""nam...",300671,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...",42.526529,"[{""name"": ""Paramount Pictures"", ""id"": 4}, {""na...",mt,1/13/16,69411370,144,13 Hours: The Secret Soldiers of Benghazi,7.0,938
2,4500000,"[{""id"": 53, ""name"": ""Thriller""}, {""id"": 28, ""n...",375290,"[{""id"": 2227, ""name"": ""evacuation""}, {""id"": 60...",2.551559,"[{""name"": ""T-Series"", ""id"": 3522}, {""name"": ""H...",in,1/22/16,32000000,126,Airlift,7.3,57
3,170000000,"[{""id"": 14, ""name"": ""Fantasy""}]",241259,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...",56.268916,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...",gb,5/25/16,299370084,113,Alice Through the Looking Glass,6.5,1725
4,110000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 878, ...",262504,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...",86.105615,"[{""name"": ""Summit Entertainment"", ""id"": 491}, ...",us,3/9/16,179246868,121,Allegiant,5.9,1998


##### Using PyGal bar graph, we will first find top 10 movies with highest budget.
## 2. Use of Bar Chart

In [6]:
Mobudget=tmdb.groupby('title')['budget'].mean()
Mobudget.head(10)

title
10 Cloverfield Lane                           15000000
13 Hours: The Secret Soldiers of Benghazi     50000000
Airlift                                        4500000
Alice Through the Looking Glass              170000000
Allegiant                                    110000000
Bad Moms                                      20000000
Batman v Superman: Dawn of Justice           250000000
Ben-Hur                                      100000000
Captain America: Civil War                   250000000
Central Intelligence                          50000000
Name: budget, dtype: int64

In [7]:
bar_chart = pygal.Bar(height=400)
[bar_chart.add(x[0], x[1]) for x in Mobudget.items()]
display(HTML(base_html.format(rendered_chart=bar_chart.render(is_unicode=True))))  #This line will render the graphs on our IPYNB file itself


##### We will create a popularity chart depicting popularity of the movie.
## 3. Use of Treemap/Area chart


In [8]:
popularityChart = tmdb.groupby(['title'])['popularity'].apply(list) #convert all the movies into a lit form by grouping popularity & title
popularityChart

title
10 Cloverfield Lane                          [53.698682999999996]
13 Hours: The Secret Soldiers of Benghazi             [42.526529]
Airlift                                                [2.551559]
Alice Through the Looking Glass              [56.268916000000004]
Allegiant                                             [86.105615]
                                                     ...         
The Young Messiah                            [1.8845580000000002]
Triple 9                                              [29.371987]
Warcraft                                              [63.148529]
X-Men: Apocalypse                                    [139.272042]
Zoolander 2                                           [37.253774]
Name: popularity, Length: 71, dtype: object

In [9]:
treemap = pygal.Treemap(height=300)    #calling the treemap functionality
[treemap.add(x[0], x[1]) for x in popularityChart.items()] #fetching all the inputs from the dataframe
display(HTML(base_html.format(rendered_chart=treemap.render(is_unicode=True)))) #Running the same on ipynb

##### We will now see performances of each movies on gauge charts
## 4. Use of Gauge Chart

In [10]:
ratings=tmdb.groupby('title')['vote_average'].mean()  #grouping title with vote average

In [11]:
gauge = pygal.SolidGauge(inner_radius=0.70) #creating a gauge with radius 0.7
[gauge.add(x[0], [{"value" : x[1] * 10}] ) for x in ratings.iteritems()] #fetching values from df and getting scores out of 100.
display(HTML(base_html.format(rendered_chart=gauge.render(is_unicode=True)))) #running the command

#### Alternatively,
we can also assemble all the gauge charts into one and create a different type of Gauge chart.
Let's do the alternative.

In [12]:
gauge = pygal.Gauge(human_readable=True) #calling gauge with human readable form
[gauge.add(x[0], [{"value" : x[1] }] ) for x in ratings.iteritems()] #getting values out of 10 this time. See, no multiplication with 10
display(HTML(base_html.format(rendered_chart=gauge.render(is_unicode=True))))


##### We will now show a map with movie production countries and counts on it.
### 5. Use of World Map

For setup, we will run the pip command below to install from the file itself.

In [13]:
!pip install pygal.maps.world


Processing /Users/mukulpathak/Library/Caches/pip/wheels/54/e6/11/5be0d3206bdc0ea8f0fcf1fe32661d7e614863c8b6a22655ae/pygal_maps_world-1.0.2-py3-none-any.whl
Installing collected packages: pygal.maps.world
Successfully installed pygal.maps.world


In [14]:
releaseCou=tmdb['production_countries'].value_counts() #get counts for each location

In [15]:
releaseCou

us    44
ca     7
gb     6
cn     3
in     2
hk     1
fl     1
dk     1
fr     1
bg     1
de     1
mt     1
jp     1
au     1
Name: production_countries, dtype: int64

NOTE : PyGal will plot locations when countries are stored in the country code format. You can learn about the country codes for each country from the official website : http://www.pygal.org/en/stable/documentation/types/maps/pygal_maps_world.html

In [16]:
#Convert Series into dictionary, PyGal works with dictionaries when it comes to maps. 
dictLoc = releaseCou.to_dict() 
dictLoc

{'us': 44,
 'ca': 7,
 'gb': 6,
 'cn': 3,
 'in': 2,
 'hk': 1,
 'fl': 1,
 'dk': 1,
 'fr': 1,
 'bg': 1,
 'de': 1,
 'mt': 1,
 'jp': 1,
 'au': 1}

In [17]:
from pygal_maps_world.maps import World #import the world maps file from World package for PyGal

wm1= World() #store World() parameter in wm1

In [18]:
wm1.force_uri_protocol = 'http'
wm1.title="Map of primary movie production location"
wm1.add('In 2017', dictLoc)  #pass the series to the map

wm1.render_to_file('map.svg') #Since, the map won't be visible on the IPYNB file,store it locally on an SVG file.
