Title: Analysis of World of Warcraft PvP Leaderboards
Date: 2020-02-09 17:40
Modified: 2020-02-22 18:57
Category: Data Science
Tags: data-analysis, visualization, SQL
Slug: world-of-warcraft
Status: draft

---
**Don't forget to check final version against [project 1 rubric](https://git.generalassemb.ly/DSI-TOR-6/projects/tree/master/project_1#rubric) and [report template](https://git.generalassemb.ly/DSI-TOR-6/projects/blob/master/project_1/project_1.ipynb)**

If you combine your **problem statement, executive summary, data dictionary, and conclusions/recommendations**, you have an amazing README.md file that quickly aligns your audience to the contents of your project. Don't forget to cite your 
**data sources**!

Some recommendations on plotting:
 * Plots have titles
 * Plots have axis labels
 * Plots have appropriate tick labels
 * All text is legible in a plot
 * Plots demonstrate meaningful and valid relationships
 * Plots are interpreted to aid understanding

---------

### Contents

#### Introduction

    **problem statement**
    **executive summary**
    
#### data acquisition

    connect to db, list tables
    SQL query: battlegrounds & players
    **data dictionary**
    **data sources**
    
#### measures of best player

    rating
    rank
    win ratio    

    rating vs rank
    win ratio vs rank
    num matches vs win ratio
    wins vs num matches
    ratings vs rank grouped by leaderboard

#### histograms, populations    
        
    rank histogram
    ilvl histogram
    achievement histogram

    player class bar chart
    player class bar chart stacked factions

    ratings histogram for all leaderboards
    
#### exploring correlations

    pair plot
    heat map

    ilvl vs rank
    ilvl vs rating
    achievement vs rank
    rating histogram
    win ratio bar chart by faction

#### Groupings

    ilvl vs rank grouped by leaderboard
    ilvl vs rank grouped by player class
    win ratio vs rank grouped by player class
    win ratio vs rank grouped by faction

    mean rating bar chart by class
    mean rank by player details

    mean rank bar chart by class for all leaderboards

#### Descriptive and Inferential Statistics

    measures of central tendency, spread, and shape/skewness
    For each variable in your data, summarize the underlying distributions
    Be sure to back up these summaries with statistics.
    
    plot and interpret boxplots

#### summary, conclusions, etc

    best class to play?
    best PvP type to play? (2v2 3v3 battlegrounds)
    recommendations for climbing leaderboards
    does the hoarde receive preferential treatment from blizzard or not?

I recently scraped the [PvP leaderboards](https://worldofwarcraft.com/en-us/game/pvp/leaderboards/) from the World of Warcraft website and wrote them out to a [SQLite database]({static}/attachments/wow.zip). Let's plot some charts of the data and see if anything interesting turns up.

There are three leaderboards: the 2v2 Arena, the 3v3 Arena, and the 10x10 Battleground. Each leaderboard lists the top 1000 players by rating. (I think these are Elo ratings.) Anyway, I saved the leaderboards to a SQL database as three separate tables. I also scraped a minimal amount of data from the profile pages of each of the characters on the leaderboards and saved the profile data to a fourth table.

In [None]:
import sqlite3
import altair as alt
import pandas as pd

In [None]:
query = '''
SELECT *
FROM sqlite_master 
WHERE type='table'
'''
con = sqlite3.connect('data/wow.db')
pd.read_sql(query, con)

In [None]:
query = '''
    SELECT name, rank, rating, wins, losses, achievement, ilvl
    FROM battlegrounds
    LEFT JOIN players
    ON battlegrounds.url = players.URL
'''
con = sqlite3.connect('data/wow.db')
df = pd.read_sql(query, con)
con.close()
df

In [None]:
df['num_matches'] =  df['wins'] + df['losses']
df['win_ratio'] = df['wins'] / df['num_matches']

In [None]:
alt.Chart(df).mark_point().encode(x='rank', y='rating')

In [None]:
alt.Chart(df).mark_point().encode(x='rank', y='ilvl')

In [None]:
alt.Chart(df).mark_point().encode(alt.X('rating:Q', scale=alt.Scale(zero=False)), 
                                  alt.Y('ilvl:Q', scale=alt.Scale(zero=False)))

In [None]:
alt.Chart(df).mark_point().encode(x='rank', y='achievement')

In [None]:
alt.Chart(df).mark_point().encode(x='rank', y='win_ratio')

In [None]:
alt.Chart(df).mark_point().encode(alt.X('win_ratio:Q', scale=alt.Scale(zero=False)), y='num_matches')

In [None]:
alt.Chart(df).mark_point().encode(alt.Y('win_ratio:Q', scale=alt.Scale(zero=False)), x='num_matches')

In [None]:
alt.Chart(df).mark_point().encode(x='num_matches', y='wins')

In [None]:
alt.Chart(df).mark_bar().encode(alt.X("rating:Q", bin=True), y='count()')

In [None]:
alt.Chart(df).mark_bar().encode(alt.X("rank:Q", bin=True), y='count()')

In [None]:
alt.Chart(df).mark_bar().encode(alt.X("ilvl:Q", bin=True), y='count()')

In [None]:
alt.Chart(df).mark_bar().encode(alt.X("achievement:Q", bin=True), y='count()')

In [None]:
alt.Chart(df).mark_circle().encode(
    alt.X(alt.repeat("column"), type='quantitative', scale=alt.Scale(zero=False)),
    alt.Y(alt.repeat("row"), type='quantitative', scale=alt.Scale(zero=False))
).properties(
    width=100,
    height=100
).repeat(
    row=['rank', 'rating', 'win_ratio', 'ilvl', 'achievement'],
    column=['rank', 'rating', 'win_ratio', 'ilvl', 'achievement']
)

In [None]:
query = '''
    SELECT *, '2v2' as board
    FROM arena_2v2
    UNION ALL
    SELECT *, '3v3' as board
    FROM arena_3v3
    UNION ALL
    SELECT *, 'battlegrounds' as board
    FROM battlegrounds
'''
con = sqlite3.connect('data/wow.db')
df_boards = pd.read_sql(query, con)
con.close()
df_boards

In [None]:
alt.Chart(df_boards).mark_point().encode(x='rank', y='rating', color='board')

In [None]:
query = '''
SELECT board, rank, rating, name, title, realm, class, details,
    faction, wins, losses, achievement, ilvl, players.url
FROM (
    SELECT *, '2v2 arenas' as board
    FROM arena_2v2
    UNION
    SELECT *, '3v3 arenas' as board
    FROM arena_3v3
    UNION
    SELECT *, 'battlegrounds' as board
    FROM battlegrounds
) leaderboards
JOIN players
WHERE players.url = leaderboards.url
'''
con = sqlite3.connect('data/wow.db')
df_boards = pd.read_sql(query, con)
con.close()
df_boards

In [None]:
alt.Chart(df_boards).mark_point().encode(x='rank', y='rating', color='board')

In [None]:
alt.Chart(df_boards).mark_point().encode(alt.X('rank:Q', scale=alt.Scale(zero=False)), 
                                  alt.Y('rating:Q', scale=alt.Scale(zero=False)),
                                 color='board')

In [None]:
alt.Chart(df_boards).mark_point(opacity=0.75).encode(alt.X('rank:Q', scale=alt.Scale(zero=False)), 
                                  alt.Y('ilvl:Q', scale=alt.Scale(zero=False)),
                                 color='board')

In [None]:
alt.Chart(df_boards).mark_point(opacity=0.75, clip=True).encode(alt.X('rank:Q', scale=alt.Scale(zero=False)), 
                                  alt.Y('ilvl:Q', scale=alt.Scale(domain=(420, 480))), color='board')

In [None]:
alt.Chart(df_boards).mark_point(opacity=0.75, clip=True).encode(alt.X('rank:Q', scale=alt.Scale(zero=False)), 
                                  alt.Y('ilvl:Q', scale=alt.Scale(domain=(420, 480))),
                                 color='class')

In [None]:
df_boards['num_matches'] =  df_boards['wins'] + df_boards['losses']
df_boards['win_ratio'] = df_boards['wins'] / df_boards['num_matches']

In [None]:
alt.Chart(df_boards).mark_circle().encode(alt.X('rank:Q', scale=alt.Scale(zero=False)), 
                                  alt.Y('win_ratio:Q', scale=alt.Scale(zero=False)), color='class')

In [None]:
alt.Chart(df_boards).mark_circle().encode(alt.X('rank:Q', scale=alt.Scale(zero=False)), 
                                  alt.Y('win_ratio:Q', scale=alt.Scale(zero=False)),
                                 color='faction')

In [None]:
alt.Chart(df_boards).mark_bar().encode(x='class', y='count()')

In [None]:
alt.Chart(df_boards).mark_bar().encode(
    x='class',
    y='count()',
    color='faction'
)

In [None]:
alt.Chart(df_boards).mark_bar().encode(
    x='faction',
    y='win_ratio',
)

In [None]:
bar = alt.Chart(df_boards).mark_bar().encode(
    x='class:O',
    y='mean(rating):Q'
)
rule = alt.Chart(df_boards).mark_rule(color='red').encode(
    y='mean(rating):Q'
)
(bar + rule).properties(width=600)



In [None]:
bar = alt.Chart(df_boards).mark_bar().encode(
    alt.Y('mean(rating)', scale=alt.Scale(zero=False)),
    x='class'
)
rule = alt.Chart(df_boards).mark_rule(color='red').encode(
    y='mean(rating)'
)
(bar + rule).properties(width=500)

In [None]:
df_boards['rank2'] = 1000 - df_boards['rank']

In [None]:
alt.Chart(df_boards).mark_bar().encode(
    alt.Y('mean(rank2)', scale=alt.Scale(zero=False)),
    x='class',
    column='board'
)

In [None]:
alt.Chart(df_boards).mark_bar().encode(
    alt.X('class'),
    alt.Y('mean(rank2)', scale=alt.Scale(zero=False)),
    alt.Color('board'),
    alt.Column('board')
)

In [None]:
bar = alt.Chart(df_boards).mark_bar().encode(
    alt.X('mean(rank2)', scale=alt.Scale(zero=False)),
    alt.Y('details', sort='-x'),
)
rule = alt.Chart(df_boards).mark_rule(color='red').encode(
    x='mean(rank2)'
)
(bar + rule)

In [None]:
alt.Chart(df_boards).transform_fold(
    ['2v2 arenas', '3v3 arenas', 'battlegrounds'],
    as_=['Leaderboard', '# Characters']
).mark_area(
    opacity=0.5,
    interpolate='step'
).encode(
    alt.X('rating:Q', bin=alt.Bin(maxbins=100)),
    alt.Y('count()', stack=None),
    alt.Color('board:N')
)