# Webscraping with Clash Royale

Clash Royale is a massively popular mobile game by Supercell. In the game players battle one-on-one against other real players across the world. Each player builds a Deck to bring into battle. An important part of the game is building strong Decks. I'd like to see what Decks the top players use. This information is feelely available online from APIs and on many websites.

In this notebook I use Python code with the BeautifulSoup library to scrape data from the website https://statsroyale.com/. Then I use that data to try to improve my own deck.

### Import Libraries

In [218]:
import pandas as pd # Build dataframes
import requests # Download webpage
from bs4 import BeautifulSoup # Parse html 
import re # regular expressions
#import html5lib

### Import seed Webpage 

The page at https://statsroyale.com/top/players gives the current Top 200 Players in Clash Royale. I download that page, parse it using BeautifulSoup.

In [219]:
url = 'https://statsroyale.com/top/players'
page = requests.get(url)

soup = BeautifulSoup(page.content, 'lxml')
page

<Response [200]>

The Response [200] indicates a succesful response from the website. Now I can use the soup object created by BeautifulSoup to find all the links on the page. 

### Find Links to Top 200 Players

I itterate through all the links on the page to find the ones pointing to a Player Profile. Those are the top 200 Players.

In [220]:
# Get Links to the top 200 players profiles
TopPlayerLinks = []
for link in soup.find_all('a'):
    linkurl=link.get('href')
    
    match = re.match("https:\/\/statsroyale\.com\/profile\/.*",linkurl)
    if match:
        TopPlayerLinks.append(linkurl)
    
print('Number of Profiles Found: ' + str(len(TopPlayerLinks)))
print('Top Player is at:', TopPlayerLinks[0])

Number of Profiles Found: 200
Top Player is at: https://statsroyale.com/profile/2JGCJG0G


### Get Deck from Player Profile

Now that I have links for the Top 200 Players I can scrape the information I am interested in from each page.

If you follow the url to the top player printed above, you see that the player's current deck is displayed with the image of each card. I want to get the name of the card from the html for the image. 

The following function gets the deck from the parsed html (soup) from a player's profile page:

In [221]:
# Function to get cards from html
def getcards(soup):

    deck = set()
    
    #itterate through the images on the page
    for img in soup.find_all('img'):
        
        # get src path for the image
        image_src = img.get('src')

        # if the image src matches the path for cards, add cardname to deck
        match = re.match('\/images\/cards\/',image_src)
        if match:
            # removes path from src string to make cardname
            cardname = image_src.replace('/images/cards/full/','').replace('.png','')
            
            # add cardname to deck
            deck.add(cardname)

    return deck

# Test getcards with the Top Player
TopPlayerPage=requests.get(TopPlayerLinks[0])
soup=BeautifulSoup(TopPlayerPage.content, 'lxml')
getcards(soup)

{'goblin_archer',
 'graveyard',
 'ice_golem',
 'mega_minion',
 'poison',
 'rascals',
 'the_log',
 'tombstone'}

This shows the cards in the top player's deck.

### Get Player Name and Number of Trophies

In addition to the player's deck, we can grab some other detials off the page. The player's current trophy count and highest ever trophy count are good to have. There is probably a better way to get these numbers but the following worked:

In [222]:
def get_highest_trophies(soup):
    highesttrophies = soup.find('div', {'statistics__subheader statistics__trophyCaption'})
    highesttrophies = highesttrophies.next.next.next.text
    highesttrophies = highesttrophies.split('\n')[3]
    return highesttrophies

get_highest_trophies(soup)

'6577'

In [223]:
def get_current_trophies(soup):
    currenttrophies = soup.find_all('div', {'class':'statistics__metricCaption ui__mediumText'})[1]
    currenttrophies =currenttrophies.next.next.next.next.next.next
    return currenttrophies

get_current_trophies(soup)

'5576'

## Scrape Deck for Each of Top 200 Players

Now I itterate through the 200 Top Players and populate a list of data. I get the player's unique id, name, deck, favorite card, highest trophy level, and current trophy level. 

After gathering the data in the form of nested lists within the loop, I use Pandas to transform it into a Data Frame for future analysis.

In [224]:
# Scrape Data
top_player_decks = []

for url in TopPlayerLinks:
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'lxml')
    
    playerid = url.split('https://statsroyale.com/profile/')[1]
    player_name = soup.find('span', {'class':'profileHeader__nameCaption'}).contents[0]
    deck = getcards(soup)
    fav_card=soup.find('span', {'class':'profile__favouriteCardName'}).contents[0]
    highest_trophies = get_highest_trophies(soup)
    current_trophies = get_current_trophies(soup)

    top_player_decks.append([playerid,player_name,deck,fav_card,highest_trophies,current_trophies])

In [225]:
RoyaleDeckData = pd.DataFrame(top_player_decks, columns=['player_id','name','deck','fav','highest','current'])
RoyaleDeckData.shape
RoyaleDeckData.head()

Unnamed: 0,player_id,name,deck,fav,highest,current
0,2JGCJG0G,BigSpin,"{the_log, goblin_archer, ice_golem, graveyard,...",Rascals,6577,5576
1,LUGGR08G,小软软,"{three_musketeers, hunter, minion_horde, ice_g...",Three Musketeers,6648,5567
2,YQULRC8Y,أم القيوين,"{rocket, building_xbow, the_log, ice_golem, to...",X-Bow,6625,5525
3,28GP0UCJ,Tree™⚡️,"{princess, mega_knight, inferno_dragon, ghost,...",Electro Wizard,6608,5513
4,9QL8YQRQ,☆SuperMan☆,"{the_log, zap, mega_minion, prince, dark_princ...",Three Musketeers,6112,5507


## Using the Scraped Data to Improve my own Deck

Now that I have the data I can put it to use.

I'd like to find Deck's containing particular cards. I like to play with the Balloon card and the Miner card. So lets find all the decks among the top 200 players that include those two cards.

The code below loops through the decks, printing all those that include the balloon and miner cards. 

In [226]:
decks = RoyaleDeckData['deck']
for deck in decks:
    if deck.issuperset({'chr_balloon','miner'}):
        print(deck)

{'ice_golem', 'zap', 'inferno_dragon', 'fire_fireball', 'mega_minion', 'chr_balloon', 'miner', 'tombstone'}
{'ice_golem', 'zap', 'pekka', 'fire_fireball', 'mega_minion', 'chr_balloon', 'electro_wizard', 'miner'}
{'ice_golem', 'zap', 'inferno_dragon', 'fire_fireball', 'mega_minion', 'chr_balloon', 'miner', 'tombstone'}
{'ice_golem', 'zap', 'pekka', 'fire_fireball', 'mega_minion', 'chr_balloon', 'electro_wizard', 'miner'}


I find a few players playing with a Balloon and Miner deck.

### Look up Decks by Card with Dummy Coding

That code worked, but looping through a for loop to find the Decks with a given card does not seem like the most elegant and usable solution.

One idea is to dummy code each card. A binary variable will be added for each card to indicate whether it is present in the user's deck.

I need to get a list of all the cards available. I can scrape that off the same website at: https://statsroyale.com/cards

In [227]:
cards_url ='https://statsroyale.com/cards'
cardspage = requests.get(cards_url)
soup = BeautifulSoup(cardspage.content, 'lxml')
allcards = []
    
#itterate through the images on the page
for img in soup.find_all('img'):

    # get src path for the image
    image_src = img.get('src')

    # if the image src matches the path for cards, add cardname to deck
    match = re.match('\/images\/cards\/',image_src)
    if match:
        # removes path from src string to make cardname
        cardname = image_src.replace('/images/cards/full/','').replace('.png','')
        
        # add cardname to deck
        allcards.append(cardname)
        
print(allcards)

['three_musketeers', 'chr_golem', 'pekka', 'lava_hound', 'mega_knight', 'royal_giant', 'angry_barbarian', 'giant_skeleton', 'zapMachine', 'barbarians', 'minion_horde', 'rascals', 'chr_balloon', 'chr_witch', 'prince', 'bowler', 'executioner', 'cannon_cart', 'giant', 'wizard', 'baby_dragon', 'dark_prince', 'hunter', 'rage_barbarian', 'inferno_dragon', 'electro_wizard', 'dark_witch', 'magic_archer', 'valkyrie', 'musketeer', 'mini_pekka', 'hog_rider', 'battle_ram', 'flying_machine', 'zappies', 'knight', 'archers', 'minion', 'bomber', 'goblin_gang', 'skeleton_balloon', 'skeleton_horde', 'skeleton_warriors', 'ice_wizard', 'princess', 'miner', 'bandit', 'ghost', 'mega_minion', 'blowdart_goblin', 'goblins', 'goblin_archer', 'fire_spirits', 'bats', 'ice_golem', 'skeletons', 'snow_spirits', 'barbarian_hut', 'building_xbow', 'building_elixir_collector', 'fire_furnace', 'building_inferno', 'bomb_tower', 'building_mortar', 'building_tesla', 'firespirit_hut', 'chaos_cannon', 'tombstone', 'lightning'

Now that I have a list of all the card names, I can add a column for each card to my data frame. I initalize the values to False then loop through each player to set the columns in their deck to True.

In [228]:
card_columns = pd.DataFrame(False, index = RoyaleDeckData.index, columns=allcards,)
RoyaleDeckData = pd.concat([RoyaleDeckData,card_columns],axis=1)

for index, row in RoyaleDeckData.iterrows():
    RoyaleDeckData.loc[index,row['deck']]=True

RoyaleDeckData.head()

Unnamed: 0,player_id,name,deck,fav,highest,current,three_musketeers,chr_golem,pekka,lava_hound,...,order_volley,goblin_barrel,tornado,copy,barb_barrel,heal,zap,rage,the_log,mirror
0,2JGCJG0G,BigSpin,"{the_log, goblin_archer, ice_golem, graveyard,...",Rascals,6577,5576,False,False,False,False,...,False,False,False,False,False,False,False,False,True,False
1,LUGGR08G,小软软,"{three_musketeers, hunter, minion_horde, ice_g...",Three Musketeers,6648,5567,True,False,False,False,...,False,False,False,False,False,False,True,False,False,False
2,YQULRC8Y,أم القيوين,"{rocket, building_xbow, the_log, ice_golem, to...",X-Bow,6625,5525,False,False,False,False,...,False,False,True,False,False,False,False,False,True,False
3,28GP0UCJ,Tree™⚡️,"{princess, mega_knight, inferno_dragon, ghost,...",Electro Wizard,6608,5513,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,9QL8YQRQ,☆SuperMan☆,"{the_log, zap, mega_minion, prince, dark_princ...",Three Musketeers,6112,5507,False,False,False,False,...,False,False,False,False,False,False,True,False,True,False


Now I can find all the decks where miner and balloon are used in a single line without a loop:

In [229]:
mydecks = RoyaleDeckData.loc[RoyaleDeckData['chr_balloon'] & RoyaleDeckData['miner']]
mydecks

Unnamed: 0,player_id,name,deck,fav,highest,current,three_musketeers,chr_golem,pekka,lava_hound,...,order_volley,goblin_barrel,tornado,copy,barb_barrel,heal,zap,rage,the_log,mirror
75,PYLGJLPL,hulksmash,"{ice_golem, zap, inferno_dragon, fire_fireball...",Mega Minion,6421,5230,False,False,False,False,...,False,False,False,False,False,False,True,False,False,False
91,2CL0GU8,⚡Xx_MaHmOd_Xx⚡,"{ice_golem, zap, pekka, fire_fireball, mega_mi...",P.E.K.K.A,5986,5260,False,False,True,False,...,False,False,False,False,False,False,True,False,False,False
120,2JY09RVPJ,S E N C E R ⭐,"{ice_golem, zap, inferno_dragon, fire_fireball...",Balloon,6130,4883,False,False,False,False,...,False,False,False,False,False,False,True,False,False,False
168,PCLL9Q8J,White Light,"{ice_golem, zap, pekka, fire_fireball, mega_mi...",P.E.K.K.A,6601,5246,False,False,True,False,...,False,False,False,False,False,False,True,False,False,False


And now I can get a list of other cards that were used in any or all of the decks:

In [230]:
cardused = mydecks.iloc[:,-len(allcards):].any()
print(cardused[cardused])

pekka             True
chr_balloon       True
inferno_dragon    True
electro_wizard    True
miner             True
mega_minion       True
ice_golem         True
tombstone         True
fire_fireball     True
zap               True
dtype: bool


In [231]:
alluse = mydecks.iloc[:,-len(allcards):].all()
alluse[alluse]

chr_balloon      True
miner            True
mega_minion      True
ice_golem        True
fire_fireball    True
zap              True
dtype: bool

How many on the list are using that card?

In [232]:
mydecks.loc[:,list(cardused[cardused].index)].sum().sort_values(ascending=False)

zap               4
fire_fireball     4
ice_golem         4
mega_minion       4
miner             4
chr_balloon       4
tombstone         2
electro_wizard    2
inferno_dragon    2
pekka             2
dtype: int64

This gives me a list of the cards most used in Balloon Miner decks. I can use this information to help improve my own deck.

## Suggest Cards Function

Next I would like to generalize this into a function that will accept as input the name of cards to include (e.g. Balloon and Miner in above example) and will output a list like the one above showing what other cards are used with those cards. 

In [233]:
mycards = ['chr_balloon','miner']
mydecks=RoyaleDeckData.loc[RoyaleDeckData.loc[:,mycards].apply(all, axis=1),:]
cardused = mydecks.iloc[:,-len(allcards):].any()
mydecks.loc[:,list(cardused[cardused].index)].sum().sort_values(ascending=False)

zap               4
fire_fireball     4
ice_golem         4
mega_minion       4
miner             4
chr_balloon       4
tombstone         2
electro_wizard    2
inferno_dragon    2
pekka             2
dtype: int64

In [234]:
mycards = ['chr_balloon','miner']
def suggest_cards(mycards, func=all):
    
    # find all dekcs that include those cards     
    mydecks=RoyaleDeckData.loc[RoyaleDeckData.loc[:,mycards].apply(func, axis=1),:]
    
    # check whether each card in the game is used with mycards by any of the Top 200 Players
    cardused = mydecks.select_dtypes(['bool']).any()
    
    # count how many of the Top 200 Players use each card
    return mydecks.loc[:,list(cardused[cardused].index)].sum().sort_values(ascending=False)
                    
                                     
suggest_cards(mycards)


zap               4
fire_fireball     4
ice_golem         4
mega_minion       4
miner             4
chr_balloon       4
tombstone         2
electro_wizard    2
inferno_dragon    2
pekka             2
dtype: int64

I try the zap, fireball, tombstone, ice golem, mega minion, miner, inferno dragon, balloon deck, and had very good results. My own [ profile page](https://statsroyale.com/profile/UPPJ0JY9/decks) shows that I have a 5-1-1 record with that deck after the first 7 battles.

Since my first writing this, the decks have changed. It would be useful to record the decks I find over time to see how the trends change over time. This would also provide many more deck options and thus be more useful when searching for decks.

I'll have to think about what the best way to implement that is..

For now, lest use the function I wrote to try to suggest support players for Balloon Decks.

In [235]:
# what cards go with balloon and tombstone?
suggest_cards(['chr_balloon','tombstone'])

tombstone            16
mega_minion          16
chr_balloon          16
zap                  15
fire_fireball        15
minion               14
lava_hound           14
skeleton_warriors    11
goblin_gang           3
ice_golem             2
miner                 2
inferno_dragon        2
order_volley          1
lightning             1
dtype: int64

I see more of the top 200 using the combination of balloon and tombstone. This list gives me more card options to try. If I decidie to use the balloon, tombstone, and skeleton warriors, I might try to following deck:


In [236]:
suggest_cards(['chr_balloon','tombstone','skeleton_warriors'])

zap                  11
fire_fireball        11
tombstone            11
mega_minion          11
skeleton_warriors    11
minion               11
chr_balloon          11
lava_hound           11
dtype: int64

zap, fireball, tombstone, mega minion, skeleton warriors miniion, balloon, lava hound.


## Add Card Names 

Now I would like to gather the card url codes (like used above) along with the common names of the cards into a Python dictionary. For example, the card commonly called "Balloon" has the url code "chr_balloon". It would be much better if you could search for cards with their common name without knowing the url code. Most of these url codes are similar to the common name, but I do see atleast one card with a url code very different: "order_volley" points to "arrows"

https://statsroyale.com/images/cards/full/order_volley.png

### Card Name Dictionary  

I create a dictiory of cards with common names pointing to the image url component. 

Remember I need to match image url components because the only indication of the Player's Deck on the profile page is the Image of the card, not its actual name. 

By making a dictionary, the user can supply the card name instead of knowing the card url component. For some cards 

In [237]:
cards_url ='https://statsroyale.com/cards'
cardspage = requests.get(cards_url)
soup = BeautifulSoup(cardspage.content, 'lxml')
allcards = {}

for img in soup.find_all('img'):
    
    # get src path for the image
    image_src = img.get('src')

    # does image point to card?
    match = re.match('\/images\/cards\/',image_src)
    if match:
        
        # get image url component
        imagename = image_src.replace('/images/cards/full/','').replace('.png','')
        
        # get cardname
        cardname = img.find_next('div',{'class':'ui__tooltip ui__tooltipTop ui__tooltipMiddle cards__tooltip'})
        cardname = cardname.text.replace('\n','').strip().lower()
        
        # add cardname: imagename to dictionary
        allcards[cardname]=imagename                                           
                                                          

Using a dictionary, we are free to add additional unique key values to point to the url codes. This means the user can supply shorter, easier, or prefered names. Below I add several new key values including "loon" as a newname for "balloon". Users can use either "loon" or "balloon" to specify the same card.

In [238]:
# add custom keys
def addkey(newkey, oldkey):
    allcards[newkey] = allcards[oldkey]

addkey('log','the log')
addkey('pekka','p.e.k.k.a')
addkey('mini pekka','mini p.e.k.k.a')

# add any cardnames you like, just specify actual card name once:
addkey('skelmy','skeleton army')
addkey('loon','balloon')
allcards

{'archers': 'archers',
 'arrows': 'order_volley',
 'baby dragon': 'baby_dragon',
 'balloon': 'chr_balloon',
 'bandit': 'bandit',
 'barbarian barrel': 'barb_barrel',
 'barbarian hut': 'barbarian_hut',
 'barbarians': 'barbarians',
 'bats': 'bats',
 'battle ram': 'battle_ram',
 'bomb tower': 'bomb_tower',
 'bomber': 'bomber',
 'bowler': 'bowler',
 'cannon': 'chaos_cannon',
 'cannon cart': 'cannon_cart',
 'clone': 'copy',
 'dark prince': 'dark_prince',
 'dart goblin': 'blowdart_goblin',
 'electro wizard': 'electro_wizard',
 'elite barbarians': 'angry_barbarian',
 'elixir collector': 'building_elixir_collector',
 'executioner': 'executioner',
 'fire spirits': 'fire_spirits',
 'fireball': 'fire_fireball',
 'flying machine': 'flying_machine',
 'freeze': 'freeze',
 'furnace': 'firespirit_hut',
 'giant': 'giant',
 'giant skeleton': 'giant_skeleton',
 'goblin barrel': 'goblin_barrel',
 'goblin gang': 'goblin_gang',
 'goblin hut': 'fire_furnace',
 'goblins': 'goblins',
 'golem': 'chr_golem',
 'gr

Now the suggest_cards function just needs a simple tweak so that it accepts the common card names (or even user defined names) instead of requiring the specific url component.

In [239]:
def suggest_cards(mycards, func=all):
    # suggest_cards takes either a single card name or list of card names and finds 
    # all the other cards being used with those cards among the top 200 players. 
    # Reports the number of decks including each card
    
    # check that input is a list, if not, make it one
    if not isinstance(mycards, list): mycards = [mycards] 
        
    # get card urls for each of the cards supplied in mycards
    mycard_urls = [allcards[card] for card in mycards]
    
    # find all dekcs that include those cards     
    mydecks=RoyaleDeckData.loc[RoyaleDeckData.loc[:,mycard_urls].apply(func, axis=1),:]
    
    # check whether each card in the game is used with mycards by any of the Top 200 Players
    cardused = mydecks.select_dtypes(['bool']).any()
    
    # count how many of the Top 200 Players use each card
    return mydecks.loc[:,list(cardused[cardused].index)].sum().sort_values(ascending=False)
                                     
suggest_cards('loon')

chr_balloon                  25
mega_minion                  22
zap                          19
fire_fireball                17
tombstone                    16
minion                       16
lava_hound                   15
skeleton_warriors            13
ice_golem                     7
order_volley                  6
goblin_gang                   5
miner                         4
freeze                        4
minion_horde                  3
giant                         2
rage_barbarian                2
snow_spirits                  2
pekka                         2
rocket                        2
copy                          2
building_elixir_collector     2
electro_wizard                2
inferno_dragon                2
lightning                     1
tornado                       1
zapMachine                    1
valkyrie                      1
executioner                   1
wizard                        1
skeleton_horde                1
ghost                         1
skeleton

In [240]:
suggest_cards(['loon','tombstone','lava hound'])

tombstone            14
mega_minion          14
minion               14
chr_balloon          14
lava_hound           14
zap                  13
fire_fireball        13
skeleton_warriors    11
goblin_gang           3
order_volley          1
lightning             1
dtype: int64

The output of suggest cards should be the common or user supplied names, not url components!

How to fix that..



The deck values are actually url components not card names. I used those to make the column names. The good thing about this is that this assures all the column names are legal so I should probably keep it like this. Lets just replace the names in the output of the suggest_cards function. 

To facilite this, I reverse the name to url dictionary to make a url to name dictionary. Note that although we have multiple valid card names pointing to the same url (multiple valid names for same card) in the allcards dict, we can only have one card name for each url in the url_to_name dict.

In [241]:
url_to_name = {curl: cname for cname, curl in allcards.items()}

In [246]:
def suggest_cards(mycards, func=all):
    # suggest_cards takes either a single card name or list of card names and finds 
    # all the other cards being used with those cards among the top 200 players. 
    # Reports the number of decks including each card
    
    # check that input is a list, if not, make it one
    if not isinstance(mycards, list): mycards = [mycards] 
        
    # do some more checks
    
    # get card urls for each of the cards supplied in mycards
    mycard_urls = [allcards[card] for card in mycards]
    
    # find all dekcs that include those cards 
    mydecks=RoyaleDeckData.loc[RoyaleDeckData.loc[:,mycard_urls].apply(func, axis=1),:]

    # check whether each card in the game is used with mycards by any of the Top 200 Players
    cardused = mydecks.select_dtypes(['bool']).any()
    
    # count how many of the Top 200 Players use each card
    usedwithmycards = mydecks.loc[:,list(cardused[cardused].index)].sum().sort_values(ascending=False)
    
    # replace card urls with card names for reporting
    usedwithmycards.index = [url_to_name[url] for url in usedwithmycards.index]
    
    return usedwithmycards
                                     
suggest_cards(['loon','tombstone'])

tombstone         16
mega minion       16
loon              16
zap               15
fireball          15
minions           14
lava hound        14
guards            11
goblin gang        3
ice golem          2
miner              2
inferno dragon     2
arrows             1
lightning          1
dtype: int64

Now we see the card names instead of url components in the output of the function.