# Webscraping with Clash Royale

Clash Royale is a massively popular mobile game by Supercell. In the game players battle one-on-one against other real players across the world. Each player builds a Deck to bring into battle. An important part of the game is building strong Decks. I'd like to see what Decks the top players use. This information is feelely available online from APIs and on many websites.

In this notebook I use Python code with the BeautifulSoup library to scrape data from the website https://statsroyale.com/

### Import Libraries

In [1]:
import pandas as pd # Build dataframes
import requests # Download webpage
from bs4 import BeautifulSoup # Parse html 
import re # regular expressions
#import html5lib

### Import seed Webpage 

The page at https://statsroyale.com/top/players gives the current Top 200 Players in Clash Royale. I download that page, parse it using BeautifulSoup.

In [2]:
url = 'https://statsroyale.com/top/players'
page = requests.get(url)

soup = BeautifulSoup(page.content, 'lxml')
page

<Response [200]>

The Response [200] indicates a succesful response from the website. Now I can use the soup object created by BeautifulSoup to find all the links on the page. 

### Find Links to Top 200 Players

I itterate through all the links on the page to find the ones pointing to a Player Profile. Those are the top 200 Players.

In [3]:
# Get Links to the top 200 players profiles
TopPlayerLinks = []
for link in soup.find_all('a'):
    linkurl=link.get('href')
    
    match = re.match("https:\/\/statsroyale\.com\/profile\/.*",linkurl)
    if match:
        TopPlayerLinks.append(linkurl)
    
print('Number of Profiles Found: ' + str(len(TopPlayerLinks)))
print('Top Player is at:', TopPlayerLinks[0])

Number of Profiles Found: 200
Top Player is at: https://statsroyale.com/profile/LCRU2CC2


### Get Deck from Player Profile

Now that I have links for the Top 200 Players I can scrape the information I am interested in from each page.

If you follow the url to the top player printed above, you see that the player's current deck is displayed with the image of each card. I want to get the name of the card from the html for the image. 

The following function gets the deck from the parsed html (soup) from a player's profile page:

In [4]:
# Function to get cards from html
def getcards(soup):

    deck = set()
    
    #itterate through the images on the page
    for img in soup.find_all('img'):
        
        # get src path for the image
        image_src = img.get('src')

        # if the image src matches the path for cards, add cardname to deck
        match = re.match('\/images\/cards\/',image_src)
        if match:
            # removes path from src string to make cardname
            cardname = image_src.replace('/images/cards/full/','').replace('.png','')
            
            # add cardname to deck
            deck.add(cardname)

    return deck

# Test getcards with the Top Player
TopPlayerPage=requests.get(TopPlayerLinks[0])
soup=BeautifulSoup(TopPlayerPage.content, 'lxml')
getcards(soup)

{'bandit',
 'electro_wizard',
 'ghost',
 'mega_minion',
 'miner',
 'pekka',
 'poison',
 'zap'}

This shows the cards in the top player's deck.

### Get Player Name and Number of Trophies

In addition to the player's deck, we can grab some other detials off the page. The player's current trophy count and highest ever trophy count are good to have. There is probably a better way to get these numbers but the following worked:

In [5]:
def get_highest_trophies(soup):
    highesttrophies = soup.find('div', {'statistics__subheader statistics__trophyCaption'})
    highesttrophies = highesttrophies.next.next.next.text
    highesttrophies = highesttrophies.split('\n')[3]
    return highesttrophies

get_highest_trophies(soup)

'6626'

In [6]:
def get_current_trophies(soup):
    currenttrophies = soup.find_all('div', {'class':'statistics__metricCaption ui__mediumText'})[1]
    currenttrophies =currenttrophies.next.next.next.next.next.next
    return currenttrophies

get_current_trophies(soup)

'5303'

## Scrape Deck for Each of Top 200 Players

Now I itterate through the 200 Top Players and populate a list of data. I get the player's unique id, name, deck, favorite card, highest trophy level, and current trophy level. 

After gathering the data in the form of nested lists within the loop, I use Pandas to transform it into a Data Frame for future analysis.

In [7]:
# Scrape Data
top_player_decks = []

for url in TopPlayerLinks:
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'lxml')
    
    playerid = url.split('https://statsroyale.com/profile/')[1]
    player_name = soup.find('span', {'class':'profileHeader__nameCaption'}).contents[0]
    deck = getcards(soup)
    fav_card=soup.find('span', {'class':'profile__favouriteCardName'}).contents[0]
    highest_trophies = get_highest_trophies(soup)
    current_trophies = get_current_trophies(soup)

    top_player_decks.append([playerid,player_name,deck,fav_card,highest_trophies,current_trophies])

In [8]:
RoyaleDeckData = pd.DataFrame(top_player_decks, columns=['player_id','name','deck','fav','highest','current'])
RoyaleDeckData.shape
RoyaleDeckData.head()

Unnamed: 0,player_id,name,deck,fav,highest,current
0,LCRU2CC2,Bogdan,"{zap, miner, ghost, pekka, mega_minion, electr...",P.E.K.K.A,6626,5303
1,282QVRGP,Alif,"{zap, three_musketeers, bats, building_elixir_...",Three Musketeers,6544,5272
2,9PLGQ0V9,RIAN,"{zap, miner, mega_minion, skeleton_warriors, g...",Lava Hound,6664,5254
3,PCV2CUVY,"""'A1_¡ RєZΑ""'","{tornado, zap, dark_prince, ghost, mega_minion...",Executioner,6620,5245
4,2UL8YUQC,TimRedBeard,"{zap, minion, miner, ghost, pekka, electro_wiz...",Bandit,6524,5231


Now that I have the data I can put it to use.

I'd like to find Deck's containing particular cards. I like to play with the Balloon card and the Miner card. So lets find all the decks among the top 200 players that include those two cards.

The code below loops through the decks, printing all those that include the balloon card. 

In [9]:
decks = RoyaleDeckData['deck']
for deck in decks:
    if deck.issuperset({'chr_balloon','miner'}):
        print(deck)

{'zap', 'ice_golem', 'miner', 'bats', 'pekka', 'blowdart_goblin', 'chr_balloon', 'goblin_gang'}
{'zap', 'ice_golem', 'miner', 'mega_minion', 'archers', 'tombstone', 'chr_balloon', 'fire_fireball'}
{'zap', 'ice_golem', 'miner', 'mega_minion', 'tombstone', 'chr_balloon', 'inferno_dragon', 'fire_fireball'}


I find 3 players playing with a Balloon and Miner deck. It looks like they are using the exact same decks. Maybe I will give this one a try!

That code worked, but looping through a for loop to find the Decks of interest does not seem like the most elegant and usable solution.

One idea is to dummy code each card. A binary variable will be added for each card to indicate whether it is present in the user's deck.

I need to get a list of all the cards available. I can scrape that off the same website at: https://statsroyale.com/cards

In [11]:
cards_url ='https://statsroyale.com/cards'
cardspage = requests.get(cards_url)
soup = BeautifulSoup(cardspage.content, 'lxml')
allcards = []
    
#itterate through the images on the page
for img in soup.find_all('img'):

    # get src path for the image
    image_src = img.get('src')

    # if the image src matches the path for cards, add cardname to deck
    match = re.match('\/images\/cards\/',image_src)
    if match:
        # removes path from src string to make cardname
        cardname = image_src.replace('/images/cards/full/','').replace('.png','')
        
        # add cardname to deck
        allcards.append(cardname)
        
print(allcards)

['three_musketeers', 'chr_golem', 'pekka', 'lava_hound', 'mega_knight', 'royal_giant', 'angry_barbarian', 'giant_skeleton', 'zapMachine', 'barbarians', 'minion_horde', 'rascals', 'chr_balloon', 'chr_witch', 'prince', 'bowler', 'executioner', 'cannon_cart', 'giant', 'wizard', 'baby_dragon', 'dark_prince', 'hunter', 'rage_barbarian', 'inferno_dragon', 'electro_wizard', 'dark_witch', 'magic_archer', 'valkyrie', 'musketeer', 'mini_pekka', 'hog_rider', 'battle_ram', 'flying_machine', 'zappies', 'knight', 'archers', 'minion', 'bomber', 'goblin_gang', 'skeleton_balloon', 'skeleton_horde', 'skeleton_warriors', 'ice_wizard', 'princess', 'miner', 'bandit', 'ghost', 'mega_minion', 'blowdart_goblin', 'goblins', 'goblin_archer', 'fire_spirits', 'bats', 'ice_golem', 'skeletons', 'snow_spirits', 'barbarian_hut', 'building_xbow', 'building_elixir_collector', 'fire_furnace', 'building_inferno', 'bomb_tower', 'building_mortar', 'building_tesla', 'firespirit_hut', 'chaos_cannon', 'tombstone', 'lightning'

Now that I have a list of all the card names, I can add a column for each card to my data frame. I initalize the values to False then loop through each player to set the columns in their deck to True.

In [13]:
card_columns = pd.DataFrame(False, index = RoyaleDeckData.index, columns=allcards,)
RoyaleDeckData = pd.concat([RoyaleDeckData,card_columns],axis=1)

for index, row in RoyaleDeckData.iterrows():
    RoyaleDeckData.loc[index,row['deck']]=True

RoyaleDeckData.head()

Unnamed: 0,player_id,name,deck,fav,highest,current,three_musketeers,chr_golem,pekka,lava_hound,...,order_volley,goblin_barrel,tornado,copy,barb_barrel,heal,zap,rage,the_log,mirror
0,LCRU2CC2,Bogdan,"{zap, miner, ghost, pekka, mega_minion, electr...",P.E.K.K.A,6626,5303,False,False,True,False,...,False,False,False,False,False,False,True,False,False,False
1,282QVRGP,Alif,"{zap, three_musketeers, bats, building_elixir_...",Three Musketeers,6544,5272,True,False,False,False,...,False,False,False,False,False,False,True,False,False,False
2,9PLGQ0V9,RIAN,"{zap, miner, mega_minion, skeleton_warriors, g...",Lava Hound,6664,5254,False,False,False,False,...,False,False,False,False,False,False,True,False,True,False
3,PCV2CUVY,"""'A1_¡ RєZΑ""'","{tornado, zap, dark_prince, ghost, mega_minion...",Executioner,6620,5245,False,False,False,False,...,False,False,True,False,False,False,True,False,False,False
4,2UL8YUQC,TimRedBeard,"{zap, minion, miner, ghost, pekka, electro_wiz...",Bandit,6524,5231,False,False,True,False,...,False,False,False,False,False,False,True,False,False,False


Now I can find all the decks where miner and balloon are used in a single line without a loop:

In [14]:
mydecks = RoyaleDeckData.loc[RoyaleDeckData['chr_balloon'] & RoyaleDeckData['miner']]
mydecks

Unnamed: 0,player_id,name,deck,fav,highest,current,three_musketeers,chr_golem,pekka,lava_hound,...,order_volley,goblin_barrel,tornado,copy,barb_barrel,heal,zap,rage,the_log,mirror
55,2Y0VQC22,Gumm¥bär,"{zap, ice_golem, miner, bats, pekka, blowdart_...",Giant,6437,5111,False,False,True,False,...,False,False,False,False,False,False,True,False,False,False
66,2LP99P92,肥平,"{zap, ice_golem, miner, mega_minion, archers, ...",Three Musketeers,6338,5106,False,False,False,False,...,False,False,False,False,False,False,True,False,False,False
89,9U02J90P,Gustavo,"{zap, ice_golem, miner, mega_minion, tombstone...",Graveyard,6178,5061,False,False,False,False,...,False,False,False,False,False,False,True,False,False,False


And now I can get a list of other cards that were used in any or all of the 

In [15]:
cardused = mydecks.iloc[:,-len(allcards):].any()
print(cardused[cardused])

pekka              True
chr_balloon        True
inferno_dragon     True
archers            True
goblin_gang        True
miner              True
mega_minion        True
blowdart_goblin    True
bats               True
ice_golem          True
tombstone          True
fire_fireball      True
zap                True
dtype: bool


In [16]:
alluse = mydecks.iloc[:,-len(allcards):].all()
alluse[alluse]

chr_balloon    True
miner          True
ice_golem      True
zap            True
dtype: bool

How many on the list are using that card?

In [17]:
mydecks.loc[:,list(cardused[cardused].index)].sum().sort_values(ascending=False)

zap                3
ice_golem          3
miner              3
chr_balloon        3
fire_fireball      2
tombstone          2
mega_minion        2
bats               1
blowdart_goblin    1
goblin_gang        1
archers            1
inferno_dragon     1
pekka              1
dtype: int64

This gives me a list of the cards most used in Balloon Miner decks. I can use this information to help improve my own deck.

Next I would like to generalize this into a function that will accept as input the name of cards to include (e.g. Balloon and Miner in above example) and will output a list like the one above showing what other cards are used with those cards. 