In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt

# Final Fantasy XIV - Analysis of Arms

In FFXIV (an MMORPG), arms refer to the weapons that characters can equip. This notebook will look at how the data can offer insights into weapon balancing to devise a weapon rating system. With our rating system, we can determine which weapons are unbalanced.

For this exercise, we'll use scraped data from Lodestone, the official FFXIV community website. The complete database of arms has approximately 200,000 data points, so to keep scraping times low, the scraper has only grabbed weapons that are level 250+. However, if anyone would like to scrape the complete database, see the scraping script attached at the end of the notebook - you'll need to change the base URL in line 22 from:

f"https://eu.finalfantasyxiv.com/lodestone/playguide/db/item/?category2=1&page={page_number}&min_item_lv=250"

to
    
f"https://eu.finalfantasyxiv.com/lodestone/playguide/db/item/?category2=1&page={page_number}

You'll also need to change any occurrences of the numbers 36 (line 16) and 35 (line 19) to 4121 and 4120, respectively. Note that the scraper won't run in a Kaggle notebook.

## Import Data 

In [None]:
df = pd.read_csv('../input/final-fantasy-xiv-arms/ff14_weapons_250.csv')
df.head()

In [None]:
df.describe()

## Missing Values

The scraper should've dealt with any missing values before export; let's just double check that:

In [None]:
df.isnull().sum()

It looks like we're good to go!

## Exploratory Data Analysis 

In [None]:
df.shape

We've got 20 features and approximately 1700 rows. So firstly, let's start with categorising our features:

### Quantitative Features 
* level
* damage
* autoAttack
* delay
* Strength
* Vitality
* CriticalHit
* Determination 
* Tenacity
* SkillSpeed
* DirectHitRate
* Dexterity
* Intelligence
* SpellSpeed
* Mind
* Piety

### Qualitative Features
* name
* category
* isUnique
* isTradeable

In [None]:
quant_features = ['level', 'damage', 'autoAttack', 'delay', 'Strength', 'Vitality', 'CriticalHit',
                  'Determination', 'Tenacity', 'SkillSpeed', 'DirectHitRate', 'Dexterity', 'Intelligence',
                  'SpellSpeed', 'Mind', 'Piety']

qualt_features = ['name', 'category', 'isUnique', 'isTradeable']

### Pearson Correlation Matrix

There are quite a few quantitative features, so let's look at the Pearson correlation matrix for a better understanding of how the features are related:

In [None]:
cor = df[quant_features].corr()
plt.figure(figsize=(18,12))
heat = sns.heatmap(cor, annot=True, cmap=plt.cm.Reds)
heat.tick_params(labelsize=20)
plt.show()

Strangely, a lot of features don't display a strong relationship with level. With all of the bonus stats, this makes sense to a degree. The rows have 0 for bonuses that a weapon doesn't have (every feature after delay is related to a bonus attribute). There isn't a single weapon with bonuses for each attribute, with zero replacing any missing attributes. All of those zeros are likely dragging the coefficients down. It makes sense that damage and autoAttack are closely related to level; higher-level weapons should be hitting harder.

Perhaps we could calculate a new value using level, damage and autoAttack, giving us a better idea of how powerful a weapon is. Another potential idea is to combine all of the bonus attribute columns into a single column which is the sum of bonuses for a weapon. Doing this would make it a lot easier to establish a baseline for a good weapon. First, let's try creating our new features and then we'll look at the resulting Pearson correlation matrix.

In [None]:
# LDA multiplier is combination of level, dmg and autoAttack
df['LDA'] = (df['damage'] + df['autoAttack'])/df['level']

# totalBonus is sum of bonus attributes
df['totalBonus'] = df['Strength'] + df['Vitality'] + df['CriticalHit'] + df['Determination'] \
                   + df['Tenacity'] + df['SkillSpeed'] + df['DirectHitRate'] + df['Dexterity'] \
                   + df['Intelligence'] + df['SpellSpeed'] + df['Mind'] + df['Piety']

cor = df[['LDA', 'delay', 'totalBonus']].corr()
plt.figure(figsize=(18,6))
heat = sns.heatmap(cor, annot=True, cmap=plt.cm.Reds)
heat.tick_params(labelsize=20)
plt.show()

LDA and totalBonus show a much stronger correlation than the individual column did, so let's stick with those. The correlation may be negative, but this is just because LDA decreases with higher-level weapons (which are more likely to have high bonuses), so as the level increases, totalBonus will increase. The value for delay is also strongly related to LDA. LDA increases with higher values for damage and autoAttack. It makes sense that longer delays slow down heavier hitting weapons.

### Qualitative Feature Analysis

Now let's take a look at our qualitative features to get a feel for how they are distributed in the data (let's not worry about the names):

In [None]:
plt.figure(figsize=(18,8))
count = sns.countplot(data = df, y = 'category')
count.set_xlabel("Count",fontsize=20)
count.set_ylabel('',fontsize=20)
count.tick_params(labelsize=20)
plt.show()

for feature in ['isUnique', 'isTradeable']:
    plt.figure(figsize=(18,4))
    count = sns.countplot(data = df, x = feature)
    count.set_xlabel(feature,fontsize=20)
    count.set_ylabel('Count',fontsize=20)
    count.tick_params(labelsize=20)
    plt.show()

Most weapons in the data are unique, and most aren't tradeable. While these features could be helpful for clustering, likely, we won't use them this time, as we're trying to devise a rating system for detecting unbalanced weapons. The graphs show that there is a very even distribution of different categories in the dataset. The exceptions are for Gunbreaker and Dancer arms, but I don't think this is a huge issue. Let's dive into categories a little more and make some plots to get an idea of how category affects our constructed quantitative features.

In [None]:
for feature in ['totalBonus', 'LDA']:
    plt.figure(figsize=(18,8))
    box = sns.boxplot(data = df, x=feature, y='category')
    box.set_xlabel(feature,fontsize=20)
    box.set_ylabel('Category',fontsize=20)
    box.tick_params(labelsize=20)
    plt.show()

plt.figure(figsize=(18,8))
violin = sns.violinplot(data = df, x='delay', y='category')
violin.set_xlabel('Delay',fontsize=20)
violin.set_ylabel('Category',fontsize=20)
violin.tick_params(labelsize=20)
plt.show()

With Gunbreaker and Dancer arms as the exception once again, the totalBonus is pretty even across categories. Although it would seem that physical categories seem to perform slightly better than magical ones. LDA is much higher for the magical classes. This advantage makes sense as the magic types tend to hit harder (LDA increases with damage and autoAttack). Higher delay times balance high LDA scores. The exceptions to this are Marauder and Archer arms, which usually have a higher critical hit bonus as a tradeoff. 

Generally speaking, a high LDA is good, as it shows an appropriate increase in damage with leveling and a high delay is bad (more hits = greater damage per second). Let's combine these two features to simplify things further, dividing LDA (level-damage-attack) by delay to create a new multiplier. We can call this LDS (level-damage-speed). By doing this, harder-hitting weapons will be penalised for their slow attack speed. Let's then take a look at the relationship between this value and the categories. We can also display our totalBonus box plot again to see if their totalBonus balances out high LDS values.

In [None]:
df['LDS'] = df['LDA']/df['delay']

for feature in ['LDS', 'totalBonus']:
    plt.figure(figsize=(18,8))
    box = sns.boxplot(data = df, x=feature, y='category')
    box.set_xlabel(feature, fontsize=20)
    box.set_ylabel('Category',fontsize=20)
    box.tick_params(labelsize=20)
    plt.show()

There's a good balance between LDS and totalBonus, the arms for gladiators do the best overall for LDS, but the worst for totalBonus and the opposite is true for Dancer arms. The system isn't perfect, but by using these two performance indicators, we should be able to get a gauge for how balanced a weapon is.

## Overpowered vs Underpowered - Finding the Sweet Spot

Now that we've got some indicators for effectively rating weapons, we can try to find the right balance between totalBonus and LDS. totalBonus refers to the total of all of a weapons attribute bonuses and we can calculate LDS using the formula below:

## <h1><center>$LDS = \frac{damage + autoAttack}{level + delay}$</center></h1>

Let's start by plotting LDS and totalBonus on a scatter chart to see the relationship between these two variables.

In [None]:
plt.figure(figsize=(18,8))
scatter = sns.scatterplot(df['LDS'], df['totalBonus'])
scatter.set_xlabel('LDS', fontsize=20)
scatter.set_ylabel('totalBonus',fontsize=20)
scatter.tick_params(labelsize=15)
plt.show()

The higher each attribute is, the better. If the weapons were completely balanced, then our analysis would cluster the data points together much closer. Furthermore, we can see from the graph above that the two indicators are inversely proportionate. In many cases, we can split the data into clear lines that start near the high end of totalBonus and then follow a curved path down and to the right until arriving at the high end of LDS. In theory, the further right a weapon is, the more overpowered it is. So now, all we need to do is define a sweet spot.

By multiplying a logarithmic value for totalBonus by LDS and then plotting the data points on a distribution plot, we should feel where our sweet spot is.

In [None]:
df['totalBonus*LDS'] = np.log10(df['totalBonus']) * df['LDS'] 

plt.figure(figsize=(18,4))
count = sns.distplot(x = df['totalBonus*LDS'])
count.set_xlabel('Log of TotalBonus multiplied by LDS',fontsize=20)
count.set_ylabel('Density',fontsize=20)
count.tick_params(labelsize=20)
plt.show()

The distribution shows a fairly even distribution for our weapons, with balanced weapons in the middle, underpowered weapons on the left and overpowered weapons on the right. It looks like approximately 70% of data points fall into the densely packed center. Let's assume that the top and bottom 15% of weapons fall into our overpowered and underpowered clusters. We can figure out the numbers for these limits by finding the value for the 15th and 85th percentiles.

In [None]:
lower_limit = round(df['totalBonus*LDS'].quantile(.15), 3)
upper_limit = round(df['totalBonus*LDS'].quantile(.85), 3)
lower_limit, upper_limit

These numbers show that if we multiply the logarithmic totalBonus value for an arm by its LDS, the weapon is only balanced if the results falls between 0.546 and 0.644. Using these numbers, we can split our dataframe into three separate dataframes, an underpowered one, an overpowered one and a balanced one (these are included at the end if you'd like to see them. Once that's done, let's visualize the data again on our scatter plot.

In [None]:
balanced_df = df[(df['totalBonus*LDS'] > 0.546) & (df['totalBonus*LDS'] < 0.644)]
underpowered_df = df[df['totalBonus*LDS'] <= 0.546]
overpowered_df = df[df['totalBonus*LDS'] >= 0.644]

plt.figure(figsize=(18,8))
plt.scatter(underpowered_df['LDS'], underpowered_df['totalBonus'], color='cornflowerblue', label='Underpowered')
plt.scatter(balanced_df['LDS'], balanced_df['totalBonus'], color='thistle', label='Balanced')
plt.scatter(overpowered_df['LDS'], overpowered_df['totalBonus'], color='darkviolet', label='Overpowered')
plt.xlabel('LDS', fontsize=20)
plt.ylabel('totalBonus',fontsize=20)
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.legend(fontsize=20)
plt.show()

## Summary

The analysis has shown that by combining all of the attribute bonuses for a weapon and by calculating the relationship between its level, damage, autoAttack and delay, we can get our hands on a couple of performance indicators for a weapon (totalBonus and LDS). Furthermore, by observing the relationship between these two indicators, we can determine a range for balanced weapons. In this case, a weapon can be categorised as balanced if the log of totalBonus multiplied by LDS returns a value between 0.546 and 0.644.

The system isn't perfect, but it's a good starting point when looking at weapon balancing. There are some significant limitations of the system, though:
* The dataset used only considers weapons that are higher than level 250 instead of utilising the entire weapons database.
* The data doesn't include the number of materia which a weapon can hold (think of materia slots as attachment slots), so some weapons could be rated as overpowered by this system but have no materia slots, which would limit the weapon significantly.
* LDS accounts for the higher damage of high-level weapons by dividing damage by level. However, the totalBonus indicator isn't held back by a weapon's level, so high-level weapons are inherently more likely to be considered overpowered by this system.

### Overpowered Arms


In [None]:
pd.set_option('display.max_rows', None)
overpowered_df

### Underpowered Arms

In [None]:
underpowered_df

### Balanced Arms

In [None]:
balanced_df

### Web Scraper 

In [None]:
'''
# example page: https://eu.finalfantasyxiv.com/lodestone/playguide/db/item/?category2=1&page=2&min_item_lv=250
# robots.txt: No robots.txt file - scrape away!

import requests
import pandas as pd
from bs4 import BeautifulSoup

if __name__ == '__main__':
    # url list is used to store the urls for each page
    url_list = []

    # 35 pages for weapons, loop through these
    for page_number in range(1, 36):
        # print progress
        if page_number % 10 == 0:
            print(f"URL scraping ~ {page_number}/35")

        # add page number to url
        url = f"https://eu.finalfantasyxiv.com/lodestone/playguide/db/item/?category2=1&page={page_number}&min_item_lv=250"

        # get page soup
        r = requests.get(url)
        soup = BeautifulSoup(r.text, 'html.parser')

        # get the weapon links then append to url list
        for weapon in soup.find_all(class_='db_popup db-table__txt--detail_link'):
            url_list.append(weapon['href'])

    print('=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*')
    print('URL scraping complete')
    print('=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*')

    # create empty list for storing JSON objects
    JSON_list = []

    # loop through all of the weapon urls
    for index, url in enumerate(url_list):
        try:
            # print progress
            if index % 50 == 0 and index != 0:
                print(f"Weapon Scraping ~ {index}/{len(url_list)}")

            # get page soup
            r = requests.get(f"https://eu.finalfantasyxiv.com/{url}")
            soup = BeautifulSoup(r.text, 'html.parser')

            # get weapon name
            weapon = soup.find_all('h2')[-1].text.replace('\n', '').replace('\t', '')

            # determine if weapon is unique or tradeable
            unique_line = str(soup.find_all(class_='db-view__item__text__element en-gb'))
            if 'Unique' in unique_line:
                isUnique = 1
            else:
                isUnique = 0
            if 'Untradable' in unique_line:
                isTradeable = 0
            else:
                isTradeable = 1

            # get item category and level
            category = soup.find(class_='db-view__item__text__category').text
            level = soup.find(class_='db-view__item_level').text.split(' ')[-1]

            # get weapon attributes
            damage, autoAttack, delay = [float(val) for val in
                                         soup.find(class_='clearfix sys_nq_element').text.split('\n')[1:4]]

            # create JSON object
            weapon_JSON = {
                'name': weapon,
                'category': category,
                'level': level,
                'damage': damage,
                'autoAttack': autoAttack,
                'delay': delay,
                'isUnique': isUnique,
                'isTradeable': isTradeable
            }

            # get weapon bonuses - these are optional
            bonus_table = soup.find(class_='db-view__basic_bonus')
            bonuses = [val for val in bonus_table.text.split('\n') if val != '']

            if bonuses:
                # weapon has bonuses, add these to the dictionary
                for bonus in bonuses:
                    attribute = ''.join(bonus.split(' ')[:-1])
                    amount = bonus.split(' ')[-1].replace('+', '')
                    weapon_JSON[attribute] = float(int(amount))

            # append JSON object to list
            JSON_list.append(weapon_JSON)

        except Exception as e:
            print(f"{e} @ {url}")

    # convert JSON list to pandas dataframe
    output_df = pd.DataFrame.from_dict(JSON_list)

    # fill any missing values with 0, values will only be missing for bonuses
    output_df = output_df.fillna(0)

    # export to csv
    output_df.to_csv('ff14_weapons_250.csv', index=False)
'''