# **Data analysis using data from League of Legends champions**

[League of Legends](https://www.leagueoflegends.com/) is a multiplayer online battle arena video game developed and published by Riot Games. A game in which the player controls a character ("champion") with a set of unique abilities from an isometric perspective.

We will analyze some statistics of the champions of the **marksman** category using PySpark, which is an Apache Spark framework API developed for the Python programming language, and a notebook in Google Colab, wich is free cloud service created by google that allows anyone to write and run arbitrary Python code via the browser and is especially suited for machine learning, data analysis and education. 

All data used were extracted from the [Data Dragon](https://developer.riotgames.com/docs/lol) web API, a public API by Riot Games.    
<br>
<br>
**Starting from the beginning**

At first we will need to install the three libraries that we will be using at this analysis, wich are: [Requests](https://requests.readthedocs.io/en/latest/), [PySpark](https://spark.apache.org/docs/latest/api/python/index.html) and [pandas](https://pandas.pydata.org/docs/).

(as i already have it installed, it will just verify that the requirements are already satisfieds)

*Requests* is an HTTP library for the Python programming language. The project's goal is to make HTTP requests simpler and easier to use;<br>
*PySpark* is an interface for Apache Spark in Python. It not only allows you to write Spark applications using Python APIs, but also provides the PySpark shell for interactively analyzing your data in a distributed environment;<br>
*pandas* is a software library created for the Python language for data manipulation and analysis. In particular, it offers structures and operations for manipulating numeric tables and time series.





In [24]:
!pip install requests
!pip install pyspark
!pip install pandas

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


# **Importing**

Now, with all the libraries properly installed, we will make the necessary imports 

In [25]:
import requests
import pandas as pd
from pyspark.sql import SparkSession
from pyspark.sql import functions as F

With the SparkSession already imported, we will instantiate the SparkSession class through a series of chained methods, such as **appName** and **getOrCreate**.

In [26]:
spark = (SparkSession.builder
         .appName("League of Legends Marksman Champions Data Analysis")
         .getOrCreate()
         )


# **Extracting the champions data**

For extracting the data from League of Legends champions is done via an HTTP request to a [Data Dragon](https://developer.riotgames.com/docs/lol) API endpoint, a public Riot Games API that centralizes game data, such as champions, items, spells and ETC.

As an answer from the request, we get a JSON file at this sample format:

<pre>{
  "type":"champion",
  "format":"standAloneComplex",
  "version":"12.23.1",
  "data":{
    "Aatrox": {},
    "Ahri": {],
    "Akali": {],
    "Akshan": {],
    "Alistar": {],
    ...,
    }
}

In [27]:
r=requests.get(
"http://ddragon.leagueoflegends.com/cdn/12.23.1/data/en_US/champion.json")

champions=r.json().get("data")

**Structure example**

Now that we have all the data from the requests inside the key **"data"**
we will get all the champions name data (keys), and get one champion for have an example of the full structure.

In [28]:
champions.keys()
champions.get("Draven")

{'version': '12.23.1',
 'id': 'Draven',
 'key': '119',
 'name': 'Draven',
 'title': 'the Glorious Executioner',
 'blurb': 'In Noxus, warriors known as Reckoners face one another in arenas where blood is spilled and strength tested—but none has ever been as celebrated as Draven. A former soldier, he found that the crowds uniquely appreciated his flair for the dramatic, and...',
 'info': {'attack': 9, 'defense': 3, 'magic': 1, 'difficulty': 8},
 'image': {'full': 'Draven.png',
  'sprite': 'champion0.png',
  'group': 'champion',
  'x': 192,
  'y': 96,
  'w': 48,
  'h': 48},
 'tags': ['Marksman'],
 'partype': 'Mana',
 'stats': {'hp': 675,
  'hpperlevel': 104,
  'mp': 361,
  'mpperlevel': 39,
  'movespeed': 330,
  'armor': 29,
  'armorperlevel': 4.5,
  'spellblock': 30,
  'spellblockperlevel': 1.3,
  'attackrange': 550,
  'hpregen': 3.75,
  'hpregenperlevel': 0.7,
  'mpregen': 8.05,
  'mpregenperlevel': 0.65,
  'crit': 0,
  'critperlevel': 0,
  'attackdamage': 62,
  'attackdamageperlevel': 

# **TRANSFORMING AND CLEANING THE DATA**

Before starting the analysis, we have to clean the data to keep only what will be useful for our purposes. For that, we will keep only the necessary columns.

In [29]:
champions=[{'name': value['name'], 'title': value['title'], 'tags': value['tags'], **value['info'], **value['stats']} for key, value in champions.items()]

Now, as our goal is to analyze only the champions of the marksman category, we will create the dataframe using the **pandas library** that we previously imported and filter the tags, where the result will bring us only the champions that actually appear in that category.

In [30]:
dff = pd.DataFrame(champions)
dff = dff[dff['tags'].isin([['Marksman']])]
dff

Unnamed: 0,name,title,tags,attack,defense,magic,difficulty,hp,hpperlevel,mp,...,hpregen,hpregenperlevel,mpregen,mpregenperlevel,crit,critperlevel,attackdamage,attackdamageperlevel,attackspeedperlevel,attackspeed
8,Aphelios,the Weapon of the Faithful,[Marksman],6,2,1,10,580,102,348.0,...,3.25,0.55,6.5,0.4,0,0,55,3.0,2.1,0.64
17,Caitlyn,the Sheriff of Piltover,[Marksman],8,2,2,6,580,107,315.0,...,3.5,0.55,7.4,0.55,0,0,62,3.8,4.0,0.681
21,Corki,the Daring Bombardier,[Marksman],8,3,6,6,588,105,350.0,...,5.5,0.55,7.4,0.55,0,0,55,2.8,2.3,0.638
24,Draven,the Glorious Executioner,[Marksman],9,3,1,8,675,104,361.0,...,3.75,0.7,8.05,0.65,0,0,62,3.6,2.7,0.679
38,Graves,the Outlaw,[Marksman],8,5,3,3,625,106,325.0,...,8.0,0.7,8.0,0.7,0,0,68,4.0,2.6,0.475
50,Jinx,the Loose Cannon,[Marksman],9,2,4,6,630,100,245.0,...,3.75,0.5,6.7,1.0,0,0,59,3.4,1.0,0.625
51,Kai'Sa,Daughter of the Void,[Marksman],8,5,3,6,670,102,344.88,...,3.5,0.55,8.2,0.45,0,0,59,2.0,1.8,0.644
52,Kalista,the Spear of Vengeance,[Marksman],8,2,4,7,574,114,300.0,...,3.75,0.55,6.3,0.4,0,0,66,3.5,4.5,0.694
61,Kindred,The Eternal Hunters,[Marksman],8,2,2,4,610,99,300.0,...,7.0,0.55,7.0,0.4,0,0,65,2.5,3.5,0.625
70,Lucian,the Purifier,[Marksman],8,5,3,6,641,100,349.0,...,3.75,0.65,8.18,0.7,0,0,60,2.9,3.3,0.638


Having the dataframe created with the pandas, we will again use the **spark** so we can manage our dataframe better.

In [31]:
df = spark.createDataFrame(dff)

# **Concatenating columns**

With our spark dataframe created, we will concatenate some colums, so we have an more specific full name of each champion, and add to the dataframe with the other columns so we can do the analysis.

In [32]:
df = df.withColumn("full_name", F.concat(df.name, F.lit(", "), df.title))
df.select("full_name").show()

+--------------------+
|           full_name|
+--------------------+
|Aphelios, the Wea...|
|Caitlyn, the Sher...|
|Corki, the Daring...|
|Draven, the Glori...|
|  Graves, the Outlaw|
|Jinx, the Loose C...|
|Kai'Sa, Daughter ...|
|Kalista, the Spea...|
|Kindred, The Eter...|
|Lucian, the Purifier|
|Miss Fortune, the...|
|Samira, the Deser...|
|Sivir, the Battle...|
|    Xayah, the Rebel|
|Zeri, The Spark o...|
+--------------------+



In [33]:
base_columns = ["attackdamage", "armor", "hp", "mp"]

(df.orderBy(*base_columns, ascending=False)
 .select("full_name", *base_columns)
 .show()
 )

+--------------------+------------+-----+---+------+
|           full_name|attackdamage|armor| hp|    mp|
+--------------------+------------+-----+---+------+
|  Graves, the Outlaw|          68|   33|625| 325.0|
|Kalista, the Spea...|          66|   24|574| 300.0|
|Kindred, The Eter...|          65|   29|610| 300.0|
|Draven, the Glori...|          62|   29|675| 361.0|
|Caitlyn, the Sher...|          62|   28|580| 315.0|
|Lucian, the Purifier|          60|   28|641| 349.0|
|    Xayah, the Rebel|          60|   25|660| 340.0|
|Kai'Sa, Daughter ...|          59|   28|670|344.88|
|Jinx, the Loose C...|          59|   26|630| 245.0|
|Sivir, the Battle...|          58|   26|600| 340.0|
|Samira, the Deser...|          57|   26|600| 349.0|
|Corki, the Daring...|          55|   28|588| 350.0|
|Aphelios, the Wea...|          55|   26|580| 348.0|
|Zeri, The Spark o...|          53|   24|630| 250.0|
|Miss Fortune, the...|          52|   28|640| 300.0|
+--------------------+------------+-----+---+-

# **ANALYZING THE DATA**

For the analysis, we will look into three different scenarios.<br>
Anyone who's played League of Legends for a while has noticed that there are champions that are stronger or weaker based on how much time has passed. 
To better understand this, we will place the marksman in the three most common scenarios, which are the beginning, middle and end of the match.<br> In a typical League of Legends game, characters start at level 1 and go up to a maximum level of 18. Dividing it into three scenarios, we will use level 1 for the beginning of the game, level 10 for the middle of the game and finally level 18 for the end of the game. <br>
With this, we will be able to understand a little better the strength of each champion during the match based on the basic strength data of each one.


**FIRST SCENARIO**<br>
Early Game:<br>
level = 1

In [34]:
level = 1

df2 = df.withColumns({
    "attackdamage": df.attackdamage+df.attackdamageperlevel*level,
    "armor": df.armor+df.armorperlevel*level,
    "hp": df.hp+df.hpperlevel*level,
    "mp": df.mp+df.mpperlevel*level
})

In [35]:
(df2.orderBy(*base_columns, ascending=False)
 .select("full_name", *base_columns)
 .show()
 )

+--------------------+------------+-----+---+------+
|           full_name|attackdamage|armor| hp|    mp|
+--------------------+------------+-----+---+------+
|  Graves, the Outlaw|        72.0| 37.6|731| 365.0|
|Kalista, the Spea...|        69.5| 29.2|688| 345.0|
|Kindred, The Eter...|        67.5| 33.7|709| 335.0|
|Caitlyn, the Sher...|        65.8| 32.7|687| 350.0|
|Draven, the Glori...|        65.6| 33.5|779| 400.0|
|    Xayah, the Rebel|        63.5| 29.2|762| 380.0|
|Lucian, the Purifier|        62.9| 32.2|741| 387.0|
|Jinx, the Loose C...|        62.4| 30.7|730| 290.0|
|Kai'Sa, Daughter ...|        61.0| 32.2|772|382.88|
|Sivir, the Battle...|        60.8|30.45|704| 380.0|
|Samira, the Deser...|        60.3| 30.7|708| 387.0|
|Aphelios, the Wea...|        58.0| 30.2|682| 390.0|
|Corki, the Daring...|        57.8| 32.7|693| 404.0|
|Zeri, The Spark o...|        54.5| 28.2|745| 295.0|
|Miss Fortune, the...|        54.4| 32.2|743| 335.0|
+--------------------+------------+-----+---+-

***Some power stats at this current stage of the game.***<br>
We will determine the **maximum** attack damage and armor, the **minimum** hp and the **average** mp.<br>

For that we will use the ***agg*** method. This method receives a dictionary as a parameter, where the keys are the names of the columns we want to analyze and the values are the functions we want to apply to them.

In [36]:
(df2.agg({
    "attackdamage": "max",
    "hp": "min",
    "mp": "mean",
    "armor": "max"
})
    .show()
)

+------------------+----------+-----------------+-------+
|           avg(mp)|max(armor)|max(attackdamage)|min(hp)|
+------------------+----------+-----------------+-------+
|361.72533333333337|      37.6|             72.0|    682|
+------------------+----------+-----------------+-------+



**SECOND SCENARIO**<br>
Mid Game:<br>
level = 10

In [37]:
level = 10

df3 = df.withColumns({
    "attackdamage": df.attackdamage+df.attackdamageperlevel*level,
    "armor": df.armor+df.armorperlevel*level,
    "hp": df.hp+df.hpperlevel*level,
    "mp": df.mp+df.mpperlevel*level
})

In [38]:
(df3.orderBy(*base_columns, ascending=False)
 .select("full_name", *base_columns)
 .show()
 )

+--------------------+------------+-----+----+------+
|           full_name|attackdamage|armor|  hp|    mp|
+--------------------+------------+-----+----+------+
|  Graves, the Outlaw|       108.0| 79.0|1685| 725.0|
|Kalista, the Spea...|       101.0| 76.0|1714| 750.0|
|Caitlyn, the Sher...|       100.0| 75.0|1650| 665.0|
|Draven, the Glori...|        98.0| 74.0|1715| 751.0|
|    Xayah, the Rebel|        95.0| 67.0|1680| 740.0|
|Jinx, the Loose C...|        93.0| 73.0|1630| 695.0|
|Kindred, The Eter...|        90.0| 76.0|1600| 650.0|
|Samira, the Deser...|        90.0| 73.0|1680| 729.0|
|Lucian, the Purifier|        89.0| 70.0|1641| 729.0|
|Sivir, the Battle...|        86.0| 70.5|1640| 740.0|
|Aphelios, the Wea...|        85.0| 68.0|1600| 768.0|
|Corki, the Daring...|        83.0| 75.0|1638| 890.0|
|Kai'Sa, Daughter ...|        79.0| 70.0|1690|724.88|
|Miss Fortune, the...|        76.0| 70.0|1670| 650.0|
|Zeri, The Spark o...|        68.0| 66.0|1780| 700.0|
+--------------------+------

***Some power stats at this current stage of the game.***<br>
We will determine the **maximum** attack damage and armor, the **minimum** hp and the **average** mp, using ***agg*** method again.

In [39]:
(df3.agg({
    "attackdamage": "max",
    "hp": "min",
    "mp": "mean",
    "armor": "max"
})
    .show()
)

+-----------------+----------+-----------------+-------+
|          avg(mp)|max(armor)|max(attackdamage)|min(hp)|
+-----------------+----------+-----------------+-------+
|727.1253333333334|      79.0|            108.0|   1600|
+-----------------+----------+-----------------+-------+



**THIRD SCENARIO**<br>
Late Game:<br>
level = 18

In [40]:
level = 18

df4 = df.withColumns({
    "attackdamage": df.attackdamage+df.attackdamageperlevel*level,
    "armor": df.armor+df.armorperlevel*level,
    "hp": df.hp+df.hpperlevel*level,
    "mp": df.mp+df.mpperlevel*level
})

In [41]:
(df4.orderBy(*base_columns, ascending=False)
 .select("full_name", *base_columns)
 .show()
 )

+--------------------+------------------+------------------+----+-------+
|           full_name|      attackdamage|             armor|  hp|     mp|
+--------------------+------------------+------------------+----+-------+
|  Graves, the Outlaw|             140.0|             115.8|2533| 1045.0|
|Caitlyn, the Sher...|130.39999999999998|112.60000000000001|2506|  945.0|
|Kalista, the Spea...|             129.0|117.60000000000001|2626| 1110.0|
|Draven, the Glori...|             126.8|             110.0|2547| 1063.0|
|    Xayah, the Rebel|             123.0|100.60000000000001|2496| 1060.0|
|Jinx, the Loose C...|120.19999999999999|110.60000000000001|2430| 1055.0|
|Samira, the Deser...|             116.4|110.60000000000001|2544| 1033.0|
|Lucian, the Purifier|112.19999999999999|103.60000000000001|2441| 1033.0|
|Kindred, The Eter...|             110.0|113.60000000000001|2392|  930.0|
|Aphelios, the Wea...|             109.0|101.60000000000001|2416| 1104.0|
|Sivir, the Battle...|             108

***Some power stats at this current stage of the game.***<br>
We will determine the **maximum** attack damage and armor, the **minimum** hp and the **average** mp, using ***agg*** method again.

In [42]:
(df4.agg({
    "attackdamage": "max",
    "hp": "min",
    "mp": "mean",
    "armor": "max"
})
    .show()
)

+------------------+------------------+-----------------+-------+
|           avg(mp)|        max(armor)|max(attackdamage)|min(hp)|
+------------------+------------------+-----------------+-------+
|1051.9253333333334|117.60000000000001|            140.0|   2392|
+------------------+------------------+-----------------+-------+



# **Last but not least**

So that's it for this analysis.
Here we could see the differences between the marksman champions during the game,<br> and understand why some are **OVERPOWERED** in the early/mid game, and others need to wait more time to help the team.

Hope you liked it, and see you next time! ♥