# Database creation

The data used in this project can be created by running two files: getMatchID.py and createData.py, respectively. However, Riot's API only allows a limit of 300 requests for every 2 minutes. So, re-creating the dataset that has been used in this project will take many days! 

That method is not recommended. Instead, this Notebook was created to help readers better understand how we collected and process data, as well as provide a simpler, more efficient way of checking the original method.

In [11]:
import pandas as pd
import requests
import json

By running the getMatchId.py in the folder. We can generate a text file contain numbers of match_id that we can use to call out the data from Riot's api. For testing purposes, we only take one match in the list of over 150,000 matches.

In [3]:
f = open('Data/matchids.txt')
for line in f:
    print(line)
    break

4241678498



In [None]:
key = 'RGAPI-2e386383-b1bc-4f52-88ea-51f69387012e'

Next, we need the key provided by Riot to be able to use their APIs. Please note that each key is only valid for 24 hours. Therefore, you need to create an account and generate a new key by accessing the link below:
https://developer.riotgames.com
The above key is no longer valid at the present time.

Unfortunately, the Riot does not allow us to access a request multiple matches at one time. Instead, we can only retrieve the details of one match each time we request to the API. Therefore, the only way to build a database is to repeatedly access the same API using different match_id. This makes database creation extremely time-consuming


With match_id and key ready. Let's try using Riot's api.

In [10]:
match_id = '4241678498'
response = requests.get("https://euw1.api.riotgames.com/lol/match/v4/matches/{}?api_key={}".format(match_id, key)) 
match = response.json()
print(match)

{'gameId': 4241678498, 'platformId': 'EUW1', 'gameCreation': 1571779464690, 'gameDuration': 1216, 'queueId': 420, 'mapId': 11, 'seasonId': 13, 'gameVersion': '9.20.292.2452', 'gameMode': 'CLASSIC', 'gameType': 'MATCHED_GAME', 'teams': [{'teamId': 100, 'win': 'Fail', 'firstBlood': False, 'firstTower': False, 'firstInhibitor': False, 'firstBaron': False, 'firstDragon': False, 'firstRiftHerald': False, 'towerKills': 0, 'inhibitorKills': 0, 'baronKills': 0, 'dragonKills': 0, 'vilemawKills': 0, 'riftHeraldKills': 0, 'dominionVictoryScore': 0, 'bans': [{'championId': 119, 'pickTurn': 1}, {'championId': 90, 'pickTurn': 2}, {'championId': 25, 'pickTurn': 3}, {'championId': 111, 'pickTurn': 4}, {'championId': 23, 'pickTurn': 5}]}, {'teamId': 200, 'win': 'Win', 'firstBlood': True, 'firstTower': True, 'firstInhibitor': False, 'firstBaron': False, 'firstDragon': True, 'firstRiftHerald': True, 'towerKills': 7, 'inhibitorKills': 0, 'baronKills': 0, 'dragonKills': 2, 'vilemawKills': 0, 'riftHeraldKil

Wow. Mặc dù đây chỉ là một trận đấu nhưng lượng dữ liệu trả về thật không lồ. Chắc chắn rằng trong số feature này, có rất nhiều thông tin mà chúng ta không cần tới. Với sự am hiểu khá rõ về Leagued of leagend, nhóm tác giả đã quyết định chọn lựa ra 48 thông tin quan trọng nhất với một trận đấu.

Wow. Although this is just a match, the amount of data returned is immense. Certainly among these features, there's a lot of information we don't need. With a good understanding of Leagued of legend, the team decided to select the 48 most important information for a match.


General information about the match includes:

And player information such as:

The new problem we have here is that the return information is in the form:

```
{ 'match_id' : int,

   'player1' : list of indvidual information,
   
    player2' : list of indvidual information,
    
    player3' : list of indvidual information,
    .
    .
    .
}
```

Like football, league of legends is a team game. So it doesn't make much sense if we analyze different players. A player cannot make a big impact on the outcome of a match. So we will focus on performance analysis of the entire team. Therefore, information of 10 players needs to be combined and divided into 2 teams, blue and red.


In [8]:
clean_data = {}


# General data, these data do not belong to anyteam.
clean_data['gameId'] = [match['gameId']]
clean_data['gameDuration'] = [match['gameDuration']]



# Feature you want from match['participants']['stats']
wanted_data = ['win', 'wardsPlaced', 'wardsKilled', 'deaths', 'kills',
               'assists', 'totalDamageDealtToChampions', 'goldEarned', 'totalMinionsKilled',
               'champLevel', 'neutralMinionsKilled','killingSprees', 
               'totalHeal', 'damageDealtToObjectives'] 
for player in match['participants']:
    # Determent which team this player belongs:
    if player['teamId'] == 100:
        team_name = 'blue'
    else: team_name = 'red'
    
    for i in wanted_data:
        name = team_name + i.capitalize() # set name for clean_data dict
        if name not in clean_data:
            clean_data[name] = [0]
        if type(player['stats'][i]) is bool:
            clean_data[name][0] = int(player['stats'][i])
        else:
            clean_data[name][0] += player['stats'][i]

            
# Feature you want from match['participants']['teams']
wanted_data = ['firstBaron', 'firstBlood', 'firstTower', 'firstInhibitor', 'inhibitorKills', 'dragonKills',
              'towerKills', 'firstDragon', 'baronKills']           
for team in match['teams']:
    # Determent which team:
    if team['teamId'] == 100:
        team_name = 'blue'
    else: team_name = 'red'
    
    for i in wanted_data:
        name = team_name + i.capitalize() # set name for clean_data dict
        if name not in clean_data:
            clean_data[name] = [0]
        if type(team[str(i)]) is bool:
            clean_data[name][0] = int(team[str(i)])
        else:
            clean_data[name][0] += int(team[str(i)])
    
print(clean_data)

{'gameId': [4241678498], 'gameDuration': [1216], 'blueWin': [0], 'blueWardsplaced': [27], 'blueWardskilled': [8], 'blueDeaths': [27], 'blueKills': [9], 'blueAssists': [5], 'blueTotaldamagedealttochampions': [29450], 'blueGoldearned': [30168], 'blueTotalminionskilled': [401], 'blueChamplevel': [53], 'blueNeutralminionskilled': [94], 'blueKillingsprees': [1], 'blueTotalheal': [7680], 'blueDamagedealttoobjectives': [8720], 'redWin': [1], 'redWardsplaced': [43], 'redWardskilled': [10], 'redDeaths': [9], 'redKills': [27], 'redAssists': [27], 'redTotaldamagedealttochampions': [45074], 'redGoldearned': [43623], 'redTotalminionskilled': [429], 'redChamplevel': [60], 'redNeutralminionskilled': [128], 'redKillingsprees': [6], 'redTotalheal': [21713], 'redDamagedealttoobjectives': [51909], 'blueFirstbaron': [0], 'blueFirstblood': [0], 'blueFirsttower': [0], 'blueFirstinhibitor': [0], 'blueInhibitorkills': [0], 'blueDragonkills': [0], 'blueTowerkills': [0], 'blueFirstdragon': [0], 'blueBaronkills'

All selected attributes can be written to the wanted_data list as above. And the scrip will select and classify each data itself. Finally we have a dictionary that looks pretty clean. Try converting into dataframe

In [9]:
df = pd.DataFrame.from_dict(clean_data)
display(df)

Unnamed: 0,gameId,gameDuration,blueWin,blueWardsplaced,blueWardskilled,blueDeaths,blueKills,blueAssists,blueTotaldamagedealttochampions,blueGoldearned,...,blueBaronkills,redFirstbaron,redFirstblood,redFirsttower,redFirstinhibitor,redInhibitorkills,redDragonkills,redTowerkills,redFirstdragon,redBaronkills
0,4241678498,1216,0,27,8,27,9,5,29450,30168,...,0,0,1,1,0,0,2,7,1,0


Great. So we have successfully used the riot api and converted the data to a dataframe. The only thing left to create a complete database is just to put everything in the loop and wait for the results.