In this notebook the data will be analyted using subgroup discovery methods, with the focus of splitting the data into short, medium/normal, and long games and using subgroup discovery to find out the champions/team compositions most used in each population of games. 

To allow for that each row in the dataframe will get a column with the respective champions on each role in that game on that team 

In [75]:
import pandas as pd
import pysubgroup as ps

In [76]:
#Load our dataset/frame
df = pd.read_csv('./out/df.csv')
game_player_stats = pd.read_csv('./out/game_player_stats.csv')

For each game we have ten entries, we can use for each row of the df the combination of gameid and teamid to get to five rows and then go over each position
Maybe there is also a way to group them by gameid, teamid and position to directly get the correct entry
-->  use pivot table and index gameplayer stats by gameid and teamid to create a new dataframe with columns being the positions and the index gameid and teamid and fill those columns with the values in the champion column from gameplayerstats

In [77]:
#Include the champions for each role of a game in a dataframe 
champions_df = (
    game_player_stats
        .pivot_table(
            index=['gameid', 'teamid'], 
            columns='position', 
            values='champion',
            aggfunc='first'
        )
        .reset_index()
)

#Merge the champions df into the normal df 
df = df.merge(champions_df, on=['gameid', 'teamid'], how='left')
print(df.dtypes)

gameid                       object
teamid                       object
result                        int64
side                         object
kills                         int64
deaths                        int64
assists                       int64
visionscore                 float64
earnedgold                  float64
golddiffat15                float64
total cs                    float64
wardsplaced                 float64
wardskilled                 float64
wcpm                        float64
damagetochampions           float64
side_adv                      int64
adc_golddiffat15            float64
adc_damagetochampions       float64
adc_earnedgold              float64
adc_damagetakenperminute    float64
adc_dpm                     float64
adc_damageshare             float64
adc_kills                     int64
adc_deaths                    int64
adc_assists                   int64
adc_golddiffat20            float64
adc_golddiffat25            float64
jng_golddiffat15            

To split everything into short, normal and long games in terms of time we first need to decide what a normal, short and long game is 
For that we take the mean gamelength as well as the standard deviation over all our games, then our upper bound is the mean + std and our lower bound is the mean - std. Everything inside that is a normal game anything smaller is a short game and everything larger is a long game

In [78]:
#To split into normal long and short games we first need to find out what a normal game is 
mean_gamelength = df['gamelength'].mean()
std_gamelength = df['gamelength'].std()

upper_bound = mean_gamelength + std_gamelength
lower_bound = mean_gamelength - std_gamelength

print(f'Mean gamelength: {mean_gamelength} seconds, {mean_gamelength / 60} minutes')
print(f'std gamelength: {std_gamelength} seconds, {std_gamelength / 60} minutes')
print(f'Upper bound normal gamelength: {upper_bound} seconds, {upper_bound / 60} minutes')
print(f'lower bound normal gamelength: {lower_bound} seconds, {lower_bound / 60} minutes')

''' 
We say anything inbetween 27 minutes and 39 minutes can be considered a normal match 
'''

Mean gamelength: 1999.2420084137023 seconds, 33.32070014022837 minutes
std gamelength: 358.3217314879099 seconds, 5.972028858131832 minutes
Upper bound normal gamelength: 2357.5637399016123 seconds, 39.292728998360204 minutes
lower bound normal gamelength: 1640.9202769257922 seconds, 27.348671282096536 minutes


' \nWe say anything inbetween 27 minutes and 39 minutes can be considered a normal match \n'

In [79]:
#We create binary columns for long and short games since they work best with pysubgroup and normal games are those where both are false 
df['is_long'] = df['gamelength'] > upper_bound
df['is_short'] = df['gamelength'] < lower_bound

df['is_normal'] = ((df['is_long'] == 0) & (df['is_short'] == 0))


In [80]:
#Try to look at only recent games so larger than 2024
df["date"] = pd.to_datetime(df["date"])

df_recent = df[
    (df["date"].dt.year >= 2025)
]


the idea with the subgroup discovery is to see whether we can find interesting subgroups in the population of games for the champions played

In [82]:
#start with short games
#target = ps.BinaryTarget('result', False)
target = ps.NumericTarget('golddiffat15')
#target = ps.BinaryTarget('is_long', True)

#For now set searchspace to champions on roles 
searchspace = (
    [ps.EqualitySelector('top', c) for c in df.top.unique()] +
    [ps.EqualitySelector('mid', c) for c in df.mid.unique()] +
    [ps.EqualitySelector('jng', c) for c in df.jng.unique()] +
    [ps.EqualitySelector('bot', c) for c in df.bot.unique()] +
    [ps.EqualitySelector('sup', c) for c in df.sup.unique()]
)

#create automatic selectors 
#searchspace = ps.create_selectors(df, ignore=["gamelength", "is_short", "is_long", 'predicted_prob_late', 'predicted_result_late', 'gameid', 'teamid', 'date'])
#searchspace = ps.create_selectors(df, ignore=['result', 'deaths', 'win_prob', 'predicted_prob_late', 'predicted_result_late', 'gameid', 'teamid', 'date'])
searchspace = ps.create_selectors(df, ignore=['result','golddiffat15', 'predicted_prob_late', 'predicted_result_late', 'gameid', 'teamid', 'date'])
#searchspace = ps.create_selectors(df, ignore=['result', 'gameid', 'date', 'teamid'])
#run the discovery
task = ps.SubgroupDiscoveryTask (
    df,
    target,
    searchspace,
    result_set_size=5,
    depth=5,
    qf=ps.StandardQFNumeric()
)
#WRA for binary targets and standardqf for numeric (standadqf is just mean subgroup - mean dataset)
result = ps.DFS().execute(task)

print(result.to_dataframe())

TypeError: StandardQFNumeric.__init__() missing 1 required positional argument: 'a'

target numeric golddiffat15 without result in ignore 
     quality                                         subgroup  size_sg  \
0  9217210.0                                        result==1     7931   
1  9160524.0                     is_long==False AND result==1     7059   
2  7180491.0                        playoffs==0 AND result==1     6040   
3  6415199.0                    is_short==False AND result==1     6947   
4  6370631.0  jng_damagetakenperminute>=1070.59 AND result==1     5665   
5  6307887.0                          jng_golddiffat20>=961.0     3095   
6  6298360.0                                        kills>=21     5053   
7  6280200.0                          kills>=21 AND result==1     4544   
8  6221032.0                     is_long==False AND kills>=21     4208   
9  6154756.0                         jng_golddiffat25>=1381.0     3199   

   size_dataset      mean_sg  mean_dataset       std_sg  std_dataset  \
0         15862  1162.175009           0.0  2000.569698  2313.639961   
1         15862  1297.708457           0.0  1985.429877  2313.639961   
2         15862  1188.823013           0.0  2004.242335  2313.639961   
3         15862   923.448827           0.0  1877.621420  2313.639961   
4         15862  1124.559753           0.0  2029.770343  2313.639961   
5         15862  2038.089499           0.0  2008.800933  2313.639961   
6         15862  1246.459529           0.0  2157.405485  2313.639961   
7         15862  1382.086268           0.0  2117.795572  2313.639961   
8         15862  1478.382129           0.0  2138.068940  2313.639961   
9         15862  1923.962488           0.0  2043.755542  2313.639961   

   median_sg  median_dataset  max_sg  max_dataset  min_sg  min_dataset  \
...
6        inf          inf  
7        inf          inf  
8        inf          inf  
9        inf          inf  
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...
/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pysubgroup/numeric_target.py:121: RuntimeWarning: divide by zero encountered in scalar divide
  statistics["mean_lift"] = statistics["mean_sg"] / statistics["mean_dataset"]
/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pysubgroup/numeric_target.py:123: RuntimeWarning: divide by zero encountered in scalar divide
  statistics["median_sg"] / statistics["median_dataset"]

    For short games is true
    quality         subgroup  size_sg  size_dataset  positives_sg  \
0  0.002081    bot=='Kai'Sa'    10911        139772          1764   
1  0.002048  sup=='Nautilus'    15448        139772          2372   
2  0.001993      bot=='Jinx'     6491        139772          1155   
3  0.001670  bot=='Aphelios'    10106        139772          1598   
4  0.001524     jng=='Viego'     6303        139772          1064   

   positives_dataset  size_complement  relative_size_sg  \
0              18872           128861          0.078063   
1              18872           124324          0.110523   
2              18872           133281          0.046440   
3              18872           129666          0.072303   
4              18872           133469          0.045095   

   relative_size_complement  coverage_sg  coverage_complement  \
0                  0.921937     0.093472             0.906528   
1                  0.889477     0.125689             0.874311   
2                  0.953560     0.061202             0.938798   
3                  0.927697     0.084676             0.915324   
4                  0.954905     0.056380             0.943620   

   target_share_sg  target_share_complement  target_share_dataset      lift  
0         0.161672                 0.132763               0.13502  1.197392  
1         0.153547                 0.132718               0.13502  1.137220  
2         0.177939                 0.132930               0.13502  1.317870  
3         0.158124                 0.133219               0.13502  1.171116  
4         0.168809                 0.133424               0.13502  1.250249  

Long games is true
    quality        subgroup  size_sg  size_dataset  positives_sg  \
0  0.003667  jng=='Rek'Sai'     4941        139772          1250   
1  0.003498   jng=='Gragas'     5837        139772          1360   
2  0.003335    jng=='Elise'     4689        139772          1166   
3  0.002833    bot=='Sivir'     5474        139772          1213   
4  0.002747   top=='Maokai'     2794        139772           801   

   positives_dataset  size_complement  relative_size_sg  \
0              20860           134831          0.035350   
1              20860           133935          0.041761   
2              20860           135083          0.033547   
3              20860           134298          0.039164   
4              20860           136978          0.019990   

   relative_size_complement  coverage_sg  coverage_complement  \
0                  0.964650     0.059923             0.940077   
1                  0.958239     0.065197             0.934803   
2                  0.966453     0.055896             0.944104   
3                  0.960836     0.058150             0.941850   
4                  0.980010     0.038399             0.961601   

   target_share_sg  target_share_complement  target_share_dataset      lift  
0         0.252985                 0.145441              0.149243  1.695122  
1         0.232996                 0.145593              0.149243  1.561188  
2         0.248667                 0.145792              0.149243  1.666189  
3         0.221593                 0.146294              0.149243  1.484779  
4         0.286686                 0.146440              0.149243  1.920932  

automatic selector with is long 
    quality                                           subgroup  size_sg  \
0  0.080146                                   total cs>=1200.0     2917   
1  0.078662    damagetochampions>=91493.0 AND total cs>=1200.0     2358   
2  0.078142            total cs>=1200.0 AND visionscore>=295.0     2433   
3  0.076656  damagetochampions>=91493.0 AND visionscore>=295.0     2893   
4  0.076456                                 visionscore>=295.0     3695   
5  0.074352           earnedgold>=45050.0 AND total cs>=1200.0     2225   
6  0.073323  adc_damagetochampions>=26880.0 AND total cs>=1...     2055   
7  0.072518         earnedgold>=45050.0 AND visionscore>=295.0     2435   
8  0.071459  adc_damagetochampions>=26880.0 AND visionscore...     2515   
9  0.070377                                earnedgold>=45050.0     3626   

   size_dataset  positives_sg  positives_dataset  size_complement  \
0         15862          1592               1744            12945   
1         15862          1507               1744            13504   
2         15862          1507               1744            13429   
3         15862          1534               1744            12969   
4         15862          1619               1744            12167   
5         15862          1424               1744            13637   
6         15862          1389               1744            13807   
7         15862          1418               1744            13427   
8         15862          1410               1744            13347   
9         15862          1515               1744            12236   

   relative_size_sg  relative_size_complement  coverage_sg  \
0          0.183899                  0.816101     0.912844   
1          0.148657                  0.851343     0.864106   
2          0.153385                  0.846615     0.864106   
3          0.182386                  0.817614     0.879587   
4          0.232947                  0.767053     0.928326   
5          0.140272                  0.859728     0.816514   
6          0.129555                  0.870445     0.796445   
7          0.153512                  0.846488     0.813073   
8          0.158555                  0.841445     0.808486   
9          0.228597                  0.771403     0.868693   

   coverage_complement  target_share_sg  target_share_complement  \
0             0.087156         0.545766                 0.011742   
1             0.135894         0.639101                 0.017550   
2             0.135894         0.619400                 0.017648   
3             0.120413         0.530245                 0.016192   
4             0.071674         0.438160                 0.010274   
5             0.183486         0.640000                 0.023466   
6             0.203555         0.675912                 0.025712   
7             0.186927         0.582341                 0.024279   
8             0.191514         0.560636                 0.025024   
9             0.131307         0.417816                 0.018715   

   target_share_dataset      lift  
0              0.109948  4.963844  
1              0.109948  5.812740  
2              0.109948  5.633556  
3              0.109948  4.822679  
4              0.109948  3.985143  
5              0.109948  5.820917  
6              0.109948  6.147547  
7              0.109948  5.296497  
8              0.109948  5.099089  
9              0.109948  3.800111  

target result , depth 2
    quality                           subgroup  size_sg  size_dataset  \
0  0.127191                          kills>=21     5053         15862   
1  0.125363                        assists>=49     4959         15862   
2  0.116946       is_long==False AND kills>=21     4208         15862   
3  0.114456     assists>=49 AND is_long==False     4073         15862   
4  0.111115                    jng_assists>=11     4779         15862   
5  0.109192                     adc_assists>=8     5242         15862   
6  0.108971    assists>=49 AND is_short==False     4437         15862   
7  0.107332      is_short==False AND kills>=21     4415         15862   
8  0.107300          assists>=49 AND kills>=21     4122         15862   
9  0.101973  adc_assists>=8 AND is_long==False     4361         15862   

   positives_sg  positives_dataset  size_complement  relative_size_sg  \
0          4544               7931            10809          0.318560   
1          4468               7931            10903          0.312634   
2          3959               7931            11654          0.265288   
3          3852               7931            11789          0.256777   
4          4152               7931            11083          0.301286   
5          4353               7931            10620          0.330475   
6          3947               7931            11425          0.279725   
7          3910               7931            11447          0.278338   
8          3763               7931            11740          0.259866   
9          3798               7931            11501          0.274934   

   relative_size_complement  coverage_sg  coverage_complement  \
...
6         0.889565                 0.348709                   0.5  1.779130  
7         0.885617                 0.351271                   0.5  1.771234  
8         0.912906                 0.355026                   0.5  1.825813  
9         0.870901                 0.359360                   0.5  1.741802  
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

result targer with 5 depth and champion only selectors 
    quality             subgroup  size_sg  size_dataset  positives_sg  \
0  0.004413       mid=='Taliyah'     1120         15862           630   
1  0.002995         bot=='Varus'     1527         15862           811   
2  0.002143        top=='Rumble'     1206         15862           637   
3  0.001608          mid=='Yone'      741         15862           396   
4  0.001576        jng=='Maokai'      710         15862           380   
5  0.001482         bot=='Xayah'      541         15862           294   
6  0.001482        bot=='Kai'Sa'      951         15862           499   
7  0.001450         sup=='Braum'      838         15862           442   
8  0.001387           top=='Jax'      514         15862           279   
9  0.001355  bot=='Miss Fortune'     1009         15862           526   

   positives_dataset  size_complement  relative_size_sg  \
0               7931            14742          0.070609   
1               7931            14335          0.096268   
2               7931            14656          0.076031   
3               7931            15121          0.046715   
4               7931            15152          0.044761   
5               7931            15321          0.034107   
6               7931            14911          0.059955   
7               7931            15024          0.052831   
8               7931            15348          0.032404   
9               7931            14853          0.063611   

   relative_size_complement  coverage_sg  coverage_complement  \
...
6         0.524711                 0.498424                   0.5  1.049422  
7         0.527446                 0.498469                   0.5  1.054893  
8         0.542802                 0.498567                   0.5  1.085603  
9         0.521308                 0.498552                   0.5  1.042616  
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

target result with 20 result_set_size
     quality                                           subgroup  size_sg  \
0   0.127191                                          kills>=21     5053   
1   0.125363                                        assists>=49     4959   
2   0.116946                       is_long==False AND kills>=21     4208   
3   0.114456                     assists>=49 AND is_long==False     4073   
4   0.111115                                    jng_assists>=11     4779   
5   0.109192                                     adc_assists>=8     5242   
6   0.108971                    assists>=49 AND is_short==False     4437   
7   0.107332                      is_short==False AND kills>=21     4415   
8   0.107300                          assists>=49 AND kills>=21     4122   
9   0.101973                  adc_assists>=8 AND is_long==False     4361   
10  0.101311                 is_long==False AND jng_assists>=11     3910   
11  0.099452                                       adc_kills>=7     4667   
12  0.097245    jng_damagetakenperminute>=1070.59 AND kills>=21     3969   
13  0.096772                          kills>=21 AND playoffs==0     3828   
14  0.095606                is_short==False AND jng_assists>=11     4281   
15  0.095165  assists>=49 AND jng_damagetakenperminute>=1070.59     3871   
16  0.094881                        assists>=49 AND playoffs==0     3752   
17  0.094723                    assists>=49 AND jng_assists>=11     3695   
18  0.094093                     adc_assists>=8 AND assists>=49     3731   
19  0.093715                             jng_earnedgold>=8507.0     4815   

    size_dataset  positives_sg  positives_dataset  size_complement  \
0          15862          4544               7931            10809   
1          15862          4468               7931            10903   
...
16                   0.5  1.802239  
17                   0.5  1.813261  
18                   0.5  1.800054  
19                   0.5  1.617445  
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

result = False so loss as target and rest normal 
    quality                                          subgroup  size_sg  \
0  0.127506                                        deaths>=21     5075   
1  0.117293                     deaths>=21 AND is_long==False     4229   
2  0.107553                    deaths>=21 AND is_short==False     4434   
3  0.106418                                     jng_deaths>=5     4838   
4  0.100113                                     adc_deaths>=4     5130   
5  0.099798  deaths>=21 AND jng_damagetakenperminute>=1070.59     4074   
6  0.098821                  is_long==False AND jng_deaths>=5     4069   
7  0.096993                        deaths>=21 AND playoffs==0     3845   
8  0.093525                      deaths>=21 AND jng_deaths>=5     3539   
9  0.092769                  adc_deaths>=4 AND is_long==False     4371   

   size_dataset  positives_sg  positives_dataset  size_complement  \
0         15862          4560               7931            10787   
1         15862          3975               7931            11633   
2         15862          3923               7931            11428   
3         15862          4107               7931            11024   
4         15862          4153               7931            10732   
5         15862          3620               7931            11788   
6         15862          3602               7931            11793   
7         15862          3461               7931            12017   
8         15862          3253               7931            12323   
9         15862          3657               7931            11491   

   relative_size_sg  relative_size_complement  coverage_sg  \
...
6                   0.5  1.770460  
7                   0.5  1.800260  
8                   0.5  1.838372  
9                   0.5  1.673301  
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

same as above but ignoring deaths
    quality                                           subgroup  size_sg  \
0  0.106418                                      jng_deaths>=5     4838   
1  0.100113                                      adc_deaths>=4     5130   
2  0.098821                   is_long==False AND jng_deaths>=5     4069   
3  0.092769                   adc_deaths>=4 AND is_long==False     4371   
4  0.089522                  is_short==False AND jng_deaths>=5     4272   
5  0.086622                                      kills: [7:12[     3066   
6  0.085960  jng_damagetakenperminute>=1070.59 AND jng_deat...     4009   
7  0.084731                                 earnedgold<29602.0     2690   
8  0.084731              earnedgold<29602.0 AND is_long==False     2690   
9  0.082713                   is_long==False AND kills: [7:12[     2920   

   size_dataset  positives_sg  positives_dataset  size_complement  \
0         15862          4107               7931            11024   
1         15862          4153               7931            10732   
2         15862          3602               7931            11793   
3         15862          3657               7931            11491   
4         15862          3556               7931            11590   
5         15862          2907               7931            12796   
6         15862          3368               7931            11853   
7         15862          2689               7931            13172   
8         15862          2689               7931            13172   
9         15862          2772               7931            12942   

   relative_size_sg  relative_size_complement  coverage_sg  \
...
6                   0.5  1.680220  
7                   0.5  1.999257  
8                   0.5  1.999257  
9                   0.5  1.898630  
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...