# PART 2: DATA ANALYSIS AND VISUALIZATION

## Imports

In [19]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

# Reading in the dataframe

In [2]:
players = pd.read_csv('FAWSL_CDM.csv')

players

Unnamed: 0,player,passes,incomplete_passes,pass_accuracy,assists,interceptions,ints_per_match,recoveries,recs_per_match,dribbles,dribble_success_rate,duels,duel_success_rate,miscontrols,fouls
0,Keira Walsh,1335,149,88.838951,2,24,1.846154,113,5.65,33,54.545455,51,64.705882,18,15
1,Lia Wälti,1245,163,86.907631,1,22,1.692308,112,6.222222,28,85.714286,82,47.560976,10,10
2,Angharad James,1007,217,78.450844,1,52,2.888889,159,7.227273,26,61.538462,104,44.230769,20,15
3,Katie Zelem,946,195,79.386892,2,27,2.25,126,7.411765,29,86.206897,57,47.368421,13,16
4,Melanie Leupolz,817,121,85.189718,0,22,2.0,83,4.368421,15,86.666667,61,52.459016,26,35
5,Jessica Fishlock,763,177,76.802097,3,32,2.133333,146,7.684211,35,60.0,63,49.206349,31,15
6,Sophie Louise Ingle,760,109,85.657895,1,15,2.142857,66,4.125,6,83.333333,37,43.243243,6,23
7,Jackie Groenen,735,156,78.77551,3,17,1.545455,135,6.428571,38,71.052632,48,72.916667,29,20
8,Jill Scott,683,125,81.698389,1,19,2.375,91,5.6875,31,67.741935,67,46.268657,29,10
9,Alanna Stephanie Kennedy,678,150,77.876106,0,26,2.363636,100,5.555556,30,76.666667,56,33.928571,37,18


# Passing

In [17]:
fig = px.scatter(players, x="pass_accuracy", y="assists", size='passes', color="player")
fig.show()

- Passing is a very important attribute for a CDM. A high passing accuracy for a CDM suggests that the defensive midfielder is good at recycling possession and picking out accurate passes to playmakers or other teammates. While assists aren't the most important statistic for a CDM, it can show how adept the CDM is at making the decisive pass if need be.



- With the above in mind, Keira Walsh, Lia Walti and Sophie Louise Ingle really stand out for passing. Jackie Groenen has a higher number of assists with a lower passing accuracy relative to the aforementioned players and Melanie Leupolz has a high passing accuracy but no assists.

## Breaking up opposition play

In [24]:
fig = px.scatter(players, x="ints_per_match", y="recs_per_match", size='recoveries', color="player",
                 labels = dict(ints_per_match = 'Interceptions Per Match', recs_per_match='Recoveries Per Match'))
fig.show()

- Breaking up opposition play through interceptions and recoveries is also important to have for a CDM. Higher numbers in those areas suggest more productivity in the defensive aspect of the game. It is also important to note that a possession oriented team is less likely to have players making constant interceptions and recoveries due to having the ball so much already. Players playing in those teams are likely to have highern numbers in passing stats. For example looking at the two graphs above, it can be seen that Keira Walsh has the highest number of passes and the highest passing accuracy but relatively lower numbers in interceptions and recoveries per match. This might suggest Manchester City playing a more possession oriented style. Another similar example is Lia Walti whose team might also be playing possession football.




- Sometimes teams like to deploy with two CDMs working together in a pivot with each having slightly differing roles. For example Katie Zelem's high numbers in interceptions and recoveries per match alongside Jackie Groenen's higher number of assists and dribbles might suggest that Manchester United operate in a pivot with Groenen as more of a playmaker while Zelem is more focused on breaking up opposition play.




- With the above in mind, Angharad James, Katie Zelem, Jessica Fishlock and Christie Murray show impressive numbers in interceptions and recoveries per match.

## Winning the ball and winning with the ball

In [27]:
fig = px.scatter(players, x="dribble_success_rate", y="duel_success_rate", size='duels', color="player")
fig.show()

- Dribbling is also an important consideration for a CDM. Dribbling can help the CDM out of a tight spot or create some space to be able to make a good pass. Being able to win duels is also another factor for how good a CDM is.



- Players displaying a good dribbling success rate are Melanie Leupolz, Katie Zelem, Lia Walti and Sophie Louise Ingle and players displaying a good duel success rate are Jackie Groenen and Keira Walsh with the former displaying decent dribbling success rate as well.

## Mistakes

In [29]:
fig = px.scatter(players, x="fouls", y="miscontrols", size='fouls', color="player")
fig.show()

- Miscontrols and fouls help show a CDM's temparment, discipline and how often they make mistakes when under pressure for example. A CDM that makes less miscontrols and less fouls which is bound to lead to less bookings is obviously desirable.



- The above graph shows Melanie Leupolz making an abnormally high amount of fouls compared to her peers (12 more than the second highest) while she also has a decently high number of miscontrols. Alanna Stephanie Kennedy has a high number of miscontrols while Jackie Groenen has a decently high number for both fouls and miscontrols.



- Lia Walti and Christie Murray seem to have the lowest number for fouls and miscontrols with Katie Zelem and Keira Walsh following closely.

## Conclusion

Looking at the above graphs and analysis, the top 3 CDMs in the FA WSL would be the following:

### - Keira Walsh

- Walsh has obvious strengths in passing accuracy, total passes and being able to pop up with an occasional assist. She also has one of the best duel win rates out of her peers and has shown good discipline and temparement with the small number of fouls and miscontrols.

- She could potentially do better in interceptions and recoveries but it is also worth noting those numbers can be lowered due to a potentially her team dominating the ball more. That being said, her dribbling success rate could also be better.

- I would definitely consider signing Walsh if my team needed a CDM that could dictate the tempo of the game and win duels.




### - Lia Walti

- Walti has a similar profile to Keira Walsh. This can be seen by a high number of passes attempted and the second highest passing accuracy right after Walsh. Furthermore Walti shows excellent discipline and temparment with the lowest total miscontrols and fouls amongst her selected peers.

- She also has a great dribble success rate while her duel success rate leaves a bit more to be desired.

- While Walti completes a slightly lower amount of passes than Walsh, she more than makes up for it with her ability to dribble. Even though her duel success rate could be better, there is no question that Walti is a calming presence in the midfield which is crucial for a CDM.




### - Katie Zelem

- Katie Zelem is a bit of a different profile compared to Walti and Walsh. Zelem is more of an industrious mould of CDM that boasts a high number of total recoveries and interceptions per match compared to the other two. She also has a great dribble success rate to go with that and not to mention good discipline with one of the lowest total number of fouls and miscontrols

- While she seems to get the occasional assist, her passing accuracy and duel sucess rate could be higher.

- Zelem seems to be a great CDM whose main role is to break up opposition play, recycle possession and then make the safe pass to a more creative minded teammate.

## Limitations of the study

- This study is missing important context in the form of domain knowledge of the teams the specific players play in and their style of play. As discussed elsewhere in this study for example, a team that dominatest the ball is less likely to have players with really high number of interceptions and recoveries. Having that knowledge would allow to evaluate the CDMs better as for example Angharad James is a massive outlier in the breaking up opposition play section with her making a really high number of interceptions and recoveries per game compared to everyone else while not featuring as brightly in the other graphs. Knowing the strength of her team and their playstyle would give important context to those numbers and would suggest whether James is actually that good at breaking up opposition play or whether she plays in a team that consistently gets dominated and has to defend that much more.




- Player relative aggregated stats are a paid feature for statsbomb and therefore I was not able to get stats per 90 minutes played. The workaround for that was to use unique match IDs and use matches played as the metric but that was still not able to account for the fact that a player could have been subbed off mid-game and it would still count for a full match. Therefore having stats per 90 mins played would have been more superior. However, it can be reasonably inferred that the best CDMs in the league play most games barring injury and as a result play a similar amount of minutes.



- This study does not capture the mental parts of the game as well as watching the actual matches could. For example, I have looked at the number of passes and passing accuracy. In this example, this study fails to capture what was actually a smart pass that has the potential of creating a goalscoring chance eventually versus what was actually just a pass for the sake of making a pass such as a sideways or backwards pass while not under pressure. This is why it is worth noting that these studies are helpful when studied alongisde actually watching full games.