<a href="https://colab.research.google.com/github/ncwalker59/hello-world/blob/main/Penalty%20Kicks%20Behavioral%20Analysis%20Capstone%20Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Penalty Kick Behavior Analysis**

## Link to entire cleaned Excel file:  
##### https://drive.google.com/file/d/1hANXgikba5vRu25asCLQZ2thzFxbA9dP/view?usp=sharing

### Mounting the Raw Dataset

In [7]:

import pandas as pd                     
import matplotlib.pyplot as plt          # plotting
import numpy as np                       # dense matrices
from scipy.sparse import csr_matrix      # sparse matrices
%matplotlib inline



from google.colab import drive

# see whole dataset
pd.set_option('display.max_rows', 500)

#mount dataset
drive.mount('/content/gdrive')
df = pd.read_csv('/content/gdrive/My Drive/Colab Datasets/WorldCupPKs.csv')




Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


##Background and Audience

The penalty kick is one of the most exciting and vital moments in a football match.  When a player from the attacking team is fouled in the defending team's penalty box, the attacking team is allowed a free shot from 12 yards in front of the defending team's goal.  The only thing standing between the shooter and the goal is the defending team's goalkeeper, who is responsible for blocking the shot.  In most competition levels, the goalkeeper does not have time to react to the shot because of the penalty-taker's proximity and the goal's large size.  Instead, the goalkeeper must guess where the penalty-taker will kick the ball.

Football is a low-scoring sport, and the outcome of a penalty kick often determines a match's outcome.   At the professional level, goalkeepers can study film and data of their opponent's tendencies.  However, goalkeepers in the youth and amateur level do not have access to the same resources and often rely on pure luck to guess where a player will direct their penalty kick.  

In my capstone project, I intend to identify penalty kick-takers' tendencies so that goalkeepers of all levels will make better-educated decisions when facing penalty kicks.  This information will provide value to coaches and players at the youth, high school, collegiate, and amateur levels.


##The Data
I have chosen to utilize a dataset compiled from every Penalty taken between the 1982 and 2018 World Cups. The original dataset contained 9 variables, and 305 rows.  24 of these rows contained null values, so I refined my database to 281 rows.  The independant variable for my analysis is the 'Foot' column, which is a categorical variable displaying which foot a player uses to take a penalty kick. I have selected this variable, because this will be the easiest visual indicator a goalkeeper will be able to use to predict a penalty kick's direction.  The independant variable is a new column called 'Zone_New' which I derived from the 'Zone' column.  This is a categorical variable which simplifies the zone in which a penalty kick is directed into 3 areas: left, center, and right.  I have chosen to simplify this variable into 3 areas from the original 9, as a goalkeeper usually chooses a side (or to stay in the middle), rather than a specific area to dive towards, and is able to react to the balls specific trajectory in air.  I believed that there would be a relationship between a penalty-taker's preferred foot, and the zone in which he or she would direct his or her kick.  I also examined a moderating variable, found in the 'Elimination' column, which dictated whether the observed kick directly determined the elimination of the player's team or not.  I hypothesized that the level of pressure would influence a kick-taker's behavior, and thus the correlation between the independant and dependant variable.  A control variable I used for this experiment was the 'OnTarget' variable, as I only wanted to examine kicks which were directed on goal.  I reasoned that goalkeepers would not need to worry about kicks which missed the goal.
- **Independant Variable: 'Foot'** (Categorical)
- **Dependant Variable: 'Zone_New'** Categorical)
- **Moderating Variable: 'Elimination'** (Boolean)
- **Control Variable: 'OnTarget'** (Boolean)




---



In [15]:
#raw data table
print(df)



     Zone Foot Keeper  OnTarget  Goal  Elimination Zone_Simple
0       7    R      R         1     1            0           L
1       9    R      C         1     1            0           R
2       6    R      L         1     1            0           R
3       2    R      C         1     1            0           C
4       9    R      L         1     1            0           R
5       4    R      L         1     0            0           L
6       8    L      L         1     0            0           C
7       3    R      R         1     1            0           R
8       9    R      L         1     1            0           R
9       9    R      C         1     1            1           R
10      7    R      L         1     0            0           L
11      9    R      C         1     1            1           R
12      4    R      L         1     0            0           L
13      2    R      R         1     1            0           C
14      6    R      R         1     1            0     

## Graphic Visualization of Data (by Foot type)
- Left: Purple (L)
- Center: Green (C)
- Right: Red (R)

This representation shows that left-footed penalty takers prefer the right side of the goal, while right-footed kickers prefer the left side of goal

In [10]:
# bar graph of all PKs on target 
import plotly.express as px
df = pd.read_csv('/content/gdrive/My Drive/Colab Datasets/WorldCupPKs.csv')
fig = px.bar(df, x='Foot', y='OnTarget', color='Zone_Simple')
fig.show()

#Methodology
## Chi-Squared Tests:

I used a chi-squared test to examine whether the correlation between a player's dominant foot and their kick taking behavior could be attributed to randomness.  The chi-squared test is a metric which utilizes observed and expected values from a sample to assess whether their is a relationship between the independant and dependant variables.  I performed this chi-squared test for all kicks on target, then for only elimination kicks, to see if the pressure of an elimination kick resulted in a stronger correlation to certain behaviors.

The null hypothesis, or H0, is that no relationship exists between the 'Foot' and 'Zone_Simple' variables.  In other words, the target zone is independant of the kick taker's kicking foot.

The alternate hypothesis, or H1, asserts that there is a relationship between 'Foot' and 'Zone_Simple' variables.  This means that the target zone is dependant on a kick taker's kicking foot.

I will test both of these hypothises for all PKs on goal, then I will perform the same test for PKs on goal during elimination situations to determine if the 'Elimination' variable impacts the correlation between the independant and dependant variable.


In [16]:
#chi-squared tests
from scipy.stats import chi2_contingency
from scipy.stats import chi2

# chi-squared test for PKs on target
table = [	[19, 12, 21],
			[96,  38,  70]]
print(table)
stat, p, dof, expected = chi2_contingency(table)
print('dof=%d' % dof)
print(expected)
# interpret test-statistic
prob = 0.95
critical = chi2.ppf(prob, dof)
print('probability=%.3f, critical=%.3f, stat=%.3f' % (prob, critical, stat))
if abs(stat) >= critical:
	print('Dependent (reject H0)')
else:
	print('Independent (fail to reject H0)')
# interpret p-value
alpha = 1.0 - prob
print('significance=%.3f, p=%.3f' % (alpha, p))
if p <= alpha:
	print('Dependent (reject H0)')
else:
	print('Independent (fail to reject H0)')
 
# chi-squared test for elimination PKs on target
table = [	[1, 3, 3],
			[12,  3,  10]]
print(table)
stat, p, dof, expected = chi2_contingency(table)
print('dof=%d' % dof)
print(expected)
# interpret test-statistic
prob = 0.95
critical = chi2.ppf(prob, dof)
print('probability=%.3f, critical=%.3f, stat=%.3f' % (prob, critical, stat))
if abs(stat) >= critical:
	print('Dependent (reject H0)')
else:
	print('Independent (fail to reject H0)')
# interpret p-value
alpha = 1.0 - prob
print('significance=%.3f, p=%.3f' % (alpha, p))
if p <= alpha:
	print('Dependent (reject H0)')
else:
	print('Independent (fail to reject H0)')
print('*tables indicate observed values*\n')



[[19, 12, 21], [96, 38, 70]]
dof=2
[[23.359375 10.15625  18.484375]
 [91.640625 39.84375  72.515625]]
probability=0.950, critical=5.991, stat=1.871
Independent (fail to reject H0)
significance=0.050, p=0.392
Independent (fail to reject H0)
[[1, 3, 3], [12, 3, 10]]
dof=2
[[ 2.84375  1.3125   2.84375]
 [10.15625  4.6875  10.15625]]
probability=0.950, critical=5.991, stat=4.318
Independent (fail to reject H0)
significance=0.050, p=0.115
Independent (fail to reject H0)
*tables indicate observed values*

Concusion: Right-footed kickers appear to prefer the left side of goal, while left-footed kickers tend to prefer the right side 
 a p-value > 0.05 indicates this correlation could be attributed to randomness, but the findings were stronger in elimination situations


# Results

## All Kicks on Target

**Chi-Square Test p value = 0.392**

A p-value > 0.05 indicates this correlation could be attributed to randomness

## Elimination Situations
**Chi-Squared Test p value = 0.115**

The p-value in elimination situations is smaller, leading me to conclude their is a stronger relationship between a kick-taker's kicking foot and Penalty Kick behavior during high- pressure situations.  However, the p-value is still greater than 0.05, so this test is also not statistically significant

In both tests, my results failed to reject the null hypothesis that there is no correlation between the variables

# Conclusion

Although I was unable to find a statistically significant relationship between a Penalty Kick-taker's dominant foot and shooting preference, I believe this data provides a starting point for further research.

In the future, I would like to conduct further observations of penalty kicks at the amateur, collegiate, and high school level to investigate if there is a different behavior of kick-takers at different levels of play.  This will provide value to my intended audience who are not professional players like the ones in the dataset I used for this project.

If I am able to find a strong predictor of kick taking preferences, I will be able to share this research with non-professional goalkeepers, who don't have the resources to scout their opponents.  As no such research currently exists, this information will help lower-level goalkeeper coaches and players correctly anticipate the direction of penalty kicks, and therefore become stronger players.

# Data Source
https://www.kaggle.com/pablollanderos33/world-cup-penalty-shootouts