In [1]:
import pandas as pd 
import numpy as np 
import seaborn as sns
import plotly.express as px
 

###**Background info**

**Brazilian Jiu-Jitsu (BJJ)** is a martial art that focuses on wrestling or throwing an opponent to the ground. While on the ground, the athlete aims to achieve a submission victory. A submission victory occurs when an athlete puts his opponent into a joint lock or a choke, and the submitted opponent signals defeat by *tapping on his opponent or the mat.*

[![BJJ match](https://upload.wikimedia.org/wikipedia/commons/2/22/GABRIEL_VELLA_vs_ROMINHO_51.jpg "The athlete on the bottom has caught his opponent in a choke, click for more info")](https://en.wikipedia.org/wiki/Brazilian_jiu-jitsu)

This data visualization aims to perform a data analysis to find the most common types of submissions that occurs in one of the most popular BJJ competitions, the *Abu-Dhabi Combat Club Submission Fighting World Championship (ADCC).*

This data analysis will be useful to anyone practicing BJJ as it will help them understand which are the most common submissions that are succesful in the sport. They can then focus their efforts on learning these submissions. 

A separate app was built to present the data in an easy-to-understand and engaging way.

### Data source
The raw CSV file was obtained from Kaggle.com

https://www.kaggle.com/datasets/bjagrelli/adcc-historical-dataset

Now we have to clean the data and prepare it.

In [2]:
df = pd.read_csv(r'C:\Users\lizhi\OneDrive\Desktop\Portfolio\ADCC data viz\adcc_historical_data.csv', sep = ';')

The dataframe should first be checked to ensure that it read the CSV file correctly.

In [3]:
df.head(15)

Unnamed: 0,match_id,winner_id,winner_name,loser_id,loser_name,win_type,submission,winner_points,loser_points,adv_pen,weight_class,sex,stage,year
0,3314,484,Murilo Santana,733,Vinicius Magalhaes,DECISION,,-1,-1,,ABS,M,4F,2011
1,35049,7507,Nicholas Meregali,9554,Henrique Cardoso,SUBMISSION,Kimura,-1,-1,,99KG,M,R1,2022
2,35053,7507,Nicholas Meregali,1740,Yuri Simoes,DECISION,,-1,-1,,99KG,M,4F,2022
3,35057,7507,Nicholas Meregali,576,Rafael Lovato Jr,POINTS,,0,0,PEN,99KG,M,3RD,2022
4,35096,7507,Nicholas Meregali,11797,Giancarlo Bodoni,POINTS,,6,2,,ABS,M,4F,2022
5,35100,7507,Nicholas Meregali,12003,Tye Ruotolo,DECISION,,-1,-1,,ABS,M,SF,2022
6,21816,12110,Nick Rodriguez,5842,Mahamed Aly,DECISION,,-1,-1,,+99KG,M,R1,2019
7,21822,12110,Nick Rodriguez,2452,Orlando Sanchez,POINTS,,0,0,PEN,+99KG,M,4F,2019
8,21883,12110,Nick Rodriguez,224,Roberto Abreu,DECISION,,-1,-1,,+99KG,M,SF,2019
9,35071,12110,Nick Rodriguez,2416,Felipe Pena,POINTS,,3,0,,+99KG,M,SF,2022


The match_id, winner_id, loser_id, winner_name and loser_name will be dropped from the dataframe as we would assign equal weightage to all matches (i.e. the superstars' matches will be treated the same as lesser known athletes).

In [4]:
df.drop(['match_id','winner_id', 'loser_id', 'winner_name','loser_name'], axis=1, inplace=True)
df.head(15)

Unnamed: 0,win_type,submission,winner_points,loser_points,adv_pen,weight_class,sex,stage,year
0,DECISION,,-1,-1,,ABS,M,4F,2011
1,SUBMISSION,Kimura,-1,-1,,99KG,M,R1,2022
2,DECISION,,-1,-1,,99KG,M,4F,2022
3,POINTS,,0,0,PEN,99KG,M,3RD,2022
4,POINTS,,6,2,,ABS,M,4F,2022
5,DECISION,,-1,-1,,ABS,M,SF,2022
6,DECISION,,-1,-1,,+99KG,M,R1,2019
7,POINTS,,0,0,PEN,+99KG,M,4F,2019
8,DECISION,,-1,-1,,+99KG,M,SF,2019
9,POINTS,,3,0,,+99KG,M,SF,2022


Athletes can normally win in 3 different type of ways: by submission, by points obtained from getting into advantageous positions, or by advantages/penalties given through the referee's discretion.

We will attempt to see which type of victory is most common.

First, we need to know how many entries do we have.

In [5]:
df.shape

(1028, 9)

We have a total of **1028 entries**, next is to check for missing or invalid data.

In [6]:
df.isnull().sum()

win_type           0
submission       628
winner_points      0
loser_points       0
adv_pen          999
weight_class       0
sex                0
stage              0
year               0
dtype: int64

We notice that there are lots of missing values for the 'submission' column that shows the type of submission that was applied to win the match, and the 'adv_pen' column that shows whether the match was won by an advantage or penalty that was given by the referee. This may not necessarily be an error in the data, as the 'submission' column will remain blank if the match was won by points or advantage/penalty and the 'adv_pen' column will only be filled up if the athlete won via advantage/penalty.

In [7]:
df.value_counts('submission').sum()

400

We can see that the sum of null values in the 'submission' column with the amount of submissions recorded is **1028**.

In [8]:
df.value_counts('win_type')

win_type
POINTS              520
SUBMISSION          402
DECISION             97
INJURY                8
DESQUALIFICATION      1
dtype: int64

However, an inspection of the value_counts() for the types of wins reveealed that there are 402 wins by submissions instead. The type of submission for 2 matches where the winner won by submission were not recorded.

In [9]:
df[(df['win_type']=='SUBMISSION') & df['submission'].isna()]

Unnamed: 0,win_type,submission,winner_points,loser_points,adv_pen,weight_class,sex,stage,year
515,SUBMISSION,,-1,-1,,+60KG,F,F,2013
883,SUBMISSION,,-1,-1,,99KG,M,R1,2015


We have found the 2 anomalous records. Since we cannot be sure of the actual data for these 2 matches, we will drop them from the dataframe. The loss of **2** matches out of **1028** matches is expected to not have a significant value on the visualization.

In [10]:
df.drop(index=[515,883],inplace=True)

Next, we would attempt to group the submissions into families of techniques based on which part of the body they attack: the neck, the arms or the legs.

In [11]:
df.groupby('submission').count()

Unnamed: 0_level_0,win_type,winner_points,loser_points,adv_pen,weight_class,sex,stage,year
submission,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Americana,3,3,3,0,3,3,3,3
Anaconda,1,1,1,0,1,1,1,1
Armbar,68,68,68,0,68,68,68,68
Calf slicer,2,2,2,0,2,2,2,2
Choke,11,11,11,0,11,11,11,11
Cross face,1,1,1,0,1,1,1,1
D'arce choke,6,6,6,0,6,6,6,6
Dogbar,1,1,1,0,1,1,1,1
Estima lock,1,1,1,0,1,1,1,1
Ezekiel,1,1,1,0,1,1,1,1


A look at the varieties of submissions performed at ADCC revealed two new issues:

1. We can see that heel hooks are seperated into 3 categories: heel hooks, inside heel hooks and outside heel hooks. All heel hooks are supposed to be classified into inside heel hooks or outside heel hooks.

![Inside heel hook](https://bjj.tv/wp-content/uploads/2020/08/Inside-Heel-Hook.jpg "Inside heel hook")

*Inside heel hooks turns the heel towards the centre of the body*

![Outside heel hook](https://grapplinginsider.com/wp-content/uploads/2021/03/Dean-Lister.jpg "Outside heel hook")

*Outside heel hooks turns the heel towards the outside of the body*

2. Entries such as "Submission" or "Verbal tap" does not give any idea of what submissions were performed and does not contribute to clarity about which types of submission are favoured in ADCC.

We will do the following to solve these two problems:

1. Inside heel hooks and outside heel hooks will be classified as heel hooks and merged with the "Heel hook" entry.

2. Entries such as "Submission" or "Verbal tap" will be removed dropped.


In [19]:
#Issue number 1
df['submission'].replace(['Inside heel hook', 'Outside heel hook'], 'Heel hook', inplace=True)
#Issue number 2
mask=df[(df['submission']=='Submission') | (df['submission']=='Verbal tap')]
df.drop(mask.index,inplace=True, axis=0)

Unnamed: 0_level_0,win_type,winner_points,loser_points,adv_pen,weight_class,sex,stage,year
submission,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Americana,3,3,3,0,3,3,3,3
Anaconda,1,1,1,0,1,1,1,1
Armbar,68,68,68,0,68,68,68,68
Calf slicer,2,2,2,0,2,2,2,2
Choke,11,11,11,0,11,11,11,11
Cross face,1,1,1,0,1,1,1,1
D'arce choke,6,6,6,0,6,6,6,6
Dogbar,1,1,1,0,1,1,1,1
Estima lock,1,1,1,0,1,1,1,1
Ezekiel,1,1,1,0,1,1,1,1
