In [8]:
import pandas as pd 
import numpy as np 
import seaborn as sns
import plotly as plt
 

###**Background info**

**Brazilian Jiu-Jitsu (BJJ)** is a martial art that focuses on wrestling or throwing an opponent to the ground. While on the ground, the athlete aims to achieve a submission victory. A submission victory occurs when an athlete puts his opponent into a joint lock or a choke, and the submitted opponent signals defeat by *tapping on his opponent or the mat.*

[![BJJ match](https://upload.wikimedia.org/wikipedia/commons/2/22/GABRIEL_VELLA_vs_ROMINHO_51.jpg "The athlete on the bottom has caught his opponent in a choke, click for more info")](https://en.wikipedia.org/wiki/Brazilian_jiu-jitsu)

This data visualization aims to perform a data analysis to find the most common types of submissions that occurs in one of the most popular BJJ competitions, the *Abu-Dhabi Combat Club Submission Fighting World Championship (ADCC).*

This data analysis will be useful to anyone practicing BJJ as it will help them understand which are the most common submissions that are succesful in the sport. They can then focus their efforts on learning these submissions. 

A separate app was built to present the data in an easy-to-understand and engaging way.

### Data source
The raw CSV file was obtained from Kaggle.com

https://www.kaggle.com/datasets/bjagrelli/adcc-historical-dataset

In [14]:
df = pd.read_csv(r'C:\Users\lizhi\OneDrive\Desktop\Portfolio\ADCC data viz\adcc_historical_data.csv', sep = ';')

The dataframe should first be checked to ensure that it read the CSV file correctly.

In [15]:
df.head(15)

Unnamed: 0,match_id,winner_id,winner_name,loser_id,loser_name,win_type,submission,winner_points,loser_points,adv_pen,weight_class,sex,stage,year
0,3314,484,Murilo Santana,733,Vinicius Magalhaes,DECISION,,-1,-1,,ABS,M,4F,2011
1,35049,7507,Nicholas Meregali,9554,Henrique Cardoso,SUBMISSION,Kimura,-1,-1,,99KG,M,R1,2022
2,35053,7507,Nicholas Meregali,1740,Yuri Simoes,DECISION,,-1,-1,,99KG,M,4F,2022
3,35057,7507,Nicholas Meregali,576,Rafael Lovato Jr,POINTS,,0,0,PEN,99KG,M,3RD,2022
4,35096,7507,Nicholas Meregali,11797,Giancarlo Bodoni,POINTS,,6,2,,ABS,M,4F,2022
5,35100,7507,Nicholas Meregali,12003,Tye Ruotolo,DECISION,,-1,-1,,ABS,M,SF,2022
6,21816,12110,Nick Rodriguez,5842,Mahamed Aly,DECISION,,-1,-1,,+99KG,M,R1,2019
7,21822,12110,Nick Rodriguez,2452,Orlando Sanchez,POINTS,,0,0,PEN,+99KG,M,4F,2019
8,21883,12110,Nick Rodriguez,224,Roberto Abreu,DECISION,,-1,-1,,+99KG,M,SF,2019
9,35071,12110,Nick Rodriguez,2416,Felipe Pena,POINTS,,3,0,,+99KG,M,SF,2022


The match_id, winner_id, loser_id, winner_name and loser_name will be dropped from the dataframe as we would assign equal weightage to all matches (i.e. the superstars' matches will be treated the same as lesser known athletes).

In [16]:
df.drop(['match_id','winner_id', 'loser_id', 'winner_name','loser_name'], axis=1, inplace=True)
df.head(15)

Unnamed: 0,win_type,submission,winner_points,loser_points,adv_pen,weight_class,sex,stage,year
0,DECISION,,-1,-1,,ABS,M,4F,2011
1,SUBMISSION,Kimura,-1,-1,,99KG,M,R1,2022
2,DECISION,,-1,-1,,99KG,M,4F,2022
3,POINTS,,0,0,PEN,99KG,M,3RD,2022
4,POINTS,,6,2,,ABS,M,4F,2022
5,DECISION,,-1,-1,,ABS,M,SF,2022
6,DECISION,,-1,-1,,+99KG,M,R1,2019
7,POINTS,,0,0,PEN,+99KG,M,4F,2019
8,DECISION,,-1,-1,,+99KG,M,SF,2019
9,POINTS,,3,0,,+99KG,M,SF,2022


Athletes can normally win in 3 different type of ways: by submission, by points obtained from getting into advantageous positions, or by advantages/penalties given through the referee's discretion.

We will attempt to see which type of victory is most common.

First, we need to know how many entries do we have.

In [21]:
df.shape

(1028, 9)

We have a total of **1028 entries**, next is to check for missing or invalid data.

In [22]:
df.isnull().sum()

win_type           0
submission       628
winner_points      0
loser_points       0
adv_pen          999
weight_class       0
sex                0
stage              0
year               0
dtype: int64

We notice that there are lots of missing values for the 'submission' column that shows the type of submission that was applied to win the match, and the 'adv_pen' column that shows whether the match was won by an advantage or penalty that was given by the referee. This may not necessarily be an error in the data, as the 'submission' column will remain blank if the match was won by points or advantage/penalty and the 'adv_pen' column will only be filled up if the athlete won via advantage/penalty.

In [27]:
df.value_counts('submission').sum()

400

In [28]:
df.value_counts('win_type')

win_type
POINTS              520
SUBMISSION          402
DECISION             97
INJURY                8
DESQUALIFICATION      1
dtype: int64

We would expect victories by points to be more common as it is more difficult to catch your opponent in a submission rather than 