# MMA Fight Visualization

In this notebook, we will be doing some data visualization using python. We will focus on visualizing the location of UFC fights from 1993-June 2019 using geographic heatmaps. We will look at one of the fighters with the most fights during this time period and try to map out where his fights took place as well. [Data can be found here.](https://www.kaggle.com/rajeevw/ufcdata)

### Import Libraries

In [17]:
import pandas as pd
import gmplot
import numpy as np
from IPython.display import IFrame

### Read Data

In [2]:
data = pd.read_csv('data.csv')
data.head()

Unnamed: 0,R_fighter,B_fighter,Referee,date,location,Winner,title_bout,weight_class,no_of_rounds,B_current_lose_streak,...,R_win_by_KO/TKO,R_win_by_Submission,R_win_by_TKO_Doctor_Stoppage,R_wins,R_Stance,R_Height_cms,R_Reach_cms,R_Weight_lbs,B_age,R_age
0,Henry Cejudo,Marlon Moraes,Marc Goddard,2019-06-08,"Chicago, Illinois, USA",Red,True,Bantamweight,5,0.0,...,2.0,0.0,0.0,8.0,Orthodox,162.56,162.56,135.0,31.0,32.0
1,Valentina Shevchenko,Jessica Eye,Robert Madrigal,2019-06-08,"Chicago, Illinois, USA",Red,True,Women's Flyweight,5,0.0,...,0.0,2.0,0.0,5.0,Southpaw,165.1,167.64,125.0,32.0,31.0
2,Tony Ferguson,Donald Cerrone,Dan Miragliotta,2019-06-08,"Chicago, Illinois, USA",Red,False,Lightweight,3,0.0,...,3.0,6.0,1.0,14.0,Orthodox,180.34,193.04,155.0,36.0,35.0
3,Jimmie Rivera,Petr Yan,Kevin MacDonald,2019-06-08,"Chicago, Illinois, USA",Blue,False,Bantamweight,3,0.0,...,1.0,0.0,0.0,6.0,Orthodox,162.56,172.72,135.0,26.0,29.0
4,Tai Tuivasa,Blagoy Ivanov,Dan Miragliotta,2019-06-08,"Chicago, Illinois, USA",Blue,False,Heavyweight,3,0.0,...,2.0,0.0,0.0,3.0,Southpaw,187.96,190.5,264.0,32.0,26.0


It looks like the dataset contains both fighters involved in the fight, located in the columns __R_fighter__ and __B_fighter__. In order to create the heatmap, we need to get the amount of fights for each fighter. We'll do this by using a groupby on both columns, which will get the amount of times a fighter was on either side (Red or Blue). Then, we will add both columns to get the total amount of fights.

In [3]:
# Group By
r_fight = data.groupby('R_fighter').size().to_frame().reset_index()
b_fight = data.groupby('B_fighter').size().to_frame().reset_index()

r_fight.rename(columns={0:'Num_fight'}, inplace=True)
b_fight.rename(columns={0:'Num_fight'}, inplace=True)

# Join Two Columns
df_fight_count = r_fight.merge(b_fight, left_on='R_fighter', right_on='B_fighter', how='left')
df_fight_count.drop(columns='B_fighter', inplace=True)
df_fight_count.fillna(0, inplace=True)
df_fight_count['Num_fight_y'] = df_fight_count['Num_fight_y'].astype(int)
df_fight_count.head()

Unnamed: 0,R_fighter,Num_fight_x,Num_fight_y
0,Aaron Phillips,1,1
1,Aaron Riley,4,5
2,Aaron Rosa,1,2
3,Aaron Simpson,8,3
4,Abdul Razak Alhassan,2,3


In [4]:
# Add Two Columns for Total Fight Count
df_fight_count['Total_fight'] = df_fight_count['Num_fight_x'] + df_fight_count['Num_fight_y']
df_fight_count.head()

Unnamed: 0,R_fighter,Num_fight_x,Num_fight_y,Total_fight
0,Aaron Phillips,1,1,2
1,Aaron Riley,4,5,9
2,Aaron Rosa,1,2,3
3,Aaron Simpson,8,3,11
4,Abdul Razak Alhassan,2,3,5


Now that we have the total fight counts for each fighter, now we need to see which fighters have the most fights during this time period. We'll do this by using a sort_values().

In [5]:
df_fight_count.sort_values('Total_fight', ascending=False).head(10)

Unnamed: 0,R_fighter,Num_fight_x,Num_fight_y,Total_fight
343,Donald Cerrone,22,10,32
579,Jim Miller,23,9,32
562,Jeremy Stephens,11,19,30
880,Michael Bisping,21,8,29
337,Diego Sanchez,21,8,29
321,Demian Maia,21,8,29
69,Andrei Arlovski,19,10,29
450,Gleison Tibau,16,12,28
1022,Rafael Dos Anjos,14,13,27
229,Clay Guida,17,10,27


It looks like Donald Cerrone and Jim Miller are tied for the same amount of fights during this time period. To keep things simple, we will focus on Donald Cerrone for this analysis and visualization.

Now that we have figured out the fighter we will focus on, we need to get a unique list of locations in the dataset. Once we do this, we need to get latitudes and longitudes for each location in order to map them.

In [6]:
# Get a list of unique locations in the dataset
locations = list(data['location'].unique())
len(locations)

157

We have 157 unique locations that UFC fights occured in between 1993 and 2019. Now using the geopy library, we will get latitudes and longitudes associated with each. Let's see what this looks like with the location "London, UK".

In [7]:
from  geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent='loc_locator')
city ="London"
country ="Uk"
loc = geolocator.geocode(city+','+ country)
print("latitude is :-" ,loc.latitude,"\nlongtitude is:-" ,loc.longitude)

latitude is :- 51.5073219 
longtitude is:- -0.1276474


Now that we know how to get latitudes and longitudes, we will use a for loop to iterate through the list of unique locations in the dataset. Since some locations might not be found using the library, we will pass an exception handler to easily identify areas that do not have latitudes and longitudes.

In [8]:
# Create a for loop to get lat long of each location
latitudes = []
longitudes = []

for location in locations:
    try:
        loc = geolocator.geocode(location)
        latitudes.append(loc.latitude)
        longitudes.append(loc.longitude)
    except:
        latitudes.append('Error finding: {}'.format(location))
        longitudes.append('Error finding: {}'.format(location))
        continue
    
len(latitudes)

157

The length of latitudes in our list match the length of our unique locations. Let's verify that we didn't have any errors when pulling the lat longs.

In [10]:
for x, y in zip(latitudes, longitudes):
    if 'Error finding' in str(x): 
        print(x)
    elif 'Error finding' in str(y):
        print(y)

Looks like we didn't have an issue pulling lat longs using geopy. Let's turn the two lists into a dictionary with locations so we can map the values into our dataframe.

In [11]:
# Turn lists into dictionary
loc_lat_long = dict(zip(locations, zip(latitudes, longitudes)))
list(loc_lat_long.keys())[0]
list(loc_lat_long.values())[0]

(41.8755616, -87.6244212)

In [12]:
# Map lat longs to location
data['lat_long'] = data['location'].map(loc_lat_long)
data.head()

Unnamed: 0,R_fighter,B_fighter,Referee,date,location,Winner,title_bout,weight_class,no_of_rounds,B_current_lose_streak,...,R_win_by_Submission,R_win_by_TKO_Doctor_Stoppage,R_wins,R_Stance,R_Height_cms,R_Reach_cms,R_Weight_lbs,B_age,R_age,lat_long
0,Henry Cejudo,Marlon Moraes,Marc Goddard,2019-06-08,"Chicago, Illinois, USA",Red,True,Bantamweight,5,0.0,...,0.0,0.0,8.0,Orthodox,162.56,162.56,135.0,31.0,32.0,"(41.8755616, -87.6244212)"
1,Valentina Shevchenko,Jessica Eye,Robert Madrigal,2019-06-08,"Chicago, Illinois, USA",Red,True,Women's Flyweight,5,0.0,...,2.0,0.0,5.0,Southpaw,165.1,167.64,125.0,32.0,31.0,"(41.8755616, -87.6244212)"
2,Tony Ferguson,Donald Cerrone,Dan Miragliotta,2019-06-08,"Chicago, Illinois, USA",Red,False,Lightweight,3,0.0,...,6.0,1.0,14.0,Orthodox,180.34,193.04,155.0,36.0,35.0,"(41.8755616, -87.6244212)"
3,Jimmie Rivera,Petr Yan,Kevin MacDonald,2019-06-08,"Chicago, Illinois, USA",Blue,False,Bantamweight,3,0.0,...,0.0,0.0,6.0,Orthodox,162.56,172.72,135.0,26.0,29.0,"(41.8755616, -87.6244212)"
4,Tai Tuivasa,Blagoy Ivanov,Dan Miragliotta,2019-06-08,"Chicago, Illinois, USA",Blue,False,Heavyweight,3,0.0,...,0.0,0.0,3.0,Southpaw,187.96,190.5,264.0,32.0,26.0,"(41.8755616, -87.6244212)"


Now that we have the lat longs mapped into the dataframe, let's split this column into separate latitude and longitude columns.

In [13]:
# Split column into two
data.loc[:, 'lat'] = data['lat_long'].map(lambda x: x[0])
data.loc[:, 'long'] = data['lat_long'].map(lambda x: x[1])
data.head()

Unnamed: 0,R_fighter,B_fighter,Referee,date,location,Winner,title_bout,weight_class,no_of_rounds,B_current_lose_streak,...,R_wins,R_Stance,R_Height_cms,R_Reach_cms,R_Weight_lbs,B_age,R_age,lat_long,lat,long
0,Henry Cejudo,Marlon Moraes,Marc Goddard,2019-06-08,"Chicago, Illinois, USA",Red,True,Bantamweight,5,0.0,...,8.0,Orthodox,162.56,162.56,135.0,31.0,32.0,"(41.8755616, -87.6244212)",41.875562,-87.624421
1,Valentina Shevchenko,Jessica Eye,Robert Madrigal,2019-06-08,"Chicago, Illinois, USA",Red,True,Women's Flyweight,5,0.0,...,5.0,Southpaw,165.1,167.64,125.0,32.0,31.0,"(41.8755616, -87.6244212)",41.875562,-87.624421
2,Tony Ferguson,Donald Cerrone,Dan Miragliotta,2019-06-08,"Chicago, Illinois, USA",Red,False,Lightweight,3,0.0,...,14.0,Orthodox,180.34,193.04,155.0,36.0,35.0,"(41.8755616, -87.6244212)",41.875562,-87.624421
3,Jimmie Rivera,Petr Yan,Kevin MacDonald,2019-06-08,"Chicago, Illinois, USA",Blue,False,Bantamweight,3,0.0,...,6.0,Orthodox,162.56,172.72,135.0,26.0,29.0,"(41.8755616, -87.6244212)",41.875562,-87.624421
4,Tai Tuivasa,Blagoy Ivanov,Dan Miragliotta,2019-06-08,"Chicago, Illinois, USA",Blue,False,Heavyweight,3,0.0,...,3.0,Southpaw,187.96,190.5,264.0,32.0,26.0,"(41.8755616, -87.6244212)",41.875562,-87.624421


### GMPLOT
Now that we have latitudes and longitudes for each fight, now we are ready to map the fights. We will use [gmplot](https://pypi.org/project/gmplot/), which is a matplotlib-like interface for plotting data on a google map. In order to do this, we need to get our latitudes and longitudes into a list. Then we will pass those lists into a heatmap.

In [20]:
# Get Latitudes and Longitudes into a list for mapping
all_lat = data['lat'].tolist()
all_long = data['long'].tolist()

gmap = gmplot.GoogleMapPlotter(0, 0 ,2)

gmap.heatmap(all_lat, all_long)
gmap.scatter(all_lat, all_long, c='r', marker=True)

gmap.draw("all_fight.html")

In [18]:
# Display the map that we created
IFrame(src='all_fight.html', width=900, height=600)

Now that we have created a heatmap for all UFC fights, lets make one for Donald Cerrone - one of the fighters with the most fights between 1993-2019. First, we need to filter our dataframe for fights that include Cerrone. Remember, there are two fighters involved in every fight, so we will need to search both columns for cerrone.

In [19]:
# Get cerrone only
cerrone = data.loc[(data['R_fighter'] == 'Donald Cerrone') | (data['B_fighter'] == 'Donald Cerrone')]
cerrone.head()

Unnamed: 0,R_fighter,B_fighter,Referee,date,location,Winner,title_bout,weight_class,no_of_rounds,B_current_lose_streak,...,R_wins,R_Stance,R_Height_cms,R_Reach_cms,R_Weight_lbs,B_age,R_age,lat_long,lat,long
2,Tony Ferguson,Donald Cerrone,Dan Miragliotta,2019-06-08,"Chicago, Illinois, USA",Red,False,Lightweight,3,0.0,...,14.0,Orthodox,180.34,193.04,155.0,36.0,35.0,"(41.8755616, -87.6244212)",41.875562,-87.624421
50,Al Iaquinta,Donald Cerrone,Jerin Valel,2019-05-04,"Ottawa, Ontario, Canada",Blue,False,Lightweight,5,0.0,...,9.0,Orthodox,177.8,177.8,155.0,36.0,32.0,"(45.421106, -75.690308)",45.421106,-75.690308
216,Alexander Hernandez,Donald Cerrone,Todd Ronald Anderson,2019-01-19,"Brooklyn, New York, USA",Blue,False,Lightweight,3,0.0,...,2.0,Orthodox,175.26,182.88,155.0,35.0,26.0,"(40.6501038, -73.9495823)",40.650104,-73.949582
311,Donald Cerrone,Mike Perry,Keith Peterson,2018-11-10,"Denver, Colorado, USA",Red,False,Welterweight,3,0.0,...,20.0,Orthodox,185.42,185.42,155.0,27.0,35.0,"(39.7392364, -104.9848623)",39.739236,-104.984862
485,Donald Cerrone,Leon Edwards,Leon Roberts,2018-06-23,"Kallang, Singapore",Blue,False,Welterweight,5,0.0,...,20.0,Orthodox,185.42,185.42,155.0,26.0,35.0,"(1.310759, 103.866262)",1.310759,103.866262


Now that we've filtered our dataframe, we'll repeat the same process we did for all UFC Fights to map all of Cerrone's fights.

In [21]:
# Get a list of latitudes and longitudes
cerrone_lat = cerrone['lat'].tolist()
cerrone_long = cerrone['long'].tolist()

# Map
gmap = gmplot.GoogleMapPlotter(0, 0 ,2)

gmap.heatmap(cerrone_lat, cerrone_long)
gmap.scatter(cerrone_lat, cerrone_long, c='r', marker=True)

gmap.draw("cerrone_fight.html")

In [22]:
# Display the map that we created
IFrame(src='cerrone_fight.html', width=900, height=600)

### PLOTLY

Using gmplot is just one of many ways we can use Python to create heatmaps. Next, we'll use plotly to create heatmaps for both Donald Cerrone and all UFC Fights. We'll start with visualizing Donald Cerrone first. To give a different view, we'll focus on his fights in the U.S. - since most of them happened there.

In [23]:
cerrone.groupby('location').size()

location
Anaheim, California, USA               1
Atlantic City, New Jersey, USA         1
Austin, Texas, USA                     1
Boston, Massachusetts, USA             1
Brooklyn, New York, USA                1
Chicago, Illinois, USA                 3
Denver, Colorado, USA                  3
Fairfax, Virginia, USA                 1
Gdansk, Poland                         1
Indianapolis, Indiana, USA             1
Kallang, Singapore                     1
Las Vegas, Nevada, USA                 9
Milwaukee, Wisconsin, USA              1
Orlando, Florida, USA                  2
Ottawa, Ontario, Canada                2
Pittsburgh, Pennsylvania, USA          1
Toronto, Ontario, Canada               1
Vancouver, British Columbia, Canada    1
dtype: int64

In order to pass this info to Plotly correctly, we need to get the State abbreviations for each state that Cerrone has fought in. First, we need to split the location column so we can separate city, state and country for each fight in the United States.

In [25]:
cerrone_usa = cerrone.loc[cerrone['location'].str.contains(', USA')]
cerrone_usa = cerrone_usa.groupby('location').size().to_frame().reset_index()
cerrone_usa[['city', 'state', 'country']] = cerrone_usa['location'].str.split(',', expand=True)
cerrone_usa['state'] = cerrone_usa['state'].str.lstrip()
cerrone_usa.rename(columns={0:'count'}, inplace=True)
cerrone_usa.head()

Unnamed: 0,location,count,city,state,country
0,"Anaheim, California, USA",1,Anaheim,California,USA
1,"Atlantic City, New Jersey, USA",1,Atlantic City,New Jersey,USA
2,"Austin, Texas, USA",1,Austin,Texas,USA
3,"Boston, Massachusetts, USA",1,Boston,Massachusetts,USA
4,"Brooklyn, New York, USA",1,Brooklyn,New York,USA


Next, we'll create a dictionary to map states with the appropriate two letter state abbreviation.

In [26]:
us_state_abbrev = {
    'Alabama': 'AL',
    'Alaska': 'AK',
    'Arizona': 'AZ',
    'Arkansas': 'AR',
    'California': 'CA',
    'Colorado': 'CO',
    'Connecticut': 'CT',
    'Delaware': 'DE',
    'District of Columbia': 'DC',
    'Florida': 'FL',
    'Georgia': 'GA',
    'Hawaii': 'HI',
    'Idaho': 'ID',
    'Illinois': 'IL',
    'Indiana': 'IN',
    'Iowa': 'IA',
    'Kansas': 'KS',
    'Kentucky': 'KY',
    'Louisiana': 'LA',
    'Maine': 'ME',
    'Maryland': 'MD',
    'Massachusetts': 'MA',
    'Michigan': 'MI',
    'Minnesota': 'MN',
    'Mississippi': 'MS',
    'Missouri': 'MO',
    'Montana': 'MT',
    'Nebraska': 'NE',
    'Nevada': 'NV',
    'New Hampshire': 'NH',
    'New Jersey': 'NJ',
    'New Mexico': 'NM',
    'New York': 'NY',
    'North Carolina': 'NC',
    'North Dakota': 'ND',
    'Northern Mariana Islands':'MP',
    'Ohio': 'OH',
    'Oklahoma': 'OK',
    'Oregon': 'OR',
    'Palau': 'PW',
    'Pennsylvania': 'PA',
    'Puerto Rico': 'PR',
    'Rhode Island': 'RI',
    'South Carolina': 'SC',
    'South Dakota': 'SD',
    'Tennessee': 'TN',
    'Texas': 'TX',
    'Utah': 'UT',
    'Vermont': 'VT',
    'Virgin Islands': 'VI',
    'Virginia': 'VA',
    'Washington': 'WA',
    'West Virginia': 'WV',
    'Wisconsin': 'WI',
    'Wyoming': 'WY',
}

In [27]:
# map abberviations for states
cerrone_usa['state_abv'] = cerrone_usa['state'].map(us_state_abbrev)
cerrone_usa.head()

Unnamed: 0,location,count,city,state,country,state_abv
0,"Anaheim, California, USA",1,Anaheim,California,USA,CA
1,"Atlantic City, New Jersey, USA",1,Atlantic City,New Jersey,USA,NJ
2,"Austin, Texas, USA",1,Austin,Texas,USA,TX
3,"Boston, Massachusetts, USA",1,Boston,Massachusetts,USA,MA
4,"Brooklyn, New York, USA",1,Brooklyn,New York,USA,NY


In [28]:
import plotly
import plotly.express as px
plotly.offline.init_notebook_mode(connected=True)

fig = px.choropleth(locations=cerrone_usa['state_abv'].tolist(), locationmode="USA-states", color=cerrone_usa['count'].tolist(), scope="usa")
fig.show()

It looks like Cerrone has fought the most times in the state of Nevada, which makes sense since most fights happen in Las Vegas. It also makes sense that Donald Cerrone has fought in Colorado often, since he was born there. Another interesting finding is that he's only fought in California one time.

Now that we have Cerrone's fights visualized, let's use Plotly to visualize all UFC fights. This time, we'll focus on all countries, not just the United States. We'll start with  splitting the location column into separate columns.

In [30]:
# Split location
data_country = data.groupby('location').size().to_frame().reset_index()
data_country[['city', 'state', 'country']] = data_country['location'].str.split(',', expand=True)
data_country['state'] = data_country['state'].str.lstrip()
data_country['city'] = data_country['city'].str.lstrip()
data_country['country'] = data_country['country'].str.lstrip()
data_country.rename(columns={0:'count'}, inplace=True)
data_country.head()

Unnamed: 0,location,count,city,state,country
0,"Abu Dhabi, United Arab Emirates",18,Abu Dhabi,United Arab Emirates,
1,"Adelaide, South Australia, Australia",24,Adelaide,South Australia,Australia
2,"Albany, New York, USA",12,Albany,New York,USA
3,"Albuquerque, New Mexico, USA",11,Albuquerque,New Mexico,USA
4,"Anaheim, California, USA",72,Anaheim,California,USA


We can see that some rows have a null value in the country column. This is because some countries only list City, Country and do not have a "State". In order to handle null values in the country column, we'll simply shift the values to the right.

In [31]:
# Fix null countries
for index, row in data_country.iterrows():
    if pd.isnull(row['country']):
        country = row['state']
        state = row['city']
        data_country.at[index, 'country'] = country
        data_country.at[index, 'state'] = state
        data_country.at[index, 'city'] = np.nan
        
data_country.head()

Unnamed: 0,location,count,city,state,country
0,"Abu Dhabi, United Arab Emirates",18,,Abu Dhabi,United Arab Emirates
1,"Adelaide, South Australia, Australia",24,Adelaide,South Australia,Australia
2,"Albany, New York, USA",12,Albany,New York,USA
3,"Albuquerque, New Mexico, USA",11,Albuquerque,New Mexico,USA
4,"Anaheim, California, USA",72,Anaheim,California,USA


Now that this is done, we have to prepare our data to be mapped using Plotly. To do this, we need to get Alpha 3 codes for each country. Luckily, we can use the pycountry library to get these Alpha 3 codes for each country. We'll use a for loop to get these codes into a dictionary, then map the codes back into the dataframe.

In [32]:
# get list of ISO 3166-1 alpha-3 codes
import pycountry

country_alpha3 = {}
for country in pycountry.countries:
    country_alpha3[country.name] = country.alpha_3
    
country_alpha3

{'Aruba': 'ABW',
 'Afghanistan': 'AFG',
 'Angola': 'AGO',
 'Anguilla': 'AIA',
 'Åland Islands': 'ALA',
 'Albania': 'ALB',
 'Andorra': 'AND',
 'United Arab Emirates': 'ARE',
 'Argentina': 'ARG',
 'Armenia': 'ARM',
 'American Samoa': 'ASM',
 'Antarctica': 'ATA',
 'French Southern Territories': 'ATF',
 'Antigua and Barbuda': 'ATG',
 'Australia': 'AUS',
 'Austria': 'AUT',
 'Azerbaijan': 'AZE',
 'Burundi': 'BDI',
 'Belgium': 'BEL',
 'Benin': 'BEN',
 'Bonaire, Sint Eustatius and Saba': 'BES',
 'Burkina Faso': 'BFA',
 'Bangladesh': 'BGD',
 'Bulgaria': 'BGR',
 'Bahrain': 'BHR',
 'Bahamas': 'BHS',
 'Bosnia and Herzegovina': 'BIH',
 'Saint Barthélemy': 'BLM',
 'Belarus': 'BLR',
 'Belize': 'BLZ',
 'Bermuda': 'BMU',
 'Bolivia, Plurinational State of': 'BOL',
 'Brazil': 'BRA',
 'Barbados': 'BRB',
 'Brunei Darussalam': 'BRN',
 'Bhutan': 'BTN',
 'Bouvet Island': 'BVT',
 'Botswana': 'BWA',
 'Central African Republic': 'CAF',
 'Canada': 'CAN',
 'Cocos (Keeling) Islands': 'CCK',
 'Switzerland': 'CHE',
 

In [33]:
# Map back into dataframe
data_country['country_alpha3'] = data_country['country'].map(country_alpha3)
data_country.head()

Unnamed: 0,location,count,city,state,country,country_alpha3
0,"Abu Dhabi, United Arab Emirates",18,,Abu Dhabi,United Arab Emirates,ARE
1,"Adelaide, South Australia, Australia",24,Adelaide,South Australia,Australia,AUS
2,"Albany, New York, USA",12,Albany,New York,USA,
3,"Albuquerque, New Mexico, USA",11,Albuquerque,New Mexico,USA,
4,"Anaheim, California, USA",72,Anaheim,California,USA,


Looks like this worked, except for the USA. This is because the key in our dictionary for USA is "United States". Let's manually replace the Alpha 3 code for USA in our dataframe.

In [34]:
for index, row in data_country.iterrows():
    if row['country'] == 'USA':
        data_country.at[index, 'country_alpha3'] = 'USA'
        
data_country.head()

Unnamed: 0,location,count,city,state,country,country_alpha3
0,"Abu Dhabi, United Arab Emirates",18,,Abu Dhabi,United Arab Emirates,ARE
1,"Adelaide, South Australia, Australia",24,Adelaide,South Australia,Australia,AUS
2,"Albany, New York, USA",12,Albany,New York,USA,USA
3,"Albuquerque, New Mexico, USA",11,Albuquerque,New Mexico,USA,USA
4,"Anaheim, California, USA",72,Anaheim,California,USA,USA


Now that USA is taken care of, let's make sure that there are no other nulls in the country_alpha3 column before we start mapping.

In [35]:
data_country.loc[data_country['country_alpha3'].isnull()]

Unnamed: 0,location,count,city,state,country,country_alpha3
93,"Moscow, Moscow, Russia",12,Moscow,Moscow,Russia,
114,"Prague, Czech Republic",13,,Prague,Czech Republic,
122,"Saint Petersburg, Saint Petersburg, Russia",11,Saint Petersburg,Saint Petersburg,Russia,
135,"Seoul, South Korea",11,,Seoul,South Korea,


Looks like Russia, Czech Republic and South Korea are countries that we need to manually map. They came up as null values since the key in the dictionary for these countries are slightly different. We'll go ahead and manually map the Alpha 3 codes using a for loop.

In [36]:
for index, row in data_country.iterrows():
    if row['country'] == 'Russia':
        data_country.at[index, 'country_alpha3'] = 'RUS'
    elif row['country'] == 'South Korea':
        data_country.at[index, 'country_alpha3'] = 'KOR'
    elif row['country'] == 'Czech Republic':
        data_country.at[index, 'country_alpha3'] = 'CZE'

data_country.loc[data_country['country_alpha3'].isnull()]

Unnamed: 0,location,count,city,state,country,country_alpha3


Looks like there are no more nulls left in this column! Now we need to get counts of fights for each country to create a heatmap of fights. We'll use a groupby combined with an .agg("sum") to get these counts. Then, we'll use Plotly to create the heatmap.

In [37]:
# Sum by country
data_country_sum = data_country.groupby(['country', 'country_alpha3'])['count'].agg('sum').to_frame().reset_index()
data_country_sum.head()

Unnamed: 0,country,country_alpha3,count
0,Argentina,ARG,12
1,Australia,AUS,162
2,Brazil,BRA,405
3,Canada,CAN,342
4,Chile,CHL,13


In [38]:
# Create heatmap
fig = px.choropleth(data_country_sum, locations="country_alpha3",
                    color="count", # lifeExp is a column of gapminder
                    hover_name="country", # column to add to hover information
                    color_continuous_scale=px.colors.sequential.Plasma)
fig.show()

Having a majority of fights in the United States is not surprising. What is even more surprising is that there are a lot of countries that have not had any UFC fights. A good amount of European countries have not had any UFC fights and it looks like Africa has also not had any UFC fights.

### Summary

In this notebook, we created heatmaps using two libraries: Plotly and Gmplot. We were able to visualize where the majority of UFC fights occurred and where UFC fights have never occurred. We also found the UFC Fighters that have the most fights between 1993 and June 2019. We visualized the fights for Donald Cerrone, who is tied for the most fights in this time period. We saw that most of his fights happened in Las Vegas, Nevada, which is no surprise since most fights (Boxing and MMA) take place in Las Vegas.