# Heavy Data: Exploring Metallica Concerts

As an aspring Data Scientist and passionated about Heavy Metal, I decided to explore information about one of my favourite bands: Metallica. 

This dataset can be originally found in [Kaggle](https://www.kaggle.com/kitsamho/metallica-tour-history) and contains information about Metallica's 2070 concerts between 1982 and 2018. All the data was scrapped from Metallica.com website.

The information contained in the dataset for each concert:

- Date;
- Venue;
- Festival: if the concert was performed in a festival, this column will contain its name;
- Latitude and longitude of the Venue;
- Tour: what tour was the concert part of;
- Set: a list of songs played in the concert;
- Last_Track_Set: the last song played in the concert;
- Encore: the songs played after the band first leave the stage and back for play more songs;
- Columns containing how many songs from each album were played from each album in the concert;
- URL: url for the official Metallica with information about the concert that was scrapped to the dataset.

In [1]:
# importing the data analysis libraries:

import pandas as pd
import numpy as np

In [2]:
# loading the dataset as a pandas dataframe:

metallica = pd.read_csv('Metallica_Data_Clean.csv')

In [3]:
# taking a first look in the information:

metallica.head(5)

Unnamed: 0,Date,Venue,Festival,City_Country,Lat,Long,Tour,Set,Last_track_Set,Encores,...,Master_Of_Puppets_Count,And_Justice_For_All_Count,Metallica_Count,Load_Count,Re_Load_Count,Garage_Inc_Count,St_Anger_Count,Death_Magnetic_Count,Hardwired_To_Self_Destruct_Count,URL
0,"December 9, 2018",Save Mart Center,,"Fresno, California, United States",36.737798,-119.787125,WorldWired,"['Hardwired', 'Atlas, Rise!', 'Seek and Destro...",Master of Puppets,"['Spit Out The Bone', 'Nothing Else Matters', ...",...,2.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,4.0,https://www.metallica.com/events/event-36824.html
1,"December 7, 2018",Golden 1 Center,,"Sacramento, California, United States",38.581572,-121.4944,WorldWired,"['Hardwired', 'Atlas, Rise!', 'Seek and Destro...",Master of Puppets,"['Battery', 'Nothing Else Matters', 'Enter San...",...,2.0,2.0,2.0,0.0,0.0,0.0,0.0,0.0,5.0,https://www.metallica.com/events/event-36823.html
2,"December 5, 2018",Moda Center,,"Portland, Oregon, United States",45.512231,-122.658719,WorldWired,"['Hardwired', 'Atlas, Rise!', 'Seek and Destro...",Master of Puppets,"['Spit Out The Bone', 'Nothing Else Matters', ...",...,2.0,1.0,2.0,0.0,1.0,0.0,0.0,0.0,4.0,https://www.metallica.com/events/event-36822.html
3,"December 2, 2018",Spokane Arena,,"Spokane, Washington, United States",47.65878,-117.426047,WorldWired,"['Hardwired', 'Atlas, Rise!', 'Seek and Destro...",Master of Puppets,"['Blackened', 'Nothing Else Matters', 'Enter S...",...,2.0,1.0,2.0,0.0,0.0,0.0,0.0,1.0,5.0,https://www.metallica.com/events/event-36821.html
4,"November 30, 2018",Vivint Smart Home Arena,,"Salt Lake City, Utah, United States",40.760779,-111.891047,WorldWired,"['Hardwired', 'Atlas, Rise!', 'Seek and Destro...",Master of Puppets,"['Fight Fire with Fire', 'Nothing Else Matters...",...,2.0,1.0,2.0,0.0,1.0,0.0,0.0,0.0,5.0,https://www.metallica.com/events/event-36820.html


In [4]:
# exploring the columns names to check if some changes are necessary:

print(metallica.columns)

Index(['Date', 'Venue', 'Festival', 'City_Country', 'Lat', 'Long', 'Tour',
       'Set', 'Last_track_Set', 'Encores', 'Last_track_Encore',
       'Encores_Count', 'Set_Length', 'Other_Acts', 'Has_Guitar_Solo',
       'Has_Bass_Solo', 'Has_Drum_Solo', 'Has_Doodle', 'Has_Medley',
       'Kill_'Em_All_Count', 'Ride_The_Lightning_Count',
       'Master_Of_Puppets_Count', 'And_Justice_For_All_Count',
       'Metallica_Count', 'Load_Count', 'Re_Load_Count', 'Garage_Inc_Count',
       'St_Anger_Count', 'Death_Magnetic_Count',
       'Hardwired_To_Self_Destruct_Count', 'URL'],
      dtype='object')


In [5]:
# changing some column names to make the analysis easier and more "readable":

metallica.columns = ['date', 'venue', 'festival', 'city_country', 'lat', 'long', 'tour',
       'set', 'last_track_set', 'encores', 'last_track_encore',
       'encores_count', 'set_length', 'other_acts', 'has_guitar_solo',
       'has_bass_solo', 'has_drum_solo', 'has_doodle', 'has_medley',
       'kill_count', 'ride_count',
       'master_count', 'justice_count',
       'black_count', 'load_count', 'reload_count', 'garage_count',
       'anger_Count', 'magnetic_count',
       'hardwired_count', 'url']

In [6]:
# checking the new names in the dataframe:

metallica.head(5)

Unnamed: 0,date,venue,festival,city_country,lat,long,tour,set,last_track_set,encores,...,master_count,justice_count,black_count,load_count,reload_count,garage_count,anger_Count,magnetic_count,hardwired_count,url
0,"December 9, 2018",Save Mart Center,,"Fresno, California, United States",36.737798,-119.787125,WorldWired,"['Hardwired', 'Atlas, Rise!', 'Seek and Destro...",Master of Puppets,"['Spit Out The Bone', 'Nothing Else Matters', ...",...,2.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,4.0,https://www.metallica.com/events/event-36824.html
1,"December 7, 2018",Golden 1 Center,,"Sacramento, California, United States",38.581572,-121.4944,WorldWired,"['Hardwired', 'Atlas, Rise!', 'Seek and Destro...",Master of Puppets,"['Battery', 'Nothing Else Matters', 'Enter San...",...,2.0,2.0,2.0,0.0,0.0,0.0,0.0,0.0,5.0,https://www.metallica.com/events/event-36823.html
2,"December 5, 2018",Moda Center,,"Portland, Oregon, United States",45.512231,-122.658719,WorldWired,"['Hardwired', 'Atlas, Rise!', 'Seek and Destro...",Master of Puppets,"['Spit Out The Bone', 'Nothing Else Matters', ...",...,2.0,1.0,2.0,0.0,1.0,0.0,0.0,0.0,4.0,https://www.metallica.com/events/event-36822.html
3,"December 2, 2018",Spokane Arena,,"Spokane, Washington, United States",47.65878,-117.426047,WorldWired,"['Hardwired', 'Atlas, Rise!', 'Seek and Destro...",Master of Puppets,"['Blackened', 'Nothing Else Matters', 'Enter S...",...,2.0,1.0,2.0,0.0,0.0,0.0,0.0,1.0,5.0,https://www.metallica.com/events/event-36821.html
4,"November 30, 2018",Vivint Smart Home Arena,,"Salt Lake City, Utah, United States",40.760779,-111.891047,WorldWired,"['Hardwired', 'Atlas, Rise!', 'Seek and Destro...",Master of Puppets,"['Fight Fire with Fire', 'Nothing Else Matters...",...,2.0,1.0,2.0,0.0,1.0,0.0,0.0,0.0,5.0,https://www.metallica.com/events/event-36820.html


In [7]:
# exploring the kind of data contained in each column:

metallica.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2070 entries, 0 to 2069
Data columns (total 31 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   date               2070 non-null   object 
 1   venue              2070 non-null   object 
 2   festival           239 non-null    object 
 3   city_country       2070 non-null   object 
 4   lat                2070 non-null   float64
 5   long               2070 non-null   float64
 6   tour               1681 non-null   object 
 7   set                1862 non-null   object 
 8   last_track_set     1861 non-null   object 
 9   encores            1862 non-null   object 
 10  last_track_encore  1717 non-null   object 
 11  encores_count      1862 non-null   float64
 12  set_length         1862 non-null   float64
 13  other_acts         1513 non-null   object 
 14  has_guitar_solo    2070 non-null   bool   
 15  has_bass_solo      2070 non-null   bool   
 16  has_drum_solo      2070 

In [8]:
# taking a look at the format of dates, to check if can strip the year for a specific new columns:

metallica['date'].value_counts()

June 11, 2003         3
June 4, 1996          3
November 14, 1997     3
November 16, 1997     2
April 18, 2010        2
                     ..
August 20, 2004       1
October 24, 1986      1
September 14, 1985    1
December 5, 1986      1
June 5, 1986          1
Name: date, Length: 2052, dtype: int64

As we can see, the dates are stored as strings. In order to generate specific information for each ear in the dataset, we will create a separete column containing only the year of the concert:

In [9]:
# separating the year of the concert in a different column:

metallica['year'] = metallica['date'].str[-4:]
metallica['year'].describe()

count     2070
unique      36
top       1992
freq       178
Name: year, dtype: object

In [10]:
# concerting the year to numeric type:

metallica['year'] = metallica['year'].astype(int)

In [11]:
metallica['year'].describe()

count    2070.000000
mean     1997.368599
std        10.187933
min      1982.000000
25%      1989.000000
50%      1996.000000
75%      2004.750000
max      2018.000000
Name: year, dtype: float64

## At this moment we are able to answer or first question: how many concerts has Metallica made in each year?

In [12]:
#counting the amount of concerts per year and sorting it in descending order:

years_count = metallica['year'].value_counts(ascending=False)
years_count

1992    178
1989    156
1986    148
2004    119
1997     98
2009     96
1988     94
1996     91
1993     79
2003     78
1998     71
2010     64
2017     62
2008     60
1984     59
1985     58
1991     58
1983     55
1999     53
1994     51
2018     50
2012     33
2014     31
2000     30
1987     30
1982     28
2013     28
2011     22
2016     22
2006     17
2015     17
2007     14
1990     11
1995      5
2002      2
2005      2
Name: year, dtype: int64

We can see that Metallica has performed 178 concerts in 1992. This year the band had just reached the mainstream after the releas of Black Album. We also have a large number of concerts in 2004, when Metallica has just released its more controversal album, St. Anger.

We can also see that there are some inconsistencies, for example, only 11 concerts recorded in 1990 and 5 concerts recorded in 1995. It has a high probability of this information to be wrond, so we will remove from data set information about the years with less than 50 shows recorded:

In [13]:
type(years_count)

pandas.core.series.Series

In [14]:
# first, we can creat a list from the pandas data series we created to explore the amount of concerts per year:

years_count_list = list(years_count.items())

In [15]:
# we now have a list of tuples, each tuple containing the year and the number of concerts:

years_count_list

[(1992, 178),
 (1989, 156),
 (1986, 148),
 (2004, 119),
 (1997, 98),
 (2009, 96),
 (1988, 94),
 (1996, 91),
 (1993, 79),
 (2003, 78),
 (1998, 71),
 (2010, 64),
 (2017, 62),
 (2008, 60),
 (1984, 59),
 (1985, 58),
 (1991, 58),
 (1983, 55),
 (1999, 53),
 (1994, 51),
 (2018, 50),
 (2012, 33),
 (2014, 31),
 (2000, 30),
 (1987, 30),
 (1982, 28),
 (2013, 28),
 (2011, 22),
 (2016, 22),
 (2006, 17),
 (2015, 17),
 (2007, 14),
 (1990, 11),
 (1995, 5),
 (2002, 2),
 (2005, 2)]

In [16]:
# we will know convert this list of tuples into a list of lists:

year_count_list = []

for item in years_count_list:
    year_count = list(item)
    year_count_list.append(year_count)
    
year_count_list

[[1992, 178],
 [1989, 156],
 [1986, 148],
 [2004, 119],
 [1997, 98],
 [2009, 96],
 [1988, 94],
 [1996, 91],
 [1993, 79],
 [2003, 78],
 [1998, 71],
 [2010, 64],
 [2017, 62],
 [2008, 60],
 [1984, 59],
 [1985, 58],
 [1991, 58],
 [1983, 55],
 [1999, 53],
 [1994, 51],
 [2018, 50],
 [2012, 33],
 [2014, 31],
 [2000, 30],
 [1987, 30],
 [1982, 28],
 [2013, 28],
 [2011, 22],
 [2016, 22],
 [2006, 17],
 [2015, 17],
 [2007, 14],
 [1990, 11],
 [1995, 5],
 [2002, 2],
 [2005, 2]]

In [17]:
# just checking if everything is ok so far:

year_count_list[10][0]

1998

In [18]:
# we can now use a for loop to eliminate all data from the dataset related to the years with less than 50 shows performed:

for year_count in year_count_list:
    
    count = year_count[1]
    year = year_count[0]
    
    if count < 50:
        metallica = metallica.loc[metallica['year'] != year,:]

In [19]:
metallica['year'].value_counts(ascending=False)

1992    178
1989    156
1986    148
2004    119
1997     98
2009     96
1988     94
1996     91
1993     79
2003     78
1998     71
2010     64
2017     62
2008     60
1984     59
1985     58
1991     58
1983     55
1999     53
1994     51
2018     50
Name: year, dtype: int64

With this operation we dropped from 2010 to 1778 concerts, as the command below shows us. Since our analysis is drived only for curiosity and doesn't have professional purposes, its ok to proceed:

In [20]:
metallica.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1778 entries, 0 to 2041
Data columns (total 32 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   date               1778 non-null   object 
 1   venue              1778 non-null   object 
 2   festival           133 non-null    object 
 3   city_country       1778 non-null   object 
 4   lat                1778 non-null   float64
 5   long               1778 non-null   float64
 6   tour               1552 non-null   object 
 7   set                1572 non-null   object 
 8   last_track_set     1571 non-null   object 
 9   encores            1572 non-null   object 
 10  last_track_encore  1495 non-null   object 
 11  encores_count      1572 non-null   float64
 12  set_length         1572 non-null   float64
 13  other_acts         1290 non-null   object 
 14  has_guitar_solo    1778 non-null   bool   
 15  has_bass_solo      1778 non-null   bool   
 16  has_drum_solo      1778 

## We can now proceed to a second question: what cities Metallica visit the most since 1982?

In [21]:
# checking the information contained in the city_country column:

metallica['city_country'].value_counts()

San Francisco, California, United States    35
London, England                             29
Los Angeles, California, United States      24
Detroit, Michigan, United States            24
Chicago, Illinois, United States            24
                                            ..
Dover, New Jersey, United States             1
Hanley, England                              1
Bielefeld, Germany                           1
Guatemala City, Guatemala                    1
Brunssum, Netherlands                        1
Name: city_country, Length: 465, dtype: int64

As we can see above, the column contains in one unique string information about the city, state and the country of the concert. In this case, we will celan this column, separating the city from the rest:

In [22]:
metallica['city'] = metallica['city_country'].str.split(',')

In [23]:
metallica['city'] = metallica['city'].str[0]

In [24]:
# checking the top 10 cities that have received the largest amount of Metallica concerts:

metallica['city'].value_counts().head(10)

San Francisco    35
London           30
Los Angeles      24
Chicago          24
Detroit          24
Paris            21
Copenhagen       21
New York         20
Denver           19
Philadelphia     18
Name: city, dtype: int64

As conclusion, we saw that the first city is San Francisco, the city were the band was formed in 1981. Outside US, we also have London, Paris and Copenhagen, in Denmark, country were the drummer, Lars Ulrich, was born.

## What about the countries? It repeats the figures saw in the top 10 cities? Let's check it out:

In [25]:
metallica['country'] = metallica['city_country'].str.split(',')
metallica['country'] = metallica['country'].str[-1]

In [26]:
metallica['country'].value_counts().head(10)

 United States    1003
 Canada            110
 Germany           106
 England            75
 Australia          47
 France             45
 Japan              34
 Spain              27
 Denmark            25
 Netherlands        23
Name: country, dtype: int64

We discovered that 1003 concerts were performed in United States, which is around 56% of total amount of concerts! We can also see Australia and Japan in the top 10, which is a bit surprising for me. Converting this number to percentages, we have:

In [27]:
metallica['country'].value_counts(normalize=True).head(10)

 United States    0.564117
 Canada           0.061867
 Germany          0.059618
 England          0.042182
 Australia        0.026434
 France           0.025309
 Japan            0.019123
 Spain            0.015186
 Denmark          0.014061
 Netherlands      0.012936
Name: country, dtype: float64

## What were the most extenses tours in terms of amount of concerts?

In [28]:
metallica['tour'].value_counts(ascending=False)

Wherever We May Roam                    229
Damaged Justice                         219
World Magnetic                          179
Poor Touring Me                         135
Damage, Inc.                            114
Madly In Anger With The World           105
WorldWired                              104
Ride the Lightning                       77
Poor ReTouring Me                        60
Summer Shit                              51
The Garage Remains the Same              42
Guns N' Roses/Metallica Stadium Tour     36
Kill 'Em All for One                     32
Van Halen's Monsters of Rock             28
Nowhere Else to Roam                     24
Lollapalooza                             23
European Vacation                        20
Summer Sanitarium 2003                   20
Monsters of Rock                         20
Hell On Earth Tour                       11
Blitzkrieg '97                            9
Seven Dates of Hell                       6
Garage Barrage                  

As expected, the largest tour were Wherever We May Roam, the tour to support the release of the most successful album Metallica has release, the Black Album, with more than 200 concerts.

## How percentage of the total amount of concerts were performed in festvals?

To answer this question we can apply the value_counts method to the column "festival", and set the dropna argument to keep the null values. 

In [29]:
metallica['festival'].value_counts(normalize=True,dropna=False)

NaN                       0.925197
Sonisphere                0.007312
Big Day Out               0.004499
Monsters Of Rock          0.002812
Rock Werchter             0.002812
                            ...   
Doctor Music Festival     0.000562
Fm4 Frequency Festival    0.000562
Nulle Part Ailleurs       0.000562
Open Air                  0.000562
Ozzfest                   0.000562
Name: festival, Length: 83, dtype: float64

As a result, we saw that 92% the concerts were not performed in festval, which means that 8% were.

## How many different setlists has Metallica played?

In [30]:
# describing the column set:

metallica['set'].describe()

count                                                  1572
unique                                                  705
top       ['Breadfan', 'Master of Puppets', 'Wherever I ...
freq                                                     48
Name: set, dtype: object

Metallica has played 705 different setlists in the concerts. 

In [31]:
metallica['set'].value_counts(ascending=False).head(1)

['Breadfan', 'Master of Puppets', 'Wherever I May Roam', 'Harvester of Sorrow', 'Welcome Home (Sanitarium)', 'The God That Failed', 'Kill/Ride Medley', 'For Whom the Bell Tolls', 'Disposable Heroes', 'Seek and Destroy', 'Guitar Solo', 'Nothing Else Matters', 'Creeping Death', 'Bass Solo', 'Fade to Black', 'Whiplash']    48
Name: set, dtype: int64

The most common setlist was played 48 times and is composed by the followng songs:
    
    ['Breadfan', 'Master of Puppets', 'Wherever I May Roam', 'Harvester of Sorrow', 'Welcome Home (Sanitarium)', 'The God That Failed', 'Kill/Ride Medley', 'For Whom the Bell Tolls', 'Disposable Heroes', 'Seek and Destroy', 'Guitar Solo', 'Nothing Else Matters', 'Creeping Death', 'Bass Solo', 'Fade to Black', 'Whiplash'] 

## How many times each album has been played?

To answer this question we can perform a sum of the amount of songs played in each concert for each album. We will also create a new column, called "set_len", that calculates the total amount of songs played in each concert:

In [32]:
metallica[['kill_count', 'ride_count',
       'master_count', 'justice_count',
       'black_count', 'load_count', 'reload_count', 'garage_count',
       'anger_Count', 'magnetic_count',
       'hardwired_count']].describe()

Unnamed: 0,kill_count,ride_count,master_count,justice_count,black_count,load_count,reload_count,garage_count,anger_Count,magnetic_count,hardwired_count
count,1572.0,1572.0,1572.0,1572.0,1572.0,1572.0,1572.0,1572.0,1572.0,1572.0,1572.0
mean,0.902036,2.266539,1.663486,1.433842,2.237913,0.667303,0.379135,0.307252,0.14631,0.592875,0.344784
std,1.205999,1.082033,1.200728,1.304187,1.845224,1.415788,0.662747,0.747837,0.435787,1.628721,1.301502
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,1.0,3.0,2.0,1.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,1.0,3.0,2.0,2.0,3.0,0.0,1.0,1.0,0.0,0.0,0.0
max,8.0,5.0,6.0,5.0,7.0,7.0,3.0,10.0,3.0,7.0,6.0


In [33]:
metallica[['kill_count', 'ride_count',
       'master_count', 'justice_count',
       'black_count', 'load_count', 'reload_count', 'garage_count',
       'anger_Count', 'magnetic_count',
       'hardwired_count']].sum(axis=0).sort_values(ascending=False)

ride_count         3563.0
black_count        3518.0
master_count       2615.0
justice_count      2254.0
kill_count         1418.0
load_count         1049.0
magnetic_count      932.0
reload_count        596.0
hardwired_count     542.0
garage_count        483.0
anger_Count         230.0
dtype: float64

We conclude that Ride The Lightning and Black Album are the most played albuns, with a large difference in front of the third place, Master Of Puppets.

In [34]:
# creating the column with the lenght of the setlist:

metallica['set_len'] = metallica[['kill_count', 'ride_count',
       'master_count', 'justice_count',
       'black_count', 'load_count', 'reload_count', 'garage_count',
       'anger_Count', 'magnetic_count',
       'hardwired_count']].sum(axis=1)

Once the column with the setlist leght was created, we can now have new information about the setlists:

In [35]:
metallica['set_len'].describe()

count    1778.000000
mean        9.673791
std         4.499909
min         0.000000
25%         8.000000
50%        11.000000
75%        13.000000
max        18.000000
Name: set_len, dtype: float64

Above we can see some general information about the setlists in the dataset. As a band fan I can see that this information is not so accurate, since usually the band plays much more songs than that. This is because probably the amount of songs played of each album are not considering the songs played in the encore.

In [36]:
year_len = metallica[['year', 'set_len']]
year_len

Unnamed: 0,year,set_len
0,2018,14.0
1,2018,14.0
2,2018,14.0
3,2018,14.0
4,2018,14.0
...,...,...
2037,1983,6.0
2038,1983,4.0
2039,1983,6.0
2040,1983,8.0


Since the amount of songs in each concert doest not seem to  be reliable to generate information about the set list, we can use the date to check another information, for example, the total amount of songs played in each year:

In [37]:
year_len.groupby('year').set_len.sum().sort_values(ascending=False)

year
1992    1881.0
2009    1371.0
1997    1144.0
2004    1107.0
1996    1096.0
2010     931.0
1993     895.0
1989     889.0
2008     836.0
1998     834.0
2017     795.0
1988     772.0
1986     691.0
2018     665.0
1999     646.0
2003     633.0
1994     607.0
1991     587.0
1984     327.0
1985     286.0
1983     207.0
Name: set_len, dtype: float64

We can see that in 1992 Metallica has played around 1800 songs. If apply a correction and add the encore songs to this number, we will probably have more than 2000 plays in this year! We can check this same number for each tour:

In [38]:
metallica.groupby('tour').set_len.sum().sort_values(ascending=False)

tour
World Magnetic                          2662.0
Wherever We May Roam                    2538.0
Poor Touring Me                         1756.0
Damaged Justice                         1464.0
WorldWired                              1420.0
Madly In Anger With The World            991.0
Poor ReTouring Me                        761.0
Damage, Inc.                             678.0
Summer Shit                              607.0
The Garage Remains the Same              511.0
Ride the Lightning                       435.0
Nowhere Else to Roam                     311.0
Guns N' Roses/Metallica Stadium Tour     309.0
European Vacation                        300.0
Lollapalooza                             286.0
Van Halen's Monsters of Rock             182.0
Summer Sanitarium 2003                   180.0
Monsters of Rock                         172.0
Blitzkrieg '97                            97.0
Kill 'Em All for One                      79.0
Garage Barrage                            50.0
Seven Da

## How much the band travels around the world?

How many different cities has the band visited in each year?

In [39]:
metallica.groupby('year').city.nunique().sort_values(ascending=False)

year
1986    135
1989    135
1992    126
2004    105
1988     77
1997     76
2009     74
1996     68
1993     65
2003     60
1998     60
2008     54
1985     51
1999     50
1994     50
1984     49
2017     48
1991     47
2018     42
2010     42
1983     40
Name: city, dtype: int64

The band has visited 124 different cities in 1992 and 104 cities in 2004. What a journey! What about the countries?

In [40]:
metallica.groupby('year').country.nunique().sort_values(ascending=False)

year
2010    34
1993    33
1999    32
2004    24
1996    21
2008    21
2009    19
1986    19
2003    18
1988    18
1992    17
2017    17
1991    16
2018    15
1984    13
1997     8
1998     7
1989     6
1985     4
1994     2
1983     1
Name: country, dtype: int64

We can see that the visited 34 different countries in 2010. What if we make this same analysis by tours?

In [41]:
metallica.groupby('tour').country.nunique().sort_values(ascending=False)

tour
World Magnetic                          46
The Garage Remains the Same             32
Wherever We May Roam                    29
WorldWired                              28
Damaged Justice                         23
Poor Touring Me                         21
Madly In Anger With The World           21
Nowhere Else to Roam                    19
European Vacation                       18
Monsters of Rock                        13
Ride the Lightning                      12
Damage, Inc.                            12
Blitzkrieg '97                           7
Poor ReTouring Me                        6
Seven Dates of Hell                      6
Hell On Earth Tour                       3
Summer Sanitarium 2003                   2
Lollapalooza                             2
Guns N' Roses/Metallica Stadium Tour     2
Garage Barrage                           2
Summer Shit                              2
M2K                                      1
Kill 'Em All for One                     1
Van Ha

Metallica has visited 46 different countries in World Magnetic tour, to support the release of Death Magnetic album. 

## Hoe many different setlists the band has played in each year?

In [42]:
metallica.groupby('year').set.nunique().sort_values(ascending=False)

year
2004    110
2009     93
2010     62
2008     58
2018     49
2017     49
2003     44
1997     34
1992     33
1996     25
1986     20
1993     18
1984     18
1998     17
1983     17
1999     16
1991     15
1989     15
1988     10
1985     10
1994      4
Name: set, dtype: int64

## What musics are the most played in Metallica concerts?

To finish, let's check it out what musics are the most played during all those years.

In [43]:
# creating a panda series with column "set":

sets = metallica['set']
type(sets)

pandas.core.series.Series

In [44]:
# converting the panda series in a list:

sets_list = list(sets.items())
sets_list

[(0,
  '[\'Hardwired\', \'Atlas, Rise!\', \'Seek and Destroy\', \'Ride the Lightning\', \'Welcome Home (Sanitarium)\', "Now That We\'re Dead", \'Creeping Death\', \'For Whom the Bell Tolls\', \'Fade to Black\', \'Hit the Lights\', \'Fuel\', \'Moth Into Flame\', \'Sad But True\', \'One\', \'Master of Puppets\']'),
 (1,
  '[\'Hardwired\', \'Atlas, Rise!\', \'Seek and Destroy\', \'The Shortest Straw\', \'The Unforgiven\', "Now That We\'re Dead", \'Confusion\', \'For Whom the Bell Tolls\', \'Welcome Home (Sanitarium)\', \'Whiplash\', \'Creeping Death\', \'Moth Into Flame\', \'Sad But True\', \'One\', \'Master of Puppets\']'),
 (2,
  '[\'Hardwired\', \'Atlas, Rise!\', \'Seek and Destroy\', \'Holier Than Thou\', \'Welcome Home (Sanitarium)\', "Now That We\'re Dead", \'Creeping Death\', \'For Whom the Bell Tolls\', \'Fade to Black\', \'No Remorse\', \'Fuel\', \'Moth Into Flame\', \'Sad But True\', \'One\', \'Master of Puppets\']'),
 (3,
  '[\'Hardwired\', \'Atlas, Rise!\', \'Seek and Destroy\

In [45]:
# checking the results:

sets_list[0]

(0,
 '[\'Hardwired\', \'Atlas, Rise!\', \'Seek and Destroy\', \'Ride the Lightning\', \'Welcome Home (Sanitarium)\', "Now That We\'re Dead", \'Creeping Death\', \'For Whom the Bell Tolls\', \'Fade to Black\', \'Hit the Lights\', \'Fuel\', \'Moth Into Flame\', \'Sad But True\', \'One\', \'Master of Puppets\']')

In [46]:
set_lists_clean = []

for item in sets_list:
    set_lists_clean.append(item[1])
    
set_lists_clean[0:5]

['[\'Hardwired\', \'Atlas, Rise!\', \'Seek and Destroy\', \'Ride the Lightning\', \'Welcome Home (Sanitarium)\', "Now That We\'re Dead", \'Creeping Death\', \'For Whom the Bell Tolls\', \'Fade to Black\', \'Hit the Lights\', \'Fuel\', \'Moth Into Flame\', \'Sad But True\', \'One\', \'Master of Puppets\']',
 '[\'Hardwired\', \'Atlas, Rise!\', \'Seek and Destroy\', \'The Shortest Straw\', \'The Unforgiven\', "Now That We\'re Dead", \'Confusion\', \'For Whom the Bell Tolls\', \'Welcome Home (Sanitarium)\', \'Whiplash\', \'Creeping Death\', \'Moth Into Flame\', \'Sad But True\', \'One\', \'Master of Puppets\']',
 '[\'Hardwired\', \'Atlas, Rise!\', \'Seek and Destroy\', \'Holier Than Thou\', \'Welcome Home (Sanitarium)\', "Now That We\'re Dead", \'Creeping Death\', \'For Whom the Bell Tolls\', \'Fade to Black\', \'No Remorse\', \'Fuel\', \'Moth Into Flame\', \'Sad But True\', \'One\', \'Master of Puppets\']',
 '[\'Hardwired\', \'Atlas, Rise!\', \'Seek and Destroy\', \'Leper Messiah\', \'The

In [47]:
set_lists_clean[0:5]

['[\'Hardwired\', \'Atlas, Rise!\', \'Seek and Destroy\', \'Ride the Lightning\', \'Welcome Home (Sanitarium)\', "Now That We\'re Dead", \'Creeping Death\', \'For Whom the Bell Tolls\', \'Fade to Black\', \'Hit the Lights\', \'Fuel\', \'Moth Into Flame\', \'Sad But True\', \'One\', \'Master of Puppets\']',
 '[\'Hardwired\', \'Atlas, Rise!\', \'Seek and Destroy\', \'The Shortest Straw\', \'The Unforgiven\', "Now That We\'re Dead", \'Confusion\', \'For Whom the Bell Tolls\', \'Welcome Home (Sanitarium)\', \'Whiplash\', \'Creeping Death\', \'Moth Into Flame\', \'Sad But True\', \'One\', \'Master of Puppets\']',
 '[\'Hardwired\', \'Atlas, Rise!\', \'Seek and Destroy\', \'Holier Than Thou\', \'Welcome Home (Sanitarium)\', "Now That We\'re Dead", \'Creeping Death\', \'For Whom the Bell Tolls\', \'Fade to Black\', \'No Remorse\', \'Fuel\', \'Moth Into Flame\', \'Sad But True\', \'One\', \'Master of Puppets\']',
 '[\'Hardwired\', \'Atlas, Rise!\', \'Seek and Destroy\', \'Leper Messiah\', \'The

In [48]:
# creating a function to clean the setlist:

def clean_set_list(setlist):
    
    setlist = setlist.replace(']','')
    setlist = setlist.replace('[','')
    setlist = setlist.replace("'",'')
    setlist = setlist.replace("' ","'")
    setlist = setlist.strip()

    
    return setlist

In [49]:
#t testing the function in an example:

clean_set_list(set_lists_clean[1343])

'Blackened, For Whom the Bell Tolls, Welcome Home (Sanitarium), Leper Messiah, Harvester of Sorrow, Eye of the Beholder, Bass Solo, Master of Puppets, One, Seek and Destroy, ...And Justice for All'

In [50]:
# creating a frequency table of the amount total amount of plays of each song:

frequency_table = {}

for item in set_lists_clean:
    
    if item is None:
        pass
    else:
        setlist = clean_set_list(item)
        musics = setlist.split(',')
    
        for music in musics:
            if music not in frequency_table:
                frequency_table[music] = 1
            else:
                frequency_table[music] += 1
            
frequency_table

{'Hardwired': 104,
 ' Atlas': 101,
 ' Rise!': 101,
 ' Seek and Destroy': 873,
 ' Ride the Lightning': 243,
 ' Welcome Home (Sanitarium)': 806,
 ' "Now That Were Dead"': 104,
 ' Creeping Death': 739,
 ' For Whom the Bell Tolls': 1220,
 ' Fade to Black': 864,
 ' Hit the Lights': 16,
 ' Fuel': 355,
 ' Moth Into Flame': 103,
 ' Sad But True': 915,
 ' One': 763,
 ' Master of Puppets': 1012,
 ' The Shortest Straw': 55,
 ' The Unforgiven': 376,
 ' Confusion': 27,
 ' Whiplash': 611,
 ' Holier Than Thou': 57,
 ' No Remorse': 184,
 ' Leper Messiah': 89,
 ' The Day That Never Comes': 153,
 ' Motorbreath': 48,
 ' Through the Never': 209,
 ' Halo On Fire': 77,
 ' The Memory Remains': 190,
 ' The Four Horsemen': 367,
 ' Harvester of Sorrow': 704,
 'Disposable Heroes': 1,
 ' When a Blind Man Cries': 1,
 ' "Please Dont Judas Me"': 1,
 ' Turn the Page': 49,
 ' Bleeding Me': 149,
 ' Veteran of the Psychic Wars': 1,
 ' Nothing Else Matters': 550,
 ' All Within My Hands': 1,
 ' Enter Sandman': 252,
 ' Har

In [51]:
# converting the frequency table info a dataframe:

series = pd.Series(frequency_table) 
df_songs = series.to_frame(name='amount of plays')

In [55]:
# creating a top 30 of the most played songs:

df_songs.sort_values('amount of plays',ascending = False).head(30)

Unnamed: 0,amount of plays
For Whom the Bell Tolls,1220
Master of Puppets,1012
Sad But True,915
Seek and Destroy,873
Fade to Black,864
Welcome Home (Sanitarium),806
One,763
Creeping Death,739
Harvester of Sorrow,704
Wherever I May Roam,656


To finish, we can see that the most played songs are:
    
- For Whom The Bell Tolls
- Master Of Puppets
- Sad But True
- Fade To Black
- Welcome Home

As a fan, I can see some inconsistencies in this information. Enter Sandman is probably the most popular song of the band and it is actually placed between the three most played songs by the band in all times. The Memory Remains was probably played more times than Until It Sleeps. 

This analysis were totally driven by my own curiosity and I did not put a lot of effort in data cleaning, but it could probably be much more complete. Feel free to improve if you are also a Metallica fan!