In [1]:
import pandas as pd

In [2]:
# death_metal.csv
url = 'https://drive.google.com/file/d/11HsCgxJL_PtJ8xxdT5VZbw6e0y0-VKag/view?usp=sharing' 
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
bands = pd.read_csv(path)

In [3]:
bands.head()

Unnamed: 0,name,country,status,formed_in,genre,theme,active
0,**Act of Destruction?>,united states,ac,2005.0,Melodic Death/Thrash Metal,Death| Love| Life| Evil| Darker Tones,2005-present
1,**Nirvana 2002?>,sweden,su,1988.0,Death Metal,Metaphysical Philosophy| Parapsychology,1988-1992
2,**Olemus?>,austria,ac,1993.0,Death/Black/Gothic Metal,Sadness| Life| Death,1993-present
3,**Misanthrope?>,mexico,oh,2010.0,Death Metal,Death| Destruction| War| Decadence,2010-present
4,**Detonator?>,russia,su,1991.0,Technical Death/Thrash Metal,Loneliness philosophy| state of mind of the pe...,1991-2002


In [None]:
bands.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26 entries, 0 to 25
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   name       26 non-null     object 
 1   country    26 non-null     object 
 2   status     26 non-null     object 
 3   formed_in  26 non-null     float64
 4   genre      26 non-null     object 
 5   theme      26 non-null     object 
 6   active     26 non-null     object 
dtypes: float64(1), object(6)
memory usage: 1.5+ KB


### **Exercise 1:** 
Cleaning the 'name' column

The names of the bands are messy. They have some extra characters like '\*\*' at the begining and '?>' at the end. Remove them using any or all of these methods:  
- [pandas.Series.str.replace](https://pandas.pydata.org/docs/reference/api/pandas.Series.str.replace.html)
- [pandas.Series.str.lstrip](https://pandas.pydata.org/docs/reference/api/pandas.Series.str.lstrip.html)
- [pandas.Series.str.rstrip](https://pandas.pydata.org/docs/reference/api/pandas.Series.str.rstrip.html)
- [pandas.Series.str.strip](https://pandas.pydata.org/docs/reference/api/pandas.Series.str.strip.html)

In [None]:
bands['name'].head(5)

0    **Act of Destruction?>
1          **Nirvana 2002?>
2                **Olemus?>
3           **Misanthrope?>
4             **Detonator?>
Name: name, dtype: object

In [8]:
bands.name = bands.name.str.strip('?>**')

### **Exercise 2:** 
Cleaning the country column

The country column has all countries written with small capital letter. Change them so they have all capital letters.  
Best method for this is: [pandas.Series.str.title](https://pandas.pydata.org/docs/reference/api/pandas.Series.str.title.html)

In [None]:
bands['country'].head(5)

0    united states
1           sweden
2          austria
3           mexico
4           russia
Name: country, dtype: object

In [9]:
bands['country'] = bands.country.str.title()
bands['country']

0      United States
1             Sweden
2            Austria
3             Mexico
4             Russia
5            Germany
6              Italy
7            Finland
8              Italy
9             Sweden
10       Switzerland
11           Germany
12            France
13          Colombia
14    Czech Republic
15    United Kingdom
16           Germany
17      Korea| South
18          Malaysia
19            Sweden
20     United States
21     United States
22         Australia
23           Germany
24    United Kingdom
25     United States
Name: country, dtype: object

### **Exercise 3:**
Cleaning the status column

The status column has some abbreviations instead of the real status.  
Change them in accordance with this:  
* ac = Active  
* su = Split-up  
* cn = Changed name  
* oh = On hold  
* un = Unknown

You can use  
pandas.Series.str.replace -> https://pandas.pydata.org/docs/reference/api/pandas.Series.str.replace.html  
or you can create a dictionary and remap values according to it with  
pandas.Series.replace -> https://pandas.pydata.org/docs/reference/api/pandas.Series.replace.html

In [None]:
bands['status'].head(5)

0    ac
1    su
2    ac
3    oh
4    su
Name: status, dtype: object

In [11]:
bands.status.replace({'ac': 'Active', 'su': 'Split up', 'oh': 'On hold','cn':'Changed name','un':'Unknown'},inplace=True)
bands.head()

Unnamed: 0,name,country,status,formed_in,genre,theme,active
0,Act of Destruction,United States,Active,2005.0,Melodic Death/Thrash Metal,Death| Love| Life| Evil| Darker Tones,2005-present
1,Nirvana 2002,Sweden,Split up,1988.0,Death Metal,Metaphysical Philosophy| Parapsychology,1988-1992
2,Olemus,Austria,Active,1993.0,Death/Black/Gothic Metal,Sadness| Life| Death,1993-present
3,Misanthrope,Mexico,On hold,2010.0,Death Metal,Death| Destruction| War| Decadence,2010-present
4,Detonator,Russia,Split up,1991.0,Technical Death/Thrash Metal,Loneliness philosophy| state of mind of the pe...,1991-2002


In [None]:
def replace_abb(x):
  ret=""
  if x == "ac":
      ret="Active"
  elif x == "su":
      ret="Split-up"
  elif x == "oh":
      ret="On hold"
  elif x == "cn":
      ret="Changed name"
  elif x == "un":
      ret="Unknown"
  else:
      ret=""
  return ret
bands['status']=bands['status'].apply(replace_abb)

### **Exercise 4:**
Cleaning the genre column

The column genre has genres in a single string separated by character /  
1. First, transform the string to list of strings  
(e.g. 'Avant-garde Black/Death Metal'    to     \[Avant-garde Black, Death Metal\]  
1. Then, create a new column 'number_of_genres' where you will store the number of genres in each list.  

Methods you can use are:  
pandas.Series.str.split -> https://pandas.pydata.org/docs/reference/api/pandas.Series.str.split.html  
pandas.Series.str.len -> https://pandas.pydata.org/docs/reference/api/pandas.Series.str.len.html

In [None]:
bands['genre'].head(5)

0      Melodic Death/Thrash Metal
1                     Death Metal
2        Death/Black/Gothic Metal
3                     Death Metal
4    Technical Death/Thrash Metal
Name: genre, dtype: object

In [None]:
bands['genre'] = bands.genre.str.strip()
bands['genre'] = bands.genre.str.split('/')
bands['number_of_genres'] = bands.genre.str.len()

### **Exercise 5:** 
Cleaning the active column

The column `active` contains information about the years when a band was active or '?' if the status or year is unknown.  

Create two new columns 
- `active_from`: when the band formed
- `active_to`: when the band broke up

and fill them with the information contained in the column `active`.  

Method you can use is: pandas.Series.str.extract -> https://pandas.pydata.org/docs/reference/api/pandas.Series.str.extract.html#

In [None]:
bands['active'].head(5)

0    2005-present
1       1988-1992
2    1993-present
3    2010-present
4       1991-2002
Name: active, dtype: object

In [24]:
bands[["active_from","active_to"]]=bands['active'].str.extract("(\d+)\-(\d+|present|\?)")

In [None]:
bands['active'].str.split('-' ,expand=True)

### **Exercise 6** 
Counting the themes

Count how many times do the words Love, Life, Death repeat in a themes column.  
Method you can use is:  
pandas.Series.str.count -> https://pandas.pydata.org/docs/reference/api/pandas.Series.str.count.html  

In [None]:
bands['theme'].head(5)

0                Death| Love| Life| Evil| Darker Tones
1              Metaphysical Philosophy| Parapsychology
2                                 Sadness| Life| Death
3                   Death| Destruction| War| Decadence
4    Loneliness philosophy| state of mind of the pe...
Name: theme, dtype: object

In [30]:
(bands.theme.str.count('Love', flags=3).sum() +
bands.theme.str.count('Life', 3).sum() +
bands.theme.str.count('Death', 3).sum() )

19

In [29]:
bands.theme.str.count('Love|Life|Death', flags=3).sum()

19

## Final result

In [32]:
bands

Unnamed: 0,name,country,status,formed_in,genre,theme,active,number_of_genres,active_from,active_to
0,Act of Destruction,United States,Active,2005.0,"[Melodic Death, Thrash Metal]",Death| Love| Life| Evil| Darker Tones,2005-present,2,2005,present
1,Nirvana 2002,Sweden,Split up,1988.0,[Death Metal],Metaphysical Philosophy| Parapsychology,1988-1992,1,1988,1992
2,Olemus,Austria,Active,1993.0,"[Death, Black, Gothic Metal]",Sadness| Life| Death,1993-present,3,1993,present
3,Misanthrope,Mexico,On hold,2010.0,[Death Metal],Death| Destruction| War| Decadence,2010-present,1,2010,present
4,Detonator,Russia,Split up,1991.0,"[Technical Death, Thrash Metal]",Loneliness philosophy| state of mind of the pe...,1991-2002,2,1991,2002
5,Blood Agent,Germany,Active,2013.0,"[Death, Thrash Metal]",War| Death| Apocalypse,2013-present,2,2013,present
6,Traumagain,Italy,Active,2001.0,[Brutal Death Metal],Nihilism| Death,2001-present,1,2001,present
7,Anaktorian,Finland,Split up,2001.0,[Melodic Death Metal],Death| emotions| pain,2001-2007,1,2001,2007
8,Revolt,Italy,Changed name,2001.0,"[Death, Thrash Metal]",Society| Hate,2001-2006,2,2001,2006
9,Coldworker,Sweden,Split up,2006.0,[Death Metal],Death,2006-2013,1,2006,2013
