# Task

	•	Clean the duration column to only have integer values as total duration in seconds
	•	Delete the songs with an empty instrumental column
	•	Create a new field called “tag” and assign 
        •	“energetic” if the song has energy higher than 0.7
        •	“positive” if the song has energy higher than 0.6 and danceability higher than 0.7
        
	•	Answer the following questions
        •	What is the average danceability of a Backstreet Boys song? 
        •	What is the artist with the highest average popularity? 
        •	What is the average duration in seconds of mainstream (popularity > 50) and indie (popularity <= 50) songs? 

## Data manipulation

In [1]:
import pandas as pd

In [19]:
songs_df = pd.read_csv('Data/songs.csv')

In [20]:
# Clean the duration column to only have integer values as total duration in seconds

songs_df['duration_sec'] = songs_df['duration_sec'].astype('int')

In [21]:
# Delete the songs with an empty instrumental column

songs_df.dropna(axis=0, subset=['instrumentalness'], inplace=True)

In [22]:
# Create a new field called “tag” and assign 
       # • “energetic” if the song has energy higher than 0.7
       # • “positive” if the song has energy higher than 0.6 and danceability higher than 0.7
        
songs_df['tag'] = list(zip(songs_df['energy'].values, songs_df['danceability'].values))

In [23]:
def tag_value(numbers):
    if numbers[0] > 0.7 and numbers[1] <= 0.7:
        return 'energetic'
    elif numbers[0] > 0.6 and numbers[1] > 0.7:
        return 'positive'
    return None

In [24]:
songs_df['tag'] = songs_df['tag'].apply(tag_value)

In [25]:
songs_df.head(50)

Unnamed: 0,artist_name,track_name,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_sec,time_signature,tag
0,David Bowie,Space Oddity - 2015 Remaster,73.0,0.31,0.403,,-13.664,1,0.0326,0.0726,9.3e-05,0.139,0.466,134.48,318,4,
1,Crimson Sun,Essence of Creation,34.0,0.511,0.955,1.0,-5.059,1,0.129,0.0004,9e-06,0.263,0.291,151.937,220,4,energetic
3,Shawn Mendes,Wonder,80.0,0.333,0.637,1.0,-4.904,0,0.0581,0.131,1.8e-05,0.149,0.132,139.898,172,4,
5,Paul McCartney,Pretty Boys (feat. Khruangbin),56.0,0.696,0.979,11.0,-5.338,0,0.0445,0.00244,0.768,0.116,0.962,101.021,348,4,energetic
6,The Flaming Lips,She Don't Use Jelly,60.0,0.33,0.556,7.0,-11.494,1,0.0796,0.207,4.4e-05,0.315,0.506,173.828,222,4,
7,The Vanities,Wasted All My Days,23.0,0.43,0.942,10.0,-3.008,1,0.0913,0.000166,1.9e-05,0.103,0.557,91.483,140,4,energetic
8,pizzagirl,car freshener aftershave,36.0,0.603,0.724,7.0,-6.843,1,0.0265,0.00135,1e-05,0.184,0.387,129.984,243,4,energetic
9,Jane's Addiction,Been Caught Stealing,57.0,0.639,0.929,,-4.762,1,0.212,0.00326,0.0501,0.242,0.688,103.758,214,4,energetic
10,The M√∂nic,Just Mad,29.0,0.324,0.798,2.0,-8.417,1,0.031,0.0518,6e-06,0.256,0.534,168.132,172,4,energetic
13,Foo Fighters,Shame Shame,62.0,0.652,0.864,7.0,-4.108,1,0.0325,0.00138,0.0433,0.0371,0.38,122.026,257,4,energetic


## Questions

In [26]:
# What is the average danceability of a Backstreet Boys song?

bsb_df = songs_df[songs_df['artist_name']=='Backstreet Boys']

In [10]:
answer = round(bsb_df['danceability'].mean(),3)
print(f"The average danceability of a Backstreet Boys song is {answer}")

The average danceability of a Backstreet Boys song is 0.483


In [11]:
# What is the artist with the highest average popularity?

In [12]:
all_averages = songs_df.groupby('artist_name')['popularity'].mean().sort_values(ascending=False)

answer_2 = all_averages.index.to_list()[0]
print(f"The artist with the highest average popularity is {answer_2}")

The artist with the highest average popularity is Doja Cat


In [13]:
# What is the average duration in seconds of mainstream (popularity > 50) and indie (popularity) songs?

In [14]:
mainstream_df = songs_df[songs_df['popularity']>50]

In [15]:
answer_mainstream = round(mainstream_df['duration_sec'].mean())
print(f"The average duration of mainstream songs is {answer_mainstream} seconds")

The average duration of mainstream songs is 214.0 seconds


In [16]:
indie_df = songs_df[songs_df['popularity']<=50]

In [18]:
answer_indie = round(indie_df['duration_sec'].mean())
print(f"The average duration of indie songs is {answer_indie} seconds")

The average duration of indie songs is 219.0 seconds
