# Who are the top Billboard artists ever? Let's find that out and more! #
If you want to ask me questions or make comments on the code, do so here or on Twitch in real time! https://twitch.tv/MitchsWorkshop  

First, we import pandas and plotly:

In [None]:
import pandas as pd
import plotly.graph_objects as go
from plotly.offline import iplot

Now, let's import our data, convert our `date` column to `datetime` type, and create a `year` column from that. We can use that to give our graph different hues for whichever decade the artists first appeared at number 1.

In [None]:
df = pd.read_csv("../input/billboard-the-hot-100-songs/charts.csv")

df["date"] = pd.to_datetime(df["date"])
df["year"] = df["date"].dt.year

df.info()

### Removing Duplicate Songs ###
Some songs appear more than once at number 1 on different weeks or months. So I will filter out the songs whose `peak-rank` is `1`, and create a `groupby` object by `song`, taking only the first instance of each song. This will give one entry per song for the first week it appeared at number one. Then, we will `sort_values` by ascending `date` to get a view of our data chronologically.

In [None]:
is_number1 = df["peak-rank"] == 1
number1_no_duplicates = df[is_number1].groupby("song").first().sort_values("date", ascending = True)
number1_no_duplicates.sample(5)

Now that we have our data sorted and clean, let's define a function called `get_decade` and `map` it to our dataframe to populate a `decade` column of strings (1980's, 2010's, etc).

In [None]:
def get_decade(year):
    year = str(year)
    decade = year[0:3] + "0's"
    return decade

number1_no_duplicates["decade"] = number1_no_duplicates["year"].apply(get_decade)
number1_no_duplicates.sample(5)

## Artists with the most #1 Billboard hits by decade! ##
Everything looks to be in order. Let's graph these suckers! To do so, we need a `Series` object that is the `value_count` of the `artist` column in our newly prepped data.

In [None]:
# change this to see more or fewer artists
top_num = 15

graph_data = number1_no_duplicates[["artist", "decade"]].value_counts().head(top_num)
graph_data.sample(5)

Now let's build the final product. Since plotly demands a different `trace` for each varying hue (especially since we're using a legend), we are going to perform a little trick. Using a `for` loop that iterates through every unique decade in our graphed data, we can create a list containing each `trace` and use that list as our `data` parameter for our graph. We can fix the order of the bars afterwards when we create our final `figure`.  
  
First, let's create some default values for plot aesthetics and use them for our future plots.

In [None]:
# default plot aesthetics
default_layout = {
    # display title with bigger font
    "title": dict(
        font = {"size": 30}
    ),
    
    # change plot font
    "font" : dict(
        family = "arial",
        size = 16,
        color = "white"
    ),
    
    # plot colors and size
    "paper_bgcolor" : "#3d3d3d",
    "plot_bgcolor" : "rgba(0,0,0,0)",
    "height" : 700,
    "yaxis" : {"showgrid": False},
    
    # move the legend
    "legend" : dict(
        orientation = "v",
        x=.7,
        y=.93,
        traceorder="normal",
    )
}

In [None]:
# ordered list of unique decades in our graph data
decade_index = sorted(graph_data.index.get_level_values(1).unique())

# dict defining decades by color
decade_colors = {
    "1950's" : "#cd4b42",
    "1960's" : "#cd7e42",
    "1970's" : "#b13e55",
    "1980's" : "#42cd56",
    "1990's" : "#42b8cd",
    "2000's" : "#606faf",
    "2010's" : "#9677bb",
    "2020's" : "#b771ab"
}

# will become our final list of traces
data = []

for decade in decade_index:
    search_filter = graph_data.index.get_level_values(1) == decade
    subset = graph_data[search_filter]
    
    # create the bars for the decade
    trace = go.Bar(
        x = subset.index.get_level_values(0),
        y = subset,

        # annotate text above the bars
        text = subset,
        textposition = "outside",
        
        # change bar color based of custom dict decade_colors
        marker = {"color" : decade_colors[decade]},
        
        name = decade
    )
    # add trace to the data list
    data.append(trace)

fig = go.Figure(data = data, layout = default_layout)

# updates to the default layout
fig.update_layout(
    {
    "title": dict(
            text = f"Top {top_num} Artists by Number of #1 Billboard Songs"
        ),
    "xaxis": dict(
            categoryorder = 'array',
            categoryarray = graph_data.index.get_level_values(0)
        ),
    "legend": dict(
            orientation = "v",
            x=.7,
            y=.93,
            traceorder="normal"
        )
    }
)

# display the plot!
iplot(fig)

## Which songs spent the most total weeks at number 1? ##
To find out, we can simply get every week's number 1 song by filtering the dataframe by where the value of `1` from the `rank` column. After we do that, we can simply `value_counts()` the `song` column of the filtered dataframe, and we will have our answer! After that, it's a simple matter of plotting!

In [None]:
# filter out every week's number 1 song
all_number1 = df[df["rank"]==1]
all_number1.head()

Now let's re-define our function to add a `decade` column just to remember the syntax, and use it on our new subset of our data so we can tint our next plot by decade like we did the first time.

In [None]:
def get_decade(year):
    year = str(year)
    decade = year[0:3] + "0's"
    return decade

all_number1["decade"] = all_number1["year"].apply(get_decade)
all_number1.head()

Now that we've separated out the number 1 songs, we can see how many times each appears using `value_counts()`.

In [None]:
# after building the plot, I saw that a title of one of the songs was too long and made the chart ugly, so I crop it here
is_long_song = all_number1["song"]=="Candle In The Wind 1997/Something About The Way You Look Tonight"
all_number1.loc[is_long_song, "song"] = "Candle In The Wind"

top_songs = all_number1[["song", "decade"]].value_counts()

top_songs.head()

Our data is ready to plot! Using the same logic as last time we made our plot, we will loop through the decades and apply a different color each time.

In [None]:
# change this number to see more or fewer songs
num_songs = 15
graph_data = top_songs.head(num_songs)

decade_index = sorted(graph_data.index.get_level_values(1).unique())
data = []

for decade in decade_index:
    search_filter = graph_data.index.get_level_values(1) == decade
    subset = graph_data[search_filter]
    
    # create the bars for the decade
    trace = go.Bar(
        x = subset.index.get_level_values(0),
        y = subset,

        # annotate text above the bars
        text = subset,
        textposition = "outside",
        
        # change bar color based of custom dict decade_colors
        marker = {"color" : decade_colors[decade]},
        name = decade,
        
        # prevent the text annotations from being cropped by the top of the plot
        cliponaxis = False
    )
    # add trace to the data list
    data.append(trace)

fig = go.Figure(data = data, layout = default_layout)

# updates to the default layout
fig.update_layout(
    {
    "title": dict(
            text = f"Top {top_num} Songs by Total Weeks as Billboard #1"
        ),
    "xaxis": dict(
            categoryorder = 'array',
            categoryarray = graph_data.index.get_level_values(0)
        ),
    "legend": dict(
            orientation = "v",
            x=.9,
            y=1,
            traceorder="normal"
        )
    }
)

# display the plot!
iplot(fig)

Interestingly enough, we can see that songs from more recent decades (1990's and beyond) spend far more time at number 1 than older songs. To speculate, my instinct says that this is because of the rise of digital music, which emphasizes singles over albums. The Beatles, for example, may release an album with multiple number one songs, but each overtakes the other during the life cycle of the album. But again, that's speculative. Unfortunately, this dataset does not contain album names so we are doomed to speculate. If you agree or disagree with my rationale, let me know! Thanks for reading!

# Connect with me! #  
### Watch me build these types of things live on Twitch! https://twitch.tv/Mitchsworkshop ###  
### Twitter is where these charts get posted! https://twitter.com/MitchsWorkshop ###  
### Join the Discord to give or receive programming advice. https://discord.gg/2hcWnTF ###