# About this notebook

Here I just want to show the no-tech people that making dashboards is not like any slide or spreadsheet program. Here, in the “wild world of python”~(the most friendly of all languages)~, you have to consider three questions:

  1. Where is going to be shown my work: App, blog, paper, or website?
  2. How much “visual space” will I have?
  3. Do I need/can make the plots interactable?

These questions can be tricky if you only consider one of them. Why? Well, above I will show you the results that you can have with three ways to make a dashboard; two of them need a programming language (python), and a friendly environment or IDE (jupyter notebook). The last one, Tableau, is a powerful tool for Exploratory Data Analysis and Business Intelligence but has a little negative aspect (as far as I know now).

But first, let's import all the code from the scripts. In the midtime that I get how to load my scripts, the code cells will be hidden in here. It's planned to be on scripts just to separate the main ideas and how you can use each library.


## Simple data filters function

In [None]:
import pandas as pd

def count_if(array, value):
    count = 0
    for element in array:
        if element == value:
            count += 1
    return count

def array_count(elements_to_count, original_array):
    count_array = [None] * len(elements_to_count)
    c = 0

    for value in range(len(elements_to_count)):
        count_array[c] = original_array.count(elements_to_count[value])
        c += 1
        
    return count_array

def reverse(array):
    reversed_array = [None]*len(array)
    n = 0
    for data in reversed(array):
        reversed_array[n] = data
        n += 1

    return reversed_array

def rank_set_list(df, rank, info, set_list=True, array=True):
    ranking = df[df['Rank'] == rank]
    raw_info_in_rank = ranking[info].to_list()

    if set_list == True:
        set_info = list(set(raw_info_in_rank))
        none_cases = count_if(set_info, None)
        if none_cases > 0:
            set_info.remove(None)

        if array == True:
            return set_info, raw_info_in_rank

        else: return set_info
    
    else: return raw_info_in_rank

def rank_info(df, rank, column_info, info_name='Albums'):
    set_info, raw_info_in_rank = rank_set_list(df, rank, column_info)
    info_count = array_count(elements_to_count=set_info, original_array=raw_info_in_rank)
    info = {info_name: set_info,
            'Counts' : info_count}
    df_info = pd.DataFrame(info)
    df_info = df_info.sort_values(by=['Counts'])

    return df_info

In [None]:
file_dir = '../input/amazon-mexico-top-50-best-sellers/mexico/parquet/mx-music.parquet'

## Matplotlib code

In [None]:
# Personal Build
# from graph.data_filters import *

# Kernel Graphics
import matplotlib.pyplot as plt
from matplotlib import dates as mdates
import matplotlib.patches as mpatches

# Data Management
import pandas as pd

def plot_rank_barh(x, y, rank):
    plt.figure(figsize=(8,8))
    plt.barh(y=y, width=x)

    for i, v in enumerate(x):
        plt.text(v+1, i, str(v), fontweight='bold')

    xlabel = f'Times shown at #{rank} position' 
    plt.xlabel(xlabel)

    header = f'Times an album hit the #{rank} position' 
    plt.title(label=header)

    plt.show()

df = pd.read_parquet(file_dir)

def barh_and_scatters(pivot_column, df_info=df, top_rank=5, extracted_info_name='Albums', n_plots=1):

    top = top_rank + 1 
    info = pivot_column
    df = df_info

    for rank in range(1, top):
        
        #Extract Info
        
        extracted_df = rank_info(df, rank, info, extracted_info_name)
        albums = extracted_df[extracted_info_name].to_list()
        info_count = extracted_df['Counts'].to_list()

        #Plot
        plot_rank_barh(info_count, albums, rank)

        # scatter
        rank_df = df[df['Rank'] == rank]
        x = rank_df['Stars']
        y = rank_df['Reviews']
        
        groups = rank_df.groupby(info)        
        markers = ['o', 'x', 'v', 'p']
        fig, ax = plt.subplots()
        m = 0
        p = 0
        

        for name, group in groups:
            if (m % 10 == 0) and (m > 0):
                p += 1
            if p == 4: p = 0

            ax.plot(group.Stars , group.Reviews, marker=markers[p], linestyle='', label=name)
            m += 1

        plt.legend(loc='best', bbox_to_anchor=(1,1), ncol=2)
        plt.xlabel("Stars")
        plt.ylabel("Count of Reviews")

def barh_plus_scatter(pivot_column, df_info=df, top_rank=5, extracted_info_name='Albums', n_plots=1, w_fig=8, h_fig=8):
    top = top_rank + 1 
    info = pivot_column
    df = df_info

    for rank in range(1, top):
        
        #Extract Info
        
        extracted_df = rank_info(df, rank, info, extracted_info_name)
        albums = extracted_df[extracted_info_name].to_list()
        info_count = extracted_df['Counts'].to_list()

        #Plot
        fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2,figsize=(w_fig, h_fig))
        title = f' Top #{rank} Info' 
        fig.suptitle(title)
        
        # plt.figure(figsize=(w_fig,h_fig))

        ax1.barh(y=albums, width=info_count)

        for i, v in enumerate(info_count):
            ax1.text(v+1, i, str(v))

        xlabel = f'Times shown at #{rank} position' 
        ax1.set_xlabel(xlabel)

        bar_header = f'Times an {extracted_info_name} hit the #{rank} position' 
        ax1.set_title(label=bar_header)

        # scatter
        rank_df = df[df['Rank'] == rank]
        x = rank_df['Stars']
        y = rank_df['Reviews']
        
        ax2.set_xlabel('Stars')
        ax2.set_title(f'Reviews vs Stars per {extracted_info_name} at the #{rank} position')
        ax2.set_ylabel('Reviews')
        groups = rank_df.groupby(info)        
        markers = ['o', 'x', 'v', 'p']
        
        m = 0
        p = 0
        for name, group in groups:
            if (m % 10 == 0) and (m > 0):
                p += 1
            if p == 4: p = 0
            ax2.plot(group.Stars , group.Reviews, marker=markers[p], linestyle='', label=name)
            
            m += 1

        ax2.legend(loc='best', bbox_to_anchor=(1,1), ncol=2)
        ax2.set_xlim([0, 5])
        plt.show()
        

## Plotly

In [None]:
# Personal Build
# from graph.data_filters import *

# Kernel Graphics
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px

# Data Management
import pandas as pd

main_colors = px.colors.qualitative.Bold
df = pd.read_parquet(file_dir)
# extracted_df = rank_info(df, 1, artist, 'Artist/Band')
# labels = extracted_df['Artist/Band'].to_list()
# labels_count = extracted_df['Counts'].to_list()
# labels = reverse(labels)
album = 'Product Names'
artist = 'Authors/Company'

def plotly_dashboard(pivot_column, rank=1, df_info=df, top_rank=None, extracted_info_name='Albums'):    
    info = pivot_column
    df = df_info
    main_colors = px.colors.qualitative.Bold
    
    if top_rank:
        top = top_rank + 1 
        for rank in range(rank, top):
            plotly_figures(info, df, main_colors, rank, extracted_info_name)
    
    else:
        plotly_figures(info, df, main_colors, rank, extracted_info_name)

def plotly_figures(info, df, main_colors, rank, extracted_info_name):
    extracted_df = rank_info(df, rank, info, extracted_info_name)
    labels = extracted_df[extracted_info_name].to_list()
    labels_count = extracted_df['Counts'].to_list()
    row_1_barh_and_lines(rank, extracted_df, info, extracted_info_name, labels, labels_count, original_df=df, line_colors=main_colors)

    #FIGURE 2
    labels = reverse(labels)
    row_2_rank_boxes(rank, info, extracted_info_name, labels, original_df=df, box_colors=main_colors)

    #FIGURE 3
    #Labels are reversed in order to match with the other graphs
    row_3_boxes_and_scatter(rank, info, labels, original_df=df, scatter_colors=main_colors)

def row_1_barh_and_lines(rank, extracted_df, info, extracted_info_name, labels, labels_count, line_colors, original_df=df):
    df = original_df
    bar_header = f'Times an album hit the top #{rank}'
    line_header = f'{extracted_info_name} Rank History'
    
    
    fig = make_subplots(
        rows=1, cols=2,
        specs=[[{'type': 'bar'}, {'type': 'scatter'}]],
        subplot_titles=(bar_header, line_header)
        )

    #Horizontal Bars
    fig.add_trace(
        go.Bar(
            x = labels_count, y = labels, orientation = 'h',
            showlegend = False,
            text = labels,
                hovertemplate=
                "<b>%{text}</b><br><br>" +
                "Times: %{x}<br>" 
            ), row = 1, col = 1)
    fig.update_traces(marker_color='#F58518')
    fig.update_yaxes(title_text=extracted_info_name, row=1, col=1)
    # fig.update_xaxes(title_text=f'Times hited the top#{rank}', row=1, col=1)
    

    #lines
    labels = reverse(labels)
    color = 0
    for label in labels:
        best_seller = df[df[info] == label]
        fig.add_trace(
            go.Scatter(
                x = best_seller.time, y = best_seller.Rank, mode = 'lines', name = label, line=dict(color = line_colors[color]),
                text = best_seller[info],
                hovertemplate=
                "<b>%{text}</b><br><br>" +
                "Time: %{x}<br>" +
                "Rank Position: #%{y:,0f}<br>"
                ), row=1, col=2)
        if color < (len(line_colors)-1):
            color += 1
        else:
            color = 0
    # fig.update_layout(yaxis=dict(autorange = "reversed"), row=1, col=2)
    fig.update_yaxes(title_text='Rank Position', range=[1,50], row=1, col=2, autorange="reversed")
    fig.update_xaxes(title_text='Date' , row=1, col=2)


    fig.update_layout(
        title = {
            'text' : f"Amazon Music's sells at web page. Top #{rank} Info",
            'x' : 0.5,
            'xanchor' : 'center',
            'yanchor' : 'top',
            'y' : 0.9
        },
        title_font_size = 30,
        margin = dict(r=30, t=100, b=0, l=10),
        height = 400,
    )
    
    fig.show()

def row_2_rank_boxes(rank, info,  extracted_info_name, labels, box_colors, original_df=df):
    df = original_df
    ranking_box_header = f'Ranking distribution of the {extracted_info_name} at Top #{rank}:'  

    fig2 = make_subplots(
        rows=1, cols=1,
        specs=[[{'type': 'Box'}]]
        )
    color = 0
    for label in labels:
        info_df = df[df[info] == label]
        fig2.add_trace(
                go.Box(
                    y = info_df.Rank, showlegend = False,
                    name = label,
                    marker_color = box_colors[color]
                ))
        if color < (len(box_colors)-1):
            color += 1
        else:
            color = 0

    fig2.update_layout(
        margin=dict(r=10, t=40, b=50, l=10),
        height=300,
        title_text = ranking_box_header
    )
    fig2.update_yaxes(title_text='Rank Position', range=[1,50], autorange="reversed", row=1, col=1)
    fig2.show()

def row_3_boxes_and_scatter(rank, info, labels, scatter_colors, original_df=df):
    df = original_df
    box_header = f'Stats of of the #{rank} position'
    scatter_header = f'Reviews vs Stars of the #{rank} position'

    fig3 = make_subplots(
        rows=1, cols=5,
        column_widths=[0.19, 0.19, 0.19, 0.005, 0.425],
        specs=[[{'type': 'box'}, {'type': 'box'}, {'type': 'box'}, None, {'type': 'scatter'}]],
        subplot_titles=(None, box_header, None, scatter_header, None)
    )
    
    #BOXES INFO
    top_5 = df[(df['Rank'] >= 1) & (df['Rank']<=5)]
    rank_df = df[df['Rank'] == rank]
    box_info = ['Stars', 'Reviews', 'Price_std_or_min']
    colors = ['#FECB52', 'gold', 'crimson', 'red','green','lightseagreen']

    #Box
    b = 1
    c = 0
    for i in box_info:
        fig3.add_trace(
            go.Box(
                y = top_5[i], showlegend = False,
                name = f'Tops 1-5',
                marker_color = colors[c],
            ), row = 1, col = b)
        
        c += 1

        fig3.add_trace(
            go.Box(
                y = rank_df[i], showlegend = False,
                name = f'Top {rank}',
                marker_color = colors[c],
            ),
            row = 1, col = b)
        if i == 'Price_std_or_min':
            i = 'Price'
        fig3.update_xaxes(title_text=i, row=1, col=b)
        b += 1
        c += 1 

    #Scatter
    groups = rank_df.groupby(info)
    color = 0 
    for label in labels:
        best_seller = df[df[info] == label]
        fig3.add_trace(
            go.Scatter(
                x = best_seller.Stars, y = best_seller.Reviews, mode = 'markers', name = label, marker=dict(color = scatter_colors[color]),
                text = best_seller[artist],
                hovertemplate=
                "<b>%{text}</b><br><br>" +
                "Stars: %{x}<br>" +
                "Reviews: %{y:,0f}<br>"
                ), row = 1, col = 5)
        if color < (len(scatter_colors)-1):
            color += 1
        else:
            color = 0

    fig3.update_yaxes(title_text="Reviews", row=1, col=5)
    fig3.update_xaxes(title_text="Stars",range=[0,5], row=1, col=5)
    fig3.update_layout(
        margin=dict(r=10, t=40, b=50, l=10),
        height=300,
    )

    fig3.show()


# Building graphs

Matplotlib and Seaborn are two of the most popular graph libraries that every Data Analyst learns. 

Let us start with two of the most used plots in this area: count bars (aka histograms) and scatters. The bars will show how many times, an artist or group hits the first place of the board; the scatter will light us if there is any relationship between the stars rate (x-axis) and the number of reviews (y-axis).

In [None]:
artist = 'Authors/Company'
barh_and_scatters(artist, top_rank=1)

What?! Does Gloria Trevi beat Katy Perry, Lady Gaga, and Britney Spears?! I don’t know you, but for me is funny to watch the loyalty of “La Oreja de Van Gogh” fans until nowadays. But I have some kind of trouble reading the scatter plot. Well, I can see the main objective, there is a wall of stars/reviews: after 500 reviews it seems impossible to keep the 5 stars.

This could be used on a web page or blog. I can take any of these pictures, save them as .jpg or .png and pass them to the Front-End. But how it would look If I need them on paper? Let’s see.

In [None]:
barh_plus_scatter(artist, extracted_info_name='Artist' , top_rank=2, w_fig=20, h_fig=8)

At first sight, on a 24” screen seems quite right, because we don’t have a big number of labels. But what happens when you have a lot of them? For that, we are going to request the first three positions, so this could be something “more real”.

In [None]:
barh_plus_scatter(album, top_rank=3, w_fig=20, h_fig=8)

As you can see, is not the best choice. Do not worry, there are ways to move the legend over the plots, but the problem will remains. Considering the objective of this notebook is not for formatting plots, we are here to show a solution for a web page or blog.

# Plotly and its interactive plots.


[Plotly](https://plotly.com/python) is a graphing library, also available on R and Javascript. From plotly page:
>[…]makes interactive, publication-quality graphs. Examples of how to make line plots, scatter plots, area charts, bar charts, error bars, box plots, histograms, heatmaps, subplots, multiple-axes, polar charts, and bubble charts.
Plotly.py is free and open source and you can view the source, report issues, or contribute on GitHub.

Taking advantage of the interactive graphs, you can take a bunch of them, put them together on a dashboard and look at the data they have. If you haven’t tried one, try these bullets:>
 - Point the bar of an artist and look at the times he/she/they hit the first position.
 - Double click on that artist/band's name on a line plot, look at its history.
 - Look upon the box plot (second row) and look at their average position. Yes, this can be interpreted as "La Oreja de Van Gogh"  has the best median (half of its data) ranking of all, the second place. Can you tell the artists that have a median of 4th position?
 - What is the difference between the median price of the top 1 versus the top 5?

In [None]:
artist = 'Authors/Company'
plotly_dashboard(artist, rank=1, extracted_info_name="Artist/Band")

Thanks for reach here! The best comes here. You can imagine the hours that took me to learn how to code these graphs just by checking my old code on the _music graphs notebook.ipynb_. But what would you think that you can take minutes building them? And even better! Without a notebook! Just on a web page! Like a blog! Well, not quite that.

Tableau is a powerful software developed to build this great interactive dashboard for big amounts of data. You can share your dashboards with others and make them public [here.](https://public.tableau.com/profile/edward5144#!/vizhome/AmazonBestSellers_16116180017330/AmazonMexicoTop50BestSellersMusic) Keep in mind that Tableau is build for Business Intelligence and not quite fits whit analysis like the made here. The main reason of saying this, is that in the data filters I made a special buildin function that filters the artists that reach a rank between 1 and the selected one. This implays to make a query for every request, that means duplicating the data. So in the Tableau page that you will see, you will watch all the 50 rank positions at once.

# Conclusion

Data Visualization is an interesting way of doing Exploratory Data Analysis. Yes, you have just read an EDA of Amazon Mexico Music board. The first aspects that we have to consider are the media where we are going to communicate, define our visual space and make readable our dashboards for the audience.

Hope this works as a beginning to your journey in the Data Analysis world!