# <font color='#54595F'><div style = 'background-color:aquamarine'><center>Web-scraping and analysing Beatport.com </center> </div></font>
![](beatport.png "Title")

## <font color='#54595F'>Introduction</font>
<font color='#54595F'>In the following Jupyter Notebook I will perform web scraping of the Beatport.com website to generate a dataframe with all the top 100 songs by genre and then perform an analysis of the generated table.</font>
    
<font color='#54595F'>I will start by showing the results of this study and then the code.</font>


---
## <font color='#54595F'>What is Beatport?</font>

<font color='#54595F'>**Beatport** is a digital electronic music store headquartered in the USA. Since the company's founding in 2004, its goal has been the same: to meet the distinctive needs of its community by providing top-notch goods and services that uplift and unite the community of artists, DJs, and fans. Through this dedication, Beatport hopes to continuously inspire and push innovation, leading and defining the development of dance music culture.With over 36 million unique users, 465 thousand DJ customers, and 11 million curated tracks provided by 75 thousand label relationships, Beatport is still the acknowledged industry leader for the DJ community today.</font>

---

#### <div class="alert alert-danger"> Disclaimer: The data was scraped on 27 October 2022. The code will work as long as Beatport does not make any changes to the structure of its website. Beatport is entitled to make any changes to its website. </div>

---

![](images/imagen.png "Title")

# <font color='#54595F'> Findings</font>

<font color='#54595F'>The first 10 columns of the dataframe obtained after scraping the web site look like the following:</font>

![](images/head.png "Title")

---
### <font color='#54595F'>Song names</font>

<font color='#54595F'>Of the 3200 songs there are 2911 songs with different names. This is good and is because within electronic music many times a single song is interpreted by different artists in different genres and different types of Remix. We can see that the song that appears more times is "Do it to it" (8 times).</font>


![](images/names.png "Title")

---
### <font color='#54595F'>Remix</font>

<font color='#54595F'>There are 647 different types of remixes, the ones that appear most often are the "Original Remix" and the "Extended Mix". Due to the fact that within Beatport.com there are errors in the way mix types are spelled (some with capital letters, some without, etc.) some items appear repeated. In a next project of this type I will set parameters so that this doesn't happen.</font>

![](images/remix.png "Title")

---
### <font color='#54595F'> Artist</font>

<font color='#54595F'>2153 unique artists are performing in the top 100 by genre within Beatport. The artist who plays the most songs is Block & Crown with 28 songs, followed by Ondamike with 19 songs, and David Guetta with Dusky with 13 songs each.</font>

![](images/artists.png "Title")

---
### <font color='#54595F'> Label</font>

<font color='#54595F'>Labels are companies, large or small, that manufacture, distribute, and promote the recordings of affiliated musicians. Essentially, record labels work to sell the brand of the artist and the products they create. 1396 unique labels play the top 100 by genre within Beatport. The labels with the most songs are Defected, Spinning' Records, Ravesta Records, Deadbeats and Musical Freedom with 33, 31, 29, 25 and 19 songs respectively.</font>

![](images/labels.png)

---
### <font color='#54595F'>Genre</font>

<font color='#54595F'>Within genres, there should be 32 genres with 100 songs each. But Beatport within its top 100, for some genres, mixes their names, for example, if I enter the website I can see that in the top 100 of DJ Tools there are Acapellas and DJ Tools.</font>

---
### <font color='#54595F'>Date
<font color='#54595F'>Although the data are of the songs that on October 27, 2022 occupy the top 100 by genre of Baetport, we can see that not all songs are from the year 2022, there are songs that are much older:</font>

![](images/bydate.png)

---
    
# <font color='#54595F'>About me</font>

<font color='#54595F'>Besides working as a data analyst, one of my hobbies is DJing. My favorite genres to play are Deep House, Progressive House, Melodic Techno, and Tech House. I want to share with you my soundcloud channel where you can find my mixes. If you like them, I invite you to follow me and like the mixes. Thank you very much! </font>
    
##### <a href="https://on.soundcloud.com/PBvwv"><font color='#54595F'>Visit my Soundcloud</font></a>
        

# <font color='#54595F'>CODE</font>

In [1]:
# Import the libraries to be used

from bs4 import BeautifulSoup
import requests
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express
import os

In [2]:
# I store the url as a variable, create the variable response and get the information from the website with the 
# requests library. Response should be 200.

url = 'https://www.beatport.com/'
response = requests.get(url)
print(response)

# Now, I store the entire contents of the response in the variable scr

src = response.content

# Now that I've saved the page content I'm going to use Beautiful Soup to parse and process the content.
# To do this I will create 
# a Beautiful Soup object based on the src variable.

soup = BeautifulSoup(src, 'html.parser')

# I create the results variable to find all the "a" tags which is
# where all the "href" I need are located.

results = soup.find('div',{'class':'genres-drop head-drop header-tooltip-menu'}).find_all('a')

# I create the list links

link = []

# with a for loop I will append to the links list all the "href"

for result in results:
    link.append(result['href'])

# To get all the links from which I am going to scrap the  information I need to remove the last "/" from the original URL 
# and add all the items from the new link list.I'm going to do this  # with a for loop creating a new list url_list

url_list = []
for i in link:
    url_list.append(url.strip('/') + i + '/top-100')

# I create de pandas DataFrame and exported as csv

df = pd.DataFrame({'link':url_list})
df.to_csv('links_beatport.csv', index=False)

<Response [200]>


In [3]:
df.head()

Unnamed: 0,link
0,https://www.beatport.com/genre/140-deep-dubstep-grime/95/top-100
1,https://www.beatport.com/genre/afro-house/89/top-100
2,https://www.beatport.com/genre/amapiano/98/top-100
3,https://www.beatport.com/genre/bass-club/85/top-100
4,https://www.beatport.com/genre/bass-house/91/top-100


In [4]:
# create lists of all the data I want for my dataframe.
url = df['link'] # link column of my df
name = []
remix = []
artist = []
label = []
genre = []
release_date = []

# get the information from the website with the requests library.
for url in url:
    response = requests.get(url)

# I print this so I can know if every url works
    print(response) 

# Now, I store the entire contents of the response in the variable scr
    src = response.content
    soup = BeautifulSoup(src, 'html.parser')

# Beautiful Soup object based src variable 
    results = soup.find('div', {'class':'bucket tracks top-hundred-tracks'}).find('ul').find_all('li')

# with a for loop I will append to the info scraped to every list created before.
    for result in results:
        name.append(result.find('span',{'class':'buk-track-primary-title'}).text.strip("'"))
        remix.append(result.find('span', {'class':"buk-track-remixed"}).text)
        artist.append(result.find('p', {'class':'buk-track-artists'}).find('a').text.strip())
        label.append(result.find('p', {'class':'buk-track-labels'}).text.strip())
        genre.append(result.find('p', {'class':'buk-track-genre'}).text.strip())
        release_date.append(result.find('p', {'class':'buk-track-released'}).text.strip())

<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>


In [5]:
# Create DataFrame
beatport_top100 = pd.DataFrame(
    {
        'Name':name,
        'Remix':remix, 
        'Artist':artist,
        'Label':label, 
        'Genre':genre, 
        'Release Date':release_date
    }
)

In [6]:
# finding null values
beatport_top100.isna().sum()


Name            0
Remix           0
Artist          0
Label           0
Genre           0
Release Date    0
dtype: int64

In [7]:
# Export dataframe to csv file
beatport_top100.to_csv('beatport_top_allgn.csv')

In [8]:
# Print first 10 rows
beatport_top100.head(10)

Unnamed: 0,Name,Remix,Artist,Label,Genre,Release Date
0,Discovery,Original Mix,Hamdi,Jadu Dala,140 / Deep Dubstep / Grime,2022-09-14
1,TEK,Original Mix,Monty,1985 Music,140 / Deep Dubstep / Grime,2022-09-30
2,Skanka,Original Mix,Hamdi,DUPLOC,140 / Deep Dubstep / Grime,2022-05-11
3,Grinding Ft. High Tara,Mystic State Remix,An-ten-nae,Medicine,140 / Deep Dubstep / Grime,2022-10-07
4,Magenta,Original Mix,sumthin sumthin,Bassrush Records,140 / Deep Dubstep / Grime,2022-10-07
5,Obsidian Vortex,Original Mix,ATLiens,Bassrush Records,140 / Deep Dubstep / Grime,2022-09-16
6,Secrecy,Original Mix,Peekaboo,Deadbeats,140 / Deep Dubstep / Grime,2022-07-29
7,Elite,Original Mix,Säkä,Deadbeats,140 / Deep Dubstep / Grime,2022-08-26
8,Reaper (feat. JID),Monty & Visages Remix,Monty,Create Music Group,140 / Deep Dubstep / Grime,2022-06-24
9,Clutch,Original Mix,Ternion Sound,Deep Dark & Dangerous,140 / Deep Dubstep / Grime,2022-09-23


In [9]:
# Transform date column to datetime
beatport_top100['Release Date'] =  pd.to_datetime(beatport_top100['Release Date'], infer_datetime_format=True)

In [10]:
# These are all of the columns
columns = beatport_top100.columns
columns

Index(['Name', 'Remix', 'Artist', 'Label', 'Genre', 'Release Date'], dtype='object')

In [11]:
for i in columns:
    print(i,'({})'.format(beatport_top100[i].nunique()))
    print(beatport_top100[i].value_counts())
    print()
    print()

Name (2911)
Do It To It                    8
Move Your Body                 5
Move                           5
Higher                         5
Breathe                        5
                              ..
Ballz                          1
sunscreen                      1
Are We Here? (30 Something)    1
White Girl Got Some Ass        1
Rewind                         1
Name: Name, Length: 2911, dtype: int64


Remix (647)
Original Mix                  1864
Extended Mix                   503
Extended                        35
Acapella                        22
Accapella                       21
                              ... 
Kings of the Rollers Remix       1
Bladerunner Remix                1
Nick The Lot Remix               1
Burr Oak Remix                   1
El-B Mix                         1
Name: Remix, Length: 647, dtype: int64


Artist (2153)
Block & Crown     28
Ondamike          19
David Guetta      13
Dusky             13
Ghostbusterz      11
                  ..
Univac

In [12]:
individual_dfs = [beatport_top100[['Name']], beatport_top100[['Remix']], beatport_top100[['Artist']], beatport_top100[['Label']], beatport_top100[['Genre']]] 

In [13]:
df_pivot = []
for i in individual_dfs:
    pivot_table = i.pivot_table(
    index= i.columns[0],
    values= i.columns[0],
    aggfunc={i.columns[0]:['count']}
    )
    
    pivot_table.set_axis(['Q'], axis=1, inplace=True)
    pivot_table=pivot_table.reset_index()
    pivot_table.sort_values(by='Q', ascending=False, inplace=True)
    
    df_pivot.append(pivot_table)
    

In [14]:
print(df_pivot[0])

                                                   Name  Q
669                                         Do It To It  8
1638                                               Move  5
1642                                     Move Your Body  5
357                                             Breathe  5
1120                                             Higher  5
...                                                 ... ..
1008                                               Gold  1
1010                                         Gold Teeth  1
1011                                         Gom Jabbar  1
1013  Gonna Be Alright feat. Jamie 3:26 feat. Annett...  1
2910                                                حلم  1

[2911 rows x 2 columns]


In [15]:
import plotly.express as px
import plotly.io as pio


In [16]:
df_pivot[0].columns[0]

'Name'

In [17]:
# Create a folder called images to save my graphs
if not os.path.exists("images"):
    os.mkdir("images")

In [18]:
# Filter the dataframe so that it does not crash the browser
df_pivot_1_filtered = df_pivot[0].head(25)

# Construct the graph and style it.
fig = px.bar(df_pivot_1_filtered, x=df_pivot[0].columns[0], y='Q', template='plotly_dark', text_auto=True)
fig.update_layout(
    title='Top 25 songs that appear more than once', 
    xaxis = dict(
        showgrid=True,
        )
    , 
    yaxis = dict(
        showgrid=True
       
    ), 
    legend = dict(
        orientation='v'
    ), 
  #  barmode='group', 
   # paper_bgcolor='#000000'
)
fig.show(renderer="iframe")
fig.write_image("images/names.png")


In [19]:
# Filter the dataframe so that it does not crash the browser
df_pivot_1_filtered = df_pivot[1].head(10)

# Construct the graph and style it.
fig = px.bar(df_pivot_1_filtered, x=df_pivot[1].columns[0], y='Q', template='plotly_dark', text_auto=True)
fig.update_layout(
    title='Top 10 types of Remix', 
    xaxis = dict(
        showgrid=True,
        )
    , 
    yaxis = dict(
        showgrid=True
       
    ), 
    legend = dict(
        orientation='v'
    ), 
  #  barmode='group', 
   # paper_bgcolor='#000000'
)
fig.show(renderer="iframe")
fig.write_image("images/remix.png")


In [20]:
# Filter the dataframe so that it does not crash the browser
df_pivot_1_filtered = df_pivot[2].head(25)

# Construct the graph and style it.
fig = px.bar(df_pivot_1_filtered, x=df_pivot[2].columns[0], y='Q', template='plotly_dark', text_auto=True)
fig.update_layout(
    title='Top 25 Artists and its number of songs', 
    xaxis = dict(
        showgrid=True,
        )
    , 
    yaxis = dict(
        showgrid=True
       
    ), 
    legend = dict(
        orientation='v'
    ), 
  #  barmode='group', 
   # paper_bgcolor='#000000'
)
fig.show(renderer="iframe")
fig.write_image("images/artists.png")


In [21]:
# Filter the dataframe so that it does not crash the browser
df_pivot_1_filtered = df_pivot[3].head(25)

# Construct the graph and style it.
fig = px.bar(df_pivot_1_filtered, x=df_pivot[3].columns[0], y='Q', template='plotly_dark', text_auto=True)
fig.update_layout(
    title='Top 25 Labels and its number of songs', 
    xaxis = dict(
        showgrid=True,
        )
    , 
    yaxis = dict(
        showgrid=True
       
    ), 
    legend = dict(
        orientation='v'
    ), 
  #  barmode='group', 
   # paper_bgcolor='#000000'
)
fig.show(renderer="iframe")
fig.write_image("images/labels.png")

In [22]:
beatport_top100

Unnamed: 0,Name,Remix,Artist,Label,Genre,Release Date
0,Discovery,Original Mix,Hamdi,Jadu Dala,140 / Deep Dubstep / Grime,2022-09-14
1,TEK,Original Mix,Monty,1985 Music,140 / Deep Dubstep / Grime,2022-09-30
2,Skanka,Original Mix,Hamdi,DUPLOC,140 / Deep Dubstep / Grime,2022-05-11
3,Grinding Ft. High Tara,Mystic State Remix,An-ten-nae,Medicine,140 / Deep Dubstep / Grime,2022-10-07
4,Magenta,Original Mix,sumthin sumthin,Bassrush Records,140 / Deep Dubstep / Grime,2022-10-07
...,...,...,...,...,...,...
3195,Usch,Original Mix,Kornel Kovacs,Studio Barnhus,UK Garage / Bassline,2022-10-21
3196,Have You Over,Original Mix,Axel Boy,Crucast,Bassline,2022-10-21
3197,Believe Me Boy,TC4 Remix,Sam Deeley,Steppers Club,UK Garage,2022-10-21
3198,Bubble & Move feat. Eloheema,El-B Mix,Beat Merchants,Beats Galore,UK Garage,2022-10-21


In [23]:
beatport_top100.sort_values(by='Release Date', inplace=True)

In [24]:
beatport_top100.head(10)

Unnamed: 0,Name,Remix,Artist,Label,Genre,Release Date
3030,September,Original Mix,"Earth, Wind & Fire",Columbia/Legacy,Trap / Wave,1978-11-23
3024,Funkytown,"12"" Version",Lipps Inc.,Polydor,Trap / Wave,1979-11-01
3020,I Feel Love,Original Mix,Donna Summer,Mercury Records,Trap / Wave,1980-09-08
1684,Passion,Naked Mix,Gat Decor,Altra Moda,House,1992-06-12
3077,Ghostbusters,Original Mix,Ray Parker Jr.,Arista/Legacy,Trap / Wave,1993-10-12
891,Turn Me Out feat. Kathy Brown,Acappella,Praxis,Cutting Records,Acapellas,1994-03-20
1661,Gotta Let You Go,Club Mix,Dominica,Altra Moda,House,1994-04-11
554,Fall Deeper,Original Mix,Force Mass Motion,Force Mass Motion,Breaks / Breakbeat / UK Bass,1996-01-01
676,Freed from Desire,Full Vocals Mixx,Gala,Do It Yourself,Dance / Electro Pop,1996-03-23
1406,U Found Out,Tony De Vit Remix,Handbaggers,Hard Drive,Hard Dance / Hardcore,1996-09-23
