# Spotify top 200 charts (2020 - 2021)
### This data set includes information related to top 200 charted songs from 2020 to 2021.

### The csv includes:
```
index, highest charting position, number of times charted, week of highest charting, song name, streams, artist, artist followers, song id, genre, release date, weeks charted, popularity, danceability, energy, loudness, speechiness, acousticness, liveness, tempo, duration, valence, chord
```

### For this visualization we're interested in:
- Geting a brief understanding of what genre was the most listened from 2020 to 2021.
- Knowing which chords are the most used when composing a song and if it affects the charting place.
- Knowing if there's an "ideal" duration for a song to be top charted.
- Knowing if there's an "ideal" tempo for a song to be top charted.

### What will this help us for?
This will help us get an insight on what type of song is more likely to be top charted and therefore more listened according to the considered variables: "genre, duration, tempo and chord"

In [149]:
import numpy as np
import matplotlib.pyplot as plt
import csv

from wordcloud import WordCloud, STOPWORDS

def read_csv(csv_route):
  with open(csv_route, encoding = "cp850") as file:
    reader = csv.reader(file, delimiter=",")
    data_indices = [1, 4, 6, 9, 19, 20, 22]
    float_indices = [0, 4, 5]
    header = next(reader)
    header = [header[i] for i in data_indices]
    data = []
    for row in reader:
      song = [row[i] for i in data_indices]
      data.append(song)
    for idx, song in enumerate(data):
      for i, val in enumerate(song):
        if val == " ":
          del data[idx]
        else: 
          if i in float_indices:
            song[i] = float(song[i])
    return data, header

def filter_data(data, upper_limit, lower_limit):
  return list(filter(lambda x: float(x[0]) <= upper_limit and float(x[0]) >= lower_limit, data))


def generate_chord_wordcloud(data):
  comment_words = " ".join(song[6] for song in data)
  wordcloud = WordCloud(width=400, height=400, background_color = "white", regexp=r"(?:[A-z]#/[A-z]{2})|(?:[A-Z])").generate(comment_words)
  return wordcloud


### First let's read the dataset and retrieve all the songs that belonged to the top 10 charted and the ones that belonged to the least 10 charted. (best and worst)

In [150]:
data, header = read_csv('./spotify_dataset.csv')
top_ten = filter_data(data, 10, 0)
last_ten = filter_data(data, 200, 190)

### Let's find out the most listened genre's from 2020 to 2021

In [None]:
#TODO: graph word_cloud for genres in top_ten and least_ten

### Let's see if there's an ideal duration for a song to be top charted

In [None]:
#TODO: graph durations for top_ten and least_ten

### Let's see if there's an ideal tempo for a song to be top charted

In [None]:
#TODO: graph tempo for top_ten and leat_ten

### Now let's find out the frequently used chords for top charted and least charted songs

In [None]:


fig, ax = plt.subplots(1,2)
fig.suptitle("Frequently used chords for:")
ax[0].set_title("top charted songs")
ax[0].imshow(generate_chord_wordcloud(top_ten))
ax[0].axis("off")

ax[1].set_title("least charted songs")
ax[1].imshow(generate_chord_wordcloud(last_ten))
ax[1].axis("off")

fig.tight_layout()
plt.show()

### Insights