# SONG LABELING IN COMMERCIAL SECTORS
*Creating a binary Pandas Dataframe with business sector labels for each song, retrieving the brand which used the specific song from [Songs used in commercials](https://www.songfacts.com/category/songs-used-in-commercials), and then choosing the appropriate sector for the brand.*

### Importing Libraries

In [None]:
import pandas as pd
import collections

## 1. Sector Occurences
*Counting the occurences of each sector chosen in the file ``brands.txt`` and saving counts in the file ``sect_occur.csv``*

In [None]:
with open("brands.txt") as f:
    data = f.readlines()
    
sectors = []
# In brands.txt each sector found for each song is divided by a '|' symbol (each song can have been used in multiple tv commercials)
data = [l.replace("\n","").replace("\t","").split("|")[1:] for l in data]

for row in data:
    for sec in row:
        sectors.append(sec)

# Counting occurences
occur = dict(collections.Counter(sectors))
occur = [[k,v] for k,v in occur.items()]

# Creating the csv file with the counts
df = pd.DataFrame(occur, columns = ["SECTOR", "OCCURRENCES"])
df.sort_values(by = ["OCCURRENCES"], ascending = False, inplace = True)
df.to_csv("sect_occur.csv", index = False)

## 2. Creating the Labels DataFrame
*Most occurred sectors found after some cleaning:* \
**Car, Tech, Clothing, Beer, Soft Drink, Snack, Department Store, Finance, Sport, Food, Fast Food, Phone Company**

In [None]:
sectors = ["Car","Tech","Clothing","Beer","Soft Drink","Snack","Department Store","Finance","Sport","Food","Fast Food","Phone Company"]

# Removing subcategories
data = [[x.split("(")[0] for x in l] for l in data]

# Defining a 0 matrix
sectors_df = pd.DataFrame(0, index = range(len(data)), columns = sectors)

# Writing 1 if a sector in the list i found for each song
for i in range(len(data)):
    for x in data[i]:
        if x in sectors:
            sectors_df.at[i,x] = 1

# Updating the dataset, dropping songs with uncommon sectors
feat_df = pd.read_csv("1. Dataset Creation/features.csv")
for i,row in sectors_df.iterrows():
    row = list(row)
    if 1 not in row:
        sectors_df.drop(i, axis = 0, inplace = True)
        feat_df.drop(i, axis = 0, inplace = True)

sectors_df.reset_index(drop = True, inplace = True)
feat_df.reset_index(drop = True, inplace = True)

feat_df.to_csv("features_2.csv", index = False)
sectors_df.to_csv("sectors_2.csv", index = False)