Skip to content

Agglomerative hierarchical clustering on 39.7K female fragrances 🤖

Notifications You must be signed in to change notification settings

katarzynajanicka/agglomerative-fragrance-clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

agglomerative-fragrance-clustering

Hierarchical agglomerative clustering on female fragrance accords


Table of contents

General info

Unsupervised machine learning project with hierarchical agglomerative clustering performed on 39.7K female fragrances.

This project is part of my fragrance exploration series:

  1. K-means++ clustering on fragrance accords
    https://github.com/katarzynajanicka/fragrance-clustering
  2. Agglomerative hierarchical clustering on 39.7K female fragrances
    https://github.com/katarzynajanicka/agglomerative-fragrance-clustering
  3. Accords-based recommendation system for female fragrances
    https://github.com/katarzynajanicka/fragrance-finder

Technologies

Project is created with Python - version: 3.8.2.

Python libraries:

  • scipy - version 1.5.2
  • scikit-learn - version 0.23.2
  • pandas - version 1.1.1
  • numpy - version 1.19.2
  • matplotlib - version 3.3.1
  • seaborn - version 0.11.0

Setup

Input data: result.csv, this is the end result of the https://github.com/katarzynajanicka/fragrance-clustering project.

Output data:

  • hierarchical-clustering.ipynb (Jupyter notebook)
  • hierarchical_result.csv (end result)

Results

Project structure

Data structure

There are 39.7K rows. Each observation is a unique female fragrance.

Fields:

  • brand - name of the brand
  • title - name of the fragrance
  • date - release date (in YYYY format)
  • rating_score - fragrance rating
  • votes - number of votes cast for a scent
  • accords - top five notes

Dendrograms

Hierarchical clustering

Cluster description by top accords

Fragrance tree

Most popular fragrances

Most popular fragrances by cluster

Most popular fragrances by brand

Final thoughts

Agglomerative hierarchical clustering technique turned out be a better approach than K-means++ clustering (see: https://github.com/katarzynajanicka/fragrance-clustering). This is due to the fact that different perfume fragrances usually share the same notes. It is not unusual for a fragrance to have accords from two or three fragrance families (Floral, Fresh, Woody, Oriental).

Status

Project is finished.