Skip to content

mathicard/DSE-AMD

Repository files navigation

Market-Basket Analysis using “Old Newspapers” dataset

Algorithms for Massive Data final project - Università degli Studi di Milano

Authors: Mathias Cardarello Fierro & Lorenzo Polli

This project aims at investigating techniques generally used to conduct market-basket analysis over huge datasets in order to find frequent itemsets. In this specific case, the dataset is taken from the public repository Kaggle and contains more than 16 million of paragraphs about old newspapers. Although the newspapers were written in 67 different languages, for the scope of the research, the analysis is conducted over English newspapers only. In total, a subset of more than 1 million articles published between the years 2005 and 2012 were analyzed, using algorithms for massive datasets.

This repository contains the following files:

  • Final report: AMD_Market_basket_Analysis_Cardarello_Polli.pdf
  • Google Colab notebook with the applications of Apriori and SON algorithms: A_Priori_&SON_Market_basket_analysis(eng_newspapers).ipynb
  • Google Colab notebook with the application of FP growth algorithm: FP_growth_Market_basket_analysis_(eng_newspapers).ipynb

About

Algorithms for Massive Data final project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published