Skip to content

An exploratory data analysis on a book review data with additional meta data.


Notifications You must be signed in to change notification settings


Repository files navigation


EDA Amazon Books Reviews

An exploratory analysis of the Amazon Book Reviews. An expansion to this project is carried out using Collaborative Filtering models from the research: A Comparative Analysis of Amazon Book Ratings Using Collaborative Filtering.

alt text

◘ Introduction

The general idea behind this research is to identify and establish a pattern or a set of patterns and relationships among a plethora of features available within the acquired data. A thorough analysis is performed by using a multitude of tools and packages using Python so that a set of statistical and/or machine learning models can be applied to accomplish better generalization.

◘ Study Flowchart

alt text

◘ Project Organization

├── Makefile          				<- Makefile with commands.
├──             	<- The top-level README for developers using this project.
├── data
├── features                		<- Set of files to construct a more readable and useable data.
|   |──		<- Script that filters through the data and keeps important features.
|   |──   <- File where data is cleaned and visually analyzed for its distribution.
|   |──    <- Script to observe and search for patterns and relationship among various features.
|   └──                     <- Text from the review features are processed and analyzed for better intuition.
├── figures            				<- Generated graphics and figures to be used in reporting (includes IDE and Notebooks generated graphs).
├── notebooks          			<- Additional script for Jupyter Notebooks for better visualization.
├── requirements.txt    		<- The requirements file for reproducing the analysis environment, e.g.
│                         				    generated with `pip freeze > requirements.txt`
├──           			<- makes project pip installable (pip install -e .) so src can be imported
├── visualization           		<- Create exploratory and results oriented visualizations.
|   |──		<- Script to better facilitate abstractions for generating simple graphs.
|   └──		<- Can be used to generate a more specific type of graph to be utilized during data inspection.
└── tox.ini            				<- tox file with settings for running tox; see

◘ Modules Required:

  • pandas 2.0.0
  • plotly 5.15.0
  • missingno 0.5.2
  • vaderSentiment 3.3.2
  • spacy 3.5.3
  • matplotlib 3.7.1
  • seaborn 0.12.2
  • wordcloud 1.9.2



No releases published


