In [None]:
import collections
import json

import folium
import folium.plugins
import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import plotly.express as px
import seaborn as sns


In [None]:
# Set font to Helvetica in matplotlib, seaborn and plotly
plt.rcParams["font.family"] = "Helvetica"

# Table of contents

1. Motivation
    1. The datasets
    2. Why these dataset?
    3. Goals
2. Data cleaning and preprocessing
    1. Data cleaning
    2. Data filtering
    3. Data preprocessing
    4. Fundamental visualizations
3. Explorative Data Analysis
4. Genre
5. Visualization
6. Discussion
    1. Improvements and limitations
    2. Further work
7. Contributions
8. References

# Motivation

## 1.1: The Datasets
**RateBeer Datasets**

Below are descriptions of each attribute contained in the RateBeer datasets. These datasets include details about various beers, their breweries and locations, ratings, reviews, users, location of users etc.


## `beers.csv`

- `beer_id`: The unique identifier for each beer.
- `beer_name`: The name of the beer, which is also used in the reviews.
- `brewery_id`: The unique identifier of the brewery (also used in the reviews).
- `brewery_name` : The name of the brewery that 
- `style`: a score that ranks this beer against all beers within its own style category.
- `nbr_ratings` : The number of ratings a beer received.
- `avg`: The average rating of a beer from 0 to 5.
- `abv`: The alcohol content of the beer (percentage).
- `overall_score`: a score that reflects the rating given by RateBeer users and how this beer compares to all other beers on RateBeer. (out of 100)
- `style_score` : a score that ranks this beer against all beers within its own style category.

> These two scores are calculated only from ratings that are accompanied with a written review of 75 or more characters. A rating doesn't count toward the final rating if the rater has left fewer than 10 ratings, if it is is deemed unauthentic, derogatory or abusive or if the rating was made by a brewer or brewer affiliate. Reference [here](https://www.ratebeer.com/our-scores)


---

## `breweries.csv`

- `id` : the brewery's id which corresponds to the id *brewery_id* from `beer.csv`
- `location` : the location of the brewery
- `name` : the name of the brewery
- `nbr_beers` : the number of beers that brewery produces

---

## `users.csv`

- `user_id` : the user_id 
- `nbr_ratings` : the number of ratings the reviewer has put on the website
- `user_name` : the username
- `joined` : the date when the user joined the websites
- `location` : the user's location

--- 

## `reviews.csv`

- `beer_name` and `beer_id`: The name and unique identifier of the beer which corresponds to the ones in `beer.csv`
- `brewery_name` and `brewery_id`: The name and identifier of the brewery which corresponds to ones in `brewery.csv`.
- `style`: The type of beer, categorized into one of the beer styles (104 styles)
- `abv`: The alcohol by volume percentage of the beer (%).
- `date`: When the review was posted.
- `user_name` and `user_id`: The username and identifier of the reviewer, matching the `user.csv`.
- `text`: The written review provided by the user.

### Rating System

RateBeer employ a detailed rating system where users evaluate several aspects of the beer. The rating is composed of the following attributes:

- `appearance`
- `aroma`
- `mouthfeel`
- `taste`
- `overall`

On RateBeer, *Appearance* and *Mouthfeel* are each scored out of 5. *Aroma* and *Taste* are scored out of 10. While *Overall* is scored out of 20. These all combine to give the beer a total score out of 50, which is then divided and displayed as a score out of 5 for each rating. Reference [here](https://www.ratebeer.com/our-scores).