In [1]:
from src.data.datasets import *
from src.scripts import notation_system

In [2]:
DATA_FOLDER = "data/"
GENERATED_FOLDER = "generated/"
rb_dataset = RateBeerDataset(data_folder=DATA_FOLDER, generated_folder=GENERATED_FOLDER, from_raw=False)
ba_dataset = BeerAdvocateDataset(data_folder=DATA_FOLDER, generated_folder=GENERATED_FOLDER, from_raw=False)

In [3]:
data_rb = rb_dataset.get_data()
data_ba = ba_dataset.get_data()

# Notation system

In [None]:
fig1, fig2 = notation_system.fig_notation_system(data_ba["reviews"], data_rb["reviews"])

### The Importance of the Rating System in Interpreting Beer Ratings

The rating system is essential for understanding the scores given to a beer. Notably, the same overall rating on BeerAdvocate and RateBeer does not reflect identical scores across different topics. In theory, if a user provides the same scores and descriptions for a beer on both platforms, the overall score would still differ between the two sites.

The topics considered in these ratings are as follows: **"appearance," "aroma," "palate," "taste," and "overall."** Concretely, the overall score is a weighted average of these topics' ratings. Our objective is to determine these weights.

This relationship can be analyzed through a **linear regression (without intercept)**. Formally, we aim to find the coefficients $(\beta_1, \beta_2, \beta_3, \beta_4, \beta_5)$ such that:

$
\text{rating} = \beta_1 \cdot \text{appearance} + \beta_2 \cdot \text{aroma} + \beta_3 \cdot \text{palate} + \beta_4 \cdot \text{taste} + \beta_5 \cdot \text{overall}
$

### Normalization

It is crucial to **normalize** the topic scores so that all values fall between 0 and 1.

### Results

We find the following $\beta\$ parameters for the topics:

In [None]:
fig1.show()

### Analysis of BeerAdvocate and RateBeer Notation System

BeerAdvocate and RateBeer assign similar weights to **appearance**, **aroma**, and **palate** (approximately 6%-9%, 20%-24%, and 10%, respectively). These criteria contribute comparably to the overall rating on both platforms.

However, significant differences emerge for **taste** and **overall**:  
- BeerAdvocate gives much greater weight to **taste** (40%) compared to RateBeer (20%), indicating that the perception of taste is crucial for high ratings on BeerAdvocate.  
- Conversely, RateBeer assigns double the weight to **overall** (42%) than BeerAdvocate (20%), emphasizing a more general, holistic evaluation.

### Implications:  
- **BeerAdvocate** prioritizes sensory aspects like **taste**, making it key for achieving high ratings.  
- **RateBeer** takes a broader approach, with the **overall** impression being the most critical factor.

This distinction helps explain why the same beer may receive different scores across the two platforms, even with similar topic-level ratings.


The linear regressions yield an \(R^2\) of **1.00** for both platforms. This indicates that the ratings across the five topics explain **100%** of the overall rating. Below are the predicted ratings compared to the real ratings:


In [None]:
fig2.show()

### Conclusion

The results of the linear regression confirm the validity of the topic weightings found for both BeerAdvocate and RateBeer. With an \(R^2\) of **1.00** for both platforms, it is evident that the five topics (**appearance**, **aroma**, **palate**, **taste**, and **overall**) fully determine the final rating. The coefficients reflect the relative importance of each topic, highlighting the key differences between the platforms: BeerAdvocate prioritizes **taste**, while RateBeer emphasizes the **overall** impression. These findings provide a clear understanding of how each platform evaluates beer ratings and the factors that influence their scoring systems.
