Skip to content

jjsmu/group16

Repository files navigation

Introduction

Letter to the CEO and CFO of Budweiser,

We extend our sincere gratitude for entrusting us with this comprehensive data set encompassing a diverse range of beers and breweries. Our team has meticulously analyzed the information, employing advanced statistical techniques, while ensuring the findings are explained in a manner that aligns with your esteemed company profile.

In our journey through the data, we have delved into the fine details of beer varieties, exploring attributes such as Alcohol By Volume (ABV), International Bitterness Units (IBU), and Styles. Our focus has been on drawing meaningful insights that define the conventional understanding of beer categorizations, specifically focusing on Ale and IPA style beers.

We have utilized robust machine learning methods such as the k-Nearest Neighbors algorithm, ensuring the accuracy and reliability of our findings. Additionally, we have employed k-Fold Cross Validation to validate our models, ensuring our results are reproducible.

Our analysis has been presented through clear, concise visualizations, helping you understand the trends and correlations. We are confident that our analysis will serve as an assets in your strategic decision-making processes, enhancing Budweiser's position in the competitive landscape of the brewing industry.

We look forward to discussing these findings in detail, and exploring how these insights can be leveraged to propel Budweiser to new heights of success and innovation.

Best Regards,

Group 6

Presentation for the CEO and CFO of Budweiser (group1-6)

Purpose: Budweiser Casestudy 01

  • This case study examines the distributions and center of mean and median values for all beers sold by breweries listed in the dataset. The two datasets: Beers.csv and Breweries.csv are combined to study the relationship between alcohol by content and bitterness to understand the current trends in the alcohol beverage industry in the United States. A KNN machine learning model is implemented for classification and regression tasks to predict the beer style by it's alcohol and bitterness content. Finally, a k-fold cross validation is introduced to ensure the validity of our model. In essence, the purpose of this casestudy is to analyze the current market segment, classify and predict beer style trends regionally across the US.

Summary Memo: Analysis of Breweries and Beer Data

Objective: The analysis aims to provide insights into the breweries' distribution, beer characteristics, and the relationship between beer bitterness and alcohol content.

Breweries Distribution: We will determine the number of breweries present in each US state.

Beer Characteristics by State: We'll compute and visualize the median alcohol content (ABV) and bitterness (IBU) for beers in each state using a bar chart.

State with Extreme Beer Traits: Identification of states producing the most alcoholic and bitter beers will be determined.

ABV Analysis: An analysis and commentary on the distribution and summary statistics of the ABV variable will be provided.

Bitterness vs. Alcohol Content: A scatter plot will be drawn to examine the relationship between beer bitterness (IBU) and its alcohol content (ABV). An interpretative analysis will follow.

KNN Classification for IPAs vs. Other Ales: Using the KNN classification method, we'll investigate the differences in IBU and ABV between IPAs (India Pale Ales) and other types of Ale.

Additional Insights: In a bid to provide Budweiser with a unique value proposition, we will derive an additional inference from the data (in regards to naming conventions and size of beers,) to provide Budweiser with a thorough understanding of the U.S. craft beer landscape, guiding future strategic decisions

Summary memo written with aid of Chat-GPT

Code

Codebook For Budweiser Case study

OS

  • R version 4.3.1 (2023-06-16 ucrt)
  • Platform: x86_64-w64-mingw32/x64 (64-bit)
  • Running under: Windows 10 x64 (build 19045)

Data

  • Data sets

  • Data sets obtained from Client: Beers & Breweries

  • Also saved in this repository as of October 9, 2023: Beers & Breweries

  • The datasets provided consist of information on 2410 US craft beers and 558 US breweries with details such as beer name, ABV, IBU, and brewery location.

Data Dictionary for Beer:

Variable Label Data Type Missing Data Code Example Values
Name Name of Beer String empty cells Blood Orange Gose, Summer Solstice Cerveza Crema (2009)
Beer_ID unique identifier for beer Integer empty cells 35,767,1712
ABV Indicated alcohol by volume (percentage) Decimal empty cells 0.09,0.125
IBU International Bitterness Units (percentage) Integer empty cells 92,17,4
Brewery_id unique identifier for brewery Integer empty cells 409,2,73
Style Style of beer Nominal empty cells 409,2,73
Ounces amount of beer in indivisual can/bottle Integer empty cells 12,8.4,16

Data Dictionary for Brewery:

Variable Label Data Type Missing Data Code Example Values
Brew_ID unique identifier for brewery Integer empty cells 1,20,558
Name Name of Brewery String empty cells NorthGate Brewing, Avery Brewing Company
City City where brewery is located String empty cells Seven Points, Portland
State U.S. state (abbreviation) where brewery is located String empty cells MN,MI, CO

Data preparation

  • Data Merging: The beer data will be merged with the breweries' data.

  • Handling Missing Data: We will address the missing values across all columns for the purpose of this analysis, we treaded the missing data as MCAR (Missing Completely At Random). Please note, many of the missing values pertained to IBU. Filtering is addressed in the EDA. rmd file

  • Key variable: The Key variable that connects the data sets is Brewery_id in Beer.csv and Brew_ID in Brewery.csv. We renamed variables to have matching column names in order to merge.

  • Refer to EDA.rmd file for code relating to merging, transformation and analysis.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages