# Introduction 


As someone who has been coding in R for years, I slowly started to explore coding with python as well. I wanted to share this notebook to "get my hands dirty" with some EDA in python. While I love R and the R programming language communiuty, it never hurts to learn something new! 

So let's start by reading in some of the modules I will use:

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
import matplotlib.pyplot as plt
import seaborn as sns
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))


### And let's read in the data

In [None]:
beerRaw = pd.read_csv("../input/craft-cans/beers.csv")
breweriesRaw = pd.read_csv("../input/craft-cans/beers.csv")

# Explore

I will first merge the two dataframes into a single dataframe called `beerDf`. I will use `merge()` on `brewery_id`.

In [None]:
print(list(beerRaw.columns.values))

print(list(breweriesRaw.columns.values))

# Adding index to breweriesRaw
breweriesRaw['brewery_id'] = breweriesRaw.index


beerDf = pd.merge(beerRaw, breweriesRaw, how = "outer")

print(beerDf.head)

print(list(breweriesRaw.columns.values))

Now I want to take a quick look at some descriptive statistics on the dataset

In [None]:
# What columns are in our dataset? 

print(beerDf.columns.values)

In [None]:
# Digging a little deeper
print(beerDf.ibu.describe())

In [None]:
# Describe with grouping
beerDf.groupby(by = "style").describe()

With `groupby()`we can see the descriptive statistics for each style. 

In [None]:
# Counts of the breweries in the dataset 
beerDf['name'].value_counts()


In [None]:
# What are the most common styles?

beerDf['style'].value_counts()[:15]

Hefeweizens are my favorite beer. I will subset and take a quick glance at some of the characteristics

In [None]:
hefeDf = beerDf[beerDf['style'] == "Hefeweizen"]

hefeDf

In [None]:
# Average ABV of a hefeweizen
hefeDf.abv.mean()

In [None]:
# Median ABV of a hefeweizen
hefeDf.abv.median()

In [None]:
# Average IBU of a hefeweizen 
hefeDf.ibu.mean()

In [None]:
# Median IBU of a hefeweizen
hefeDf.ibu.median()

In [None]:
hefeDf.describe() # Same info with describe()

# Visualizations

We can also make some visualizations to get a better sense of our data

In [None]:
beerDf['name'].value_counts()[:20].plot(kind = "bar")

plt.title("Count of Brewers")

plt.show()


In [None]:
plt.scatter("ibu", "abv", data = beerDf)

plt.title("IBU vs. ABV")

In [None]:
beerDf['style'].value_counts()[:15].plot(kind = "bar", color = "red")

plt.title("Most Common Beer Styles")

In [None]:
# Let's also see how the abv is distributed

plt.hist("abv", 8, data = beerDf, alpha = 0.8)

plt.title("Distribution of ABV")

In [None]:
plt.hist("ibu", 11, data = beerDf, alpha = 0.8, color = "green")

plt.title("Distribution of IBU")

# Conclusion

Not bad for an R user! We learned a little but more about this dataset through EDA and reinforced what we saw descriptively with some visualizations. 

For example, we read that the mean IBU score was **42.7**, but when we look at the distribution of IBU, it looks like a bi-modal distribution. 

We also learned that the most common beer style in this dataset is **American IPA**

I hope to continue learning more ways to do data analysis in both R and python. But this has been a fun way to get more intimate with python code. 

###