# Analyze Buc-ee's Ratings

In this notebook we will analyze the ratings of Buc-ee's locations. We will look at the distribution of ratings, the number of ratings per location, and the average rating per location.

In [47]:
# For data manipulation and analysis we import pandas
import pandas as pd

# We will use plotly for visualizations
import plotly.express as px
import plotly.graph_objects as go

In [48]:
# Import the data
buc_ees_ratings_df = pd.read_csv('../output/buc-ees_directory_with_ratings.csv')

buc_ees_ratings_df.head()

Unnamed: 0,name,address,directions,rating,total_ratings
0,"#57 – Athens, AL","2328 Lindsay Lane South Athens, AL 35613",https://www.google.com/maps/search/2328 Lindsa...,4.4,57.0
1,"#43 – Leeds, AL","6900 Buc-ee’s Blvd. Leeds, Alabama 35094",https://www.google.com/maps/search/6900 Buc-ee...,4.4,1270.0
2,"#42 – Loxley, AL","20403 County Rd. 68 Robertsdale, Alabama 36567",https://www.google.com/maps/search/20403 Count...,4.0,2050.0
3,"#47 – Daytona Beach, FL","2330 Gateway North Drive Daytona Beach, FL 32117",https://www.google.com/maps/search/2330 Gatewa...,4.4,1596.0
4,"#46 – Saint Augustine, FL","200 World Commerce Pkwy Saint Augustine, Flori...",https://www.google.com/maps/search/200 World C...,4.4,1198.0


Let's clean up the data a little.

In [49]:
# Let's rename the "directions" column to "google_maps_url"
buc_ees_ratings_df.rename(columns={'directions': 'google_maps_url'}, inplace=True)

# Let's also sort the dataframe by rating
buc_ees_ratings_df.sort_values(by='rating', ascending=False, inplace=True)

Let's do some general analysis.

In [50]:
print(f'There are {len(buc_ees_ratings_df)} Buc-ee\'s locations in the dataset.')

# Filter the dataframe to only include locations with 100 total_ratings or more
buc_ees_ratings_df = buc_ees_ratings_df[buc_ees_ratings_df['total_ratings'] >= 100]

print(f"But because we want each location to have at least 100 total ratings, we are left with {len(buc_ees_ratings_df)} locations.")

There are 44 Buc-ee's locations in the dataset.
But because we want each location to have at least 100 total ratings, we are left with 33 locations.


With cleaning out of the way, let's do some analysis.

Let's look at the distribution of ratings.

In [51]:
# Let's create a histogram of the ratings
fig = px.histogram(buc_ees_ratings_df, x='rating', nbins=20, title='Distribution of Buc-ee\'s Ratings')
fig.show()

In [52]:
print(f'The average rating for Buc-ee\'s is {buc_ees_ratings_df["rating"].mean():.1f}.')

The average rating for Buc-ee's is 4.2.


Now lets find the highest and lowest rated locations.

In [53]:
# Use the head() method to find the three highest rated Buc-ee's locations
buc_ees_ratings_df.head(3)


Unnamed: 0,name,address,google_maps_url,rating,total_ratings
32,"#44 – Melissa, TX","1550 Central Texas Expressway Melissa, Texas 7...",https://www.google.com/maps/search/1550 Centra...,4.7,4753.0
35,"#20 – Pearland, TX","11151 Shadow Creek Pky Pearland, Texas 77584",https://www.google.com/maps/search/11151 Shado...,4.5,1122.0
7,"#55 – Richmond, KY","1013 Buc-ee's Boulevard Richmond, Kentucky 40475",https://www.google.com/maps/search/1013 Buc-ee...,4.4,940.0


In [54]:
# Reverse the order of the dataframe so that the lowest rated locations are at the top
buc_ees_ratings_df.sort_values(by='rating', ascending=True, inplace=True)

# Use the head() method to find the three lowest rated Buc-ee's locations
buc_ees_ratings_df.head(3)

Unnamed: 0,name,address,google_maps_url,rating,total_ratings
22,"#7 – Freeport, TX","4231 E. Hwy 332 Freeport, Texas 77541",https://www.google.com/maps/search/4231 E. Hwy...,3.4,129.0
28,"#29 – Lake Jackson, TX","598 Hwy 332 Lake Jackson, Texas 77566",https://www.google.com/maps/search/598 Hwy 332...,3.6,162.0
37,"#31 – Richmond, TX","1243 Crabb River Rd Richmond, Texas 77469",https://www.google.com/maps/search/1243 Crabb ...,3.7,230.0


Let's create a scatter plot of the number of ratings per location and the average rating per location.

In [61]:
# Create scatter plot of ratings vs. total_ratings. Include the location name AND address in the hover text.
fig = px.scatter(buc_ees_ratings_df, x='total_ratings', y='rating', hover_name='name', hover_data=['address'], title='Buc-ee\'s Ratings vs. Total Ratings')

# Have the middle of the plot be at the average rating and total_ratings
fig.update_layout(xaxis_range=[0, buc_ees_ratings_df['total_ratings'].max()], yaxis_range=[buc_ees_ratings_df['rating'].min(), buc_ees_ratings_df['rating'].max()])

# Add a line to show the average rating
fig.add_shape(type='line', x0=0, x1=buc_ees_ratings_df['total_ratings'].max(), y0=buc_ees_ratings_df['rating'].mean(), y1=buc_ees_ratings_df['rating'].mean(), line=dict(color='red', width=2))

# Add a line to show the average total_ratings
fig.add_shape(type='line', x0=buc_ees_ratings_df['total_ratings'].mean(), x1=buc_ees_ratings_df['total_ratings'].mean(), y0=buc_ees_ratings_df['rating'].min(), y1=buc_ees_ratings_df['rating'].max(), line=dict(color='red', width=2))

fig.show()