<h1 style='text-align:center;font-weight:bold;color:orange'></h1>
<h2 style='text-align:center; color:darkblue'><b>Turning Criticism into Opportunity: Proactive Strategies to Tackle Negative Feedback at Starbucks</b></h1>
<h5 style="text-align: center;">Shafanda Nabil Sembodo</h2>

In [8]:
# Importing necessary libraries
import pandas as pd 
import warnings 

# Hide warning issues to prevent unnecessary output clutter
warnings.filterwarnings("ignore")

# Setting the maximum column width for pandas display to 100 characters
pd.options.display.max_colwidth = 100

<a id="numerical"></a>
## <b><span style='color:darkturquoise'>Section 1 |</span><span style='color:darkblue'> Business Understanding</span></b>

____

In any data-driven project, a deep understanding of the business context is crucial to ensure that the solutions address the most pressing challenges effectively. For Starbucks, customer feedback is essential, as it directly impacts brand reputation, customer satisfaction, and long-term profitability. Negative reviews, particularly those with low ratings, can significantly influence customer retention and acquisition costs. Without a clear comprehension of the underlying business dynamics, it becomes difficult to identify meaningful insights, define relevant objectives, or develop actionable strategies. In this section, we will explore the urgency of establishing a robust business understanding to guide the analysis of Starbucks reviews and maximize its impact on improving customer satisfaction and operational performance.

<a id="basic"></a>
### <b><span style='color:darkblue'> 1.1 Business Context</span></b>

Starbucks is a global leader in the coffee industry, recognized for its high-quality products and premium customer experiences. However, even industry leaders face challenges, and recent analysis indicates a trend of low customer ratings and negative feedback across multiple locations in the United States. These ratings often stem from customer dissatisfaction with factors such as service quality, product consistency, or ambiance. Negative feedback, especially when aggregated, can lead to tangible business risks, such as:

A decrease of just 1 star in average ratings can result in a 5–10% drop in revenue, underscoring the critical role of customer feedback in influencing purchasing decisions. Studies have shown that customers heavily rely on online reviews and ratings when choosing where to dine or purchase beverages, making negative feedback a significant threat to sales performance. Moreover, poor reviews gradually erode brand reputation, diminishing customer trust and loyalty over time, which are essential for sustaining long-term success. In a highly competitive coffee market with numerous alternatives, failing to address customer concerns effectively amplifies the risk of customer attrition, as dissatisfied patrons are more likely to shift their loyalty to competitors that prioritize their needs.

By turning criticism into actionable insights, Starbucks has an opportunity to convert negative experiences into areas of improvement, strengthening its market position and ensuring long-term customer satisfaction.

<a id="basic"></a>
### <b><span style='color:darkblue'> 1.2 Business Task</span></b>

The primary objective is to develop a data-driven strategy to analyze and address low customer ratings across U.S. Starbucks locations. This involves:

1. Analyzing historical ratings and reviews to identify recurring themes and patterns of dissatisfaction (e.g., service speed, product quality, or location cleanliness).
2. Segmenting feedback by location, time, and category (e.g., beverages, food, or ambiance) to pinpoint the specific areas with the highest impact on ratings.
3. Quantifying the relationship between specific issues highlighted in reviews and overall customer ratings. For instance, leveraging text analytics to extract insights from reviews and correlating these with numerical ratings.
4. Proposing specific and measurable actions to improve customer satisfaction, such as enhancing employee training, improving supply chain reliability, or introducing quality assurance mechanisms.

Subsequently, the insights will be translated into actionable strategies to mitigate negative feedback and improve average ratings within a year. This involves targeting critical improvement areas such as staff training, product quality, or service redesign, depending on the results of the analysis.

<a id="basic"></a>
### <b><span style='color:darkblue'> 1.3 Success Metrics</span></b>

To evaluate the effectiveness of the proposed strategies, the following measurable metrics will be tracked:

**1. Rating Improvement**

* Target: Achieve a 0.5-point increase in average ratings for underperforming locations within the next 6 months.
* Measurement: Initial ratings vs. ratings after implementing improvement strategies.
* To calculate the percentage increase in average ratings:

$$\text{Rating Increase (\%)} = \frac{\text{New Average Rating} - \text{Initial Average Rating}}{\text{Initial Average Rating}} \times 100$$

Where:
- New Average Rating: The updated average rating after implementing improvements.
- Initial Average Rating: The average rating before any changes or interventions.


**2. Reduction in Negative Reviews**

* Target: Decrease the proportion of reviews with negative sentiment (e.g., scores ≤ 2) by 25% within six months.
* Measurement: Percentage of negative reviews pre- and post-strategy implementation using sentiment analysis tools.
* The formula to calculate the percentage of negative reviews is:

$$\text{Negative Review Percentage} = \left( \frac{\text{Negative Reviews}}{\text{Total Reviews}} \right) \times 100$$

Where:
- Negative Reviews: The total number of reviews with low ratings (e.g., 1 or 2 stars).
- Total Reviews: The total number of all customer reviews.

By achieving these measurable outcomes, Starbucks can transform customer criticism into actionable opportunities for sustained improvement and growth.

<a id="numerical"></a>
## <b><span style='color:darkturquoise'>Section 2 |</span><span style='color:darkblue'> Data Examination</span></b>

____

Data examination and preparation are foundational steps that ensure the Starbucks review dataset is clean, consistent, and meaningful. For the Starbucks review dataset, these processes are particularly vital as they uncover hidden patterns, handle missing or inconsistent data, and transform raw information into actionable insights. Without a thorough examination and proper preparation, even the most sophisticated analysis may yield unreliable or misleading results, ultimately hindering efforts to improve customer satisfaction and address negative feedback effectively.

<a id="basic"></a>
### <b><span style='color:darkblue'> 2.1 Dataset Dictionary</span></b>

The dataset provides detailed information about customer reviews and ratings of Starbucks locations in the United States. This structured data is essential for analyzing customer feedback, identifying patterns, and developing strategies to improve ratings and customer satisfaction. Please note that whether all variables will be used for data analysis and machine learning modeling will be determined based on findings during data exploration. 


In [4]:
# Import the dataset from a CSV file into a pandas DataFrame
data = pd.read_csv('../data/data.csv')

# Display a random sample of 3 rows from the dataset
data.sample(3)

Unnamed: 0,name,location,Date,Rating,Review,Image_Links
638,Annonymouslydiscriminated,"Brooklyn, NY","Reviewed Sept. 5, 2012",1.0,"I live and worked in Soho, NYC for years. I lived around the corner of Spring Street. I began to...",['No Images']
775,Lisa,"Palo Alto, CA","Reviewed Aug. 13, 2009",,"I ordered a 2% Grande Latte this morning, Thursday, 8/13 at the Menlo Park, CA store on Santa Cr...",['No Images']
335,Susan,"Sag Harbor, New York","Reviewed Aug. 25, 2017",3.0,"Their coffee is somewhat bitter and is very overpriced. In addition, I would like to see more gl...",['No Images']


Below is the dataset dictionary that defines the columns and their significance:

- `Name`: The name or username of the customer who provided the review.
- `Location`: The Starbucks branch or location where the review was provided.
- `Date`: The date when the review was posted.
- `Rating`: The numeric rating given by the customer, typically on a scale of 1 to 5.
- `Review`: The textual feedback provided by the customer.
- `Image_Links`: Links to images associated with the reviews, if available.

<a id="basic"></a>
### <b><span style='color:darkblue'> 2.2 Data Condition: Quality and Representativeness</span></b>

Unfound and unresolved issues around data quality will undermine the trust and accuracy of the work results, we built a function to perform a basic sanity check which will include column name, row count, data type, missing value rate, duplicate rate, unique value, and sample.

In [6]:
# Define a function to inspect the dataframe and return a summary
def inspect_dataframe(df):
    # Print the number of rows and columns in the dataframe
    print(f'The dataframe contains {df.shape[0]} rows and {df.shape[1]} cols.')
    
    # Create a summary dictionary containing:
    # - Column names
    # - Data types of each column
    # - Percentage of missing values per column
    # - Percentage of duplicate rows in the dataframe
    # - Sample unique values for each column
    # - Count of unique values per column
    summary = {
        'ColumnName': df.columns.values.tolist(),
        'DataType': df.dtypes.values.tolist(),
        'NAPct': (df.isna().mean() * 100).round(2).tolist(),  # NaN percentage per column
        'DuplicatePct': (df.duplicated().sum()/len(df)*100).round(2),  # Percentage of duplicate rows
        'Sample': [df[col].unique() for col in df.columns],  # Sample of unique values per column
        'UniqueValue': df.nunique().tolist()  # Number of unique values per column
    }
    
    # Return the summary as a DataFrame
    return pd.DataFrame(summary)

# Call the function to inspect the dataframe and output the summary
inspect_dataframe(data)

The dataframe contains 850 rows and 6 cols.


Unnamed: 0,ColumnName,DataType,NAPct,DuplicatePct,Sample,UniqueValue
0,name,object,0.0,0.12,"[Helen, Courtney, Daynelle, Taylor, Tenessa, Alyssa, ken, Nikki, Alex, Sunny, Breggetta, Shannon...",604
1,location,object,0.0,0.12,"[Wichita Falls, TX, Apopka, FL, Cranberry Twp, PA, Seattle, WA, Gresham, OR, Sunnyvale, TX, Spri...",633
2,Date,object,0.0,0.12,"[Reviewed Sept. 13, 2023, Reviewed July 16, 2023, Reviewed July 5, 2023, Reviewed May 26, 2023, ...",741
3,Rating,float64,17.06,0.12,"[5.0, 1.0, 2.0, 3.0, 4.0, nan]",5
4,Review,object,0.0,0.12,[Amber and LaDonna at the Starbucks on Southwest Parkway are always so warm and welcoming. There...,814
5,Image_Links,object,0.0,0.12,"[['No Images'], ['https://media.consumeraffairs.com/files/cache/reviews/starbucks_804950_thumbna...",47


**Key Takeaways**
- The dataset comprises 850 rows and 6 columns, with 3 object columns, 1 date column, and 1 numerical column.
- **Name:** There is no missing data (0.00%), ensuring full customer identification. However, there is a slight presence of duplicated customer names (0.12%). The relatively low duplication rate indicates a diverse set of reviewers, contributing to varied perspectives. However, ensuring that each customer name is unique will help avoid redundant analysis and improve the quality of customer insights.
- **Location:** This column is complete with no missing values (0.00%), suggesting that all reviews include location details. A diversity of 633 unique locations indicates the wide geographic distribution of reviews, providing insights across various Starbucks outlets. The small duplication percentage suggests that review distribution is well spread, but further checks may be required to ensure accurate location-tagging, especially if locations have similar names or were incorrectly labeled.
- **Date:** There is no missing data (0.00%), which is crucial for performing temporal analysis. The column shows a small duplication percentage (0.12%), suggesting that multiple reviews may have been submitted on the same date. The diversity in review dates (741 unique dates) implies a consistent flow of customer feedback over time, allowing for trend analysis. This also provides an opportunity to track seasonal or time-based patterns, such as peak review periods or specific events leading to customer dissatisfaction.
- **Rating:** This column has a significant amount of missing data (17.06%), with some instances of NaN values, which is important to address during data cleaning. This missing data needs to be handled carefully to avoid skewing analysis, possibly through imputation or exclusion. The ratings scale appears to range from 1.0 to 5.0, with a limited set of unique ratings (5 unique values). The prevalence of NaN values could indicate missing or incomplete reviews, which may reflect customers skipping the rating section. Handling these missing values will be crucial for accurate performance evaluation and customer sentiment analysis.
- **Review:** There is no missing data (0.00%), ensuring that all reviews contain textual feedback. However, a small percentage of duplicate reviews (0.12%) is present. With 814 unique reviews, the data provides a rich source of customer sentiment and detailed feedback. The variety of comments can help uncover recurring themes, common complaints, or areas of improvement. Text analysis techniques like sentiment analysis can extract valuable insights from this column to understand customer satisfaction.
- **Image Link:** This column is almost complete, with a small number of No Images entries indicating that some reviews do not include images. However, the percentage of duplicate entries is relatively low (0.12%). A total of 47 unique image links indicates a relatively low number of image-based reviews compared to textual feedback. These images can provide additional context to reviews, particularly for evaluating product presentation, ambiance, or store conditions. This column can be analyzed for visual sentiment or quality assurance purposes, enhancing the review analysis.

<a id="basic"></a>
### <b><span style='color:darkblue'> 2.3 Data Size: Amount and Range</span></b>

Once we have established an understanding of the different types & quality of data, we can switch our examination towards the shape and size of this data, looking at the quantitative attributes across all variables. Statistical methods will assist in describing further physical characteristics. 

In [26]:
# Convert the Rating column to categorical
data['Rating'] = pd.Categorical(data['Rating'])

# Get statistical summary for categorical var
data.describe().round(2).transpose()

Unnamed: 0,count,unique,top,freq
name,850.0,604.0,Linda,13.0
location,850.0,633.0,"New York, NY",14.0
Date,850.0,741.0,"Reviewed Sept. 14, 2017",4.0
Rating,705.0,5.0,1.0,451.0
Review,850.0,814.0,No Review Text,37.0
Image_Links,850.0,47.0,['No Images'],804.0


**Key Takeaways**

The provided dataset offers an overview of starbuck review dataset, with detailed frequency distribution and categorical insights. Here’s a breakdown of key observations:

- **Name:** The frequency of the name "Linda" (13 times) suggests that certain customers may be more engaged or frequent in providing feedback. This can be valuable for customer segmentation analysis to understand loyalty or repeat review behavior.
- **Location:** The concentration of reviews from "New York, NY" (14 reviews) indicates that some locations are more likely to be represented, possibly due to a larger customer base or more frequent visits to Starbucks in that area. This could be explored further to identify regional trends in customer feedback.
- **Date:**  The review date "Sept. 14, 2017" appearing 4 times is quite low, indicating that the reviews are relatively spread out over time. This dispersion suggests that the dataset might cover a wide period, which could allow for trend analysis across different times (e.g., seasonal changes, yearly patterns, or special promotions).
- **Rating:** The dominance of the "1.0" rating (451 occurrences) suggests a significant number of customers had very negative experiences. This high frequency of low ratings could be a potential concern, and it would be important to analyze why customers are dissatisfied, possibly through sentiment analysis of the reviews and identifying common pain points.
- **Review:** The presence of 37 reviews with "No Review Text" implies that some customers may have submitted ratings without providing any written feedback. These blank reviews might limit the ability to perform detailed analysis on customer sentiment or reasons behind their ratings, indicating an area for improvement in gathering complete customer feedback.
- **Image Link:** Only 47 unique image links exist across 850 reviews, suggesting that the majority of customers do not attach images when leaving feedback. The images could provide additional context to the reviews, especially when there is a specific issue related to the store or product, so encouraging image submissions may offer more granular insights for quality analysis.

<a id="numerical"></a>
## <b><span style='color:darkblue'> Summary</span></b>

1. The Starbucks review dataset provides valuable insights into customer feedback, including ratings, review text, customer names, locations, and images. This data is crucial for identifying trends, understanding customer satisfaction, and improving service. The business context involves addressing negative feedback and leveraging customer insights to enhance the overall Starbucks experience. Key objectives include identifying areas for improvement based on ratings and reviews, understanding segment performance.

2. The dataset consists of 850 reviews, with 604 unique customer names and 633 unique locations. The most frequent rating is 1.0, suggesting a significant number of negative experiences. Review text is generally unique, but 37 entries lack text, limiting the depth of sentiment analysis. Image links are sparse, with most reviews not including images, which may provide more context for feedback. Understanding these patterns is essential for developing actionable strategies to address low ratings, improve customer experience, and retain customer loyalty across diverse regions. Proper data examination and preparation are crucial to ensuring the reliability of any subsequent analysis.