Business Intelligence Challenge: Strategic Analysis of Superstore Performance
Objective: This exercise is a case study in practical data analysis. You will act as a data analyst for a national retail company, tasked with analyzing the US Superstore dataset. Your goal is to move beyond descriptive plotting to produce diagnostic insights and formulate data-driven recommendations that can inform business strategy.
Key Competencies:
Strategic Inquiry: Translate broad business objectives into specific, testable questions.
Data Storytelling: Develop a clear and compelling narrative supported by visualizations.
Tool Proficiency: Differentiate between exploratory and explanatory analysis, selecting the right tool (Matplotlib for deep dives, Seaborn for clear communication).
Actionable Insights: Formulate concrete recommendations based on your analytical findings.
Project Brief
Phase 1: Data Scoping and Preparation
A reliable analysis is built on a foundation of clean, well-understood data.
Data Ingestion and Initial Assessment:
Download and load the US Superstore dataset.
Perform an initial data review using .info(), .describe(), and .isnull().sum().
Guiding Questions: What is the data type of the date columns? What is the time frame of this dataset? Are there significant gaps in the data that could compromise the analysis?
Data Cleaning and Preprocessing:
Address any missing values or duplicates. In your documentation, you must justify your methodology. Explain why you chose to drop, fill, or otherwise impute data for each specific case.
Ensure data types are correct, particularly converting date columns to datetime objects for time-series analysis.
Feature Engineering:
To enable deeper analysis, create new features from the existing data. At a minimum, create:
Profit Margin calculated as (Profit / Sales) * 100.
Order Year and Order Month extracted from the primary date column.
Phase 2: Exploratory Analysis with Matplotlib
In this phase, you use Matplotlib for its control and flexibility to conduct a deep, interactive exploration of the data.
Time-Series Trend Investigation:
Create a line chart of total Sales aggregated by month across all years.
Technical Requirement: Enhance this chart with interactivity. Using a library like ipywidgets, add a dropdown filter for Product Category. This will allow for dynamic comparison of sales trends between categories.
Analytical Focus: Document your findings. Identify any evidence of seasonality, long-term growth or decline, and significant differences in trends between product categories.
Geographic Performance Analysis:
Visualize total sales by State. A sorted bar chart is a good starting point.
Technical Requirement: Create an interactive control, such as a slider, to dynamically display the "Top N" performing states. This simulates a common dashboard feature for filtering and ranking.
Analytical Focus: Identify the top revenue-generating states. Are sales concentrated in a few key markets, or are they widely distributed? Note any states that appear to be underperforming relative to their population or region.
Phase 3: Communicating Findings with Seaborn
Now, shift your focus from exploration to communication. Use Seaborn to create polished, presentation-ready visualizations designed to convey clear messages to a business audience.
Product Profitability Report:
Generate a horizontal bar chart displaying the Top 10 Most Profitable Products.
Presentation Requirement: This chart is intended for an executive summary. It must be clear and impactful. Use a descriptive title, label axes correctly, and annotate each bar with its corresponding profit value to eliminate ambiguity.
Discount Strategy Analysis:
Create a scatter plot to examine the relationship between Discount and Profit.
Analytical Requirement: A simple scatter plot is insufficient. Use the hue parameter to color the data points by Product Category. This will reveal whether the impact of discounting is uniform across the business. Consider adding a regression line (regplot) to clarify the trend.
Analytical Focus: What is the relationship between discounts and profitability? Does this relationship vary by category? Identify the point at which discounts begin to consistently result in losses. This analysis is critical for providing actionable advice.
Phase 4: Methodology and Tooling Review
A key skill for a senior analyst is understanding the strengths and weaknesses of their tools.
Comparative Evaluation: In a markdown cell, provide a concise comparison of Matplotlib and Seaborn based on this project. Address the following:
Efficiency: How quickly could you create a functional vs. a presentation-quality visual in each?
Control vs. Convention: Where did Matplotlib's granular control prove essential? Where did Seaborn's high-level API accelerate your work?
Tool Selection Criteria: Based on your experience, document a clear policy for your future work. For example: "For initial, multi-faceted data exploration, I will primarily use [Library] because... For final reporting to non-technical stakeholders, I will prefer [Library] because..."
Phase 5: Final Deliverable
Your final output is a Jupyter Notebook that serves as a complete report of your analysis.
Professional Structure: Organize your notebook with clear headings, markdown explanations for each step, and clean, commented code. The notebook should be easily readable by a colleague.
Executive Summary: At the top of the notebook, provide a concise summary of your key findings and recommendations, written for a management audience. Use 3-5 bullet points to highlight the most critical insights.
Example Finding: "Analysis reveals a strong negative correlation between discount rates above 20% and profitability, particularly within the 'Furniture' category, which becomes consistently unprofitable at these discount levels."
Example Recommendation: "Recommend capping the standard discount for all furniture items at 20% and implementing a formal review process for any exceptions."
Optional Advanced Challenges
Integrated Dashboard: Use ipywidgets or Voilà to combine two or more of your charts into a single interactive dashboard view. For example, selecting a state could update a chart showing the product category breakdown for that state.
Outlier Annotation: On your Discount vs. Profit scatter plot, programmatically identify and annotate the top 3 most profitable and top 3 least profitable transactions. What products were they, and what were the circumstances?
Alternative Tooling: Re-create one of the interactive charts using Plotly Express. Briefly comment on its advantages and disadvantages compared to the Matplotlib/ipywidgets combination.



In [1]:
pip install xlrd

Defaulting to user installation because normal site-packages is not writeable
You should consider upgrading via the '/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.


The goal is to create a class that represents a simple circle.
A Circle can be defined by either specifying the radius or the diameter.
The user can query the circle for either its radius or diameter.

Other abilities of a Circle instance:

Compute the circle’s area
Print the attributes of the circle - use a dunder method
Be able to add two circles together, and return a new circle with the new radius - use a dunder method
Be able to compare two circles to see which is bigger, and return a Boolean - use a dunder method
Be able to compare two circles and see if there are equal, and return a Boolean- use a dunder method
Be able to put them in a list and sort them
Bonus (not mandatory) : Install the Turtle module, and draw the sorted circles


In [2]:
import math

class Circle:
    def __init__(self, radius=None, diameter=None):
        if radius:
            self.radius = radius
        elif diameter:
            self.radius = diameter / 2
        else:
            raise ValueError("You must provide either radius or diameter.")

    @property
    def diameter(self):
        return self.radius * 2

    def area(self):
        return math.pi * self.radius ** 2

    def __str__(self):
        return f"Circle with radius: {self.radius:.2f}, diameter: {self.diameter:.2f}, area: {self.area():.2f}"

    def __add__(self, other):
        return Circle(radius=self.radius + other.radius)

    def __gt__(self, other):
        return self.radius > other.radius

    def __eq__(self, other):
        return self.radius == other.radius

    def __lt__(self, other):
        return self.radius < other.radius  # needed for sorting

# --- Example Usage ---
c1 = Circle(radius=3)
c2 = Circle(diameter=10)

print(c1)             # Uses __str__
print(c2.area())      # Area of second circle

c3 = c1 + c2          # Adds radii
print(c3)

print(c1 > c2)        # Comparison
print(c1 == c2)

# --- Sorting ---
circles = [c2, c1, c3]
circles.sort()
for c in circles:
    print(c)


Circle with radius: 3.00, diameter: 6.00, area: 28.27
78.53981633974483
Circle with radius: 8.00, diameter: 16.00, area: 201.06
False
False
Circle with radius: 3.00, diameter: 6.00, area: 28.27
Circle with radius: 5.00, diameter: 10.00, area: 78.54
Circle with radius: 8.00, diameter: 16.00, area: 201.06
