# **Import section**

In [None]:
!pip install langchain_community
!pip install replicate
!pip install pandas
!pip install -U langchain langchain-community langchain-experimental replicate

Collecting langchain_community
  Downloading langchain_community-0.3.27-py3-none-any.whl.metadata (2.9 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain_community)
  Downloading pydantic_settings-2.10.1-py3-none-any.whl.metadata (3.4 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain_community)
  Downloading httpx_sse-0.4.1-py3-none-any.whl.metadata (9.4 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting python-dotenv>=0.21.0 (from pydantic-settings<3.0.0,>=2.4.0->langchain_community)
  Downloading python_dotenv-1.1.1-py3-none-any.whl.metadata (24 k

# **IBM Granite Itegration section**

In [54]:
import os
from google.colab import userdata
from langchain_community.llms import Replicate

try:
    api_token = userdata.get("api_token")
    os.environ["REPLICATE_API_TOKEN"] = api_token
except userdata.SecretNotFoundError:
    print('Secret not found. Please add your Replicate API token to the secrets manager with the key "api_token".')

parameters = {
  "top_k": 50,
  "top_p": 0.9,
  "max_tokens": 512, # Lowered from 1024 to improve stability
  "min_tokens": 0,
  "repetition_penalty": 1.1,
  "temperature": 0.7
}

# Initialize the language model
llm = Replicate(
    model="ibm-granite/granite-3.3-8b-instruct",
    model_kwargs=parameters,
)

# **Dataset section**

**About Dataset**

This dataset captures daily sales transactions from a coffee shop in Cape Town (ibu kota legislatif Afrika Selatan) over March 2024.
It includes transaction timestamps, payment types (card/cash), coffee product names, and revenue per transaction.

The dataset is designed to help explore customer habits and business performance — perfect for time series analysis, data visualization, or beginner-friendly data analytics projects.

🧰 Columns Description
Column: What it means
date Transaction date (YYYY/MM/DD)
datetime: Exact timestamp of the transaction
cash_type: Payment method (card or cash)
card: Anonymized customer ID (card-based loyalty)
money: Amount spent per transaction (in South African Rand)
coffee_name: Type of coffee purchased

📊 Possible analyses
Trend of transactions and sales by day of the week (Monday–Sunday)

Revenue distribution by coffee type

Payment method preference (card vs. cash)

Average daily transactions and average daily sales

Peak times of the day: morning, afternoon, evening

🎨 Visualization preview
The attached Power BI dashboard shows:

Total sales, total transactions, and average transaction value

Revenue by coffee type

Double-axis trend line showing daily sales and transactions

Sales split by payment type

🌍 Why this dataset?
Coffee data is relatable, seasonal, and perfect to practice:

Time-based grouping (weekdays, months, times of day)

KPI design and visualization

Building dashboards with clear business insights

**Read data from .csv file**

In [59]:
import pandas as pd

df = pd.read_csv("/content/Coffe_sales.csv")[:3700]
df

Unnamed: 0,date,datetime,hour_of_day,cash_type,card,money,coffee_name,Time_of_Day,Weekday,Month_name,Weekdaysort,Monthsort
0,01 03 2024,01 03 2024 10:15:51,10,card,ANON-0000-0000-0001,R38.70,Latte,Morning,Fri,Mar,5,3
1,01 03 2024,01 03 2024 12:19:23,12,card,ANON-0000-0000-0002,R38.70,Hot Chocolate,Afternoon,Fri,Mar,5,3
2,01 03 2024,01 03 2024 12:20:18,12,card,ANON-0000-0000-0002,R38.70,Hot Chocolate,Afternoon,Fri,Mar,5,3
3,01 03 2024,01 03 2024 13:46:33,13,card,ANON-0000-0000-0003,R28.90,Americano,Afternoon,Fri,Mar,5,3
4,01 03 2024,01 03 2024 13:48:15,13,card,ANON-0000-0000-0004,R38.70,Latte,Afternoon,Fri,Mar,5,3
...,...,...,...,...,...,...,...,...,...,...,...,...
3631,23 03 2025,23 03 2025 10:34:55,10,card,ANON-0000-0000-1158,R35.76,Cappuccino,Morning,Sun,Mar,7,3
3632,23 03 2025,23 03 2025 14:43:37,14,card,ANON-0000-0000-1315,R35.76,Cocoa,Afternoon,Sun,Mar,7,3
3633,23 03 2025,23 03 2025 14:44:17,14,card,ANON-0000-0000-1315,R35.76,Cocoa,Afternoon,Sun,Mar,7,3
3634,23 03 2025,23 03 2025 15:47:29,15,card,ANON-0000-0000-1316,R25.96,Americano,Afternoon,Sun,Mar,7,3


# **Program section**

**1. Report with suggestion**

In [60]:
from langchain_experimental.agents.agent_toolkits.pandas.base import create_pandas_dataframe_agent

if 'df' in locals() and 'llm' in locals():
    try:
        # Step 1: Analyze the data directly with pandas (very memory efficient).
        print("\n--- Analyzing Data with Pandas ---")
        top_5_coffees = df['coffee_name'].value_counts().nlargest(5)
        top_5_data_string = top_5_coffees.to_string()

        print("\n--- Analysis Result ---")
        print(top_5_data_string)
        print("---------------------------\n")

        # Step 2: Create a detailed prompt for the AI to generate the final business report.
        report_prompt = f"""
        You are a professional business analyst. Your client is a business planner who wants to open a new cafe.
        Your task is to write a concise, insightful, and professional business report based on competitor sales data.
        The tone should be encouraging and actionable.

        Here is the summary of the top 5 selling items from the competitor's sales data:
        {top_5_data_string}

        Please structure the report with the following sections:
        1.  **Executive Summary:** Briefly explain that this report analyzes key sales data to form a foundational menu and inventory strategy for a successful launch.
        2.  **Key Insights:** Discuss the importance of these top 5 items. Explain what they represent (e.g., classic coffee demand, options for non-coffee drinkers).
        3.  **Strategic Recommendations:** Provide clear, actionable advice on inventory. Advise the client to secure reliable suppliers for the ingredients of these top 5 items, as they will be the core of the business's revenue.
        4.  **Conclusion:** End with an encouraging closing statement about building a successful cafe on this data-driven foundation.
        """

        # Step 3: Invoke the LLM with the detailed prompt to get the final report.
        print("\n--- Generating Final AI Business Report ---\n")
        final_report = llm.invoke(report_prompt)
        print(final_report)

    except NameError as e:
        print(f"An error occurred: {e}. Please ensure 'llm' and 'df' are initialized correctly.")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")

else:
    print("DataFrame 'df' or LLM 'llm' not found. Please run the setup cells first.")




--- Analyzing Data with Pandas ---

--- Analysis Result ---
coffee_name
Americano with Milk    824
Latte                  782
Americano              578
Cappuccino             501
Cortado                292
---------------------------


--- Generating Final AI Business Report ---

1. **Executive Summary:**
This report provides a comprehensive analysis of the top-selling coffee items from our competitor's sales data. The insights derived from this data will serve as the foundation for developing a strategic menu and inventory plan, ensuring a successful launch for your new cafe.

2. **Key Insights:**
The top five selling coffee items—Americano with Milk (824 units), Latte (782 units), Americano (578 units), Cappuccino (501 units), and Cortado (292 units)—demonstrate a clear preference for classic coffee options among consumers. These figures highlight the demand for versatile, milk-based coffee drinks, indicating that your cafe should cater to both coffee enthusiasts and those seeking 

**2. Report without suggestion**

In [61]:
from langchain_experimental.agents.agent_toolkits.pandas.base import create_pandas_dataframe_agent

if 'df' in locals() and 'llm' in locals():
    try:
        # Step 1: Analyze the data directly with pandas (very memory efficient).
        print("\n--- Analyzing Data with Pandas ---")
        top_5_coffees = df['coffee_name'].value_counts().nlargest(5)
        top_5_data_string = top_5_coffees.to_string()

        print("\n--- Analysis Result ---")
        print(top_5_data_string)
        print("---------------------------\n")

        # Step 2: Create a new, more direct prompt for the AI to generate a data-focused analysis.
        report_prompt = f"""
        You are a data analyst. Your task is to analyze the following sales data summary and provide a straightforward analysis in bullet points.
        Base your answer *only* on the data provided.

        Here is the summary of the top 5 selling items:
        {top_5_data_string}

        Based on this data, please provide a simple, bullet-point analysis covering the following:
        - Which items are the most popular and what does this say about the core of the business?
        - What does the popularity of these specific items suggest about general customer preferences?
        - What is the significance of non-coffee options appearing in the top sellers?
        """

        # Step 3: Invoke the LLM with the detailed prompt to get the final report.
        print("\n--- Generating Final AI Data Analysis ---\n")
        final_report = llm.invoke(report_prompt)
        print(final_report)

    except NameError as e:
        print(f"An error occurred: {e}. Please ensure 'llm' and 'df' have been initialized correctly.")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")

else:
    print("DataFrame 'df' or LLM 'llm' not found. Please run the setup cells first.")



--- Analyzing Data with Pandas ---

--- Analysis Result ---
coffee_name
Americano with Milk    824
Latte                  782
Americano              578
Cappuccino             501
Cortado                292
---------------------------


--- Generating Final AI Data Analysis ---

- The most popular items are Americano with Milk (824), Latte (782), and Americano (578). This indicates that espresso-based beverages, particularly those with milk, form the core of the business.

- The high sales of these specific items suggest that customers generally prefer milk-based coffee drinks, possibly indicating a preference for creamier, milder flavors over more robust, black coffee or specialty drinks.

- The absence of non-coffee options in the top sellers signifies that the business primarily caters to coffee enthusiasts, with espresso-based drinks dominating customer preferences. This could imply opportunities for further growth by introducing new milk-based coffee variations or promoting exist