### 1. Running an LLM for a Business Scenario


In [1]:
## Install dependencies needed to run the following cells.
%%capture
!pip install transformers>=4.40.1 accelerate>=0.27.2

In [2]:
## We load our model onto the GPU for faster inference.
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=False,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/967 [00:00<?, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

In [3]:
## Make it easier for us to use the generator by creating a pipeline object
from transformers import pipeline

# Create a pipeline
generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=False,
    max_new_tokens=500,
    do_sample=False
)

Device set to use cuda
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


In [4]:
## Create prompts as a user and feed it to the model
# I will provide the prompts in 3 separate pipelines for prompt readability.

# The prompt (user input / query)
messages = [
    {"role": "user", "content": "Write an email to a potential client explaining how customer acquisition costs, churn, and lifetime value vary across DFW zip codes, and which local markets warrant increased marketing spend versus pullback."}
]

# Generate output
output = generator(messages)
print(output[0]["generated_text"])

 Subject: Understanding Customer Acquisition Costs, Churn, and Lifetime Value Across DFW Zip Codes

Dear [Client's Name],

I hope this email finds you well. I am writing to provide you with insights into customer acquisition costs, churn, and lifetime value across different zip codes in the Dallas-Fort Worth (DFW) area. As you consider expanding your business or refining your marketing strategy, understanding these metrics can help you make informed decisions about where to allocate your resources.

Customer Acquisition Cost (CAC) is the total cost of acquiring a new customer, including marketing and sales expenses. Churn refers to the percentage of customers who stop using your product or service over a given period. Lifetime Value (LTV) is the projected revenue a customer will generate over their entire relationship with your business.

To help you better understand these metrics in the DFW area, we have analyzed data from various zip codes and identified key trends and insights.

1.

In [5]:
# The prompt (user input / query)
messages = [
    {"role": "user", "content": "Suggest Dallas–Fort Worth neighborhoods that show the strongest mismatch between consumer demand and existing retail or service locations, and where should expansion or closures be prioritized to maximize ROI."}
]

# Generate output
output = generator(messages)
print(output[0]["generated_text"])

 As an AI, I don't have real-time data access, but I can provide a general approach to identifying neighborhoods in the Dallas-Fort Worth (DFW) area that may have mismatches between consumer demand and existing retail or service locations. This approach can help businesses and investors make informed decisions about where to expand or close operations to maximize return on investment (ROI).

1. **Market Research and Data Analysis**:
   - Conduct market research to understand the demographics, consumer behavior, and preferences in different DFW neighborhoods.
   - Analyze retail and service data to identify areas with high demand but low supply.
   - Use tools like Google Trends, social media analytics, and local surveys to gauge consumer interest in various products and services.

2. **Identify High-Demand Needs**:
   - Look for gaps in essential services such as grocery stores, pharmacies, healthcare, and family-friendly dining.
   - Consider the demand for specialty stores, boutiques

In [6]:
# The prompt (user input / query)
messages = [
    {"role": "user", "content": "Provide examples of which traffic patterns, commute flows, and time-of-day trends in the DFW metroplex most impact workforce productivity, and how hybrid or location strategies be adjusted to reduce costs."}
]

# Generate output
output = generator(messages)
print(output[0]["generated_text"])

 Traffic patterns, commute flows, and time-of-day trends in the Dallas-Fort Worth (DFW) metroplex can significantly impact workforce productivity. Here are some examples and strategies to adjust hybrid or location strategies to reduce costs:

1. Peak Traffic Hours: The DFW metroplex experiences heavy traffic congestion during peak hours, typically between 7:00 AM and 9:00 AM, and 4:00 PM to 6:00 PM. This can lead to longer commute times, increased stress, and reduced productivity for employees. To mitigate this, companies can implement flexible work schedules, allowing employees to start and end their workday outside of peak hours. This can help reduce traffic congestion and improve employee satisfaction.

2. Public Transportation: The DFW metroplex has a well-developed public transportation system, including buses, light rail, and commuter rail. Encouraging employees to use public transportation can help reduce traffic congestion and improve workforce productivity. Companies can offer

Writeup:

An LLM can support business analysts in many ways. For one, it can automate the creation of repetitive or grunt-work text generation like emails. It can also aid in idea generation and strategy alignment. However, I can't help but notice the limitations these LLMs have on outputs: Phi-3-mini-4k-instruct lacks access to realtime-uptodate gegraphical data, which makes it impossible to give real geographical suggestions; Halllucinations also make it hard to tell the difference between real and false information generated by the LLM if you are unfamiliar with the topic you are exploring.

### 2. Tokenizing Business Text

In [20]:
%%capture
!pip install --upgrade transformers==4.41.2 sentence-transformers==3.0.1 gensim==4.3.2 scikit-learn==1.5.0 accelerate==0.31.0 peft==0.11.1 scipy==1.10.1 numpy==1.26.4

In [8]:
# prompt chosen: Suggest Dallas–Fort Worth neighborhoods that show the strongest mismatch between consumer demand and existing retail or service locations, and where should expansion or closures be prioritized to maximize ROI.

prompt = "Suggest Dallas–Fort Worth neighborhoods that show the strongest mismatch between consumer demand and existing retail or service locations, and where should expansion or closures be prioritized to maximize ROI."

## Convert the prompt into input_ids
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")

# Generate the text
generation_output = model.generate(
  input_ids=input_ids,
  max_new_tokens=20
)

## Print the raw input_ids
print(input_ids)

## Loop over each ID and print tokenizer.decode(id) on its own line
for id in input_ids[0]:
  print(tokenizer.decode(id))

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


tensor([[25589,  7118, 27043, 29994, 29943,   441,   399,  2072, 18403, 29879,
           393,  1510,   278,  4549,   342, 29635,  1546, 21691,  9667,   322,
          5923,  3240,   737,   470,  2669, 14354, 29892,   322,   988,   881,
         13184,   470,  4694,  1973,   367,  7536,   277,  1891,   304,  5256,
           675, 16641, 29902, 29889]], device='cuda:0')
Sug
gest
Dallas
–
F
ort
W
orth
neighborhood
s
that
show
the
strong
est
mismatch
between
consumer
demand
and
existing
ret
ail
or
service
locations
,
and
where
should
expansion
or
clos
ures
be
prior
it
ized
to
maxim
ize
RO
I
.


Writeup:

In downstream analytics tasks such as sentiment analysis, named entity recognition, or KPI extraction, inconsistent tokenization of financial terms or locations can reduce model accuracy. This is especially important in business contexts where small numerical or semantic differences can change analytical conclusions.

For instance, it this prompt alone words like Fort Worth, prioritized, among many others were broken up in at least 2-4 tokens, and an abbreviation like ROI was even broken up into 2 tokens.


### 3. Comparing Tokenizers on Business Phrases

In [9]:
# Borrowing show_tokens function from Chapter 2 notebook

from transformers import AutoModelForCausalLM, AutoTokenizer

colors_list = [
    '102;194;165', '252;141;98', '141;160;203',
    '231;138;195', '166;216;84', '255;217;47'
]

def show_tokens(sentence, tokenizer_name):
    tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
    token_ids = tokenizer(sentence).input_ids
    for idx, t in enumerate(token_ids):
        print(
            f'\x1b[0;30;48;2;{colors_list[idx % len(colors_list)]}m' +
            tokenizer.decode(t) +
            '\x1b[0m',
            end=' '
        )


In [10]:
text = """
In Q4, revenue increased 12.5% to $4.3M, driven by higher average
order value (+8.1%) and improved conversion rates (3.4% → 3.9%).
Gross profit margin expanded from 41.2% to 44.6% following supplier
renegotiations and reduced fulfillment costs per unit (–9.7%).
Key KPIs showed steady operational health, with customer acquisition
cost holding at $126, monthly active users up 10.2% QoQ, and on-time
delivery reaching 97.8%. However, customer churn ticked up slightly
from 4.6% to 5.1%, signaling a need to strengthen retention
initiatives despite overall profitability gains.

"""

In [None]:
# Compare different tokenizers

In [14]:
show_tokens(text, "bert-base-uncased")

[0;30;48;2;102;194;165m[CLS][0m [0;30;48;2;252;141;98min[0m [0;30;48;2;141;160;203mq[0m [0;30;48;2;231;138;195m##4[0m [0;30;48;2;166;216;84m,[0m [0;30;48;2;255;217;47mrevenue[0m [0;30;48;2;102;194;165mincreased[0m [0;30;48;2;252;141;98m12[0m [0;30;48;2;141;160;203m.[0m [0;30;48;2;231;138;195m5[0m [0;30;48;2;166;216;84m%[0m [0;30;48;2;255;217;47mto[0m [0;30;48;2;102;194;165m$[0m [0;30;48;2;252;141;98m4[0m [0;30;48;2;141;160;203m.[0m [0;30;48;2;231;138;195m3[0m [0;30;48;2;166;216;84m##m[0m [0;30;48;2;255;217;47m,[0m [0;30;48;2;102;194;165mdriven[0m [0;30;48;2;252;141;98mby[0m [0;30;48;2;141;160;203mhigher[0m [0;30;48;2;231;138;195maverage[0m [0;30;48;2;166;216;84morder[0m [0;30;48;2;255;217;47mvalue[0m [0;30;48;2;102;194;165m([0m [0;30;48;2;252;141;98m+[0m [0;30;48;2;141;160;203m8[0m [0;30;48;2;231;138;195m.[0m [0;30;48;2;166;216;84m1[0m [0;30;48;2;255;217;47m%[0m [0;30;48;2;102;194;165m)[0m [0;30;48;2;252;141;98mand[0m [0;

In [15]:
show_tokens(text, "bert-base-cased")

[0;30;48;2;102;194;165m[CLS][0m [0;30;48;2;252;141;98mIn[0m [0;30;48;2;141;160;203mQ[0m [0;30;48;2;231;138;195m##4[0m [0;30;48;2;166;216;84m,[0m [0;30;48;2;255;217;47mrevenue[0m [0;30;48;2;102;194;165mincreased[0m [0;30;48;2;252;141;98m12[0m [0;30;48;2;141;160;203m.[0m [0;30;48;2;231;138;195m5[0m [0;30;48;2;166;216;84m%[0m [0;30;48;2;255;217;47mto[0m [0;30;48;2;102;194;165m$[0m [0;30;48;2;252;141;98m4[0m [0;30;48;2;141;160;203m.[0m [0;30;48;2;231;138;195m3[0m [0;30;48;2;166;216;84m##M[0m [0;30;48;2;255;217;47m,[0m [0;30;48;2;102;194;165mdriven[0m [0;30;48;2;252;141;98mby[0m [0;30;48;2;141;160;203mhigher[0m [0;30;48;2;231;138;195maverage[0m [0;30;48;2;166;216;84morder[0m [0;30;48;2;255;217;47mvalue[0m [0;30;48;2;102;194;165m([0m [0;30;48;2;252;141;98m+[0m [0;30;48;2;141;160;203m8[0m [0;30;48;2;231;138;195m.[0m [0;30;48;2;166;216;84m1[0m [0;30;48;2;255;217;47m%[0m [0;30;48;2;102;194;165m)[0m [0;30;48;2;252;141;98mand[0m [0;

In [16]:
show_tokens(text, "gpt2")

[0;30;48;2;102;194;165m
[0m [0;30;48;2;252;141;98mIn[0m [0;30;48;2;141;160;203m Q[0m [0;30;48;2;231;138;195m4[0m [0;30;48;2;166;216;84m,[0m [0;30;48;2;255;217;47m revenue[0m [0;30;48;2;102;194;165m increased[0m [0;30;48;2;252;141;98m 12[0m [0;30;48;2;141;160;203m.[0m [0;30;48;2;231;138;195m5[0m [0;30;48;2;166;216;84m%[0m [0;30;48;2;255;217;47m to[0m [0;30;48;2;102;194;165m $[0m [0;30;48;2;252;141;98m4[0m [0;30;48;2;141;160;203m.[0m [0;30;48;2;231;138;195m3[0m [0;30;48;2;166;216;84mM[0m [0;30;48;2;255;217;47m,[0m [0;30;48;2;102;194;165m driven[0m [0;30;48;2;252;141;98m by[0m [0;30;48;2;141;160;203m higher[0m [0;30;48;2;231;138;195m average[0m [0;30;48;2;166;216;84m
[0m [0;30;48;2;255;217;47morder[0m [0;30;48;2;102;194;165m value[0m [0;30;48;2;252;141;98m (+[0m [0;30;48;2;141;160;203m8[0m [0;30;48;2;231;138;195m.[0m [0;30;48;2;166;216;84m1[0m [0;30;48;2;255;217;47m%)[0m [0;30;48;2;102;194;165m and[0m [0;30;48;2;252;141;98m im

Writeup:

The tokenizers differ significantly in how they handle numbers, percentages, currency symbols, and compound business terms. The BERT tokenizers tend to split numbers and symbols into more granular subcomponents, while GPT-2 often treats certain numeric sequences as single tokens. These differences can affect downstream analytics such as KPI extraction or sentiment scoring, where misaligned token boundaries may cause models to miss or misinterpret key financial indicators.


### 4. Text Embeddings for Customer Feedback

In [32]:
# Reusing SentenceTransformer from Chapter 2

from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

# Load model
model = SentenceTransformer('sentence-transformers/all-mpnet-base-v2')

# 2. Create customer feedback sentences (8+)
feedback_sentences = [
    "Checkout was fast and easy",
    "The website is easy to navigate",
    "Customer support never replied",
    "I waited a long time for help from support",
    "The product arrived on time",
    "Packaging was fine, nothing special",
    "I love the design of the app",
    "The instructions were unclear"
]

# 3. Compute embeddings for all sentences
feedback_embeddings = model.encode(feedback_sentences)

# 4. Query sentence
query_sentence = "I am unhappy with the response time from support"
query_embedding = model.encode([query_sentence])

# Compute cosine similarity
similarities = cosine_similarity(query_embedding, feedback_embeddings)[0]

# 5. Get top 3 most similar sentences
top_indices = np.argsort(similarities)[-3:][::-1]

print("Query:", query_sentence)
print("\nTop 3 most similar feedback sentences:\n")

for idx in top_indices:
    print(f"- {feedback_sentences[idx]} (similarity: {similarities[idx]:.3f})")

Query: I am unhappy with the response time from support

Top 3 most similar feedback sentences:

- I waited a long time for help from support (similarity: 0.716)
- Customer support never replied (similarity: 0.649)
- The product arrived on time (similarity: 0.350)


Writeup:

Similarity between embeddings can be used in business analytics by allowing analysts to predict potential future problems or connect interrelated problems together and find holistic solutions to addreess all of them. Grouping complaints allows the organization to see which problems are most similar to see what kinds of problems the business faces the most.
Embedding similarities can also be useful to businessess in routing tickets so that they can be directed towards the right people with the right expertise in the shortest amount of time.


### 5. Recommending Business Content Using Embeddings

In [35]:
import pandas as pd
import numpy as np
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity


In [36]:
model = SentenceTransformer('sentence-transformers/all-mpnet-base-v2')


In [37]:
# Create Business Resources DataFrame (10 items)

data = [
    {
        "id": 1,
        "title": "Churn Analysis for Subscription Services",
        "description": "Techniques for predicting customer churn using behavioral and transaction data."
    },
    {
        "id": 2,
        "title": "Sales Forecasting with Time Series Models",
        "description": "Using historical sales data to forecast future revenue and demand."
    },
    {
        "id": 3,
        "title": "Customer Segmentation with Clustering",
        "description": "Grouping customers based on demographics and purchasing behavior."
    },
    {
        "id": 4,
        "title": "Fraud Detection in Financial Transactions",
        "description": "Identifying fraudulent transactions using machine learning models."
    },
    {
        "id": 5,
        "title": "Marketing Campaign Performance Analysis",
        "description": "Measuring the effectiveness of digital marketing campaigns."
    },
    {
        "id": 6,
        "title": "Predictive Maintenance for Manufacturing",
        "description": "Using sensor data to predict equipment failures before they occur."
    },
    {
        "id": 7,
        "title": "Customer Lifetime Value Modeling",
        "description": "Estimating long-term customer value using historical transaction data."
    },
    {
        "id": 8,
        "title": "Recommendation Systems for E-Commerce",
        "description": "Building product recommendation systems using user behavior data."
    },
    {
        "id": 9,
        "title": "Supply Chain Optimization Analytics",
        "description": "Optimizing inventory and logistics using data-driven techniques."
    },
    {
        "id": 10,
        "title": "Sentiment Analysis for Customer Feedback",
        "description": "Analyzing customer reviews and feedback using natural language processing."
    }
]

df = pd.DataFrame(data)
df



Unnamed: 0,id,title,description
0,1,Churn Analysis for Subscription Services,Techniques for predicting customer churn using...
1,2,Sales Forecasting with Time Series Models,Using historical sales data to forecast future...
2,3,Customer Segmentation with Clustering,Grouping customers based on demographics and p...
3,4,Fraud Detection in Financial Transactions,Identifying fraudulent transactions using mach...
4,5,Marketing Campaign Performance Analysis,Measuring the effectiveness of digital marketi...
5,6,Predictive Maintenance for Manufacturing,Using sensor data to predict equipment failure...
6,7,Customer Lifetime Value Modeling,Estimating long-term customer value using hist...
7,8,Recommendation Systems for E-Commerce,Building product recommendation systems using ...
8,9,Supply Chain Optimization Analytics,Optimizing inventory and logistics using data-...
9,10,Sentiment Analysis for Customer Feedback,Analyzing customer reviews and feedback using ...


In [39]:
# Compute Embeddings for Descriptions
embeddings = model.encode(df["description"].tolist())

df["embedding"] = embeddings.tolist()



In [40]:
def recommend_resources(query, top_n=3):
    query_embedding = model.encode([query])

    similarities = cosine_similarity(
        query_embedding,
        np.vstack(df["embedding"])
    )[0]

    top_indices = np.argsort(similarities)[-top_n:][::-1]

    print(f"Query: {query}\n")
    print("Top recommended resources:\n")

    for idx in top_indices:
        print(f"- {df.iloc[idx]['title']}")
        print(f"  {df.iloc[idx]['description']}")
        print(f"  Similarity: {similarities[idx]:.3f}\n")


In [41]:
recommend_resources("predicting customer churn from transaction data")


Query: predicting customer churn from transaction data

Top recommended resources:

- Churn Analysis for Subscription Services
  Techniques for predicting customer churn using behavioral and transaction data.
  Similarity: 0.885

- Customer Lifetime Value Modeling
  Estimating long-term customer value using historical transaction data.
  Similarity: 0.636

- Customer Segmentation with Clustering
  Grouping customers based on demographics and purchasing behavior.
  Similarity: 0.561



Writeup:
This embedding-based recommender could be extended into a business knowledge system by indexing internal documentation, dashboards, and reports using semantic embeddings. Analysts could receive relevant resources automatically based on their search queries or current tasks, even if exact keywords do not match. Over time, usage data could be incorporated to personalize recommendations by role or department. This approach helps reduce knowledge silos and improves productivity across data-driven teams.