# BUSINESS SCIENCE GENERATIVE AI/ML TIPS
### AI-TIP 002 | CSV SEMANTIC SEARCH

**GOALS:**
- Perform semantic search on a CSV file of businesses using a pre-trained sentence transformer model
- Display the top 5 results for a given query

In [1]:
# Libraries
import pandas as pd
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity

## 1.0 LOAD DATA

Load the business data from a CSV file.

In [None]:
# Load the business data
business_data = pd.read_csv("002_csv_semantic_search/data/business_data.csv")

business_data

Unnamed: 0,id,business_name,description,category
0,1,TechCorp Solutions,Innovative IT solutions provider specializing ...,Technology
1,2,GreenLeaf Eco,Sustainable products promoting an eco-friendly...,Eco Products
2,3,UrbanStyle Fashions,Trendy and affordable clothing for urban youth.,Retail
3,4,DigitalDynamics,Cutting-edge digital marketing and e-commerce ...,Technology
4,5,HealthyHarvest Foods,Organic and non-GMO foods for a healthier life...,Food & Beverage
5,6,BrightFuture Education,Education resources and e-learning platforms f...,Education
6,7,HomeHaven Decor,Modern and stylish home decor for urban homes.,Home & Living
7,8,EliteFitness Center,State-of-the-art fitness equipment and persona...,Fitness
8,9,AutoPro Mechanics,Comprehensive automotive repair and maintenanc...,Automotive
9,10,SecureVault Systems,Advanced cybersecurity solutions for businesse...,Technology


## 2.0 GENERATE EMBEDDINGS

Use a pre-trained sentence transformer model to generate embeddings for the business descriptions.

In [3]:
# Load the pre-trained sentence transformer model
# - The first time this will take a few seconds to download the model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Generate embeddings for the business descriptions
business_data['description_embedding'] = list(model.encode(business_data['description'].tolist()))

business_data



Unnamed: 0,id,business_name,description,category,description_embedding
0,1,TechCorp Solutions,Innovative IT solutions provider specializing ...,Technology,"[-0.063927054, -0.044173643, 0.090164825, -0.0..."
1,2,GreenLeaf Eco,Sustainable products promoting an eco-friendly...,Eco Products,"[-0.009359969, 0.06440454, 0.028187241, 0.0473..."
2,3,UrbanStyle Fashions,Trendy and affordable clothing for urban youth.,Retail,"[-0.020425871, 0.0617135, 0.07148955, 0.046504..."
3,4,DigitalDynamics,Cutting-edge digital marketing and e-commerce ...,Technology,"[-0.032339267, -0.014483877, 0.0021632928, -0...."
4,5,HealthyHarvest Foods,Organic and non-GMO foods for a healthier life...,Food & Beverage,"[0.061246254, 0.011559292, 0.032011457, 0.1009..."
5,6,BrightFuture Education,Education resources and e-learning platforms f...,Education,"[0.04338222, 0.002566659, 0.019263519, -0.0395..."
6,7,HomeHaven Decor,Modern and stylish home decor for urban homes.,Home & Living,"[0.07156117, 0.051035494, 0.12542138, 0.029695..."
7,8,EliteFitness Center,State-of-the-art fitness equipment and persona...,Fitness,"[-0.10383554, -0.022419028, 0.031146636, 0.028..."
8,9,AutoPro Mechanics,Comprehensive automotive repair and maintenanc...,Automotive,"[-0.045173284, -0.017059632, 0.069921665, -0.0..."
9,10,SecureVault Systems,Advanced cybersecurity solutions for businesse...,Technology,"[0.017845903, 0.026372747, -0.00019522308, -0...."


## 3.0 PERFORM SEMANTIC SEARCH

Define a function to perform semantic search and use it to find the most relevant results for a given query.

In [4]:
# Define a function to perform semantic search
def semantic_search(query, data, top_k=5):
    # Generate the embedding for the query
    query_embedding = model.encode([query])[0]

    # Calculate cosine similarities
    similarities = cosine_similarity([query_embedding], data['description_embedding'].tolist())

    # Get the indices of the top_k most similar descriptions
    top_indices = similarities[0].argsort()[-top_k:][::-1]

    # Return the top_k results
    return data.iloc[top_indices]

In [5]:
# Perform a semantic search
query = "Find healthy organic food businesses"
k = 5
results = semantic_search(query, business_data, top_k=k)

# Display the results
print(f"Top {k} results for your query:")
print(results[['business_name', 'description', 'category']])

Top 5 results for your query:
           business_name                                        description  \
4   HealthyHarvest Foods  Organic and non-GMO foods for a healthier life...   
1          GreenLeaf Eco  Sustainable products promoting an eco-friendly...   
16    Innovate Marketing  Creative marketing strategies for startups and...   
14  EcoFriendly Supplies        Eco-friendly office and household supplies.   
12        QuickBite Deli  Quick and delicious deli meals for busy profes...   

           category  
4   Food & Beverage  
1      Eco Products  
16        Marketing  
14     Eco Products  
12  Food & Beverage  


## 4.0 WANT TO LEARN HOW TO USE GENERATIVE AI AND LLMS FOR DATA SCIENCE PROJECTS?

- Join My Live 8-Week AI For Data Scientists Bootcamp
- Live Cohorts are happening once per quarter. Schedule:
    - Week 1: Live Kickoff Clinic + Local LLM Training + AI Fast Track
    - Week 2: Retrieval Augmented Generation (RAG) For Data Scientists
    - Week 3: Business Intelligence AI Copilot (SQL + Pandas Tools)
    - Week 4: Customer Analytics Agent Team (Multi-Agent Workflows)
    - Week 5: Time Series Forecasting Agent Team (Multi-Agent Machine Learning Workflows)
    - Week 6: LLM Model Deployment With AWS Bedrock
    - Week 7: Fine-Tuning LLM Models & RAG Deployments With AWS Bedrock
    - Week 8: AI App Deployment With AWS Cloud (Docker, EC2, NGINX)

**Enroll here: [https://learn.business-science.io/generative-ai-bootcamp-enroll](https://learn.business-science.io/generative-ai-bootcamp-enroll)**