| title | emoji | colorFrom | colorTo | sdk | sdk_version | app_file | pinned |
|---|---|---|---|---|---|---|---|
King Arthur Baking AI Assistant |
π° |
blue |
red |
streamlit |
1.28.0 |
app.py |
false |
A sophisticated AI-powered assistant for King Arthur Baking mixes using RAG (Retrieval-Augmented Generation) with LangGraph, OpenAI, and MongoDB Atlas.
- Intelligent Web Scraping: Automatically scrapes the King Arthur Baking mixes category
- RAG System: Combines semantic search with OpenAI embeddings for accurate product recommendations
- LangGraph Agent: Advanced AI agent with multi-step reasoning and tool use
- MongoDB Atlas Integration: Vector search and document storage
- Streamlit Frontend: Beautiful, interactive web interface
- Real-time Analytics: Product insights and data visualization
- Frontend: Streamlit with custom CSS styling
- AI Framework: LangGraph + LangChain
- LLM: OpenAI GPT-4o
- Embeddings: OpenAI text-embedding-3-small
- Database: MongoDB Atlas
- Web Scraping: BeautifulSoup4 + Requests
- Visualization: Plotly
- Deployment: Hugging Face Spaces
-
Clone the repository:
git clone <repository-url> cd king-arthur-baking-ai
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables: Create a
.envfile in the root directory:OPENAI_API_KEY=your_openai_api_key_here MONGODB_URI=mongodb+srv://username:password@cluster.mongodb.net/ MONGODB_DB_NAME=king_arthur_baking_db MONGODB_COLLECTION_NAME=mixes
The application uses several configuration parameters that can be adjusted in config.py:
- API Keys: OpenAI API key for LLM and embeddings
- Database: MongoDB Atlas connection string and collection names
- Scraping: Delay between requests and retry limits
- Agent: Temperature, max tokens, and model parameters
The scraper automatically extracts comprehensive product information:
- Product names and descriptions
- Pricing information
- Ingredients and nutritional details
- Instructions and features
- Product images and URLs
- Availability status
python scraper.pyThis will:
- Scrape all mixes from the King Arthur Baking website
- Save data to
mixes_data.json(avoiding duplicates) - Display scraping statistics
- Create a MongoDB Atlas account
- Create a new cluster
- Get your connection string
- Update the
MONGODB_URIin your.envfile
python database.pyThis will:
- Connect to MongoDB Atlas
- Load data from the JSON file
- Generate embeddings for all products
- Create necessary indexes
The LangGraph agent features:
- Query Analysis: Understands user intent and extracts key information
- Multi-Modal Search: Combines semantic and text search
- Product Recommendations: Personalized suggestions based on preferences
- Product Comparison: Side-by-side analysis of different mixes
- Reasoning: Advanced analysis and insights about products
- Analyze Query: Determine intent and extract keywords
- Route Decision: Choose appropriate search strategy
- Search/Recommend/Compare: Execute the chosen action
- Reasoning: Analyze results and provide insights
- Response Generation: Create helpful, detailed responses
The Streamlit interface includes:
- Chat Interface: Natural language interaction with the AI
- Agent Graph: Visual representation of the AI workflow
- Product Cards: Rich product information display
- Analytics Dashboard: Data insights and visualizations
- Control Panel: Scraping and embedding management
The application provides:
- Price Distribution: Analysis of product pricing
- Feature Analysis: Most common product features
- Database Statistics: Product counts and embedding status
- Search Performance: Query analysis and results
- Create a new Space on Hugging Face
- Choose "Streamlit" as the SDK
- Upload your code files
- Set environment variables in the Space settings
- The app will automatically deploy
streamlit run app.py"I'm looking for a chocolate cake mix"
"Can you recommend some easy baking mixes for beginners?"
"Compare different pancake mixes"
"What ingredients are in your bread mixes?"
scrape_all_mixes(): Scrape all products from the websitesave_to_json(): Save data to JSON file avoiding duplicates
insert_products(): Insert products into databasesearch_products(): Text-based searchsemantic_search(): Embedding-based search
generate_embedding(): Create embeddings for textsemantic_search(): Find similar productshybrid_search(): Combine semantic and text search
chat(): Main chat interfaceget_graph_visualization(): Get agent workflow graph
The application includes comprehensive error handling:
- Connection Errors: Automatic retry with exponential backoff
- API Rate Limits: Proper delays and batch processing
- Data Validation: Input sanitization and type checking
- Graceful Degradation: Fallback options when services are unavailable
- Batch Processing: Embeddings generated in batches
- Caching: Frequent queries cached for faster response
- Indexing: MongoDB indexes for efficient search
- Rate Limiting: Respectful scraping with delays
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- King Arthur Baking for their excellent products and website
- OpenAI for the powerful GPT-4o and embedding models
- MongoDB Atlas for the vector database capabilities
- Streamlit for the beautiful web interface framework
- LangGraph for the advanced agent workflow capabilities
- Connection Errors: Check your internet connection and API keys
- Rate Limiting: Reduce scraping frequency or batch sizes
- Memory Issues: Limit the number of products processed at once
- Authentication: Verify your MongoDB Atlas and OpenAI credentials
For issues and questions:
- Check the logs for detailed error messages
- Verify all environment variables are set correctly
- Ensure all dependencies are installed
- Check the MongoDB Atlas connection and permissions
- Initial release
- Web scraping functionality
- MongoDB Atlas integration
- OpenAI embeddings and chat
- LangGraph agent workflow
- Streamlit frontend
- Hugging Face Spaces deployment ready