Deployed Application Link: https://website-interaction.streamlit.app/
This is a simple web-based tool built with Streamlit, Langchain, and Google Gemini API that allows you to ask questions about the content of websites you provide. The tool is designed to answer questions solely based on the information scraped from the URLs you input, without relying on general world knowledge.
Key Features:
- URL Input: Users can enter one or more website URLs in a text area.
- Content Ingestion: The tool scrapes the text content from the provided URLs. It also supports ingesting content from
sitemap.xmlfiles for broader website coverage. - Question Answering: Users can ask questions related to the ingested website content.
- Accurate Answers: Answers are generated using Google's Gemini Pro model and are grounded strictly in the scraped website content.
- Simple UI: A user-friendly Streamlit interface with clear input fields and buttons.
- Two Ingestion Modes:
- Ingest URLs: Processes content from the URLs directly entered by the user.
- Ingest all subdomains: Attempts to find and process content from the
sitemap.xmlof each provided URL, potentially covering more pages of the website.
- Persistent Vector Store: The ingested website content is vectorized and stored, allowing you to ask multiple questions without re-ingesting the URLs each time.
This project was built with the following evaluation criteria in mind:
- Relevance & Accuracy of answers: Answers should be directly relevant to the ingested website content and factually accurate based on that content alone.
- UI/UX: The user interface should be straightforward, intuitive, and easy to use for anyone.
- Implementation Clarity: The codebase should be well-organized, commented, and maintainable for future modifications or understanding.
Follow these steps to run the Web Content Q&A Tool on your local machine:
-
Prerequisites:
- Python 3.8 or higher must be installed on your system.
- Pip (Python package installer) should be installed.
-
Install Python Libraries: Open your terminal or command prompt and run the following command to install the necessary Python libraries:
pip install -r requirements.txt
-
Get a Google AI Studio API Key:
- Go to Google AI Studio and create a project.
- Generate an API key for the Gemini API within your project.
- Important Security Note: For local testing, you will enter this API key directly into the application's text box. This is NOT a secure method for production deployments.
-
Run the Streamlit App: Navigate to the directory where you saved the
app.pyfile in your terminal. Run the Streamlit application using the command:streamlit run app.py
-
Access the App in Your Browser: Streamlit will provide a local URL in your terminal (usually
http://localhost:8501). Open this URL in your web browser to access the Web Content Q&A Tool.- Enter your Google AI Studio API key into the provided text box.
- Enter the website URLs you want to query (one URL per line).
- Click either "Ingest URLs" or "Ingest all subdomains" to process the website content.
- Ask your question in the question input box.
- Click "Ask Question" to get your answer.
Warning: Entering your API key directly into the code is highly insecure and is only recommended for local testing. Do not use this method for production or publicly accessible deployments.
For secure deployment, especially if you are using Streamlit Cloud or other platforms, you should use secure methods to manage your API keys, such as:
-
Streamlit Secrets (Recommended for Streamlit Cloud):
-
In your Streamlit Cloud app settings, define a secret named
GOOGLE_API_KEYand paste your actual API key as the value. -
In your
app.pycode, uncomment the API key input text box or replace it with the following line to load the API key from secrets:api_key = st.secrets["GOOGLE_API_KEY"]
And revert the
initialize_llm,initialize_embeddings, andingest_urlsfunctions back to usingGOOGLE_API_KEYdirectly instead of passing it as an argument. -
Deploy your application to Streamlit Cloud.
-
-
Environment Variables (For other hosting platforms): Configure your hosting environment to set an environment variable named
GOOGLE_API_KEYwith your API key value. Access it in your Python code usingos.environ.get("GOOGLE_API_KEY").
Steps for Streamlit Cloud Deployment (using Secrets):
- Push your code to a GitHub repository.
- Sign up for Streamlit Cloud at streamlit.io/cloud.
- Connect your GitHub repository to Streamlit Cloud.
- Set up your API Key as a Secret in Streamlit Cloud: In your Streamlit Cloud app's settings, add a secret named
GOOGLE_API_KEYand paste your Gemini API key as the value. - Revert your code/uncomment the st.secrets part to use
st.secrets["GOOGLE_API_KEY"]for secure API key loading (as mentioned above). - Deploy your app from your GitHub repository in Streamlit Cloud.
[Link to your GitHub Repository will be here]
This is a basic implementation of a Web Content Q&A Tool and can be further enhanced. Potential future improvements could include:
- More robust error handling and user feedback.
- Improved UI/UX design.
- More advanced text processing and chunking strategies for better content ingestion.
- Exploration of different Langchain chain types and retrieval methods for optimized question answering.
- Support for different document loaders and file types.
- Addition of OpenAI api as alternate to google gemini api
Feel free to contribute to this project or use it as a starting point for your own web content analysis tools!