##GenAI Chatbot: Kickstarter Trend Advisor

This Generative AI chatbot leverages Mixtral, an open-source Large Language Model (LLM), enhanced with Retrieval-Augmented Generation (RAG) to provide up-to-date insights on Kickstarter trends.

The chatbot retrieves contextual information by scraping the official Kickstarter stats page. When a user asks a question, the relevant scraped data is retrieved and passed along with the query to the Mixtral model. This enables the chatbot to generate context-aware, real-time answers about current crowdfunding trends.

⚙️ Key Libraries & Tools:

LangChain: A framework that simplifies integration with LLMs and retrieval tools. It allows flexibility to switch between different LLMs in the future.

Mixtral: The primary LLM used here, chosen for its open-source accessibility.

Sentence Transformers: Used for embedding text and facilitating semantic search in RAG.






1. Installing the necessary libraries

In [1]:
# First, install the correct LangChain and community module
!pip install -U langchain langchain-community
!pip install docx
!pip install exceptions
!pip install python-docx --upgrade
!pip install python-dotenv
!pip install PyPDF2
!pip install pypdf python-dotenv
!pip install langchain_community
!pip install streamlit

# Correct import path
from langchain_community.llms import Together

Collecting langchain-community
  Downloading langchain_community-0.3.24-py3-none-any.whl.metadata (2.5 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.9.1-py3-none-any.whl.metadata (3.8 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting python-dotenv>=0.21.0 (from pydantic-settings<3.0.0,>=2.4.0->langchain-community)
  Downloading python_dotenv-1.1.0-py3-none-any.whl.metadata (24 kB

2. Importing the necessary libraries and packages

In [10]:
import os
from dotenv import load_dotenv
from openai import OpenAI
from PyPDF2 import PdfReader
import glob


import toml
from dotenv import load_dotenv
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatAnthropic
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from langchain.schema.document import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from tqdm import tqdm

#config = toml.load("config.toml")

import requests
import tiktoken
from bs4 import BeautifulSoup
from dotenv import load_dotenv
from typing import Dict

3. Setting up Mistral LLM

In [11]:
# Create the .env file with your key
with open('/content/.env', 'w') as f:
    f.write('MISTRAL_API_KEY=ur5iHlkkzSfja9rCx3OKYge3tmfLPgeP\n')

# Verify it was created correctly
!cat /content/.env  # Should show your key (but be careful with this!)

MISTRAL_API_KEY=ur5iHlkkzSfja9rCx3OKYge3tmfLPgeP


In [12]:
# First install python-dotenv if not already installed

# Load environment variables from .env file
env_path = "/content/.env"  # Path to your .env file
if os.path.exists(env_path):
    load_dotenv(env_path)
else:
    print(f"Warning: .env file not found at {env_path}")

# Get the API key - with fallback options
mistral_api_key = os.getenv("MISTRAL_API_KEY") or os.getenv("mistral_key")

if not mistral_api_key:
    raise ValueError("Mistral API key not found in environment variables. "
                    "Please set MISTRAL_API_KEY in your .env file or environment.")

# Set up Mistral client
from openai import OpenAI

client = OpenAI(
    api_key=mistral_api_key,
    base_url="https://api.mistral.ai/v1"
)

4. Initialising Mistral LLM

In [13]:
# Load environment variables
load_dotenv()

# Initialize Mistral client
client = OpenAI(
    api_key=os.getenv("MISTRAL_API_KEY"),
    base_url="https://api.mistral.ai/v1"
)

# Initialize tokenizer
tokenizer = tiktoken.get_encoding("cl100k_base")

5. Web Scraping: Extract the latest Kickstarter project data via web scraping from the official website.

In [14]:
def fetch_kickstarter_stats() -> str:
    """Fetch and clean Kickstarter stats page content"""
    url = "https://www.kickstarter.com/help/stats"
    try:
        headers = {'User-Agent': 'Mozilla/5.0'}
        response = requests.get(url, headers=headers)
        response.raise_for_status()

        soup = BeautifulSoup(response.text, 'html.parser')

        # Remove unwanted elements
        for element in soup(['script', 'style', 'nav', 'footer', 'iframe']):
            element.decompose()

        # Focus on the stats content
        stats_content = soup.find('div', class_='container') or soup
        return stats_content.get_text('\n', strip=True)

    except Exception as e:
        print(f"Error fetching Kickstarter stats: {str(e)}")
        return ""

6. Utilize scraped Kickstarter statistics as contextual data for the RAG-based LLM.

In [20]:
def analyze_kickstarter_trends(question: str, context: str = None) -> str:
    """Analyze Kickstarter trends using Mistral"""
    if context is None:
        context = fetch_kickstarter_stats()

    try:
        response = client.chat.completions.create(
            model="mistral-large-latest",
            messages=[
                {
                    "role": "system",
                    "content": f"""You are a Kickstarter trends analyst.
                    Use this data to answer questions about crowdfunding trends:
                    {context}

                    Provide accurate, data-driven responses with relevant statistics when available.
                    If the question can't be answered from the data, say so. Present answers neatly , bullet points if required.Don't use ** it does not show bold text."""
                },
                {
                    "role": "user",
                    "content": question
                }
            ],
            temperature=0.3
        )
        return response.choices[0].message.content
    except Exception as e:
        return f"Error analyzing trends: {str(e)}"

#usage
if __name__ == "__main__":
    # First fetch the data (could cache this)
    stats_data = fetch_kickstarter_stats()

7. User Interaction

The user can input questions related to current Kickstarter trends in the questions list. The chatbot is designed to:

- Assist users in identifying low-risk crowdfunding projects more effectively.

- Improve the likelihood of returns by recommending promising, data-backed projects.

- Provide detailed insights into a specific project by entering its name or URL — including how its characteristics may influence funding success.




In [21]:
# Sample questions
#Enter further questions in the questions list
questions = [
        "What percentage of projects get fully funded?",
        "What are the success rates for different project categories?",
        "What's the average pledge amount for successful projects?",
        "What trends have you noticed in project success rates over time?",
        "What are the number of successfully funded projects Crafts",
        "What combination of filters is best to find projects which are likely to be successful",
        "Is this project likely to be successful give alternatives https://www.kickstarter.com/projects/cristianluca/from-farm-to-fiber-luca-s-yarns-sustainable-and-ethical?ref=discovery_category&total_hits=39372&category_id=263"
    ]


In [22]:
#Printing questions and answers

for question in questions:
        print(f"Q: {question}")
        answer = analyze_kickstarter_trends(question, stats_data)
        print(f"A: {answer}\n")

Q: What percentage of projects get fully funded?
A: Based on the provided data, the percentage of projects that get fully funded on Kickstarter is 42.25%.

Here's the calculation:
- Total successfully funded projects: 279,223
- Total launched projects: 663,855

Percentage of fully funded projects = (Successfully funded projects / Total launched projects) * 100
= (279,223 / 663,855) * 100
= 42.25%

Q: What are the success rates for different project categories?
A: Based on the provided data, here are the success rates for different project categories on Kickstarter:

- Games: 51.35%
- Design: 43.98%
- Technology: 24.16%
- Film & Video: 38.30%
- Publishing: 39.27%
- Music: 50.48%
- Comics: 68.13%
- Fashion: 31.42%
- Art: 49.18%
- Food: 25.87%
- Photography: 35.84%
- Theater: 59.66%
- Crafts: 27.57%
- Journalism: 23.62%
- Dance: 61.01%

These success rates are calculated as the percentage of successfully funded projects out of the total number of launched projects in each category.

Q: Wh