In [1]:
!pip install -q groq langchain_community langchain_groq


[notice] A new release of pip is available: 24.0 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
import os
from groq import Groq
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory
from langchain_groq import ChatGroq

In [3]:
import os
groq_api_key = os.getenv("GROQ_API_KEY")


In [12]:
llm = ChatGroq(
    api_key =groq_api_key,
    model="llama-3.3-70b-versatile",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
    # other params...
)

In [13]:
custom_prompt_template_for_chatbot = """
You are a knowledgeable assistant specializing in Data Science and Artificial Intelligence (AI).

Your primary objective is to assist students by providing clear, concise, and accurate answers to their questions specifically related to Data Science and AI. This includes, but is not limited to, the following topics:
- Programming languages and tools: Python, SQL (MySQL, SQLite, MongoDB)
- Data visualization tools: Power BI, Tableau
- Statistical concepts and methodologies
- Machine Learning (ML) techniques and frameworks
- MLFlow for managing machine learning workflows
- Containerization with Docker
- Deep Learning concepts and frameworks
- Natural Language Processing (NLP)
- Generative AI technologies
- Skills required for a career in Data Science and AI

When responding, ensure that your answers are focused and straightforward, avoiding unnecessary details. If users ask complex questions, break down your responses into manageable parts and provide step-by-step explanations when needed.

Always be polite and encouraging, ensuring that you provide accurate information at all times.

If a question is asked that falls outside the realm of Data Science and AI or does not relate to the topics mentioned above, respond with a polite message indicating that the question is unrelated. For example: "I'm sorry, but that topic is outside the scope of Data Science and AI. I'm unable to provide an answer."

Remember previous exchanges in the conversation to provide better context for your responses.

History: {history}

Context: {context}

Question: {question}
"""


In [14]:
prompt = PromptTemplate(template=custom_prompt_template_for_chatbot,
                            input_variables=['history','context', 'question'])

memory = ConversationBufferMemory(input_key="question", memory_key="history")

In [15]:
# Function to format output in Markdown
def format_output(response):
    if isinstance(response, str):
        # Check if the response contains code (We can customize this check)
        if response.startswith("``````"):
            return response  # Return as is if it's already in code block
        else:
            # Format as markdown for theoretical responses
            formatted_response = f"# Response\n\n{response}"
            return formatted_response
    return response

In [16]:
chain = prompt | llm

def invoke_chain(question):
    response = chain.invoke({"history": memory, "context": "", "question": question})
    return format_output(response)

In [17]:
question = "write a python code to train linear regression model from taking data till predictions"
response = invoke_chain(question)
print(response)

content="Here's a step-by-step Python code to train a linear regression model:\n\n**Step 1: Import necessary libraries**\n```python\nimport pandas as pd\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.linear_model import LinearRegression\nfrom sklearn.metrics import mean_squared_error, r2_score\nimport numpy as np\n```\n\n**Step 2: Load the data**\n```python\n# Let's assume we have a dataset in a CSV file named 'data.csv'\ndata = pd.read_csv('data.csv')\n```\n\n**Step 3: Prepare the data**\n```python\n# Let's assume we have two columns: 'features' (independent variable) and 'target' (dependent variable)\nX = data[['features']]\ny = data['target']\n```\n\n**Step 4: Split the data into training and testing sets**\n```python\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n```\n\n**Step 5: Create and train the linear regression model**\n```python\nmodel = LinearRegression()\nmodel.fit(X_train, y_train)\n```\n\n**Step 6: Make p

In [18]:
while True:
    question = input("Ask a question (or type 'exit' to quit): ")
    if question.lower() == 'exit':
        break

    response = invoke_chain(question)
    print(response)

content='Data science is an interdisciplinary field that combines elements of computer science, statistics, and domain-specific knowledge to extract insights and knowledge from data. It involves using various techniques, such as machine learning, data visualization, and statistical modeling, to analyze and interpret complex data sets.\n\nData science typically involves the following steps:\n\n1. **Data collection**: Gathering data from various sources, such as databases, APIs, or files.\n2. **Data preprocessing**: Cleaning, transforming, and formatting the data for analysis.\n3. **Exploratory data analysis**: Visualizing and summarizing the data to understand its distribution, patterns, and relationships.\n4. **Modeling**: Applying machine learning or statistical models to the data to make predictions, classify outcomes, or identify trends.\n5. **Evaluation**: Assessing the performance of the models and refining them as needed.\n6. **Deployment**: Implementing the models in a productio

: 

: 