# **Assignment: Code-Focused Inference**

Your task is to load a pre-trained GPT-2 model and configure it to answer *only* questions related to Python coding.

1. **Load Model and Tokenizer:** Load a suitable pre-trained GPT-2 model and its corresponding tokenizer. You can use `transformers.AutoModelForCausalLM` and `transformers.AutoTokenizer`. A smaller model like `gpt2` or `gpt2-medium` might be sufficient.
2. **Implement a Filtering Mechanism:** Before generating a response, check if the input prompt is related to Python coding. You can use simple keyword matching (e.g., "Python", "code", "function", "class", "import") or a more sophisticated approach using a text classification model (optional).
3. **Generate Response:** If the prompt is deemed a Python coding question, generate a response using the loaded GPT-2 model.
4. **Handle Non-Coding Questions:** If the prompt is not related to Python coding, return a predefined message indicating that the model can only answer coding questions.
5. **Test:** Test your implementation with various prompts, including both Python coding questions and non-coding questions, to ensure the filtering mechanism works correctly.

## Load Model and Tokenizer

## Implement a Filtering Mechanism

## Generate Response

In [23]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

#Load Model and Tokenizer

class PythonCodeGPT:
    def __init__(self, model_name='gpt2-medium'):
        print(f"Loading model: {model_name}...")
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModelForCausalLM.from_pretrained(model_name).to(self.device)
        self.tokenizer.pad_token = self.tokenizer.eos_token
        self.python_keywords = [
            'python', 'code', 'function', 'class', 'import', 'list',
            'dict', 'tuple', 'pandas', 'numpy', 'def', 'return', 'for',
            'while', 'try', 'except', 'lambda', 'pip'
        ]
        print(f"Model loaded successfully on device: {self.device}")

#Implement a Filtering Mechanism

    def _is_python_question(self, prompt: str) -> bool:
        prompt_lower = prompt.lower()
        return any(keyword in prompt_lower for keyword in self.python_keywords)


#Generate Response

    def generate_response(self, prompt: str, max_length: int = 100) -> str:
        if not self._is_python_question(prompt):
            return "I'm sorry, I can only answer questions about Python coding."

        inputs = self.tokenizer.encode(
            prompt,
            return_tensors='pt'
        ).to(self.device)

        outputs = self.model.generate(
            inputs,
            max_length=max_length,
            num_return_sequences=1,
            pad_token_id=self.tokenizer.eos_token_id,
            no_repeat_ngram_size=2
        )

        generated_text = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
        return generated_text

In [24]:
python_bot = PythonCodeGPT()

Loading model: gpt2-medium...


Model loaded successfully on device: cpu


## TEST

In [25]:
test_prompts = [
    "How do I write a function in Python to add two numbers?",
    "What is the capital of France?",
    "Can you show me a Python code example for reading a file?",
    "What's the weather like today?",
    "Explain the 'import' statement in Python."
]

print("\n Testing Prompts ")
for i, prompt in enumerate(test_prompts):
    print(f"\n[Test {i+1}]")
    print(f"User Prompt: {prompt}")
    response = python_bot.generate_response(prompt)
    print(f"Bot Response: {response}")


 Testing Prompts 

[Test 1]
User Prompt: How do I write a function in Python to add two numbers?
Bot Response: How do I write a function in Python to add two numbers?

The answer is to use the add function.
, , and are the two arguments to the function, and the result is the number of the second argument. The add() function takes two parameters, the first is a number and a string. If you want to write the same function as above, you can use a list of numbers. For example, to create a new function that adds two integers, we can write:

[Test 2]
User Prompt: What is the capital of France?
Bot Response: I'm sorry, I can only answer questions about Python coding.

[Test 3]
User Prompt: Can you show me a Python code example for reading a file?
Bot Response: Can you show me a Python code example for reading a file?

Yes, you can.
, the file is read from the standard input, and the output is written to the stdout. The file can be read by any program that can read files. For example, if you w

## DYNAMIC TESTING

In [26]:
print("Starting the interactive Python Code Bot...")
print("Type 'quit' or 'exit' to end the conversation.")
print("-" * 40)

while True:
    user_input = input("> You: ")

    if user_input.lower() in ['quit', 'exit']:
        print("\n🤖 Bot: Goodbye!")
        break

    response = python_bot.generate_response(user_input)
    print(f"\n🤖 Bot: {response}\n")

Starting the interactive Python Code Bot...
Type 'quit' or 'exit' to end the conversation.
----------------------------------------
> You: what is the difference between lists and tuples?

🤖 Bot: what is the difference between lists and tuples?

The difference is that lists are immutable, while tups are mutable.
, and are both immutable. Lists are also immutable in the sense that they are not mutably modified. This means that you can't change the contents of a list without changing the list itself. For example, if you want to change a value in a tuple, you must first change its contents. If you wanted to modify the value of the tuple itself,

> You: What is 817 divided by 29?

🤖 Bot: I'm sorry, I can only answer questions about Python coding.

> You: quit

🤖 Bot: Goodbye!




# **Conclusion**

* The project demonstrated how placing a **filter in front of a pre-trained LLM** can create a specialized chatbot.
* A **simple keyword filter** worked well, allowing only Python-related queries and rejecting off-topic ones, proving that even basic input control is effective.
* However, the **base model (gpt2-medium)** showed clear limitations. Its answers were often inaccurate—for example, incorrectly stating that “lists are immutable.” It produced text that looked technical but lacked factual correctness.
* The experiment highlights that **general-purpose models are insufficient** for expert tasks.
* Overall, the project underscored both the **power of filtering** and the **importance of domain-specific models**.

