## To make the cost manageable, what else can I do?
Throughout this course, I dedicated a significant amount of time to building my ideation tool—a project I became deeply invested in and plan to continue developing even after the semester ends. As the course wraps up, I began considering a shift to a smaller, more cost-effective LLM, especially for long-term development and scalability.

![](hero-img.jpg)

Before making that switch, however, I wanted to evaluate whether the quality of task completion from a smaller model would be comparable to GPT-4o. In this post, I’ll be documenting my testing process with the Mistral model, examining its performance in detail to see if it meets the standards I’ve set during my work with GPT-4o.

#### Choosing the model
There are several high-quality small-scale LLMs available on Ollama—such as Llama2, Mistral, Gemma, and Phi-2—and I wanted to choose the one most aligned with the goals of my ideation tool. After evaluating their respective strengths, I ultimately decided to use Mistral 7B.

Mistral 7B stands out as an excellent all-purpose model, offering a strong combination of reasoning power and language generation quality, especially given its relatively compact size. Among the models I considered, it offered the best balance between performance, creativity, and local efficiency, which is crucial for generating diverse and innovative product ideas.

#### Running Mistral 7B locally
I wrote a simplified version of my original code using the LangGraph framework. This version replicates the essential functionality of the original, including idea generation, concept expansion, and concept combination, while being more lightweight and easier to test and refine.

![](code-snippet.png)

Then I started running Mistral locally.

![](running-mistral.png)

#### Formating problem
Problem quickly showed up when I asked the LLM to do ideation through exploring emotional root causes. Based on error message, it seems that Mistral was able to generate output but it didn't perfectly follow the formating requirements.

```
    ValueError: Invalid format specifier ' "Fear of letting others down",
      "explanation": "Social and professional pressures can make people fear judgment from peers.",
      "productDirection": "Use progress-sharing with supportive feedback loops instead of performance scores."
    ' for object of type 'str'
```

#### Simplier output but similar content overall
Once I fixed the formatting issue by providing a more explicit sample output, the code ran smoothly. Since I was working with a simplified version of the original code, Mistral wasn’t able to produce highly detailed responses. However, after comparing the actual content generated, I noticed that LLMs tend to produce fairly similar outputs, regardless of model size—especially when the task is well-structured.

However, the combined product feature didn’t perform as well. Due to the simplified structure of the concept branches, the resulting product ideas felt a bit cliché and lacked originality—nowhere near the level of quality you’d expect from a successful, commercially viable concept.

```
- b8: Real-time Feedback Via Virtual Tutor
- b9: Instant Quiz Feature
- b10: Gamified Reward System
- b11: Point System
- b12: Level System
- b13: Competitive Leaderboards

Enter branch IDs to combine (space-separated, e.g., 'b1 b2 b3'): b1 b8 b11
Created product branch b14: Lecture Quest: Gamified Virtual Tutor

Created combined product: Lecture Quest: Gamified Virtual Tutor
Explanation: Lecture Quest gamifies learning by integrating quizzes, mini-games, augmented reality elements, and an AI-powered virtual tutor that provides real-time feedback. Earning points through active engagement fuels students' motivation to stay focused during lectures.
Features:
- Interactive Quizzes
- Mini-Games
- Augmented Reality Elements
- AI-Powered Virtual Tutor for Real-Time Feedback
- Dynamic Point System
```

#### Mistral has the potential to produce higher quality ideas with better prompt engineering
Based on my current tests, I was genuinely impressed by Mistral’s ability to generate output of comparable quality to larger models—despite its smaller scale. This demonstrates that with the right setup, smaller LLMs can be highly effective for creative tasks like product ideation.

There’s still plenty of room for further exploration—refining prompts, enforcing more structured outputs, and experimenting with other lightweight models. But this trial reinforces my belief that with well-crafted prompts, smaller LLMs like Mistral are more than capable of handling complex tasks efficiently and affordably.