## Generate dataset

The number of permutations this generator could create depends on the range of numbers, the operations, and the variations in the format of problems. Let me break it down:

1. **Range of Numbers**: 
   - Your generator picks numbers between 1 and 100 for both `num1` and `num2`.
   - This gives **100 × 100 = 10,000** possible number combinations.

2. **Operations**:
   - You have four operations: `plus`, `minus`, `times`, and `divided by`.
   - For each number combination, you can apply one of these operations, giving **10,000 × 4 = 40,000** combinations.

3. **Division Specifics**:
   - When dividing, if the denominator (`num2`) is zero, it adjusts to avoid division by zero, ensuring valid outputs.

4. **Word Conversion**:
   - Since the generator uses `num2words` to convert numbers and results into words, the representation of problems as sentences in words doesn't add new permutations but provides linguistic variety.

So, in total, the generator can theoretically produce **40,000 unique permutations** of math problems. If duplicates are avoided using the `seen_problems` set, this ensures the dataset is maximally diverse.

If you'd like to explore extending the range of numbers, adding more operations (like exponentiation), or including different formats of questions, the number of permutations can grow significantly! 🚀 Let me know if you'd like to explore that further!

In [None]:
import pandas as pd
import random
from num2words import num2words

class SimpleMathProblemsGenerator:
    def __init__(self, num_samples=1000, output_file="simple_math_problems_addition_only.csv"):
        self.num_samples = num_samples
        self.output_file = output_file
        self.operations = [
            ("plus", lambda x, y: x + y),
            ("minus", lambda x, y: x - y),
            ("times", lambda x, y: x * y),
            ("divided by", lambda x, y: round(x / y, 2) if y != 0 else None)
        ]
        self.existing_questions_and_answers = set()  # Set to track unique problems

    def generate_problem(self):
        while True:
            num1 = random.randint(1, 100)
            num2 = random.randint(1, 100)
            operation, func = random.choice(self.operations)

            if operation == "divided by" and num2 == 0:
                num2 = random.randint(1, 100)

            result = func(num1, num2)
            result_word = (
                num2words(result).replace("-", " ").replace(",", "")
                if result is not None else "undefined"
            )
            question = f"{num2words(num1)} {operation} {num2words(num2)}".replace("-", " ")
            answer = result_word

            # Ensure uniqueness
            if (question, answer) not in self.existing_questions_and_answers:
                self.existing_questions_and_answers.add((question, answer))
                return question, answer

    def generate_dataset(self):
        data = [self.generate_problem() for _ in range(self.num_samples)]
        df = pd.DataFrame(data, columns=["Problem", "Solution"])
        return df

    def save_dataset(self):
        df = self.generate_dataset()
        df.to_csv(self.output_file, index=False)
        print(f"Dataset saved to {self.output_file}")

# Example usage
if __name__ == "__main__":
    generator = SimpleMathProblemsGenerator(num_samples=10000)
    generator.save_dataset()

Dataset saved to simple_math_problems_addition_only.csv
