<a href="https://colab.research.google.com/github/patrickfleith/datapipes/blob/main/Simple_Question_Generation_Distilabel_OpenAI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Easy Question Generation with distilabel and OpenAI
In this notebook I show you how to use OpenAI in distilabel to generate a bunch of questions.


## Install distilabel and OpenAI setup
- Here is how to install distilabel within a Google Colab notebook
- You need to configure the OpenAI key. If you have doubts, look at this notebook [here](https://github.com/patrickfleith/datapipes/blob/main/How_to_use_an_OpenAI_Chat_model.ipynb)

In [1]:
!pip install distilabel --upgrade --quiet

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m442.2/442.2 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m14.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.9/49.9 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m179.3/179.3 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the followi

In [11]:
import openai, distilabel
from distilabel.llms import OpenAILLM
from google.colab import userdata
from itertools import product

In [12]:
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')

In [13]:
# printing my version such you can check yours
print(f"Open AI version: {openai.__version__}")
print(f"Distilabel version: {distilabel.__version__}")

Open AI version: 1.54.3
Distilabel version: 1.4.1


## Configuration
For this demo notebook we want to generate **questions of several difficulty levels for given topics**

So we let you define some
- difficulty levels: no more than 4 is recommended.
- topics: you could be more specific and have a much longer list than the one below

💡 I created my list using an LLM. You see how it starts to be interesting?

In [19]:
# config
levels = ["easy", "typical", "difficult", "hardcore"]

topics = [
    # Classical Mechanics
    "Newton's Laws of Motion",
    "Kinematics and Dynamics",
    "Work, Energy, and Power",
    "Momentum and Collisions",
    "Rotational Motion and Angular Momentum",
    "Gravitation and Orbital Mechanics",
    "Fluid Mechanics",
    "Harmonic Motion and Oscillations",
    "Lagrangian and Hamiltonian Mechanics",

    # Thermodynamics and Statistical Mechanics
    "Temperature and Heat",
    "Laws of Thermodynamics",
    "Entropy and the Second Law",
    "Thermodynamic Processes",
    "Heat Engines and Refrigerators",
    "Statistical Mechanics and Probability",
    "Kinetic Theory of Gases",
    "Phase Transitions and Critical Phenomena",

    # Electromagnetism
    "Electrostatics (Coulomb’s Law, Electric Fields, Potential)",
    "Electric Circuits (Ohm’s Law, Kirchhoff’s Laws, AC/DC Circuits)",
    "Magnetostatics (Magnetic Fields, Biot-Savart Law, Ampère's Law)",
    "Electromagnetic Induction (Faraday's Law, Lenz’s Law)",
    "Maxwell's Equations",
    "Electromagnetic Waves and Light",
    "Optics (Geometric and Wave Optics)",

    # Quantum Mechanics
    "Wave-Particle Duality",
    "Schrödinger Equation",
    "Quantum States and Operators",
    "Quantum Harmonic Oscillator",
    "Angular Momentum and Spin",
    "Quantum Tunneling",
    "Heisenberg Uncertainty Principle",
    "Quantum Entanglement",
    "Atomic and Molecular Physics",

    # Relativity
    "Special Relativity (Time Dilation, Length Contraction, E=mc²)",
    "General Relativity (Gravitational Waves, Black Holes, Space-Time Curvature)",
    "Lorentz Transformations",
    "Relativistic Dynamics",

    # Nuclear and Particle Physics
    "Radioactivity and Nuclear Decay",
    "Nuclear Reactions and Fusion",
    "Particle Accelerators and Detectors",
    "Fundamental Particles (Quarks, Leptons, Bosons)",
    "The Standard Model of Particle Physics",
    "Quantum Chromodynamics (QCD)",
    "Symmetry and Conservation Laws",

    # Condensed Matter Physics
    "Crystallography and Lattice Structures",
    "Band Theory of Solids",
    "Semiconductors and Electronics",
    "Superconductivity",
    "Magnetism and Magnetic Materials",
    "Phase Transitions (Ferromagnetic, Antiferromagnetic, etc.)",
    "Nanomaterials and Nanotechnology",

    # Astrophysics and Cosmology
    "The Structure of Stars and Stellar Evolution",
    "Galaxies and Dark Matter",
    "Big Bang Theory and Cosmic Microwave Background",
    "Expansion of the Universe and Dark Energy",
    "Black Holes and Neutron Stars",
    "Exoplanets and Habitability",
    "Gravitational Lensing",

    # Plasma Physics
    "Basics of Plasma States",
    "Plasma Confinement and Fusion Energy",
    "Magnetohydrodynamics (MHD)",
    "Space and Astrophysical Plasmas",
    "Plasma Applications (Lasers, Propulsion)",

    # Mathematical Physics
    "Vector Calculus and Differential Equations",
    "Tensor Analysis and General Relativity",
    "Group Theory and Symmetry",
    "Complex Analysis and Fourier Transforms",
    "Computational Physics and Numerical Methods",

    # Modern and Applied Physics
    "Solid-State Physics",
    "Quantum Computing and Information Theory",
    "Materials Science",
    "Biophysics and Medical Physics",
    "Optoelectronics and Photonics",
    "Renewable Energy Technologies"
]

In [20]:
print(f"A total of {len(levels)*len(topics)} questions will be generated")

A total of 296 questions will be generated


# TCMMI framework
- Templates
- Constructor
- Messages
- Model
- Inference

In [21]:
# prompt template
prompt_template = "Ask me a {level} question about {topic}. Only respond with the question, nothing else."

# prompt constructor
prompts = [prompt_template.format(level=level, topic=topic) for level, topic in product(levels, topics)]

# messages assembly
inputs = [[{"role": "user", "content": prompt}] for prompt in prompts]

# llm
llm = OpenAILLM(model="gpt-4o-mini", api_key=OPENAI_API_KEY)
llm.load()

# inference
outputs = llm.generate_outputs(inputs=inputs, temperature=0.9)
outputs

[["What is Newton's first law of motion commonly known as?"],
 ['What is the formula for calculating the average velocity of an object?'],
 ['What is the formula for calculating work done when a force is applied over a distance?'],
 ['What is the law of conservation of momentum?'],
 ['What is the formula for calculating the angular momentum of an object rotating about an axis?'],
 ['What force keeps planets in orbit around the Sun?'],
 ['What is the main difference between a liquid and a gas in terms of their ability to flow?'],
 ['What is the term used to describe the time it takes for one complete cycle of oscillation in harmonic motion?'],
 ['What is the principle of least action in the context of Lagrangian mechanics?'],
 ['What is the difference between temperature and heat?'],
 ['What is the first law of thermodynamics commonly known as?'],
 ['What does the Second Law of Thermodynamics state about the direction of natural processes in terms of entropy?'],
 ['What is the process c