# Easy Question Generation with distilabel and OpenAI
In this notebook I show you how to use OpenAI in distilabel to generate a bunch of questions.

*Why are we using distilabel? couldn't we just do it with `openai`? Yes we could, here I just would like to introduce distilabel.*

## Install distilabel and OpenAI setup
- Below is how to install distilabel within a Google Colab notebook
- You need to configure the OpenAI key. If you have doubts, look at this notebook [here](https://github.com/patrickfleith/datapipes/blob/main/How_to_use_an_OpenAI_Chat_model.ipynb)

In [1]:
!pip install distilabel --upgrade --quiet

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/442.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m442.2/442.2 kB[0m [31m18.0 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/480.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m23.4 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/134.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m11.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.1/50.1 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [2]:
import openai, distilabel
from distilabel.llms import OpenAILLM
from google.colab import userdata
from itertools import product

In [3]:
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')

In [4]:
# printing my version such you can check yours
print(f"Open AI version: {openai.__version__}")
print(f"Distilabel version: {distilabel.__version__}")

Open AI version: 1.57.4
Distilabel version: 1.4.2


## Configuration
For this demo notebook we want to generate **questions of several difficulty levels for given topics**

So we let you define some
- difficulty levels: no more than 4 is recommended.
- topics: you could be more specific and have a much longer list than the one below

💡 I created my list using an LLM. You see how it starts to be interesting?

In [5]:
# config
levels = ["easy", "typical", "difficult", "hardcore"]

topics = [
    # Classical Mechanics
    "Newton's Laws of Motion",
    "Kinematics and Dynamics",
    "Work, Energy, and Power",
    "Momentum and Collisions",
    "Rotational Motion and Angular Momentum",
    "Gravitation and Orbital Mechanics",
    "Fluid Mechanics",
    "Harmonic Motion and Oscillations",
    "Lagrangian and Hamiltonian Mechanics",

    # Thermodynamics and Statistical Mechanics
    "Temperature and Heat",
    "Laws of Thermodynamics",
    "Entropy and the Second Law",
    "Thermodynamic Processes",
    "Heat Engines and Refrigerators",
    "Statistical Mechanics and Probability",
    "Kinetic Theory of Gases",
    "Phase Transitions and Critical Phenomena",

    # Electromagnetism
    "Electrostatics (Coulomb’s Law, Electric Fields, Potential)",
    "Electric Circuits (Ohm’s Law, Kirchhoff’s Laws, AC/DC Circuits)",
    "Magnetostatics (Magnetic Fields, Biot-Savart Law, Ampère's Law)",
    "Electromagnetic Induction (Faraday's Law, Lenz’s Law)",
    "Maxwell's Equations",
    "Electromagnetic Waves and Light",
    "Optics (Geometric and Wave Optics)",

    # Quantum Mechanics
    "Wave-Particle Duality",
    "Schrödinger Equation",
    "Quantum States and Operators",
    "Quantum Harmonic Oscillator",
    "Angular Momentum and Spin",
    "Quantum Tunneling",
    "Heisenberg Uncertainty Principle",
    "Quantum Entanglement",
    "Atomic and Molecular Physics",

    # Relativity
    "Special Relativity (Time Dilation, Length Contraction, E=mc²)",
    "General Relativity (Gravitational Waves, Black Holes, Space-Time Curvature)",
    "Lorentz Transformations",
    "Relativistic Dynamics",

    # Nuclear and Particle Physics
    "Radioactivity and Nuclear Decay",
    "Nuclear Reactions and Fusion",
    "Particle Accelerators and Detectors",
    "Fundamental Particles (Quarks, Leptons, Bosons)",
    "The Standard Model of Particle Physics",
    "Quantum Chromodynamics (QCD)",
    "Symmetry and Conservation Laws",

    # Condensed Matter Physics
    "Crystallography and Lattice Structures",
    "Band Theory of Solids",
    "Semiconductors and Electronics",
    "Superconductivity",
    "Magnetism and Magnetic Materials",
    "Phase Transitions (Ferromagnetic, Antiferromagnetic, etc.)",
    "Nanomaterials and Nanotechnology",

    # Astrophysics and Cosmology
    "The Structure of Stars and Stellar Evolution",
    "Galaxies and Dark Matter",
    "Big Bang Theory and Cosmic Microwave Background",
    "Expansion of the Universe and Dark Energy",
    "Black Holes and Neutron Stars",
    "Exoplanets and Habitability",
    "Gravitational Lensing",

    # Plasma Physics
    "Basics of Plasma States",
    "Plasma Confinement and Fusion Energy",
    "Magnetohydrodynamics (MHD)",
    "Space and Astrophysical Plasmas",
    "Plasma Applications (Lasers, Propulsion)",

    # Mathematical Physics
    "Vector Calculus and Differential Equations",
    "Tensor Analysis and General Relativity",
    "Group Theory and Symmetry",
    "Complex Analysis and Fourier Transforms",
    "Computational Physics and Numerical Methods",

    # Modern and Applied Physics
    "Solid-State Physics",
    "Quantum Computing and Information Theory",
    "Materials Science",
    "Biophysics and Medical Physics",
    "Optoelectronics and Photonics",
    "Renewable Energy Technologies"
]

In [6]:
print(f"A total of {len(levels)*len(topics)} questions will be generated")

A total of 296 questions will be generated


# The TICMI framework

This is the way I like to think of LLM usage *in general* (it largely depends on the actually used framework, LLM, open-source vs closed source etc...). But often the ingredients are the same:

- **Templates**: these are templates used to build prompt specific instructions.
    - For example: `"Do the following task: {task}. Answer in {language}"`

- **Instructions**: these are what I like to inject in templates.
    - For example: `task = "solve the equation for x"` or `language="French"`.

- **Constructor**: take a prompt template + instructions to make a prompt
    - For example: our prompt would become `Do the following task: solve the equation for x. Answer in French.`

- **Messages**: Often model like a list of messages with alternating roles (user/assistant or similar). Here I used the created prompt to make the messages.
    - For example:
```
[
    {'role': 'system', 'content': '...'},
    {'role': 'user', 'content': '...'}
]
```

- **Model**: Here you would load the model, choosing which model, and loading potentially some configuration needed (like API key) or other model-specific parameters.


- **Inference**: Here you pass the `messages` to the `model` along with possibly some specific parameters for that inference (temperature, max_token, etc...)

In [7]:
# prompt template
prompt_template = "Ask me a {level} question about {topic}. Only respond with the question, nothing else."

# prompt constructor
prompts = [prompt_template.format(level=level, topic=topic) for level, topic in product(levels, topics)]

# messages
inputs = [[{"role": "user", "content": prompt}] for prompt in prompts]

# model
llm = OpenAILLM(model="gpt-4o-mini", api_key=OPENAI_API_KEY)
llm.load()

# inference
outputs = llm.generate_outputs(inputs=inputs, temperature=0.9)
outputs

[['What is the first law of motion that states an object at rest stays at rest unless acted upon by an external force?'],
 ['What is the formula for calculating the distance traveled by an object under uniform acceleration?'],
 ['What is the formula for calculating kinetic energy?'],
 ['What is the law of conservation of momentum?'],
 ['What is the formula for calculating angular momentum (L) of an object rotating around an axis?'],
 ['What is the force that pulls objects toward each other due to their mass?'],
 ['What is the definition of viscosity in fluid mechanics?'],
 ['What is the definition of simple harmonic motion?'],
 ['What is the principle of least action in the context of Lagrangian mechanics?'],
 ['What is the freezing point of water in degrees Celsius?'],
 ['What is the first law of thermodynamics commonly known as?'],
 ['What does the Second Law of Thermodynamics state about the direction of spontaneous processes in isolated systems?'],
 ['What is the term for a process