# Auto-Tagging using a Controlled Vocabulary - IPTC Media Topics

This notebook demonstrates 'auto-tagging' using the IPTC Media Topics controlled vocabulary. 

A Google Gemini generative AI model is used to classify text. Its outputs are constrained using structured json outputs and the terms defined in the controlled vocabulary.

## Import python packages

In [None]:
import json
import os
import requests
from IPython.display import display, Markdown
from collections import deque
from dotenv import load_dotenv
from google import genai

load_dotenv()

GOOGLE_AI_API_KEY = os.getenv("GOOGLE_AI_API_KEY")

## Read the IPTC Media Topics Controlled Vocabulary

Media Topics is a constantly updated taxonomy of over 1,200 terms with a focus on categorising text.

Originally based on the IPTC Subject Codes taxonomy, the Media Topics taxonomy was first released  in 2010 and is updated at least once a year.

https://iptc.org/standards/media-topics/


In [58]:
MEDIATOPICS_URL = "https://cv.iptc.org/newscodes/mediatopic?lang=en-US&format=json"

MEDIATOPICS_PATH = "./schema/mediatopic_cptall-en-US.json"

# Function to download the Media Topics JSON file
def download_mediatopics_json():
    try:
        # request media topics
        response = requests.get(MEDIATOPICS_URL)
        # check if the request was successful
        response.raise_for_status()
        # parse the JSON content into a dictionary
        data = response.json()
        # create the schema directory if it doesn't exist
        os.makedirs("./schema", exist_ok=True)
        # write the data to the JSON file
        with open(MEDIATOPICS_PATH, 'w') as f:
            json.dump(data, f, indent=4)

    except requests.exceptions.RequestException as e:
        print(f"Error during request: {e}")
    except json.JSONDecodeError as e:
        print(f"Error decoding JSON: {e}")
    except IOError as e:
        print(f"Error writing to file: {e}")

# Download the Media Topics JSON file if it doesn't exist
if not os.path.exists(MEDIATOPICS_PATH):
    print("Downloading Media Topics Controlled Vocabulary from IPTC web")
    download_mediatopics_json()
# Load the Media Topics JSON file
with open(MEDIATOPICS_PATH, "r") as file:
    media_topics = json.load(file)

In [None]:
# Display the Media Topics keys
media_topics.keys()

dict_keys(['@context', 'uri', 'type', 'prefSchemeAlias', 'authority', 'copyrightHolder', 'licenceLink', 'dateReleased', 'prefLabel', 'definition', 'note', 'hasTopConcept', 'conceptSet'])

In [None]:
# Display the Media Topics schema excluding 'hasTopConcept' and 'conceptSet'
for key, val in media_topics.items():
    if key in ('hasTopConcept', 'conceptSet'):
        continue
    print(f"{key}: {val}")

@context: https://www.iptc.org/std/IKOS/IKOS.jsonld
uri: http://cv.iptc.org/newscodes/mediatopic/
type: http://www.w3.org/2004/02/skos/core#ConceptScheme
prefSchemeAlias: medtop
authority: http://www.iptc.org
copyrightHolder: IPTC, International Press Telecommunications Council - https://iptc.org
licenceLink: http://creativecommons.org/licenses/by/4.0/
dateReleased: 2025-10-09T12:00:00+00:00
prefLabel: {}
definition: {}
note: {}


In [113]:
# What is the number of top concepts and total concepts in the Media Topics vocabulary?
print(f"There are {len(media_topics['hasTopConcept'])} top concepts in the Media Topics vocabulary.")
print(f"There are a total of {len(media_topics['conceptSet'])} concepts in the Media Topics vocabulary.")

There are 17 top concepts in the Media Topics vocabulary.
There are a total of 1392 concepts in the Media Topics vocabulary.


In [62]:
# Let's load the concepts into a dictionary for easier access later
concepts_dict = {concept['qcode']: concept for concept in media_topics['conceptSet']}

## Top Concepts

Let's take a look at the top-level concepts (or broad headings) and read their definitions.

In [None]:
# Let's look at the top concepts and count their narrower concepts
top_concept_dict = {}  # dictionary to hold top concepts and their narrower concepts
for concept_uri in media_topics.get("hasTopConcept", []):
    # extract the concept qcode from the URI
    qcode = f"medtop:{concept_uri.split('/')[-1]}"
    concept = concepts_dict.get(qcode)
    # add the broad concept to the dictionary
    concept_label= concept.get('prefLabel', {}).get('en-US', 'None')
    top_concept_dict[concept_label] = []

    # Now let's traverse the narrower concepts using a queue (BFS) and count them and collect their labels
    n = len(concept.get('narrower', []))
    queue = deque(concept.get('narrower', []))
    while queue:
        current_concept_qcode = queue.popleft()
        current_concept = concepts_dict.get(current_concept_qcode)
        current_concept_label = current_concept.get('prefLabel', {}).get('en-US')
        if current_concept_label:
            top_concept_dict[concept_label].append(current_concept_label)
        for narrower_concept in current_concept.get('narrower', []):
            queue.append(narrower_concept)
            n += 1

    # display the broad concept label and definition
    display(Markdown(f"### {concept_label}"))
    display(Markdown(f"**Definition:** {concept.get('definition', {}).get('en-US', 'None')}"))
    # display the total number of narrower concepts
    display(Markdown(f"**Total narrower concepts:** {n}"))

### arts, culture, entertainment and media

**Definition:** All forms of arts, entertainment, cultural heritage and media

**Total narrower concepts:** 73

### crime, law and justice

**Definition:** The establishment and/or statement of the rules of behavior in society, the enforcement of these rules, breaches of the rules, the punishment of offenders and the organizations and bodies involved in these activities

**Total narrower concepts:** 70

### disaster, accident and emergency incident

**Definition:** Man made or natural event resulting in loss of life or injury to living creatures and/or damage to inanimate objects or property

**Total narrower concepts:** 35

### economy, business and finance

**Definition:** All matters concerning the planning, production and exchange of wealth.

**Total narrower concepts:** 278

### education

**Definition:** All aspects of furthering knowledge, formally or informally

**Total narrower concepts:** 26

### environment

**Definition:** The protection, damage, and condition of the ecosystem of the planet Earth and its surroundings

**Total narrower concepts:** 32

### health

**Definition:** All aspects of physical and mental well-being

**Total narrower concepts:** 67

### human interest

**Definition:** Item that discusses individuals, groups, animals, plants or other objects in an emotional way

**Total narrower concepts:** 15

### labor

**Definition:** Social aspects, organizations, rules and conditions affecting the employment of human effort for the generation of wealth or provision of services and the economic support of the unemployed.

**Total narrower concepts:** 34

### lifestyle and leisure

**Definition:** Activities undertaken for pleasure, relaxation or recreation outside paid employment, including eating and travel.

**Total narrower concepts:** 58

### politics and government

**Definition:** Local, regional, national and international exercise of power, the day-to-day running of government, and the relationships between governing bodies and states.

**Total narrower concepts:** 107

### religion

**Definition:** Belief systems, institutions and people who provide moral guidance to followers

**Total narrower concepts:** 64

### science and technology

**Definition:** All aspects pertaining to human understanding of, as well as methodical study and research of natural, formal and social sciences, such as astronomy, linguistics or economics

**Total narrower concepts:** 55

### society

**Definition:** The concerns, issues, affairs and institutions relevant to human social interactions, problems and welfare, such as poverty, human rights and family planning

**Total narrower concepts:** 73

### sport

**Definition:** Competitive activity or skill that involves physical and/or mental effort and organizations and bodies involved in these activities

**Total narrower concepts:** 354

### conflict, war and peace

**Definition:** Acts of socially or politically motivated protest or violence, military activities, geopolitical conflicts, as well as resolution efforts

**Total narrower concepts:** 30

### weather

**Definition:** The study, prediction and reporting of meteorological phenomena

**Total narrower concepts:** 4

## Gemini Classifier

Let's use Google Gemini to classify text using the broad terms from our controlled vocabulary.

We'll define a json response schema that represents our controlled vocabulary. Each enum value is a concept in the vocabulary.

The Google Gemini API seems to have a limit on the number of terms in a schema, maxing out at around 100.

Let's first test out a response schema that includes only the 17 top-level concepts of the Media Topics vocabulary.

In [88]:
# Define the response schema for classification
def load_json_response_schema(concepts: list[str]) -> dict:
    """Load a JSON schema for the given concepts."""

    response_schema = {
        '$defs': {
            'Tags': {
                'enum': concepts, 
                'title': 'Tags', 
                'type': 'string'
                }
            }, 
        'properties': {
            'keywords': {
                'items': {
                    '$ref': '#/$defs/Tags'
                    }, 
                'title': 'Keywords', 
                'type': 'array'
                }
            }, 
        'required': ['keywords'], 
        'title': 'Metadata', 
        'type': 'object'
        }
    
    return response_schema

broad_response_schema = load_json_response_schema(list(top_concept_dict.keys()))

In [89]:
# Initialize the GenAI client
client = genai.Client(api_key=GOOGLE_AI_API_KEY)

In [90]:
# Function to classify media topics using GenAI
def classify_media_topics(content: str, response_schema: dict) -> genai.types.GenerateContentResponse:
    response = client.models.generate_content(
        model="gemini-2.5-flash",
        contents=[content],
        config=genai.types.GenerateContentConfig(
            system_instruction="Extract relevant media topics from the text based on the IPTC Media Topics Controlled Vocabulary. Respond with a JSON object containing an array of 'keywords' that correspond to the 'qcode' values from the Media Topics vocabulary. Only include keywords that are directly relevant to the content provided. Do not include any additional text or explanation outside of the JSON object.",
            temperature=1.0,
            response_mime_type="application/json",
            response_schema=response_schema
        )
    )
    return response

### Write some text...

This is our text that we will be tagging.

In [83]:
text_to_classify = """The digital commons are burning. Scroll through any LinkedIn comment section under an AI-generated image, and you will feel the heat.
"You created nothing."
"This is theft."
"You are just a commissioner, not an artist."
The anger is palpable. It is the anger of the craftsman watching the factory rise. It is the defense of "sweat equity." The belief that art must be difficult, that the soul leaks onto the canvas only through physical exhaustion.
But this picket line is being walked sixty years too late.
The definition of art was not broken by a tech bro in Silicon Valley in 2024. It was dismantled, piece by piece, in the lofts of New York City in the 1960s. A group of artists called Fluxus already severed the hand from the tool. They taught us that the art is not in the execution.
The art is in the choice.
The Prompt is just a Score (George Brecht)
Critics argue that typing a text prompt isn't creating. You are just giving orders.
In 1961, George Brecht wrote a piece titled Word Event. The entire artwork consisted of one word: "EXIT."
That was it. The viewer had to enact it. The "score" was a set of instructions; the reality was the rendering. When you type “imagine a red cube” into Midjourney, you are not cheating. You are writing a Fluxus Event Score. You provide the code; the machine provides the rendering. Brecht realized that the artist is not the builder. The artist is the architect of the situation.
The Machine plays itself (Joe Jones)
"But the machine does all the work!" they shout. "You don't even know how to hold a brush."
Joe Jones, the "Music Machine Man," didn't hold a violin bow. He built mechanical orchestras; violins fitted with motors, drums beaten by rubber bands. He opened the Tone Deaf Music Store in 1969, where the public could push a button and make the art happen.
Jones removed the virtuoso. He proved that music wasn't about the dexterity of fingers on a fretboard; it was about the organization of sound. AI removes the illustrator. It asks: is art the movement of the wrist, or the organization of the pixel?
The Idea is the Engine (Sol LeWitt)
The most damning accusation against AI is that it is "lazy." It bypasses the struggle.
Sol LeWitt, the father of Conceptual Art, handed us the defense decades ago: "The idea becomes a machine that makes the art."
LeWitt would write instructions for wall drawings—"Draw a line from the left corner to the center"—and let assistants execute them. He never touched the wall. Was he lazy? No. He understood that authorship lies in the concept, not the carpentry. If LeWitt can claim the wall drawn by an assistant, the modern creator can claim the image drawn by the algorithm. The assistant has simply changed from carbon to silicon.
Silence is just Latent Space (John Cage)
Finally, there is the fear that AI is just rearranging old data. That it is random noise.
John Cage sat at a piano for 4 minutes and 33 seconds and played nothing. He framed the silence. He forced the audience to listen to the ambient noise of the room and called it music.
An AI model is infinite noise. It is a "latent space" of chaos. The artist’s job is no longer to apply paint. The artist's job is to frame the noise. To reach into the chaos and pull out a specific moment of clarity. Selection is creation.
The Verdict
Anthropologist Ellen Dissanayake calls art "making special." It is the act of taking the ordinary and making it significant.
If a user types "cat" and posts it, they have made nothing special. That is a cheap signal. But the creator who wrestles with the prompt, who curates the output, who forces the machine to visualize a new reality—they are walking the path paved by Fluxus.
The anger you feel is real. It is the pain of a paradigm shift. But do not blame the software.
The ghost in the machine isn't a thief. It's just the spirit of 1960s avant-garde, finally accessible to everyone.
The brush is dead. Long live the idea.
"""

### Classify the text!

Send our text to the model for classification using the 17 top-level Media Topics concepts.

In [None]:
response_object = classify_media_topics(content=text_to_classify, response_schema=broad_response_schema)

### Classification Results

Let's see how our classifier classified our text. The response object can be parsed directly by the sdk to see the resulting list of keywords.

In [84]:
broad_classification_results = response_object.parsed
broad_classification_results

{'keywords': ['arts, culture, entertainment and media',
  'human interest',
  'science and technology',
  'society']}

In [111]:
print(f"Our text was classified into {len(broad_classification_results['keywords'])} broad media topics.")

Our text was classified into 4 broad media topics.


### Classify Narrow Concepts

Now that we have our content classified into broad concepts, let's run our text through again to label it with narrow concepts. 

We'll feed our text content into a model loaded with the narrow concepts for each broad concept.

In the worst case, we will make 18 model inferences; once to tag broad concepts and a maximum of 17 for the narrow concepts of each broad concept.

As long as the number of concepts in the vocabulary is less than ~100, we shouldn't run into any issues.

In [None]:
complete_classification_results = {}
for keyword in broad_classification_results.get("keywords", []):
    # get the narrower concepts for the broad concept
    narrow_concepts = top_concept_dict.get(keyword, [])
    # Load the response schema for the narrower concepts
    narrow_concept_response_schema = load_json_response_schema(narrow_concepts)
    # Classify the text again with the narrower concepts
    response_object = classify_media_topics(content=text_to_classify, response_schema=narrow_concept_response_schema)
    # Get the narrower classification results
    narrow_classification_results = response_object.parsed
    # Add to the complete results
    complete_classification_results[keyword] = narrow_classification_results.get("keywords", [])


### Complete Classification Results

Our classification process is complete. It's been tagged first by broad concepts, then tagged again to capture the narrow concepts of each.

In [96]:
# Display the complete classification results
for broad_concept, narrow_concepts in complete_classification_results.items():
    display(Markdown(f"### {broad_concept}"))
    display(Markdown(f"**Narrower Concepts:** {', '.join(narrow_concepts)}"))

### arts, culture, entertainment and media

**Narrower Concepts:** arts and entertainment, culture, mass media, music, theater, visual arts, cultural development, online media outlet, social media, musical instrument, musical performance, design (visual arts), drawing, painting

### human interest

**Narrower Concepts:** accomplishment, people

### science and technology

**Narrower Concepts:** artificial intelligence, technology and engineering, information technology and computer science, philosophy, sociology, history, anthropology, mechanical engineering, scientific innovation

### society

**Narrower Concepts:** social problem, values, ethics