# Build Structured Dataset from Meeting Notes

This notebook demonstrates how to extract structured action items (tasks, due dates, and owners) from unstructured meeting notes using a simulated OpenAI API call.

In [2]:
# Import necessary libraries
import pandas as pd
import json
from openai import OpenAI

# Initialize the OpenAI client (replace with actual API key configuration)
client = OpenAI()

# Sample DataFrame with meeting notes
df_meeting_notes = pd.DataFrame({
    'meeting_id': ['001', '002'],
    'meeting_notes': [
        """
        Discussed project deadlines. John is responsible for creating the project timeline, and it's due by September 15th. 
        Sarah will handle client communication, and she needs to send the initial report by September 20th. 
        The budget report will be prepared by Michael, but there's no set deadline yet.
        """,
        """
        The website redesign is in progress. Emily will create the new layout by October 1st. 
        Tom will review the SEO strategy by September 30th. 
        We need to finalize the new logo, and James is in charge, but no date has been set.
        """
    ]
})

# Display the meeting notes DataFrame
df_meeting_notes

Unnamed: 0,meeting_id,meeting_notes
0,1,\n Discussed project deadlines. John is...
1,2,\n The website redesign is in progress....


## Extract Action Items from Meeting Notes

We define a `MeetingNotesProcessor` class that:
- Extracts action items (tasks, owners, and due dates) using an API.
- Structures and normalizes the extracted data.


In [3]:
class MeetingNotesProcessor:
    def __init__(self, dataframe):
        self.df = dataframe

    def extract_action_items(self, meeting_notes):
        """
        Simulated OpenAI API response for extracting action items from meeting notes.
        The actual API call should be placed here.
        """
        response = client.chat.completions.create(
            model="gpt-4o-2024-08-06",
            messages=[
                {
                    "role": "system",
                    "content": "Extract action items, due dates, and owners from meeting notes."
                },
                {
                    "role": "user",
                    "content": meeting_notes
                }
            ],
            response_format={
                "type": "json_schema",
                "json_schema": {
                    "name": "action_items",
                    "strict": True,
                    "schema": {
                        "type": "object",
                        "properties": {
                            "action_items": {
                                "type": "array",
                                "items": {
                                    "type": "object",
                                    "properties": {
                                        "description": {"type": "string"},
                                        "due_date": {"type": ["string", "null"]},
                                        "owner": {"type": ["string", "null"]}
                                    },
                                    "required": ["description", "due_date", "owner"],
                                    "additionalProperties": False
                                }
                            }
                        },
                        "required": ["action_items"],
                        "additionalProperties": False
                    }
                }
            }
        )
        # Extract the action items from the response
        json_content = response.choices[0].message.content
        parsed_json = json.loads(json_content)
        return parsed_json['action_items']

    def process_notes(self):
        """
        Process the meeting notes by extracting action items and normalizing the data structure.
        """
        # Apply the extract_action_items function to each row in the 'meeting_notes' column
        self.df['action_items'] = self.df['meeting_notes'].apply(self.extract_action_items)
        
        # Normalize the 'action_items' and explode lists into separate rows
        exploded_df = self.df.explode('action_items')
        action_items_df = pd.json_normalize(exploded_df['action_items'])
        
        # Concatenate normalized action items with the original exploded DataFrame
        self.df = pd.concat([exploded_df.reset_index(drop=True), action_items_df], axis=1)
        
        # Drop the old 'action_items' column
        self.df.drop(columns=['action_items'], inplace=True)


## Build the Structured Dataset

We now create an instance of the `MeetingNotesProcessor` class and process the meeting notes.

In [4]:
# Create an instance of the processor and process the notes
processor = MeetingNotesProcessor(df_meeting_notes)
processor.process_notes()

# Display the processed DataFrame with structured action items
processor.df

Unnamed: 0,meeting_id,meeting_notes,description,due_date,owner
0,1,\n Discussed project deadlines. John is...,Create the project timeline,September 15th,John
1,1,\n Discussed project deadlines. John is...,Handle client communication and send the initi...,September 20th,Sarah
2,1,\n Discussed project deadlines. John is...,Prepare the budget report,,Michael
3,2,\n The website redesign is in progress....,Create the new website layout,October 1st,Emily
4,2,\n The website redesign is in progress....,Review the SEO strategy,September 30th,Tom
5,2,\n The website redesign is in progress....,Finalize the new logo,,James


The resulting dataset includes:
- `meeting_id`: ID of the meeting.
- `meeting_notes`: Original meeting notes.
- `description`: Description of the action item.
- `due_date`: The due date for the task (if available).
- `owner`: The individual responsible for the task.

This structured format provides a clean way to track action items from meetings.