-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
Hello, below is the callback I have created to handle the attached email (.eml) file with the attachments. The email is expected to have attached pdf documents:
def before_agent_callback(callback_context: CallbackContext) -> Optional[types.Content]:
"""
Logs entry and checks 'skip_llm_agent' in session state.
If True, returns Content to skip the agent's execution.
If False or not present, returns None to allow execution.
"""
agent_name = callback_context.agent_name
invocation_id = callback_context.invocation_id
current_state = callback_context.state.to_dict()
parts_reformatted = []
for part in callback_context.user_content.parts:
if hasattr(part, 'inline_data') and part.inline_data and hasattr(part.inline_data, 'data'):
mime_type = part.inline_data.mime_type
if mime_type == "message/rfc822":
email_content_bytes = part.inline_data.data
# print(email_content_bytes)
processed_email = process_email_content(email_content_bytes)
email_body = _extract_text_from_html(processed_email['body_html'])
email_headers = processed_email['headers']
str_email_headers = json.dumps(email_headers, indent=4)
str_email_content = f"""Headers:
{str_email_headers}
Body:
{email_body}
"""
parts_reformatted.append(types.Part(text=str_email_content))
attached_pdfs = [attachment for attachment in processed_email['attachments'] if attachment['content_type'].startswith('application/pdf')]
pdf_mime_type = "application/pdf"
# attach first email as text
for pdf in attached_pdfs:
filename = pdf['filename']
pdf_artifact = types.Part(inline_data=types.Blob(data=pdf['data'], mime_type=pdf_mime_type))
# parts_reformatted.append(pdf_artifact)
try:
version = callback_context.save_artifact(filename=filename, artifact=pdf_artifact)
logger.info(f"Attached artifact {filename} with version {version}")
except Exception as e:
print(f"Error attaching artifact {filename}: {e}")
version = None
else:
parts_reformatted.append(part)
callback_context.user_content.parts = parts_reformatted
This code successfully handles the email and pdfs are successfully stored as artifacts. Previously, I have stored them to state (commented line)
In another agent, I have created an agent for document processing:
document_analyst_agent = Agent(
name="document_analyst_agent",
model=MODEL_NAME,
description=(
"Agent to extract structured information from attached documents related to shipments, containers, vehicles, and other logistics details."
),
instruction=(
f"""{document_analysis_prompt}
In order to extract the structured information from the attached documents, you can use the following tool:
- process_attached_documents"""
),
output_key="attached_documents_analysis",
tools=[process_attached_documents],
before_agent_callback=before_agent_callback
)
def process_attached_documents(tool_context: ToolContext) -> Dict[str, Any]:
"""
Retrieves the content of all files attached to the email.
This tool accesses all artifacts (files) saved from the email and returns their content.
It doesn't require any parameters as it automatically processes all available files.
Returns:
Dict[str, Any]: A dictionary with:
- 'status': 'success' or 'error'
- 'files_content': List of dictionaries, each containing:
- 'file_name': Name of the file
- 'content': Binary content of the file
- 'message': Error message (only if status is 'error')
Use this tool when you need to analyze the content of documents attached to the email,
such as PDFs, images, or other file types that were included in the original message.
"""
files_content = []
try:
available_files = tool_context.list_artifacts()
for file in available_files:
files_content.append({"file_name": file, "content": tool_context.get_artifact(file)})
return {"status": "success", "files_content": files_content}
except Exception as e:
return {"status": "error", "message": f"Error: Could not get file content: {e}"}
The function process_attached_documents is never executed (I guess I have to modify the prompt) but nevertheless it seems that in the result, there is a content from the attached pdf documents.
The first question is: do I have to tell the agent explicitly to retrieve the content from pdf documents (e.g. using the above function) or does the agent interact with the artifacts automatically?
I am testing it with adk web and the Artifacts tab is empty so it is confusing - I haven't managed to determine what is going on. I tried to add another callback to be executed before calling document_analysis_agent but from the context I can't get the list of the artifacts.
I would appreciate any help or examples. If possible with combination of adk web would be even more helpful as I find adk web extremely helpful for testing various agent configurations.
EDIT:
by adding the part to the code above, I noticed that the artifacts are successfully saved and can be retrieved from context:
for pdf in attached_pdfs:
filename = pdf['filename']
pdf_artifact = types.Part(inline_data=types.Blob(data=pdf['data'], mime_type=pdf_mime_type))
# parts_reformatted.append(pdf_artifact)
try:
version = callback_context.save_artifact(filename=filename, artifact=pdf_artifact)
if "attachments" not in callback_context.state:
callback_context.state["attachments"] = []
callback_context.state["attachments"].append({"filename": filename, "version": version})
logger.info(f"Attached artifact {filename} with version {version}")
except Exception as e:
print(f"Error attaching artifact {filename}: {e}")
version = None
So the question is what is the good way to pass the artifact content to the agent? (e.g. when want to process pdf)