# Microsoft Graph Mail API Query

Use this Notebook to query the Microsoft Graph Mail API. This is purpose built for ingesting basic email (referred to as messages) content. The level of detail depends on the permission granted to the app (see more in Permissions).

The query is designed to loop through user ids. Although you can connect it to a full user list (which can be queried with a separate API call), you may wish to refine the user list to certain IDs (e.g. customer service users). For each of the provided user IDs, it will retrieve email messages.

## More information

Microsoft Graph REST API endpoint reference for Mail - https://learn.microsoft.com/en-us/graph/api/resources/mail-api-overview?view=graph-rest-1.0

Microsoft Graph API full documentation - https://learn.microsoft.com/en-us/graph/

## Permissions

This query uses an App registration to authenticate into the Graph API, you will need to grant the required permission to the app registration. A full description of all the app permissions can be found here - https://learn.microsoft.com/en-us/graph/permissions-reference.

Be mindful the level of permission will determine how much data is retrieved. It is possible to retrieve more basic stats without the body content of the emails (recommended).

Here are some examples:
1. Mail.ReadBasic.All - Allows the app to read basic mail properties in all mailboxes without a signed-in user. Includes all properties except body, previewBody, attachments and any extended properties.
2. Mail.Read - Allows the app to read mail in all mailboxes without a signed-in user. Includes body of the email.


In [1]:
# import packages

import msal
import requests
import json
import os
from datetime import datetime

In [18]:
# Enter the Key Vault URL - DEFINITE CHANGE

keyvaulturl = "https://kv-coolcustomer-dev.vault.azure.net/"

# Enter the details of your AAD app registration - shouldn't have to change

client_id = notebookutils.credentials.getSecret(keyvaulturl, 'graph-mail-client-id')
client_secret = notebookutils.credentials.getSecret(keyvaulturl, 'graph-mail-client-secret')

# Enter the Entra Tenant ID below - DEFINITE CHANGE

entra_tenant_id = "daaee2d7-84e4-45e7-9907-8238d02b6e5a"

# Leave the below alone

scope = ["https://graph.microsoft.com/.default"]
authority = f'https://login.microsoftonline.com/{entra_tenant_id}'

# IDs for mailboxes - enter a list of IDs required for querying

ids = ["de5e10bf-3a7e-4a9c-a547-760123d1e93e"]

In [19]:
# Define the base path for the Lakehouse - top level folder for all Graph data
lakehouse_base_path = "/lakehouse/default/Files/MSGraphAPItest/Mail"

# Ensure the directory exists
os.makedirs(lakehouse_base_path, exist_ok=True)

# Generate the current date - used in folder and file generation
current_datetime = datetime.now()
year = datetime.now().strftime("%Y")
month = datetime.now().strftime("%m")
day = datetime.now().strftime("%D")
hour = datetime.now().strftime("%H")


In [20]:
# This cell creates an access token using the details above

# Create an MSAL instance providing the client_id, authority and client_credential parameters
client = msal.ConfidentialClientApplication(client_id, authority=authority, client_credential=client_secret)
print(client)

# Acquire an access token
token_result = client.acquire_token_for_client(scopes=scope)

print(token_result)
access_token = 'Bearer ' + token_result['access_token']

print('New access token was acquired from Azure AD')

<msal.application.ConfidentialClientApplication object at 0x7bb489b937d0>
{'token_type': 'Bearer', 'expires_in': 3599, 'ext_expires_in': 3599, 'access_token': 'eyJ0eXAiOiJKV1QiLCJub25jZSI6Ik12bDdGekRycjYzeElhU0taUk1yVUpSZ1N6TjB4VUZrVFNOWFoyb1hLSVUiLCJhbGciOiJSUzI1NiIsIng1dCI6InJ0c0ZULWItN0x1WTdEVlllU05LY0lKN1ZuYyIsImtpZCI6InJ0c0ZULWItN0x1WTdEVlllU05LY0lKN1ZuYyJ9.eyJhdWQiOiJodHRwczovL2dyYXBoLm1pY3Jvc29mdC5jb20iLCJpc3MiOiJodHRwczovL3N0cy53aW5kb3dzLm5ldC9kYWFlZTJkNy04NGU0LTQ1ZTctOTkwNy04MjM4ZDAyYjZlNWEvIiwiaWF0IjoxNzYyOTE1MTkzLCJuYmYiOjE3NjI5MTUxOTMsImV4cCI6MTc2MjkxOTA5MywiYWlvIjoiazJKZ1lBamN6TDJ3WXBGS2tKMWNWWGQwai9NVEFBPT0iLCJhcHBfZGlzcGxheW5hbWUiOiJjb29sY3VzdG9tZXItZGV2LWdyYXBoLW1haWwiLCJhcHBpZCI6ImI0ZTI3ZmZmLTBlMzAtNDBmZC1iN2RlLTQ2MDE1MWMwZTUxYyIsImFwcGlkYWNyIjoiMSIsImlkcCI6Imh0dHBzOi8vc3RzLndpbmRvd3MubmV0L2RhYWVlMmQ3LTg0ZTQtNDVlNy05OTA3LTgyMzhkMDJiNmU1YS8iLCJpZHR5cCI6ImFwcCIsIm9pZCI6IjgxNzE5NjM1LTY0NDQtNDU0MS04YWFmLWJkMjNhMjEwNWExNCIsInJoIjoiMS5BVUlBMS1LdTJ1U0U1MFdaQjRJNDBDdHVXZ01BQUF

In [8]:
# This cell defines a function to query endpoints on the graph API and save into JSON files

def query_user_graph_api(id, access_token):

  # Define path and file name for the Lakehouse
  endpoint_path = f"{lakehouse_base_path}"  # Adjust this base path as needed, recommend different end folder for each query
  year_month_day_path = os.path.join(endpoint_path, year, month, day)
  file_name = f"{id}{current_datetime}.json"
  file_path = os.path.join(year_month_day_path, file_name)

  # Ensure the directory exists
  os.makedirs(year_month_day_path, exist_ok=True)

  # Define the URL for the API with the access token obtained earlier 
  url = f"https://graph.microsoft.com/v1.0/users/{id}/messages"
  headers = {
    'Authorization': access_token
  }
  
  # Query the API, looping over pages
  records = [] # appending our data here, note there could be multiple pages
  iteration = 0 # provides the iteration/page number
  print('').strftime("%Y%m%d")
  print(f'Running for mail query for user id {id}') # message to the user
  while True:
        if not url:
            break
        # Make a GET request to the provided url, passing the access token in a header
        graph_result = requests.get(url=url, headers=headers)
        iteration += 1
        print(f'Page: {iteration}, URL: {url}')
        if graph_result.status_code == 200:
          print(f'Response Code {graph_result.status_code}')
          json_data = json.loads(graph_result.text)
          records = records + json_data['value'] # append data to the list
          url = json_data.get('@odata.nextLink') # set URL to next page
        elif not graph_result:
          print(f'No data available for user id {id}.')
          print(f'Response Code {graph_result.status_code}')
          print('*'*50)
          break
        else:
          print(f'Error on mail endpoint for user id {id}. Page: {iteration}, URL: {url}')
          print('*'*50)
          break

  # Write the JSON data to the file
  if records:
    with open(file_path, "w") as output_file:
        json.dump(records, output_file, indent=4)
    print(f'Mail data for user id {id} ingested. {iteration} pages total')
    print('*'*50)

In [21]:
# run the function for the desired mailbox IDs

for i in ids:
    query_user_graph_api(i, access_token)

print("ingestion complete")



Running for mail query for user id de5e10bf-3a7e-4a9c-a547-760123d1e93e
Page: 1, URL: https://graph.microsoft.com/v1.0/users/de5e10bf-3a7e-4a9c-a547-760123d1e93e/messages
Response Code 200
Page: 2, URL: https://graph.microsoft.com/v1.0/users/de5e10bf-3a7e-4a9c-a547-760123d1e93e/messages?%24top=10&%24skip=10
Response Code 200
Page: 3, URL: https://graph.microsoft.com/v1.0/users/de5e10bf-3a7e-4a9c-a547-760123d1e93e/messages?%24top=10&%24skip=20
Response Code 200
Page: 4, URL: https://graph.microsoft.com/v1.0/users/de5e10bf-3a7e-4a9c-a547-760123d1e93e/messages?%24top=10&%24skip=30
Response Code 200
Page: 5, URL: https://graph.microsoft.com/v1.0/users/de5e10bf-3a7e-4a9c-a547-760123d1e93e/messages?%24top=10&%24skip=40
Response Code 200
Page: 6, URL: https://graph.microsoft.com/v1.0/users/de5e10bf-3a7e-4a9c-a547-760123d1e93e/messages?%24top=10&%24skip=50
Response Code 200
Page: 7, URL: https://graph.microsoft.com/v1.0/users/de5e10bf-3a7e-4a9c-a547-760123d1e93e/messages?%24top=10&%24skip=60
