# 🚀 Getting Started

💡<b> Before running this notebook</b>, ensure you have configured SharePoint, Azure AI Foundry, set up an application for handling API authentication, granted appropriate roles in Microsoft Purview, and set the appropriate configuration parameters. [Steps listed here.](README.md)

## 1. Setup

### 1.1 Install required libraries

In [1]:
!pip install -r requirements.txt




[notice] A new release of pip is available: 24.3.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


### 1.2 Load libraries

In [2]:
import os
# The JSON module could be potentially removed
import json
from azure.identity import ClientSecretCredential
from pyapacheatlas.core import PurviewClient
from purviewautomation import PurviewCollections, ServicePrincipalAuthentication
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential
from pyapacheatlas.core.typedef import ClassificationTypeDef, EntityTypeDef
# Purview custom libraries
from custom_libs.purview_utils import (
    filesystemFileSampleList,
    listFilesystemFiles,
    getAADToken,
    moveCollection,
    estimateTokens,
    unstructuredDataClassification,
    rollupClassifications,
    loadPurviewAssets,
    applyPurviewClassifications
)
# SharePoint custom libraries
from custom_libs.sharepoint_utils import (
    SharePointUtils,
)

### 1.2 Initialize Environment

Before running this notebook, you must configure certain environment variables. We will now use environment variables to store our configuration. This is a more secure practice as it prevents sensitive data from being accidentally committed and pushed to version control systems.

Create a `.env` file in your project root (use the provided `.env.sample` as a template). [Detailed steps here](README.md)

> 📌 **Note**
> Remember not to commit the .env file to your version control system. Add it to your .gitignore file to prevent it from being tracked.

In [3]:
# Instantiate the SharePointDataExtractor client
# The client handles the complexities of interacting with SharePoint's REST API, providing an easy-to-use interface for data extraction.
sharepointClient = SharePointUtils()

# Load environment variables from the .env file
sharepointClient.loadEnvFile()

# Retrieve environment variables
azureOpenAIApiKey=os.getenv("AZURE_OPENAI_API_KEY") 
azureOpenAIDeploymentName=os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME")
azureOpenAILLMModel=os.getenv("AZURE_OPENAI_LLM_MODEL")
azureOpenAIApiEndpoint= os.getenv("AZURE_OPENAI_ENDPOINT")
azureOpenAIApiVersion= os.getenv("AZURE_OPENAI_API_VERSION")
purviewAccountName = os.getenv("PURVIEW_ACCOUNT_NAME")
purviewEndpointUrl=os.getenv("PURVIEW_ENDPOINT_URL")
purviewTokenUrl=os.getenv("PURVIEW_TOKEN_URL")
tenantId=os.getenv("AZURE_TENANT_ID")
clientId=os.getenv("AZURE_CLIENT_ID")
clientSecret=os.getenv("AZURE_CLIENT_SECRET")
siteDomain = os.getenv("SITE_DOMAIN")
siteName = os.getenv("SITE_NAME")

2025-04-01 14:01:45,874 - micro - MainProcess - INFO     Successfully loaded environment variables: AZURE_TENANT_ID, AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, APP_NAME, PURVIEW_ACCOUNT_NAME, PURVIEW_ENDPOINT_URL, PURVIEW_TOKEN_URLSITE_DOMAIN, SITE_NAME, AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_DEPLOYMENT_NAME, AZURE_OPENAI_LLM_MODEL, AZURE_OPENAI_API_VERSION (sharepoint_utils.py:loadEnvFile:122)


You will need to update the values for the cell below to match the characteristics of your environment.

In [4]:
# Enable or disable display of variables
displayVariables = True

# Global variable definitions
fileExtensions = ["docx","pdf","pptx"]
sharepointPath="/Insurance/Claims"
filesystemPath = r"SampleFiles"

# Number of characters to be analyzed by Large Language Model (LLM) from each file
textLength=800

# Sample size for filesystem and SharePoint files
sampleSize=0

# Entity types for classification
entityTypes = ['SharePoint','FileSystem']

# List of custom classifications to be created in Purview
# This list can be customized based on the specific needs of the organization or project.
classifications=[
    "Empty Content", 
    "Insurance Claim",  
    "Sales Receipt",  
    "Insurance Policy",
    "Report",
    "Invoice",
    "PII",
    "Other"
]
# Convert classification list to string
classificationsStr = ''.join(classification+'\n' for classification in classifications)

In [5]:
if displayVariables:
    print(f"Tenant ID: {tenantId}")
    print(f"Client ID: {clientId}") 
    print(f"Azure OpenAI API Key: {azureOpenAIApiKey}")
    print(f"Azure OpenAI Endpoint: {azureOpenAIApiEndpoint}")

Tenant ID: cfc1af90-64cb-4745-948f-32bbe51244dd
Client ID: 4c37c4dc-2228-437f-af9c-b5dc0e3f2bba
Azure OpenAI API Key: Bg5vfVvimUKiAYnQui7UkKSEPom91FhQa4q9qVCzd9T9Fszm07aCJQQJ99BBACHYHv6XJ3w3AAAAACOGiQPj
Azure OpenAI Endpoint: https://aiservicesbasemodels.openai.azure.com/openai/deployments/gpt-4o-mini


In [6]:
if not tenantId or not clientId or not clientSecret or not azureOpenAIApiKey:
    raise ValueError("Azure credentials are not set in the environment variables.")

# Generate token for REST API calls
token = getAADToken(tenantId,clientId, clientSecret,purviewTokenUrl)

# Authenticate with Microsoft Graph API
response = sharepointClient.msgraph_auth()

# Generate authentication credentials for Service Principal and Atlas client authentication for different Purview functions
servicePrincipalAuth = ServicePrincipalAuthentication(
    tenant_id=tenantId,
    client_id=clientId,
    client_secret=clientSecret
)

clientCredential = ClientSecretCredential(
    tenant_id=tenantId,
    client_id=clientId,
    client_secret=clientSecret
)

# Create clients for Purview administration and Azure AI Foundry
purviewClient = PurviewClient(
    account_name = purviewAccountName,
    authentication = clientCredential
)

collectionClient = PurviewCollections(
    purview_account_name=purviewAccountName,
    auth = servicePrincipalAuth
)

llmClient = ChatCompletionsClient(
    endpoint=azureOpenAIApiEndpoint,
    credential=AzureKeyCredential(azureOpenAIApiKey),
    temperature=0
)

2025-04-01 14:01:46,175 - micro - MainProcess - INFO     New access token retrieved. (sharepoint_utils.py:msgraph_auth:154)


### 1.4 Create Purview asset dependencies

Creates entity type definitions and classifications required by the Purview clients to assign classifications to assets discovered.

In [7]:
# Creation of custom Entity Types, required by the custom Classifications
# The list of Entity Types is taken from the variable named entityTypes
for entityName in entityTypes:
    edef = EntityTypeDef(
        name = entityName,
        superTypes= ['DataSet']
    )
    results = purviewClient.upload_typedefs(
        entityDefs=[edef],
        force_update=True
    )

# Creation of custom Classifications
# The list of classifications is taken from the variable named classifications
for classification in classifications:
    # Create custom classifications to be applied to unstructured data assets
    cdef = ClassificationTypeDef(
        name=classification,
        # You need to define the assets type that will be associated with each classification ahead of time.
        # entityTypes will restrict the types of assets that can be associated with this classification.
        # For example: If the asset has a type of FileSystem and the Classification has entityTypes=['DataSet'],
        #              the attempt to classify the asset will fail.
        # entityTypes=['SharePoint','FileSystem','DataSet','Process']
        entityTypes=entityTypes
    )
    # Do the upload
    results = purviewClient.upload_typedefs(
        classificationDefs=[cdef],
        force_update=True
    )

### 1.5 Create custom collections

Creates multiple custom collection under the parent Start_Collection (Domain)


In [8]:
# To create multiple collections, the parent collection defined by the start_collection parameter
# MUST exist.
response = collectionClient.create_collections(start_collection=purviewAccountName,
                          collection_names=['Unstructured/SharePoint','Unstructured/FileSystem'])

### 1.6 Capture Sampling Size

This will help to determine the number of files that will be analyzed for classification purposes.

> 📌 **Note:**
> Currently is a fixed size, but it could be changed to represent a percentage of the total number of files found during the scan.

In [9]:
sampleSize = input(f"Enter how many documents to analyze: ")
if sampleSize.isnumeric():
    sampleSize = int(sampleSize)
else:
    sampleSize = 0
print(f"\n{sampleSize} documents will be analyzed from the list of documents found.")


10 documents will be analyzed from the list of documents found.


## 2. SharePoint Demo

### 2.1 Scan SharePoint Site

In [10]:
"""
List all the files in SharePoint site that match the defined file extensions. 
"""
spFileList = sharepointClient.listSharepointFiles(
    site_domain=siteDomain,
    site_name=siteName,
    file_formats = fileExtensions,
    folder_path=sharepointPath,
    # Files modified N minutes ago
    # minutes_ago=60,
)
print(f"{len(spFileList)} files found matching the patterns {fileExtensions}: \n")

2025-04-01 14:02:16,127 - micro - MainProcess - INFO     Getting the Site ID... (sharepoint_utils.py:get_site_id:223)
2025-04-01 14:02:16,299 - micro - MainProcess - INFO     Site ID retrieved: mngenvmcap551180.sharepoint.com,f21d87ac-28cd-405c-a867-6e4334177b8d,81df1c36-fe3c-47aa-aad8-1f6881c85b4d (sharepoint_utils.py:get_site_id:227)
2025-04-01 14:02:16,602 - micro - MainProcess - INFO     Successfully retrieved drive ID: b!rIcd8s0oXECoZ25DNBd7jTYc34E8_qpHqtgfaIHIW032pkhmG4UcR58B-Y4sLD2Y (sharepoint_utils.py:get_drive_id:244)
2025-04-01 14:02:16,603 - micro - MainProcess - INFO     Making request to Microsoft Graph API (sharepoint_utils.py:get_files_in_site:283)
2025-04-01 14:02:16,907 - micro - MainProcess - INFO     Received response from Microsoft Graph API (sharepoint_utils.py:get_files_in_site:286)


20 files found matching the patterns ['docx', 'pdf', 'pptx']: 



In [11]:
if displayVariables == True:
    print(json.dumps(spFileList, indent=2))

[
  {
    "@microsoft.graph.downloadUrl": "https://mngenvmcap551180.sharepoint.com/sites/SPODemo/_layouts/15/download.aspx?UniqueId=59f2089f-342c-4b6d-a56c-ee65c5befd19&Translate=false&tempauth=v1.eyJzaXRlaWQiOiJmMjFkODdhYy0yOGNkLTQwNWMtYTg2Ny02ZTQzMzQxNzdiOGQiLCJhcHBfZGlzcGxheW5hbWUiOiJzcC1TZWVyIiwiYXVkIjoiMDAwMDAwMDMtMDAwMC0wZmYxLWNlMDAtMDAwMDAwMDAwMDAwL21uZ2Vudm1jYXA1NTExODAuc2hhcmVwb2ludC5jb21AY2ZjMWFmOTAtNjRjYi00NzQ1LTk0OGYtMzJiYmU1MTI0NGRkIiwiZXhwIjoiMTc0MzUxOTczNiJ9.CgoKBHNuaWQSAjY0EgsIqNCb__rL-D0QBRoOMjAuMTkwLjE1Mi4xNTMqLGZBUC96TjZWelpUN0IzOHF0eENrMmp5MjZlNVpVM25PTEYzY1B1QmF5THM9MI0BOAFCEKGQPVzNQAAAkoZ9zr7qxz9KEGhhc2hlZHByb29mdG9rZW56ATG6AR5hbGxzaXRlcy5yZWFkIGFsbHByb2ZpbGVzLnJlYWTCAUk0YzM3YzRkYy0yMjI4LTQzN2YtYWY5Yy1iNWRjMGUzZjJiYmFAY2ZjMWFmOTAtNjRjYi00NzQ1LTk0OGYtMzJiYmU1MTI0NGRkyAEB.bNOpjD2SLbJRG7fxfewJ3gkdvWHjspMrJyeVLfQDetU&ApiVersion=2.0",
    "createdBy": {
      "user": {
        "email": "admin@MngEnvMCAP551180.onmicrosoft.com",
        "id": "210d07f6-9a65-465d-a4f6-26f

### 2.2 Generate file subset

In [12]:
# Create a subset of the spFileList based on the number specified by sampleSize. If no subset is provided, the entire list will be used.
if sampleSize == 0 or sampleSize > len(spFileList):
        sampleSize = len(spFileList)
# Create a subset of the SharePoint file list
spFileSubset = sharepointClient.sharepointFileSampleList(spFileList,sampleSize)

In [13]:
if displayVariables:
    print(f"\nSubset of SharePoint files to be analyzed: {sampleSize} files\n")
    for file in spFileSubset:
        print(f"{file}")


Subset of SharePoint files to be analyzed: 10 files

asbestos_simple_sample_redacted.pdf
rra_mold_report_sample.pdf
Sample-Mold-Test-Report.pdf
sample_asbestos_test_report_redacted.pdf
rra-_uphelp_sample_xactimate_estimate.pdf
sample_other_structures_construction_estimate.pdf
sample_scope_redacted.pdf
SanBrunoCoordinatedDebrisRemovalRightofEntryForm.pdf
Kerley-Air-Quality.pdf
asbestos_simple_sample_redacted - Copy.pdf


### 2.3 Extract file contents

In [14]:
"""
Extract file contents and process all file information included in the subset from a 
specific Site ID.
"""
spFileContent = sharepointClient.getSharepointFileContent(
    site_domain=os.environ["SITE_DOMAIN"],
    site_name=os.environ["SITE_NAME"],
    folder_path=sharepointPath,
    file_names=spFileSubset
    # Files modified N minutes ago
    # minutes_ago=60,
)

2025-04-01 14:02:17,187 - micro - MainProcess - INFO     Getting the Site ID... (sharepoint_utils.py:get_site_id:223)
2025-04-01 14:02:17,314 - micro - MainProcess - INFO     Site ID retrieved: mngenvmcap551180.sharepoint.com,f21d87ac-28cd-405c-a867-6e4334177b8d,81df1c36-fe3c-47aa-aad8-1f6881c85b4d (sharepoint_utils.py:get_site_id:227)
2025-04-01 14:02:17,502 - micro - MainProcess - INFO     Successfully retrieved drive ID: b!rIcd8s0oXECoZ25DNBd7jTYc34E8_qpHqtgfaIHIW032pkhmG4UcR58B-Y4sLD2Y (sharepoint_utils.py:get_drive_id:244)
2025-04-01 14:02:17,504 - micro - MainProcess - INFO     Making request to Microsoft Graph API (sharepoint_utils.py:get_files_in_site:283)
2025-04-01 14:02:17,787 - micro - MainProcess - INFO     Received response from Microsoft Graph API (sharepoint_utils.py:get_files_in_site:286)


Processing File {'@microsoft.graph.downloadUrl': 'https://mngenvmcap551180.sharepoint.com/sites/SPODemo/_layouts/15/download.aspx?UniqueId=636f545e-40f0-4443-8f18-a7ee4d121611&Translate=false&tempauth=v1.eyJzaXRlaWQiOiJmMjFkODdhYy0yOGNkLTQwNWMtYTg2Ny02ZTQzMzQxNzdiOGQiLCJhcHBfZGlzcGxheW5hbWUiOiJzcC1TZWVyIiwiYXVkIjoiMDAwMDAwMDMtMDAwMC0wZmYxLWNlMDAtMDAwMDAwMDAwMDAwL21uZ2Vudm1jYXA1NTExODAuc2hhcmVwb2ludC5jb21AY2ZjMWFmOTAtNjRjYi00NzQ1LTk0OGYtMzJiYmU1MTI0NGRkIiwiZXhwIjoiMTc0MzUxOTczNyJ9.CgoKBHNuaWQSAjY0EgsIyK7hh_vL-D0QBRoOMjAuMTkwLjE1Mi4xNTMqLE5JbEkvcnV6RHFrVy9mY3NsNFJrUlVmTG1Bdmk4SjdCK3lNZGZSM2FMMkU9MI0BOAFCEKGQPV0EQAAAl8Ne2yofhW9KEGhhc2hlZHByb29mdG9rZW56ATG6AR5hbGxzaXRlcy5yZWFkIGFsbHByb2ZpbGVzLnJlYWTCAUk0YzM3YzRkYy0yMjI4LTQzN2YtYWY5Yy1iNWRjMGUzZjJiYmFAY2ZjMWFmOTAtNjRjYi00NzQ1LTk0OGYtMzJiYmU1MTI0NGRkyAEB.f1s-al_3T015LWyi5Iy3EP-iusi-Hq_tw1nwPiJS_u8&ApiVersion=2.0', 'createdBy': {'user': {'email': 'admin@MngEnvMCAP551180.onmicrosoft.com', 'id': '210d07f6-9a65-465d-a4f6-26fd15a35adf', 'displayN

2025-04-01 14:02:18,838 - micro - MainProcess - INFO     Text extraction from PDF bytes was successful. (sharepoint_utils.py:extract_text_from_pdf_bytes:519)


Processing File {'@microsoft.graph.downloadUrl': 'https://mngenvmcap551180.sharepoint.com/sites/SPODemo/_layouts/15/download.aspx?UniqueId=e42fd8fe-a0f6-4460-a6f3-14a79b23b5fb&Translate=false&tempauth=v1.eyJzaXRlaWQiOiJmMjFkODdhYy0yOGNkLTQwNWMtYTg2Ny02ZTQzMzQxNzdiOGQiLCJhcHBfZGlzcGxheW5hbWUiOiJzcC1TZWVyIiwiYXVkIjoiMDAwMDAwMDMtMDAwMC0wZmYxLWNlMDAtMDAwMDAwMDAwMDAwL21uZ2Vudm1jYXA1NTExODAuc2hhcmVwb2ludC5jb21AY2ZjMWFmOTAtNjRjYi00NzQ1LTk0OGYtMzJiYmU1MTI0NGRkIiwiZXhwIjoiMTc0MzUxOTczNyJ9.CgoKBHNuaWQSAjY0EgsIyK7hh_vL-D0QBRoOMjAuMTkwLjE1Mi4xNTMqLHVHd0pOcjRUbzFxeWRsalNsUmszazJnaklnR3VZVTk2RHZrMHNkMTk3WDA9MI0BOAFCEKGQPV0EQAAAl8Ne2yofhW9KEGhhc2hlZHByb29mdG9rZW56ATG6AR5hbGxzaXRlcy5yZWFkIGFsbHByb2ZpbGVzLnJlYWTCAUk0YzM3YzRkYy0yMjI4LTQzN2YtYWY5Yy1iNWRjMGUzZjJiYmFAY2ZjMWFmOTAtNjRjYi00NzQ1LTk0OGYtMzJiYmU1MTI0NGRkyAEB.njI7gU03YEyGr6hnf-iVSQUjPVS2dY0cybs3PKozabc&ApiVersion=2.0', 'createdBy': {'user': {'email': 'admin@MngEnvMCAP551180.onmicrosoft.com', 'id': '210d07f6-9a65-465d-a4f6-26fd15a35adf', 'displayN

2025-04-01 14:02:20,532 - micro - MainProcess - INFO     Text extraction from PDF bytes was successful. (sharepoint_utils.py:extract_text_from_pdf_bytes:519)


Processing File {'@microsoft.graph.downloadUrl': 'https://mngenvmcap551180.sharepoint.com/sites/SPODemo/_layouts/15/download.aspx?UniqueId=021851bb-9026-464d-b85f-d4a77ab84868&Translate=false&tempauth=v1.eyJzaXRlaWQiOiJmMjFkODdhYy0yOGNkLTQwNWMtYTg2Ny02ZTQzMzQxNzdiOGQiLCJhcHBfZGlzcGxheW5hbWUiOiJzcC1TZWVyIiwiYXVkIjoiMDAwMDAwMDMtMDAwMC0wZmYxLWNlMDAtMDAwMDAwMDAwMDAwL21uZ2Vudm1jYXA1NTExODAuc2hhcmVwb2ludC5jb21AY2ZjMWFmOTAtNjRjYi00NzQ1LTk0OGYtMzJiYmU1MTI0NGRkIiwiZXhwIjoiMTc0MzUxOTczNyJ9.CgoKBHNuaWQSAjY0EgsI7rb0h_vL-D0QBRoOMjAuMTkwLjE1Mi4xNTMqLHNHbm5TeTc5TTNlY3JmNlh4TTNHK0Qyc0NTMnVCKzYxUXI2S1FLVW50MHM9MI0BOAFCEKGQPV0EQAAAl8Ne2yofhW9KEGhhc2hlZHByb29mdG9rZW56ATG6AR5hbGxzaXRlcy5yZWFkIGFsbHByb2ZpbGVzLnJlYWTCAUk0YzM3YzRkYy0yMjI4LTQzN2YtYWY5Yy1iNWRjMGUzZjJiYmFAY2ZjMWFmOTAtNjRjYi00NzQ1LTk0OGYtMzJiYmU1MTI0NGRkyAEB.WkGBUXJ7QLJ2GGUQtjICXr5W6HMqmZPQj5C_wbteT5w&ApiVersion=2.0', 'createdBy': {'user': {'email': 'admin@MngEnvMCAP551180.onmicrosoft.com', 'id': '210d07f6-9a65-465d-a4f6-26fd15a35adf', 'displayN

2025-04-01 14:02:21,243 - micro - MainProcess - INFO     Text extraction from PDF bytes was successful. (sharepoint_utils.py:extract_text_from_pdf_bytes:519)


Processing File {'@microsoft.graph.downloadUrl': 'https://mngenvmcap551180.sharepoint.com/sites/SPODemo/_layouts/15/download.aspx?UniqueId=a22ef6a7-e600-4493-9b15-fd1b0a9c87c5&Translate=false&tempauth=v1.eyJzaXRlaWQiOiJmMjFkODdhYy0yOGNkLTQwNWMtYTg2Ny02ZTQzMzQxNzdiOGQiLCJhcHBfZGlzcGxheW5hbWUiOiJzcC1TZWVyIiwiYXVkIjoiMDAwMDAwMDMtMDAwMC0wZmYxLWNlMDAtMDAwMDAwMDAwMDAwL21uZ2Vudm1jYXA1NTExODAuc2hhcmVwb2ludC5jb21AY2ZjMWFmOTAtNjRjYi00NzQ1LTk0OGYtMzJiYmU1MTI0NGRkIiwiZXhwIjoiMTc0MzUxOTczNyJ9.CgoKBHNuaWQSAjY0EgsI7rb0h_vL-D0QBRoOMjAuMTkwLjE1Mi4xNTMqLHIzRVUyTnc1cUhFZ0VsSmZoYUJmWFo5aEtFQ2dhTlNiTzNXVWZsbURWMVE9MI0BOAFCEKGQPV0EQAAAl8Ne2yofhW9KEGhhc2hlZHByb29mdG9rZW56ATG6AR5hbGxzaXRlcy5yZWFkIGFsbHByb2ZpbGVzLnJlYWTCAUk0YzM3YzRkYy0yMjI4LTQzN2YtYWY5Yy1iNWRjMGUzZjJiYmFAY2ZjMWFmOTAtNjRjYi00NzQ1LTk0OGYtMzJiYmU1MTI0NGRkyAEB.zg6cQHyiXbfUUUCotYDs_gHImZ2GrVmXTf-VcSSGaGE&ApiVersion=2.0', 'createdBy': {'user': {'email': 'admin@MngEnvMCAP551180.onmicrosoft.com', 'id': '210d07f6-9a65-465d-a4f6-26fd15a35adf', 'displayN

2025-04-01 14:02:22,571 - micro - MainProcess - INFO     Text extraction from PDF bytes was successful. (sharepoint_utils.py:extract_text_from_pdf_bytes:519)


Processing File {'@microsoft.graph.downloadUrl': 'https://mngenvmcap551180.sharepoint.com/sites/SPODemo/_layouts/15/download.aspx?UniqueId=72d1b0a6-2450-49d8-b727-158fe680ef67&Translate=false&tempauth=v1.eyJzaXRlaWQiOiJmMjFkODdhYy0yOGNkLTQwNWMtYTg2Ny02ZTQzMzQxNzdiOGQiLCJhcHBfZGlzcGxheW5hbWUiOiJzcC1TZWVyIiwiYXVkIjoiMDAwMDAwMDMtMDAwMC0wZmYxLWNlMDAtMDAwMDAwMDAwMDAwL21uZ2Vudm1jYXA1NTExODAuc2hhcmVwb2ludC5jb21AY2ZjMWFmOTAtNjRjYi00NzQ1LTk0OGYtMzJiYmU1MTI0NGRkIiwiZXhwIjoiMTc0MzUxOTczNyJ9.CgoKBHNuaWQSAjY0EgsI7rb0h_vL-D0QBRoOMjAuMTkwLjE1Mi4xNTMqLFdWM0NTSHE3UDJKa2l3eUlPTUU5bVMwVDY5aEZNS3o1SEF0MkxOZ0RraE09MI0BOAFCEKGQPV0EQAAAl8Ne2yofhW9KEGhhc2hlZHByb29mdG9rZW56ATG6AR5hbGxzaXRlcy5yZWFkIGFsbHByb2ZpbGVzLnJlYWTCAUk0YzM3YzRkYy0yMjI4LTQzN2YtYWY5Yy1iNWRjMGUzZjJiYmFAY2ZjMWFmOTAtNjRjYi00NzQ1LTk0OGYtMzJiYmU1MTI0NGRkyAEB.YhSCAZGPraeRJbj9PthhJV7KduS7mDeI2xVNhf3M1VE&ApiVersion=2.0', 'createdBy': {'user': {'email': 'admin@MngEnvMCAP551180.onmicrosoft.com', 'id': '210d07f6-9a65-465d-a4f6-26fd15a35adf', 'displayN

2025-04-01 14:02:23,707 - micro - MainProcess - INFO     Text extraction from PDF bytes was successful. (sharepoint_utils.py:extract_text_from_pdf_bytes:519)


Processing File {'@microsoft.graph.downloadUrl': 'https://mngenvmcap551180.sharepoint.com/sites/SPODemo/_layouts/15/download.aspx?UniqueId=e274312f-200f-49a0-b83f-b36c80ae48f8&Translate=false&tempauth=v1.eyJzaXRlaWQiOiJmMjFkODdhYy0yOGNkLTQwNWMtYTg2Ny02ZTQzMzQxNzdiOGQiLCJhcHBfZGlzcGxheW5hbWUiOiJzcC1TZWVyIiwiYXVkIjoiMDAwMDAwMDMtMDAwMC0wZmYxLWNlMDAtMDAwMDAwMDAwMDAwL21uZ2Vudm1jYXA1NTExODAuc2hhcmVwb2ludC5jb21AY2ZjMWFmOTAtNjRjYi00NzQ1LTk0OGYtMzJiYmU1MTI0NGRkIiwiZXhwIjoiMTc0MzUxOTczNyJ9.CgoKBHNuaWQSAjY0EgsInsCHiPvL-D0QBRoOMjAuMTkwLjE1Mi4xNTMqLERCRGdmdk15NTQ5MDhnaDhIY09EbktpaGx3UkJYK1NXMEFER3lMbTZRVjQ9MI0BOAFCEKGQPV0EQAAAl8Ne2yofhW9KEGhhc2hlZHByb29mdG9rZW56ATG6AR5hbGxzaXRlcy5yZWFkIGFsbHByb2ZpbGVzLnJlYWTCAUk0YzM3YzRkYy0yMjI4LTQzN2YtYWY5Yy1iNWRjMGUzZjJiYmFAY2ZjMWFmOTAtNjRjYi00NzQ1LTk0OGYtMzJiYmU1MTI0NGRkyAEB.6w2hAJbVcs0dZMCvYyQbtxOer2GEKcdTeNQfy-t4aec&ApiVersion=2.0', 'createdBy': {'user': {'email': 'admin@MngEnvMCAP551180.onmicrosoft.com', 'id': '210d07f6-9a65-465d-a4f6-26fd15a35adf', 'displayN

2025-04-01 14:02:24,771 - micro - MainProcess - INFO     Text extraction from PDF bytes was successful. (sharepoint_utils.py:extract_text_from_pdf_bytes:519)


Processing File {'@microsoft.graph.downloadUrl': 'https://mngenvmcap551180.sharepoint.com/sites/SPODemo/_layouts/15/download.aspx?UniqueId=28cedac8-161a-4b1a-86ac-bcc4a59c0bf6&Translate=false&tempauth=v1.eyJzaXRlaWQiOiJmMjFkODdhYy0yOGNkLTQwNWMtYTg2Ny02ZTQzMzQxNzdiOGQiLCJhcHBfZGlzcGxheW5hbWUiOiJzcC1TZWVyIiwiYXVkIjoiMDAwMDAwMDMtMDAwMC0wZmYxLWNlMDAtMDAwMDAwMDAwMDAwL21uZ2Vudm1jYXA1NTExODAuc2hhcmVwb2ludC5jb21AY2ZjMWFmOTAtNjRjYi00NzQ1LTk0OGYtMzJiYmU1MTI0NGRkIiwiZXhwIjoiMTc0MzUxOTczNyJ9.CgoKBHNuaWQSAjY0EgsInsCHiPvL-D0QBRoOMjAuMTkwLjE1Mi4xNTMqLFNvMDNlMGlKQnlCMHRPOHdIZkl1bjBZUHlKelAvanNqMnRqbDEvbnVGdVk9MI0BOAFCEKGQPV0EQAAAl8Ne2yofhW9KEGhhc2hlZHByb29mdG9rZW56ATG6AR5hbGxzaXRlcy5yZWFkIGFsbHByb2ZpbGVzLnJlYWTCAUk0YzM3YzRkYy0yMjI4LTQzN2YtYWY5Yy1iNWRjMGUzZjJiYmFAY2ZjMWFmOTAtNjRjYi00NzQ1LTk0OGYtMzJiYmU1MTI0NGRkyAEB.0EIV_PDc9EosvA5Say32K8pKRBl_7v5e5gFjjDf62BY&ApiVersion=2.0', 'createdBy': {'user': {'email': 'admin@MngEnvMCAP551180.onmicrosoft.com', 'id': '210d07f6-9a65-465d-a4f6-26fd15a35adf', 'displayN

2025-04-01 14:02:26,010 - micro - MainProcess - INFO     Text extraction from PDF bytes was successful. (sharepoint_utils.py:extract_text_from_pdf_bytes:519)


Processing File {'@microsoft.graph.downloadUrl': 'https://mngenvmcap551180.sharepoint.com/sites/SPODemo/_layouts/15/download.aspx?UniqueId=aa3dccae-c468-437c-9d21-affd7f41fee8&Translate=false&tempauth=v1.eyJzaXRlaWQiOiJmMjFkODdhYy0yOGNkLTQwNWMtYTg2Ny02ZTQzMzQxNzdiOGQiLCJhcHBfZGlzcGxheW5hbWUiOiJzcC1TZWVyIiwiYXVkIjoiMDAwMDAwMDMtMDAwMC0wZmYxLWNlMDAtMDAwMDAwMDAwMDAwL21uZ2Vudm1jYXA1NTExODAuc2hhcmVwb2ludC5jb21AY2ZjMWFmOTAtNjRjYi00NzQ1LTk0OGYtMzJiYmU1MTI0NGRkIiwiZXhwIjoiMTc0MzUxOTczNyJ9.CgoKBHNuaWQSAjY0EgsInsCHiPvL-D0QBRoOMjAuMTkwLjE1Mi4xNTMqLHZzb0ErYzh0b2gwQkZpaC9CYWxWWFhwSDRKNzhGTDgvSEtGSXdsYWtVWUk9MI0BOAFCEKGQPV0EQAAAl8Ne2yofhW9KEGhhc2hlZHByb29mdG9rZW56ATG6AR5hbGxzaXRlcy5yZWFkIGFsbHByb2ZpbGVzLnJlYWTCAUk0YzM3YzRkYy0yMjI4LTQzN2YtYWY5Yy1iNWRjMGUzZjJiYmFAY2ZjMWFmOTAtNjRjYi00NzQ1LTk0OGYtMzJiYmU1MTI0NGRkyAEB.o9GlupCGP3dzfQaMxefP9pVpBbHiEWaNibVJk4bqF9I&ApiVersion=2.0', 'createdBy': {'user': {'email': 'admin@MngEnvMCAP551180.onmicrosoft.com', 'id': '210d07f6-9a65-465d-a4f6-26fd15a35adf', 'displayN

2025-04-01 14:02:27,032 - micro - MainProcess - INFO     Text extraction from PDF bytes was successful. (sharepoint_utils.py:extract_text_from_pdf_bytes:519)


Processing File {'@microsoft.graph.downloadUrl': 'https://mngenvmcap551180.sharepoint.com/sites/SPODemo/_layouts/15/download.aspx?UniqueId=cef72a70-2516-4899-97ac-e67c5955dda1&Translate=false&tempauth=v1.eyJzaXRlaWQiOiJmMjFkODdhYy0yOGNkLTQwNWMtYTg2Ny02ZTQzMzQxNzdiOGQiLCJhcHBfZGlzcGxheW5hbWUiOiJzcC1TZWVyIiwiYXVkIjoiMDAwMDAwMDMtMDAwMC0wZmYxLWNlMDAtMDAwMDAwMDAwMDAwL21uZ2Vudm1jYXA1NTExODAuc2hhcmVwb2ludC5jb21AY2ZjMWFmOTAtNjRjYi00NzQ1LTk0OGYtMzJiYmU1MTI0NGRkIiwiZXhwIjoiMTc0MzUxOTczNyJ9.CgoKBHNuaWQSAjY0EgsInsCHiPvL-D0QBRoOMjAuMTkwLjE1Mi4xNTMqLGUxTHhsaDVVU0tLOWIvZzVDdi9UeVRzR2FKQURmeUp0S1E5Zk1wd1JoNVk9MI0BOAFCEKGQPV0EQAAAl8Ne2yofhW9KEGhhc2hlZHByb29mdG9rZW56ATG6AR5hbGxzaXRlcy5yZWFkIGFsbHByb2ZpbGVzLnJlYWTCAUk0YzM3YzRkYy0yMjI4LTQzN2YtYWY5Yy1iNWRjMGUzZjJiYmFAY2ZjMWFmOTAtNjRjYi00NzQ1LTk0OGYtMzJiYmU1MTI0NGRkyAEB.9jKvB1qdL7Uoq4PPEl69msYLZCqBUdOyXze_e9rmAkE&ApiVersion=2.0', 'createdBy': {'user': {'email': 'admin@MngEnvMCAP551180.onmicrosoft.com', 'id': '210d07f6-9a65-465d-a4f6-26fd15a35adf', 'displayN

2025-04-01 14:02:28,202 - micro - MainProcess - INFO     Text extraction from PDF bytes was successful. (sharepoint_utils.py:extract_text_from_pdf_bytes:519)


Processing File {'@microsoft.graph.downloadUrl': 'https://mngenvmcap551180.sharepoint.com/sites/SPODemo/_layouts/15/download.aspx?UniqueId=b19fbbad-1f49-40f3-bb86-651bd7cd442c&Translate=false&tempauth=v1.eyJzaXRlaWQiOiJmMjFkODdhYy0yOGNkLTQwNWMtYTg2Ny02ZTQzMzQxNzdiOGQiLCJhcHBfZGlzcGxheW5hbWUiOiJzcC1TZWVyIiwiYXVkIjoiMDAwMDAwMDMtMDAwMC0wZmYxLWNlMDAtMDAwMDAwMDAwMDAwL21uZ2Vudm1jYXA1NTExODAuc2hhcmVwb2ludC5jb21AY2ZjMWFmOTAtNjRjYi00NzQ1LTk0OGYtMzJiYmU1MTI0NGRkIiwiZXhwIjoiMTc0MzUxOTczNyJ9.CgoKBHNuaWQSAjY0EgsInsCHiPvL-D0QBRoOMjAuMTkwLjE1Mi4xNTMqLDZKOExoTHRTWWo2d3BoVlJaY3l2M1dFcEZ0UHpNM0kzN3hLRGRORTlEaTA9MI0BOAFCEKGQPV0EQAAAl8Ne2yofhW9KEGhhc2hlZHByb29mdG9rZW56ATG6AR5hbGxzaXRlcy5yZWFkIGFsbHByb2ZpbGVzLnJlYWTCAUk0YzM3YzRkYy0yMjI4LTQzN2YtYWY5Yy1iNWRjMGUzZjJiYmFAY2ZjMWFmOTAtNjRjYi00NzQ1LTk0OGYtMzJiYmU1MTI0NGRkyAEB.vPjIxV1zB7yw0KVMAxnI-KzpjAeSYWsyz4Bf7y1JAK8&ApiVersion=2.0', 'createdBy': {'user': {'email': 'admin@MngEnvMCAP551180.onmicrosoft.com', 'id': '210d07f6-9a65-465d-a4f6-26fd15a35adf', 'displayN

2025-04-01 14:02:29,015 - micro - MainProcess - INFO     Text extraction from PDF bytes was successful. (sharepoint_utils.py:extract_text_from_pdf_bytes:519)


In [15]:
if displayVariables:
    print(json.dumps(spFileContent, indent=2))

[
  {
    "content": " \n \n \n \nJune 10, 2019  \n \n  \n \n   \n \n \nProject Name :  \nProject Address:   \nProject S\ncope:  Asbestos Inspection  \n Dear , \n \nGuzi-West certified asbestos pe rsonn el collected suspect a sbestos-containing ma terial sa mples \nat the structure referenced above on June 6, 2 019.  Foll owing s ample coll ection, the samples \nwere submit ted under chain-of-c ustody doc umentation to a Cali fornia-ce rtified laboratory for \nanaly sis by pola rized light microscopy. As evidenced by the attached laboratory report, asbestos \nwas identified in the foll owing materials and locations:  \n \no Acoustic Ceiling Texture  (10% Chrysotile Asbestos)  \n We will follow up with a project invoice within 7 business days. Thank you for the opportunity \nto work on the proj ect, and feel free to call/e- mail if you have any que stions.  \n \n \nSincerely, \n \nGuzi - \nWest Inspection and Consulting, LLC \nCertified Asbestos Consultant  \n \n\n \n  ATTACHMENT  A \nC

### 2.4 Analyze File Contents with LLM

### Estimate the number of tokens that will be used by LLM model, prior to processing the documents

In [16]:
tokens = estimateTokens(spFileContent,textLength,classificationsStr,azureOpenAILLMModel)
print(f"Estimated Number of Tokens: {tokens}")

Estimated Number of Tokens: 2183


### 2.5 Classify document contents using LLM

In [17]:
"""
Analyze SharePoint folder contents using Large Language Model to determine applicable
classifications. 
"""
spFileContent = unstructuredDataClassification(spFileContent,textLength,llmClient,azureOpenAIDeploymentName,classificationsStr)

Processing asbestos_simple_sample_redacted - Copy.pdf -> Report
{'completion_tokens': 2, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens': 265, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}, 'total_tokens': 267}
Processing asbestos_simple_sample_redacted.pdf -> Report
{'completion_tokens': 2, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens': 265, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}, 'total_tokens': 267}
Processing Kerley-Air-Quality.pdf -> Empty Content
Processing rra_mold_report_sample.pdf -> Report
{'completion_tokens': 2, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens': 366, 'prompt_tokens_details': {'audio_tokens': 0, 'ca

### 2.6 Organize and Rollup Classifications

In [18]:
"""
Collect document classifications identified for SharePoint folder
"""
spClassifications = rollupClassifications(spFileContent)


In [19]:
if displayVariables:
    print(f"\nClassifications for SharePoint files: {spClassifications}")


Classifications for SharePoint files: ['Empty Content', 'Report', 'Insurance Claim']


### 2.7 Ingest assets into Purview via Atlas API

In [20]:
"""
Load SharePoint Assets in Purview.
"""
spGuids = loadPurviewAssets(purviewClient,spFileContent)

Processing mngenvmcap551180.sharepoint.com/SPODemo/Insurance/Claims - Report


In [21]:
spGuids[0]

'42ba5837-27a3-4a0c-aa36-9cc2db95281e'

### 2.8 Apply classifications to assets

In [22]:
"""
Apply classification to SharePoint assets
"""
result = applyPurviewClassifications(purviewClient,spGuids,spClassifications)



### 2.9 Move assets to their final collection

In [23]:
"""
Move assets from default (root) collection to collectionName
"""
collectionName = 'SharePoint'
output = moveCollection(collectionName,purviewEndpointUrl,token,spGuids)

## 3. File System Demo

### 3.1 Scan Filesystem

In [24]:
"""
List all the files in Filesystem that match the defined file extensions. 
"""
fsFileList = listFilesystemFiles(filesystemPath, fileExtensions)

In [25]:
if displayVariables:
    for file in fsFileList:
        print(f"{file}")

SampleFiles\2021-Marshall-Fire-CO-Post-Wildfire-Toxicology-Report-sample-2_Redacted - Copy.pdf
SampleFiles\8 business-insurance-policy.pdf
SampleFiles\9 business-insurance-policy-wording.pdf
SampleFiles\asbestos_simple_sample_redacted - Copy.pdf
SampleFiles\Claim.pdf
SampleFiles\Dwelling_Scope-2.pdf
SampleFiles\GeneralEarthquakeDamageInspectionChecklist.pdf
SampleFiles\Marshall-Fire-C0-2021-Post-Wildfire-Toxicology-Report_Redacted.pdf
SampleFiles\Sample-Mold-Test-Report.pdf
SampleFiles\sample_asbestos_test_report_redacted.pdf
SampleFiles\SanBrunoCoordinatedDebrisRemovalRightofEntryForm.pdf


### 3.2 Generate file subset and extract contents

In [26]:
"""
Create a subset of the fsFileList based on the number specified by sampleSize, extract file 
contents, and metadata.
"""
if sampleSize == 0 or sampleSize > len(spFileList):
        sampleSize = len(spFileList)

fsFileContent = filesystemFileSampleList(fsFileList,sampleSize,filesystemPath)

Processing 9 business-insurance-policy-wording.pdf
Processing sample_asbestos_test_report_redacted.pdf
Processing Dwelling_Scope-2.pdf
Processing SanBrunoCoordinatedDebrisRemovalRightofEntryForm.pdf
Processing 2021-Marshall-Fire-CO-Post-Wildfire-Toxicology-Report-sample-2_Redacted - Copy.pdf
Processing Marshall-Fire-C0-2021-Post-Wildfire-Toxicology-Report_Redacted.pdf
Processing GeneralEarthquakeDamageInspectionChecklist.pdf
Processing Sample-Mold-Test-Report.pdf
Processing 8 business-insurance-policy.pdf
Processing asbestos_simple_sample_redacted - Copy.pdf


In [27]:
fsFileContent

[{'id': 'f43a713e-a1be-45ed-87f5-715e71a05322',
  'name': '9 business-insurance-policy-wording.pdf',
  'created_datetime': datetime.datetime(2025, 4, 1, 13, 29, 51, 160942),
  'created_by': '0',
  'size': 641042,
  'last_modified_datetime': datetime.datetime(2025, 4, 1, 13, 29, 51, 161939),
  'last_modified_by': '0',
  'source': 'SampleFiles\\9 business-insurance-policy-wording.pdf',
  'parentObject': 'SampleFiles',
  'typedef': 'FileSystem',
  'content': "Small  \nBusiness  \nInsuranceThis document contains the details of the AA Business Insurance Retail Policy. You should read it together with your Policy Schedule and Policy Summary, which contain information about the policy as it applies to you and your business.     Please read all this information carefully to make sure that the cover meets your needs.  Keep this information in a safe place - it contains important information about your policy should you want to make a claim or make changes to your insurance cover.  Useful contac

### 3.3 Estimate number of tokens to be used by LLM

In [28]:
tokens = estimateTokens(fsFileContent,textLength,classificationsStr,azureOpenAILLMModel)
print(f"Estimated Number of Tokens: {tokens}")

Estimated Number of Tokens: 893


### 3.4 Classify document contents using LLM

In [29]:
"""
Analyze Filesystem folder contents using Large Language Model to determine applicable
classifications. 
"""
fsFileContent = unstructuredDataClassification(fsFileContent,textLength,llmClient,azureOpenAIDeploymentName,classificationsStr)

Processing 9 business-insurance-policy-wording.pdf -> Insurance Policy
{'completion_tokens': 3, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens': 220, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}, 'total_tokens': 223}
Processing sample_asbestos_test_report_redacted.pdf -> Report
{'completion_tokens': 2, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens': 259, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}, 'total_tokens': 261}
Processing Dwelling_Scope-2.pdf -> Empty Content
Processing SanBrunoCoordinatedDebrisRemovalRightofEntryForm.pdf -> Empty Content
Processing 2021-Marshall-Fire-CO-Post-Wildfire-Toxicology-Report-sample-2_Redacted - Copy.pdf -> Empty Content
Processing Marshall-Fire-C0-2021-Post-Wildfire-Toxicology-Report_Redacted.pdf ->

### 3.5 Organize and Rollup Classifications

In [30]:
"""
Collect document classifications identified for FileSystem folder
"""
fsClassifications = rollupClassifications(fsFileContent)


In [31]:
if displayVariables:
    print(f"\nClassifications for FileSystem files: {fsClassifications}")


Classifications for FileSystem files: ['Empty Content', 'Insurance Policy', 'Report']


### 3.6 Ingest assets into Purview via Atlas API

In [32]:
"""
Load FileSystem Assets in Purview.
"""
fsGuids = loadPurviewAssets(purviewClient,fsFileContent)

Processing SampleFiles - Insurance Policy


In [33]:
if displayVariables:
    print(f"\nFileSystem GUIDs: {fsGuids}")


FileSystem GUIDs: ['ecb5b454-fa54-445f-acd3-0f41e354a001']


### 3.7 Apply classifications to assets

In [34]:
"""
Apply classification to SharePoint assets
"""
result = applyPurviewClassifications(purviewClient,fsGuids,fsClassifications)

### 3.8 Move assets to their final collection

In [35]:
"""
Move collections from default (root) collection to collectionName
"""
collectionName = 'FileSystem'
output = moveCollection(collectionName,purviewEndpointUrl,token,fsGuids)

## 4. Cleanup section


### 4.1 Delete assets and collections

You can delete individual assets using their respective GUIDs or you can leverage the collectionClient to delete collections recursively.

In [36]:
# Delete Entities
for guid in [*fsGuids, *spGuids]:
    response = purviewClient.delete_entity(guid=guid)
    print(json.dumps(response, indent=2))

{
  "mutatedEntities": {
    "DELETE": [
      {
        "typeName": "FileSystem",
        "attributes": {
          "qualifiedName": "customScanner://SampleFiles",
          "name": "SampleFiles"
        },
        "lastModifiedTS": "4",
        "guid": "ecb5b454-fa54-445f-acd3-0f41e354a001",
        "status": "ACTIVE",
        "displayText": "SampleFiles",
        "classificationNames": [
          "Insurance Policy",
          "Empty Content",
          "Report"
        ],
        "meaningNames": [],
        "meanings": [],
        "isIncomplete": false,
        "labels": [],
        "isIndexed": true,
        "collectionId": "FileSystem",
        "domainId": "seer"
      }
    ]
  }
}
{
  "mutatedEntities": {
    "DELETE": [
      {
        "typeName": "SharePoint",
        "attributes": {
          "qualifiedName": "customScanner://mngenvmcap551180.sharepoint.com/SPODemo/Insurance/Claims",
          "name": "mngenvmcap551180.sharepoint.com/SPODemo/Insurance/Claims"
        },
    

In [37]:
# Delete sub-collection contents and sub-collections
collectionClient.delete_collections_recursively("Unstructured",delete_assets=True)
# Delete parent collection
collectionClient.delete_collections("Unstructured")

Attempting to delete assets in collection: 'FileSystem'
Note: This could take time if there's a large number of assets in the collection
All assets have been successfully deleted from collection: 'FileSystem'


The collection 'FileSystem' was successfully deleted


Attempting to delete assets in collection: 'SharePoint'
Note: This could take time if there's a large number of assets in the collection
All assets have been successfully deleted from collection: 'SharePoint'


The collection 'SharePoint' was successfully deleted


The collection 'Unstructured' was successfully deleted




### 4.2 Delete custom classifications and entity types

In [38]:
# Delete custom classifications
for classification in classifications:
    purviewClient.delete_type(classification)

# Delete custom Entity Types
for entityName in entityTypes:
    # if entityName == 'FileSystem':
    edef = EntityTypeDef(
        name = entityName,
        superTypes= ['DataSet']
    )
    results = purviewClient.delete_typedefs(
        entityDefs=[edef],
        force_update=True
    )

In [39]:
# Delete all Jupyter notebook variables
%reset -f