# Glean API Intro - Search and Uploading
This notebook shows an example how to use the search & indexing APIs with Glean.

**Note:** You will need to have a client and indexing API tokens, which can be created in the API Tokens section of the Admin console in Glean or will be provided for you

# üîê API Key Setup Instructions

This lab requires **two API keys** from Glean:
1. **GLEAN-CLIENT-API** - for search operations
2. **GLEAN-INDEX-API** - for document indexing operations

## üåê Google Colab Setup

1. Click the **üîë key icon** in the left sidebar (Secrets)
2. Click **"+ Add new secret"**
3. Add both secrets:
   - **Name:** `GLEAN-CLIENT-API` ‚Üí **Value:** [paste your client API token]
   - **Name:** `GLEAN-INDEX-API` ‚Üí **Value:** [paste your indexing API token]
4. Make sure **"Notebook access"** is enabled for both secrets
5. Run the notebook cells in order

## üíª VSCode / Cursor / Local Setup

1. Create a file named **`.env`** in the project root directory
2. Add both keys to the file:
   ```
   GLEAN_CLIENT_API=your_client_token_here
   GLEAN_INDEX_API=your_indexing_token_here
   ```
3. Save the file (don't worry, it's automatically ignored by git)
4. Run the notebook cells in order

## üéØ Getting API Tokens

- Login to your Glean instance
- Go to **Admin Console** ‚Üí **API Tokens**
- Create tokens with appropriate scopes for your custom datasource

---

**Note:** The notebook will automatically detect which environment you're in and load keys accordingly!


# üöÄ Quick Start
This notebook will automatically set up all dependencies when you run it. Just follow these steps:

1. **Run the next cell** to automatically install all required packages
2. **Continue running cells in order** - all imports will be handled automatically

Works in:
- ‚úÖ Google Colab
- ‚úÖ VSCode with Jupyter extension  
- ‚úÖ Cursor with Jupyter extension
- ‚úÖ JupyterLab / Jupyter Notebook


In [None]:
# üì¶ Automatic Setup & Dependency Installation
# This helper automatically installs required packages for Colab and local environments
# For details on the installation logic, see: setup_helper.py

from setup_helper import install_requirements

# Install all required dependencies
install_requirements()

# Import Required Packages
The cell below will automatically import all required packages after installation.

In [None]:
try:
    import requests
    import json
    import os
    print("‚úÖ All packages imported successfully!")
except ImportError as e:
    print(f"‚ùå Error importing packages: {e}")
    print("Please run the setup cell above to install dependencies.")

In [None]:
# üîê Load Glean API Keys
# This helper automatically loads your API keys from Google Colab secrets or .env file
# For details on the key loading logic, see: glean_api_helpers.py

from glean_api_helpers import load_all_api_keys

# Load both GLEAN_CLIENT_API and GLEAN_INDEX_API
api_keys = load_all_api_keys()

# Define your parameters
* **group_num:** your assigned group number
* **search:** the string of text for your search
* **GLEAN_INSTANCE:** if you are changing the Glean instance from the default support-lab

In [None]:
# Configuration
group_num = '2026'  # Your assigned group number

# Glean instance name - Change 'support-lab' if using a different Glean instance
GLEAN_INSTANCE = "support-lab"
GLEAN_BASE_URL = f"https://{GLEAN_INSTANCE}-be.glean.com"

# Use pre-loaded API keys
client_api_token = api_keys['client']

# Validate we have the client token before proceeding
if not client_api_token:
    raise ValueError("‚ùå GLEAN-CLIENT-API is required for search operations. Please set it up and rerun.")

# Search parameters
search = 'Glean PTO policy'  # You can change this to search for specific terms
api_url_search = f"{GLEAN_BASE_URL}/rest/api/v1/search"

print(f"‚úÖ Client API configured for search on {GLEAN_BASE_URL}")


## Authorization and Prepare to Search ##
* **headers:** required and is format for Authorization including your token  
* **data:** required and the payload for search. Adjusting the pageSize with provide more/less results

In [None]:
headers = {
     'Content-Type': 'application/json',
     "Authorization": f"Bearer {client_api_token}"
}

data = {
    "trackingToken": "trackingToken",
    "query": search,
    "pageSize": 1
}

## Make the POST request and submit the search ##

In [None]:
response = requests.post(api_url_search, headers=headers, data=json.dumps(data))

## Print the response ##

In [None]:
print(response.status_code)
print(json.dumps(response.json(), indent=4))

# print(response.content) only needed if you want to see the raw response content

# Define your variables for Checking a Data Source
* **Datasource:** your custom data source name, which must be the same as the unique name field in setup
* **index_api_token:** token must be created from Indexing API tokens tab and have your custom datasource as scope
* **api_url:** which is updated depending on your data source variable

In [None]:
# Data source configuration
datasource = f"boostcampfeb26-group{group_num}"

# Use pre-loaded indexing API key
index_api_token = api_keys['index']

# Validate we have the index token before proceeding
if not index_api_token:
    raise ValueError("‚ùå GLEAN-INDEX-API is required for indexing operations. Please set it up and rerun.")

api_url_debug = f'{GLEAN_BASE_URL}/api/index/v1/debug/{datasource}/status'

print("‚úÖ Index API configured for data source operations")

# Make the POST request and submit the Data Source & Document Status
**Note:** Data source is build into api_url_debug, so no need for separate data variable as in previous examples

In [None]:
headers = {
        'Authorization': f'Bearer {index_api_token}',
        'Content-Type': 'application/json'
    }
response = requests.post(api_url_debug, headers=headers)
print(response.status_code)
print(json.dumps(response.json(), indent=4))

# print(response.content) Only run this code if needed for debugging when unformatted json is returned

# Bulk Indexing
**Notes:**
* You will need to have a indexing API token, which can be created in the API Tokens section of the Admin console in Glean.
* You will need to have a custom data source setup in order for the bulk indexing operation to work
# Define your parameters
* **object_type:** Name of the object defined in setup tab of the custom data source
* **id:** is the unique number you assigned the document in the bulkindexing
* **song:** is filename of the song
* **title:** is the title of the document seen in search results
* **name:** is the author of the document's name. For this exercise, it's pre-filled as your group number, but you cand your name
* **email:** is the author's email address. For this exercise, it's your email address 

In [None]:
object_type = 'Song' # Change to your object type if you are using a different one
id = f'gleandoc{group_num}'
upload_id = f'apiupload{group_num}'
song = 'Song01.txt' # Change to your song assigned song in the sheet
title = f'This is the song title for the {song}'
name = f'Glean Group {group_num} - Your Name' # Change to your name if you want
email = 'jennifer.shannon@glean-sandbox.com' #email = 'yourID@glean-sandbox.com' # Change to your assigned email address
view_url = f'https://customdatasource.blob.core.windows.net/customdatasource/{song}'
api_url_bulk = f'{GLEAN_BASE_URL}/api/index/v1/bulkindexdocuments'
# Don't forget to run this step! or the subsequent steps will fail

# Authorization and Prepare to Bulk Index

In [None]:
headers = {
        'Authorization': f'Bearer {index_api_token}',
        'Content-Type': 'application/json'
}

data = {
    "uploadId": upload_id,
    "isFirstPage": "true",
    "isLastPage": "true",
    "forceRestartUpload": "true",
    "datasource": datasource,
    "documents": [
        {
            "title": title,
            "filename": song,
            "datasource": datasource,
            "objectType": object_type,
            "viewURL": view_url,
            "id": id,
            "author": {
                "email": email,
                "name": name
            },
            "owner": {
                "email": name,
                "name": email
            },
            "permissions": {
                "allowAnonymousAccess": "true",
                "allowAllDatasourceUsersAccess": "true"
            },
            "createdAt": 1749060000,
            "updatedAt": 1749060000
        }
    ],
    "disableStaleDocumentDeletionCheck": "true"
}

response = requests.post(api_url_bulk, headers=headers, data=json.dumps(data))
print(response.status_code)
print(response.content)
# Note respone 200 b'' is expected for a successful upload, as the response body is empty.

# Check Status of Data Source & Document

In [None]:
headers = {
    'Authorization': f'Bearer {index_api_token}',
    'Content-Type': 'application/json'
}

data = {
    "objectType": object_type,
    "docId": id
}

response = requests.post(api_url_debug, headers=headers, data=json.dumps(data))
print(response.status_code)
print(json.dumps(response.json(), indent=4))

# Index One More Document
* **song_two:** file name for second song to upload
* **doc_title:** document title that will be seen in search results
* **doc_id:** unique number for document
* **text_content:** text describing the document
* **view_url:** where Glean can go to index
* **api_url_index_doc:** index document endpoint

In [None]:
song_two = 'Song02.txt' # Change to your song assigned song in the sheet
doc_title = f'This is the song title for the {song_two}'
doc_id = f'docid{group_num}'
text_content = f'This is the text content for the {song_two} from Group {group_num}'
view_url = f'https://customdatasource.blob.core.windows.net/customdatasource/{song_two}'
api_url_index_doc = f"{GLEAN_BASE_URL}/api/index/v1/indexdocument"
# Don't forget to run this step! or the subsequent steps will fail

# Authorization and Prepare to Index a Document


In [None]:
headers = {
    'Authorization': f'Bearer {index_api_token}',
    'Content-Type': 'application/json'
}

data = {
    "document": {
        "title": doc_title,
        "filename": song_two,
        "datasource": datasource,
        "objectType": object_type,
        "viewURL": view_url,
        "id": doc_id,
        "summary": {
            "mimeType": "text/plain",
            "textContent": text_content
        },
        "author": {
            "email": email,
            "name": name
        },
        "owner": {
            "email": email,
            "name": name
        },
        "permissions": {
            "allowAnonymousAccess": "true",
            "allowAllDatasourceUsersAccess": "true"
        },
        "createdAt": "1749060500", # Timestamp in EPOC seconds
        "updatedAt": "1749060500",
        "updatedBy": {
            "email": email,
            "name": name
        }
    }
}

response = requests.post(api_url_index_doc, headers=headers, data=json.dumps(data))
print(response.status_code)

# Good response is 200, with an empty body
# print(response) #Only needed if you want to see the raw response object
# print(response.content)