# Managing Content in Texas A&M University's ArcGIS Online Environment

## Table of Contents
- [Section 1: Import necessary modules](#section-1-import-necessary-modules)
- [Section 2: Connect to ArcGIS Online](#section-2-connect-to-arcgis-online)
- [Section 3: Configuration Setup](#section-3-configuration-setup)
- [Section 4: Identify Inactive Users](#section-4-identify-inactive-users)
- [Section 5: Identify Flagged Content](#section-5-identify-flagged-content)
- [Section 6: Remove Flagged Content](#section-6-remove-flagged-content)
- [Section 7: Generate Cleanup Report](#section-7-generate-cleanup-report)
- [Section 8: Main Function - Cleanup Execution](#section-8-main-function---cleanup-execution)


## Section 1: Import necessary modules
In this section, the essential Python libraries that are needed to complete the ArcGIS Online environment cleanup are imported:

- `arcgis.gis` - Enables interaction with ArcGIS Online.
- `datetime`, `timedelta` - Help calculate dates and time cutoffs.
- `pandas` - Used to organize and export user and content information into structured tables.
- `getpass` - Safely fetches system or ArcGIS username for logging purposes.

Each of these libraries plays a critical role in interacting with the data, time management, and result export.

In [1]:
from arcgis.gis import GIS
from datetime import datetime, timedelta
import pandas as pd
import getpass

## Section 2: Connect to ArcGIS Online
Here, a live connection to the ArcGIS Online organization is created. 
The **Organization ID** is retrieved which helps restrict searches and identify the **executor**, which is the person running the script.

Connecting securely also ensures that all operations target the correct environment and that actions like content deletion have proper authorization.

In [None]:
gis = GIS("home")
org_id = gis.properties.id
executor = gis.users.me.username if gis.users.me else getpass.getuser()
print(f"Connected to organization: {gis.properties.name} as {executor}")

## Section 3: Configuration Setup
Here, the time-based thresholds that define 'inactive' users and 'unused' content are configured.

- `YEARS_UNVIEWED` → Content not viewed in this many years is considered not used.
- `YEARS_INACTIVE` → Users inactive this long are flagged.
- `YEARS_UNMODIFIED` → Items not updated in this long are also flagged.

Cutoff dates are calculated relative to the current date. A **timestamp** is generated to use in filenames to keep outputs organized and unique.

In [None]:
YEARS_UNVIEWED = 1
YEARS_INACTIVE = 4
YEARS_UNMODIFIED = 8
TODAY = datetime.now()
CUTOFF_VIEWED = TODAY - timedelta(days=YEARS_UNVIEWED * 365)
CUTOFF_LOGIN = TODAY - timedelta(days=YEARS_INACTIVE * 365)
CUTOFF_MODIFIED = TODAY - timedelta(days=YEARS_UNMODIFIED * 365)
TIMESTAMP = TODAY.strftime('%Y%m%d_%H%M%S')

## Section 4: Identify Inactive Users
This function scans users in the organization and identifies those who have not logged in recently.

- Users who have never logged in are treated as very old accounts.
- The login timestamps are converted to readable dates.
- Users who have been inactive beyond the established threshold are added to a DataFrame and saved to a CSV file.

Finding inactive users allows us to identifying unused content and improving platform management.

In [None]:
def getInactiveUsers():
    all_users = gis.users.search(max_users=1000, sort_field='lastLogin', sort_order='desc')
    inactive_users = []
    for user in all_users:
        try:
            if user.lastLogin == 0:
                last_login_date = datetime(1970, 1, 1)
                last_login_str = "Never"
            else:
                last_login_date = datetime.utcfromtimestamp(user.lastLogin / 1000)
                last_login_str = last_login_date.strftime('%Y-%m-%d')
            if last_login_date < CUTOFF_LOGIN:
                inactive_users.append({
                    "Username": user.username,
                    "Full Name": getattr(user, "fullName", "N/A"),
                    "Email": getattr(user, "email", "N/A"),
                    "Last Login": last_login_str,
                    "_SortKey": last_login_date
                })
        except Exception as e:
            print(f"Error processing user {user.username}: {e}")
    inactive_users.sort(key=lambda x: x["_SortKey"])
    for u in inactive_users:
        del u["_SortKey"]
    df_inactive = pd.DataFrame(inactive_users)
    filename = f"inactive_users_{TIMESTAMP}.csv"
    df_inactive.to_csv(filename, index=False)
    print(f"Inactive users exported: {filename}")
    return df_inactive["Username"].tolist(), df_inactive

## Section 5: Identify Flagged Content
Content that is owned by inactive users is searched through and the metadata is analyzed:

- **Last Modified** → The last time the item was updated.
- **Last Viewed** → The last time the item was accessed.

Content that has not been modified or viewed in a long time is considered redundant and flagged.
This ensures we prioritize cleaning up only outdated materials without touching recent or active files.

In [None]:
def getFlaggedContent(usernames):
    flagged_content = []
    for username in usernames:
        try:
            user_content = gis.content.search(query=f"owner:{username} AND orgid:{org_id}", max_items=100)
            for item in user_content:
                modified_date = datetime.utcfromtimestamp(item.modified / 1000)
                if hasattr(item, "lastViewed") and item.lastViewed:
                    last_viewed_date = datetime.utcfromtimestamp(item.lastViewed / 1000)
                else:
                    last_viewed_date = datetime(1970, 1, 1)
                is_unmodified = modified_date < CUTOFF_MODIFIED
                is_unviewed = last_viewed_date < CUTOFF_VIEWED
                if is_unmodified and is_unviewed:
                    reason = "unmodified & unviewed"
                elif is_unmodified:
                    reason = "unmodified"
                elif is_unviewed:
                    reason = "unviewed"
                else:
                    continue
                flagged_content.append({
                    "Title": item.title,
                    "Owner": item.owner,
                    "Item Type": item.type,
                    "Item ID": item.id,
                    "Last Modified": modified_date.strftime('%Y-%m-%d'),
                    "Last Viewed": last_viewed_date.strftime('%Y-%m-%d'),
                    "URL": item.homepage if hasattr(item, 'homepage') else f"https://www.arcgis.com/home/item.html?id={item.id}",
                    "Reason": reason
                })
        except Exception as e:
            print(f"Error processing content for user {username}: {e}")
    df_flagged = pd.DataFrame(flagged_content)
    if not df_flagged.empty:
        df_flagged["Last Modified"] = pd.to_datetime(df_flagged["Last Modified"])
        df_flagged["Last Viewed"] = pd.to_datetime(df_flagged["Last Viewed"])
        df_flagged.sort_values(by="Last Modified", inplace=True)
        filename = f"flagged_items_{TIMESTAMP}.csv"
        df_flagged.to_csv(filename, index=False)
        print(f"Flagged content exported: {filename}")
    return df_flagged

## Section 6: Remove Flagged Content
Once flagged items are reviewed, they can be removed.

- Each flagged item is removed by its unique Item ID.
- Items successfully removed are tracked for reporting.
- Errors during removal are logged.

This phase is critical and must be executed carefully to prevent accidental data loss.

In [None]:
def removeFlaggedContent(df_flagged):
    removed = []
    for _, row in df_flagged.iterrows():
        try:
            item = gis.content.get(row["Item ID"])
            item.delete()
            removed.append(row)
            print(f"Removed: {row['Title']} (ID: {row['Item ID']})")
        except Exception as e:
            print(f"Failed to remove {row['Item ID']}: {e}")
    return pd.DataFrame(removed)

## Section 7: Generate Cleanup Report
After the scanning or removal phase, a report is generated summarizing:

- How many users were inactive.
- How many items were flagged.
- How many items were successfully removed.

This report provides transparency and accountability, especially useful for audits or future cleanups.

In [None]:
def generateReport(df_inactive, df_flagged, df_removed):
    report_lines = [
        f"GIS Cleanup Report - {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}",
        f"Executor: {executor}",
        f"Organization: {gis.properties.name}",
        f"\nSummary:",
        f"Total Inactive Users: {len(df_inactive)}",
        f"Total Flagged Content: {len(df_flagged)}",
        f"Total Removed Items: {len(df_removed)}",
    ]
    if not df_flagged.empty:
        report_lines.append("\nFlagged Content:")
        preview = (df_removed if not df_removed.empty else df_flagged).head(10)
        for _, row in preview.iterrows():
            report_lines.append(
                f"- {row.get('Title', 'N/A')} ({row.get('Item ID', 'N/A')}) by {row.get('Owner', 'N/A')} | Last Modified: {row.get('Last Modified', 'N/A')} | Last Viewed: {row.get('Last Viewed', 'N/A')}"
            )
    else:
        report_lines.append("\nNo flagged content found.")
    report_filename = f"cleanup_report_{TIMESTAMP}.txt"
    with open(report_filename, "w") as file:
        file.write("\n".join(report_lines))
    print(f"Report generated: {report_filename}")

## Section 8: Main Function - Cleanup Execution
This final section ties everything together:

- Identifies inactive users.
- Flags content.
- Gives the user three options: generate a report, cancel, or confirm removal.

The user makes a choice whether to proceed. If removal is confirmed, removal is executed safely with a secondary user confirmation.

In [None]:
inactive_usernames, df_inactive = getInactiveUsers()
df_flagged = getFlaggedContent(inactive_usernames)

if df_flagged.empty:
    print("No flagged content found.")
else:
    print(f"{len(df_flagged)} items flagged for potential removal.")
    print("Options:")
    print("Type 'report' → Generate a report of flagged items")
    print("Type 'cancel' → Exit without removing anything")
    print("Type 'confirm' → Proceed to removal of flagged items")

    choice = input("Enter your choice: ").strip().lower()

    if choice == "report":
        generateReport(df_inactive, df_flagged, pd.DataFrame())
    elif choice == "cancel":
        print("Exiting without changes.")
    elif choice == "confirm":
        confirm = input("Are you sure you want to remove flagged items? (yes/no): ").strip().lower()
        if confirm == "yes":
            df_removed = removeFlaggedContent(df_flagged)
            generateReport(df_inactive, df_flagged, df_removed)
        else:
            print("Exiting without changes.")
            generateReport(df_inactive, df_flagged, pd.DataFrame())
    else:
        print("Invalid choice. No actions taken.")