# Version diffing - Part 2: Automation

In part 1, I introduced programmatic Version diffing of Speckle data. In this part I will show how this simple example can be deployed to respond to new commits on a branch. This uses the Speckle webhook functionality.

In the example of Rhino models being incrementally amended and Versions committed to Speckle, the comparison being between the latest and the immediately previous Version. In a webhook scenario, the comparison is between the latest Version and the Version that was current when the webhook was triggered. This is because the webhook is created against a specific commit, and the webhook is triggered when a new commit is made.

**NB:**
A different version to this could be that any new commit will trigger the automation which will ignore which commit was made and always take the latest and the previous. Whcih option you prefer will depend on how frequently you are making commits. If you are making commits frequently, then you may potentially produce the same diffing result twice. When dealing with events based on large models, this may be undesirable.

Recapping the basic functionality:

In [21]:
%%capture
%pip install specklepy==2.14.0

# dotenv is a library that allows you to load environment variables from a .env file
%pip update python-dotenv
%reload_ext dotenv
%dotenv

In [22]:
# boilerplate user credentials and server

import os
HOST_SERVER = os.getenv('HOST_SERVER')
ACCESS_TOKEN = os.getenv('ACCESS_TOKEN')

In [23]:
from specklepy.api.client import SpeckleClient

client = SpeckleClient(host=HOST_SERVER)  # or whatever your host is
client.authenticate_with_token(ACCESS_TOKEN)  # or whatever your token is

In [24]:
stream_id = "20e76a799c"  # or whatever your stream id is

from specklepy.transports.server import ServerTransport
from specklepy.api.wrapper import StreamWrapper

transport = ServerTransport(client=client, stream_id=stream_id)

stream = client.stream.get(stream_id)

In [25]:
commits = client.commit.list(stream_id=stream_id, limit=2)

from specklepy.api import operations

# get obj id from latest commit
latest = commits[0].referencedObject
previous = commits[1].referencedObject

# receive objects from speckle
latest_data = operations.receive(obj_id=latest, remote_transport=transport)
previous_data = operations.receive(obj_id=previous, remote_transport=transport)

But wait! The `commit.list` is getting us the last two commits. But in Part 1 of this tutorial we made a "Diff!" commit.

In [26]:
commits

[Commit( id: b6537cf855, message: Diff!, referencedObject: 698ef89bde18ec88296164b0e267ff93, authorName: Jonathon Broughton, branchName: diffs, createdAt: 2023-05-08 19:22:59.188000+00:00 ),
 Commit( id: c13e9777b1, message: Diff!, referencedObject: 698ef89bde18ec88296164b0e267ff93, authorName: Jonathon Broughton, branchName: diffs, createdAt: 2023-05-08 19:20:06.272000+00:00 )]


Sensibly, however, the Diff commit was made on a separate Model branch. This is one of the advantages of using branches in general, partitioning versions for later recall.

To query a specific branch for commits, we use a different part of the Client API:

In [27]:
# Add the branch name as a query filter,
# limit changes to commits_limit for this call.
main_branch_commits = client.branch.get(
    stream_id=stream_id, name="main", commits_limit=2
)

main_branch_commits.commits.items

[Commit( id: 1ebb510278, message: Second Commit, referencedObject: e44034f90b817573270cc3ec2534c74f, authorName: Jonathon Broughton, branchName: main, createdAt: 2023-04-17 21:57:05.460000+00:00 ),
 Commit( id: 2bf8b491ce, message: First Commit, referencedObject: 9e18b762dd52788a457de6a7b961d428, authorName: Jonathon Broughton, branchName: main, createdAt: 2023-04-17 21:55:50.244000+00:00 )]

There's our two original commits.

But we need to go further. If we will be responding to a commit event, we shall want to request that specific commit and the immediately preceeding one.

We can inspect the properties of the webhook to determine how to do that.

## Webhooks

We write about Speckle Webhooks in the docs, so I won't go into a great detail here except to show what would be the setup in this case.

The webhook setup: url, description(optional), secret(optional) and events.

![Creating a webhook for new commits](../../assets/webhook-create.png "Creating a webhook for new commits")

and the resultant payload from a new commit being added:

```
{
  "payload": {
    ...
    "event": {
      "event_name": "commit_create",
      "data": {
          "id": "1ebb510278",
          "commit": {
            "message": "Second Commit",
            "objectId": "e44034f90b817573270cc3ec2534c74f",
            "sourceApplication": "Rhino7",
            "branchName": "main",
            "authorName": "Jonathon",
          }
      }
    },
    "server": {
      ...
    },
    "stream": {
      "id": "20e76a799c",
      "name": "Diffing Demo",
      "createdAt": "2023-04-16T17:17:05.058Z",
      ...
    },
    "user": {
      ...
      "name": "Jonathon",
      ...
      },
    "webhook": {
      "id": "a77eb6c427",
      "streamId": "20e76a799c",
      "url": "https://your.code/endpoint",
      "triggers": [
        "commit_create"
      ]
    }
  }
}
```

Critical parts of this worth noting are the `event_name` and the `data` object. The `event_name` is the event that triggered the webhook. In this case, it is `commit_create`. The `data` object contains the `commit` object which contains the `id` of the commit that was created.


In [28]:
%%capture

%pip install functions_framework

The flatten function we can reuse from before.

In [29]:
# Flatten the objects into a list of objects
from collections.abc import Iterable, Mapping
from specklepy.objects import Base


def flatten(obj, visited=None):
    
    # Avoiding pesky circular references
    if visited is None:
        visited = set()

    if obj in visited:
        return

    visited.add(obj)

    # Define a logic for what objects to include in the diff
    should_include = any(
        [
            hasattr(obj, "displayValue"),
            hasattr(obj, "speckle_type")
            and obj.speckle_type == "Objects.Organization.Collection",
            hasattr(obj, "displayStyle"),
        ]
    )

    if should_include:
        yield obj

    props = obj.__dict__

    # traverse the object's nested properties - which may include yieldable objects
    for prop in props:
        value = getattr(obj, prop)

        if value is None:
            continue

        if isinstance(value, Base):
            yield from flatten(value, visited)

        elif isinstance(value, Mapping):
            for dict_value in value.values():
                if isinstance(dict_value, Base):
                    yield from flatten(dict_value, visited)

        elif isinstance(value, Iterable):
            for list_value in value:
                if isinstance(list_value, Base):
                    yield from flatten(list_value, visited)


And the comparison function:

In [30]:
# Compare two Speckle commits and populate a List of tuples
from specklepy.objects.base import Base
from typing import List, Tuple

# Compare two Speckle commits and populate a List of tuples with the following:
def compare_speckle_commits(
    commit1_objects: List[Base], commit2_objects: List[Base]
) -> Tuple[List[Tuple[Base, Base]], List[Tuple[None, Base]], List[Tuple[Base, None]]]:
    commit1_dict = {obj.id: obj for obj in commit1_objects[1:]}
    commit2_dict = {obj.id: obj for obj in commit2_objects[1:]}

    # Find unchanged objects
    for obj_id in commit1_dict.keys():
        if obj_id in commit2_dict.keys():
            yield (commit1_dict[obj_id], commit2_dict[obj_id]) # old, new

    # Find changed objects
    for obj_id, obj in commit1_dict.items():
        if obj_id not in commit2_dict.keys() and obj.applicationId in [
            x.applicationId for x in commit2_dict.values()
        ]:
            yield (
                obj, # old object
                [ x for x in commit2_dict.values()
                    if x.applicationId == obj.applicationId
                ][0], # new changed object
            )

    # Find added objects
    for obj_id, obj in commit2_dict.items():
        if obj_id not in commit1_dict.keys() and obj.applicationId not in [
            x.applicationId for x in commit1_dict.values()
        ]:
            yield (None, obj) # old, new

    # Find removed objects
    for obj_id, obj in commit1_dict.items():
        if obj_id not in commit2_dict.keys() and obj.applicationId not in [
            x.applicationId for x in commit2_dict.values()
        ]:
            yield (obj, None) # old, new


To vary the example to a different use-case I have amended the `store_speckle_commit_diff` function to only report changed elements grouped by their applicationId. This could be useful if you are updating a separate database. In this case the two things that are worth noting would be `DELETE` representing an element that was removed, `UPDATE` representing an element that was changed and `INSERT` representing an element that was added.

In [31]:
def report_speckle_commit_diff(commit1_objects, commit2_objects):
    diff_report = []

    for obj in compare_speckle_commits(commit1_objects, commit2_objects):
        object_report = {"applicationId": obj[1].applicationId, "data": obj[1].to_dict()}

        if getattr(obj[0], "id", None) == getattr(obj[1], "id", None):
            continue  # Skip unchanged elements

        if obj[0] is not None and obj[1] is not None:
            object_report["verb"] = "UPDATE"
        elif obj[0] is None and obj[1] is not None:
            object_report["verb"] = "INSERT"
        elif obj[0] is not None and obj[1] is None:
            object_report["verb"] = "DELETE"

        diff_report.append(object_report)

    return diff_report

That would probably suffice for any follow-up actions you might want to take using those database instructions. But for storing this report in Speckle itself for posterity, we can convert the report into a Speckle object and commit it to the stream.

In [32]:
# convert the diff report to a speckle base object

from specklepy.objects import Base
from specklepy.objects.other import Collection

def diff_report_to_speckle_base(diff_report):
    commit = Collection(collectionType="diff_report", name="diff report", elements=[])

    deleted = Collection(collectionType="deleted objects", name="deleted objects", elements=[])
    updated = Collection(collectionType="updated objects", name="updated objects", elements=[])
    inserted = Collection(collectionType="inserted objects", name="inserted objects", elements=[])

    for obj in diff_report:
        target_collection = None
        if obj["verb"] == "DELETE":
            target_collection = deleted
        elif obj["verb"] == "UPDATE":
            target_collection = updated
        elif obj["verb"] == "INSERT":
            target_collection = inserted
        
        target_collection.elements.append(Base.from_dict(obj["data"]))

    commit.elements.extend([deleted, updated, inserted])

    return commit

For this example I'll deploy this webhook handler as a Google Cloud Function (other methods to do this exist) so for that we'll import the `functions_framwork` package. And load in the secrets from environment variables.

In [33]:
#GCP SDK for cloud functions
import functions_framework
import os

GCP_API_USER = os.getenv('GCP_API_USER')
GCP_API_KEY = os.getenv('GCP_API_KEY')
GCP_PROJEC = os.getenv('GCP_PROJECT')

`@functions_framework.http` is the function decorator that will allow us to deploy this as a Google Cloud Function that is triggered by an HTTP request.

**NB**
To be defensive, we should check that the event that triggered the webhook is the one we are expecting. In this case, we are expecting `commit_create`. In addition we want to limit the webhook to only respond to commits on the `main` branch. This is because we are only interested in commits to the main branch. We can do this by checking the `branchName` property of the `commit` object. The handler could be protected further by using the webhook `secret` to sign the payload and verify the signature. This is not covered here. But this is a good idea if you are deploying a webhook handler to a public endpoint. But in particular can be used to distinguish between different workloads that are being triggered by the same webhook event to different endpoints.

My webhook handler looks like this:

In [34]:
from specklepy.api import operations

@functions_framework.http
def diffing_handler(request):
    # Retrieve payload from the request
    payload = request.get_json()

    # Check if the event is a commit creation event
    event = payload.get('event')
    if not event or event.get('event_name') != 'commit_create':
        return 'Not a commit created event', 204

    # Retrieve branch name from the payload
    branch_name = payload['data']['commit'].get('branch_name')
    if not branch_name or branch_name != 'main':
        return 'Commit not on the main branch', 400

    # Retrieve stream ID and commit ID from the payload
    stream_id = payload['stream']['id']
    commit_id = payload['data']['id']

    # Get a list of recent commits from the main branch
    commits = client.branch.get(branch_name, stream_id=stream_id,
                                commits_limit=10).commits.items

    # Find the consecutive commits that match the commit ID
    consecutive_commits = [commits[i:i+2] for i in range(len(commits)-1)
                           if commits[i].id == commit_id]
    if consecutive_commits:
        latest, previous = consecutive_commits[0]
    else:
        return 'Commit not found', 404

    # Get the referenced objects from the consecutive commits
    latest = commits[0].referencedObject
    previous = commits[1].referencedObject

    # Receive the object data from Speckle
    latest_data = operations.receive(obj_id=latest,
                                     remote_transport=transport)
    previous_data = operations.receive(obj_id=previous,
                                       remote_transport=transport)

    # Flatten the object data lists
    latest_objects = list(flatten(latest_data))
    previous_objects = list(flatten(previous_data))

    # Generate the diff report between the previous and latest objects
    diff_report = report_speckle_commit_diff(previous_objects,
                                             latest_objects)

    # Convert the diff report to a Speckle base object
    diff_commit = diff_report_to_speckle_base(diff_report)

    # Send the diff report object to the Speckle server
    obj_id = operations.send(base=diff_commit, transports=[transport])

    # Create a new commit with the diff report
    diff_commit_id = client.commit.create(
        stream_id=stream_id,
        branch_name="diffs",
        message=f"diff report for commit {commit_id}",
        source_application="diffing",
        object_id=obj_id,
    )

    # Return success response with the diff commit ID
    return f"diff report created with id {diff_commit_id}", 200


In an ideal world we'd separate the webhook handler from the deployment code, but for the sake of simplicity I'll leave it here. Separating the two would allow us to deploy the webhook handler to other platforms and also to allow for testing of the webhook handler without having to deploy it.

I won't go into detail about how to deploy a Google Cloud Function, but the code is available in the GCP docs. The important thing to note is that the function is deployed to a specific endpoint. In this case, it is `https://{{CLOUD_REGION_GCP_PROJECT}}.cloudfunctions.net/diffing_handler`. This is the endpoint that we will use when creating the webhook.