Skip to content

Demo that uses Step Functions to process Alfresco events sent to Firehose.

License

Notifications You must be signed in to change notification settings

gavincornwell/firehose-step-functions-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Welcome

This demo builds upon the Firehose Rekognition demo where events emitted from Alfresco are sent to AWS Kinesis Firehose.

Rather than using one large Lambda function to process uploaded images this demo orchestrates several smaller Lambda functions using Step Functions.

Use Case

The demo uses a fictional use case around an insurance company. As images or text files are uploaded into the system they are sent asynchronously to Rekognition or Comprehend for analysis, the content type of the node is changed to an appropriate custom type (acme:insuranceClaimImage or acme:insuranceClaimReport) and a unique ID generated.

If an image is detected Rekognition is used to look for a Cars, Motorcycles, Boats, Electronics, Jewelry, Wristwatches, Clocks, Bicycles, Sport Equipment and Furniture. If detected, the acme:claimType property is set appropriately otherwise the default value of "Unknown" is used.

If a text file is detected Comprehend is used to look for the claim adjusters name and the date and location of the visit within the text. The sentiment of the text is also analysed to determine whether a follow up visit is likely to be required. The extracted values are set on the acme:claimAdjuster, acme:visitDate, acme:visitLocation and acme:visitFollowUp properties, respectively.

The architecture for this is shown in the diagram below:

Overview

Prerequisites

To run this demo some familiarity with Alfresco and AWS is presumed.

The AWS CLI needs to be present, configured with a valid access ID/key and configured to use either the North Virginia (us-east-1), Oregon (us-west-2) or Ireland (eu-west-1) region.

An S3 bucket and an EC2 KeyPair created in the same region in which the demo will be run.

Access to the AWS console.

Package & Deploy

Run the deploy script passing the name of an S3 bucket to upload the code to, the name of the stack to create and the name of the key pair to use, for example:

deploy demo-code-deployments my-stack-name my-key-pair

After a short while you'll see the stack appear in the AWS CloudFormation Console, you can track progress of the stack creation there.

Once the stack is complete select the "Outputs" tab (shown in the screenshot below) to see all the information you'll need for accessing the system.

Outputs

NOTE: It will take about 10 minutes for the Alfresco Repository to be ready for use.

Demo

Click on the link for the "ShareUrl" key shown in the CloudFormation "Outputs" tab. Login using the values of the "RepoUserName" and "RepoPassword" outputs and create a site.

Handling Images

Upload a few pictures, in the example shown below I've chosen an image that contains a car, one that contains a bicycle and the Alfresco logo.

Uploaded Images

It will take a couple of minutes for the events to make their way through the Kinesis Firehose stream (it buffers data, the minimum interval is 1 minute), get processed by the state machine and prompt the update of the images metadata.

Visit the Step Functions console and click on the state machine, you should see a list of successful executions. Clicking on an execution result should show something similar to the screenshot below:

State Machine Image Result

The "Execution Details" on the right side of the console shows information on the execution including the input and output (shown below), this can be really useful for monitoring debugging when things go wrong.

State Machine Image Output

The bottom half of the console shows the steps the state machine took and links to the logs of the Lambda functions called during the execution, clicking the CloudWatch Logs link of the ProcessImage function shows an output similar to the one shown below:

Rekognition Output

The last step in the state machine is to update the metadata by calling the Alfresco REST API.

Go back to Share, navigate to the folder where you uploaded the images and click on the image containing the car. Examine the properties of the image and you'll see it's type has been changed to acme:insuranceClaimImage as the custom properties are present as shown in the screenshot below:

Car Properties

A unique ID has been generated for the acme:imageId property and the acme:claimType property has been set appropriately.

A similar thing has happened to the image of the bicycle and the logo, the acme:claimType property will be set to "Bicycle" and "Unknown", respectively.

Handling Text

Create a new text file named report.txt in the site created earlier with the following content (see screenshot below):

My name is Joe Bloggs​, I visited 123 Acme Street, London on December 16, 2017. I'm pleased to confirm that this is a valid claim.

New Text File

Once the new file has been processed by Kinesis Firehose the same Step Function will have been called, go back to the console. Clicking on the latest execution result should show something similar to the screenshot below:

State Machine Text Result

As before, look at the "Execution Details" on the right side of the console to see the input and output (shown below):

State Machine Text Output

As a text file was detected by the Step Function the ProcessText Lambda function was called this time, clicking on the CloudWatch Logs link in the lower part of the console shows what Comprehend returned:

Rekognition Output

The last step in the state machine is to update the metadata by calling the Alfresco REST API, this can be verified by going back to Share and examining the properties of the text file created earlier. The type should now be acme:insuranceClaimReport and the properties set as shown below:

Text Properties

Troubleshooting

The AMI used by the CloudFormation stack is a 5.2 Enterprise Server, as such a trial license will be generated. Once your stack is over 30 days old the repository will go into read-only mode. If this happens either apply a valid license or re-create the stack.

If you need to SSH to the EC2 instance use centos@<public-ip>. You can get the public-ip from any of the URLs output by the CloudFormation template. Also, remember to use the SSH key selected when creating the stack!

The log files for the Repository and Share are located in /var/log/tomcat-alfresco and /var/log/tomcat-share, respectively.

To see the events being processed on the repository add the following debug statements to /usr/share/tomcat/shared/classes/alfresco/log4j.properties:

log4j.logger.org.alfresco.messaging.camel.routes.KinesisFirehoseRouteBuilder=debug
log4j.logger.org.apache.camel.component.aws=debug
log4j.logger.com.amazonaws.request=debug

Further configuration (including the name of the target Firehose stream) can be made in /usr/share/tomcat/shared/classes/alfresco-global.properties.

If you make any configuration changes you'll need to restart the Repository or Share Tomcat service, using service tomcat-alfresco restart or service tomcat-share restart, respectively. Note: you'll need to sudo su first.

To check that events are being emitted you can also examine the ActiveMQ admin console using the ActiveMQUrl output by the CloudFormation template. You should see the number highlighted in the screenshot below increasing after activity in Share.

Events

If you're still having problems feel free to raise an issue.

Cleanup

When you're finished with the stack (you will be charged a small amount for the resources it uses) navigate to the CloudFormation console, select the stack you created and choose "Delete Stack" from the "Actions" menu.

About

Demo that uses Step Functions to process Alfresco events sent to Firehose.

Resources

License

Stars

Watchers

Forks

Packages