Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

141 update failure notification #144

Closed
wants to merge 41 commits into from
Closed

141 update failure notification #144

wants to merge 41 commits into from

Conversation

WallysFerreira
Copy link

Description

Change result message of email and slack notification in errors

@WallysFerreira
Copy link
Author

Should I add constants with new key strings?

@namitad
Copy link
Collaborator

namitad commented Oct 1, 2023

@WallysFerreira yes - lets move them to constants

result[constants.ERROR] = event['taskname'] + 'Error'
result[constants.CAUSE] = event['status']
result['failed step'] = event['taskname'] + 'Error'
result['cause of failure'] = event['status']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets use constants for these

Copy link
Collaborator

@namitad namitad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have added some comments. please address

@namitad
Copy link
Collaborator

namitad commented Oct 1, 2023

@WallysFerreira as part of the PR description, could you also add a screenshot of the new email and slack notification post your changes?

@WallysFerreira
Copy link
Author

@WallysFerreira as part of the PR description, could you also add a screenshot of the new email and slack notification post your changes?

Yes I can do that

@WallysFerreira
Copy link
Author

Maybe I jumped some step on the tutorial, but when it tries to deploy to AWS it returns

Error: Template file not found at /home/wf/Trapheus/deploy.yaml

@namitad
Copy link
Collaborator

namitad commented Oct 5, 2023

@WallysFerreira do reverify your setup and a give it a try again. If it doesnt work, let me know - i can try running your changes locally to verify the output.

@WallysFerreira
Copy link
Author

I've tried running on a codespace and it's still returning an error, even though there is a template.yaml and deploy.yaml on the directory.
If you could run the changes locally I'd really appreciate it

FileNotFoundError: [Errno 2] No such file or directory: 'sam package --template-file template.yaml --output-template-file deploy.yaml --s3-bucket teste-741284363623 --region us-east-2'

@namitad
Copy link
Collaborator

namitad commented Oct 5, 2023

Ok.. give me a day or two. Will try this locally and get back to you.

@namitad
Copy link
Collaborator

namitad commented Oct 8, 2023

@WallysFerreira i ran your changes locally. but i get the following error:

{
  "errorMessage": "'Identifier'",
  "errorType": "KeyError",
  "requestId": "9c87e534-7933-4044-9295-d8a6834c23c8",
  "stackTrace": [
    "  File \"/var/task/email_function.py\", line 18, in lambda_handler\n    result[constants.DB_ID] = event['Identifier']\n"
  ]
}
Screenshot 2023-10-08 at 7 46 49 PM

can we check why identifier is not part of the input and if any changes need to be made in the template.yaml to cascade this input?

also, i faced the same issue as you did in trying to run trapheus locally - some recent changes have broken the install.py file.
Could you use this version of install.py locally while i am check the fix for the same?

@WallysFerreira
Copy link
Author

Yeah sure I'll look into it and use the new install.py. Thanks

@namitad
Copy link
Collaborator

namitad commented Oct 9, 2023

@WallysFerreira i have merged the fix for install.py. can you pull the latest on your branch and try - no need to use a separate version of the file

@WallysFerreira
Copy link
Author

I think the error was caused by the capital 'i' in identifier, but can't actually check because I'm getting another error when running install.py

Error: Unable to upload artifact src/common/common.zip referenced by ContentUri parameter of CommonLambdaLayer resource.
Unable to locate credentials

@namitad
Copy link
Collaborator

namitad commented Oct 11, 2023

@WallysFerreira looks like your temporary aws credentials may have expired. could you refresh and try again?

@WallysFerreira
Copy link
Author

I tried with new aws credentials and still got the same error :(
Could you please try running it locally?

@stationeros
Copy link
Contributor

stationeros commented Oct 15, 2023

@namitad / @WallysFerreira there is a couple of issues in the way this is implemented. The way we are getting the identifier is a bit off. The event that gets generated is this

{'Error': 'SnapshotCreationException', 'Cause': '{"errorMessage": "Identifier:database-3 \\nAn error occurred (DBClusterNotFoundFault) when calling the CreateDBClusterSnapshot operation: DBCluster not found: database-3", "errorType": "SnapshotCreationException", "requestId": "6568ebed-be45-4962-bdc3-35bf080c4322", "stackTrace": ["  File \\"/var/task/cluster_snapshot_function.py\\", line 25, in lambda_create_cluster_snapshot\\n    util.eval_snapshot_exception(error, cluster_id, rds)\\n", "  File \\"/opt/python/utility.py\\", line 72, in eval_snapshot_exception\\n    raise custom_exceptions.SnapshotCreationException(error_message)\\n"]}'} | event is {'Error': 'SnapshotCreationException', 'Cause': '{"errorMessage": "Identifier:database-3 \\nAn error occurred (DBClusterNotFoundFault) when calling the CreateDBClusterSnapshot operation: DBCluster not found: database-3", "errorType": "SnapshotCreationException", "requestId": "6568ebed-be45-4962-bdc3-35bf080c4322", "stackTrace": [" File \\"/var/task/cluster_snapshot_function.py\\", line 25, in lambda_create_cluster_snapshot\\n util.eval_snapshot_exception(error, cluster_id, rds)\\n", " File \\"/opt/python/utility.py\\", line 72, in eval_snapshot_exception\\n raise custom_exceptions.SnapshotCreationException(error_message)\\n"]}'}

As you can see there is no direct way of getting the Identifier as a key. We have a utilty function defined that needs to be utilised https://github.com/intuit/Trapheus/blob/master/src/common/python/utility.py#L33 to get the identifier from the event.

Secondly, the way TASK_NAME is getting extracted also is a bit odd. Its not present directly in the event but its in the context object of the lambda runtime. You can simply do context.function_name to get that.

With the above changes, you guys should be able to get that out. Let me know if you need help

Screenshot 2023-10-15 at 3 32 47 PM

@namitad
Copy link
Collaborator

namitad commented Oct 17, 2023

@WallysFerreira it would still run into a similar issue (as mentioned by @stationeros above) with the "identifier" change. we need to explicitly pass the identifier name to the state where alerts are being sent. Can we look at using a pass state https://docs.aws.amazon.com/step-functions/latest/dg/amazon-states-language-pass-state.html to add this to the state machine template.yaml?

This can also be used in the future if we need to add additional values - such as execution id.

@namitad
Copy link
Collaborator

namitad commented Oct 17, 2023

@stationeros yes - taskname is being cascaded because in the context object - it would either be the email alert/slack notification lambda name. even if set from each step, it would expose the lambda function name to the user instead of the flow. we can revisit if this change needs to be done and maybe address that as a separate issue.

@WallysFerreira
Copy link
Author

@stationeros that's super helpful, thanks. I'll make those changes asap.

@namitad I'll try to look into using a pass state afterwards, but it would still work without it if I make the changes @stationeros said, right? Just so I can do one thing at a time

@stationeros
Copy link
Contributor

@WallysFerreira Thats right? You can possibly fix the instance identifier and the task name chaining can be done later. I can break this into subtasks for this to be managed better

@stationeros
Copy link
Contributor

@WallysFerreira Can you make the code change for instance identifier , you just need to call this inside the lambda , we can then merge this commit

@stationeros
Copy link
Contributor

@WallysFerreira Closing this PR due to inactivity, feel free to re-open and submit if you are still working on this.

@stationeros stationeros closed this Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants