Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for DynamicPartitioningConfiguration in firehose resource #20769

Merged
merged 14 commits into from
Nov 18, 2021
Merged

support for DynamicPartitioningConfiguration in firehose resource #20769

merged 14 commits into from
Nov 18, 2021

Conversation

bozerkins
Copy link

@bozerkins bozerkins commented Sep 2, 2021

Closes #20763

Added support for dynamic partitioning configuration. The syntax is very straight forward

dynamic_partitioning_configuration {
  enabled = true
  retry_options {
    duration_in_seconds = 300
  }
}

Community Note

  • Please vote on this pull request by adding a 👍 reaction to the original pull request comment to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for pull request followers and do not help prioritize the request

Relates OR Closes #0000

@github-actions github-actions bot added needs-triage Waiting for first response or review from a maintainer. service/firehose Issues and PRs that pertain to the firehose service. size/M Managed by automation to categorize the size of a PR. and removed service/firehose Issues and PRs that pertain to the firehose service. labels Sep 2, 2021
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Welcome @bozerkins 👋

It looks like this is your first Pull Request submission to the Terraform AWS Provider! If you haven’t already done so please make sure you have checked out our CONTRIBUTING guide and FAQ to make sure your contribution is adhering to best practice and has all the necessary elements in place for a successful approval.

Also take a look at our FAQ which details how we prioritize Pull Requests for inclusion.

Thanks again, and welcome to the community! 😃

@bozerkins
Copy link
Author

bozerkins commented Sep 2, 2021

My first PR here. Any feedback would be appreciated. I'm interested into seeing this functionality in a release as soon as possible.

@breathingdust breathingdust added service/kinesis Issues and PRs that pertain to the kinesis service. enhancement Requests to existing resources that expand the functionality or scope. and removed needs-triage Waiting for first response or review from a maintainer. labels Sep 2, 2021
@esvm
Copy link

esvm commented Sep 7, 2021

I'm interested in this feature. How could we define if it's inline parsing or from aws lambda and the options to define the query or the lambda arn?

@bozerkins
Copy link
Author

How dynamic partitioning work is defined by 1) processors that you specify in the configuration 2) prefix key

For example dynamic partitioning by using jq query language would like this

dynamic_partitioning_configuration {
    enabled = true
    retry_options {
        duration_in_seconds = 300
    }
}

prefix  = "custom-prefix/customerId=!{partitionKeyFromQuery:customerId}/year=!{timestamp:yyyy}/month=!{timestamp:MM}/day=!{timestamp:dd}/hour=!{timestamp:HH}/"

processing_configuration {
    enabled = true
    # processor for partitioning
    processors {
        type = "MetadataExtraction"

        parameters {
            parameter_name  = "MetadataExtractionQuery"
            parameter_value = "{customerId:.customer_id}"
        }
        parameters {
            parameter_name  = "JsonParsingEngine"
            parameter_value = "JQ-1.6"
        }
    }
    # record file unification
    processors {
        type = "RecordDeAggregation"

        parameters {
            parameter_name  = "SubRecordType"
            parameter_value = "JSON"
        }
    }
    # record delimited unification
    processors {
        type = "AppendDelimiterToRecord"
    }
}

When you want to enable processing from lambda, the configuration would be similar to this

dynamic_partitioning_configuration {
    enabled = true
    retry_options {
        duration_in_seconds = 300
    }
}
prefix  = "custom-prefix/customerId=!{partitionKeyFromLambda:customerId}/year=!{timestamp:yyyy}/month=!{timestamp:MM}/day=!{timestamp:dd}/hour=!{timestamp:HH}/"

processing_configuration {
    enabled = true
    # lambda function for processing and partitioning
    processors {
        type = "Lambda"

        parameters {
            parameter_name  = "LambdaArn"
            parameter_value = "${aws_lambda_function.processor.arn}:$LATEST"
        }
    }

    # record file unification
    processors {
        type = "RecordDeAggregation"

        parameters {
            parameter_name  = "SubRecordType"
            parameter_value = "JSON"
        }
    }
}

@esvm
Copy link

esvm commented Sep 14, 2021

Hey @bozerkins thanks for the response, I appreciate it! Do those processors exist today? I looked into the docs and the unique processor existent is the Lambda processor.

@bozerkins
Copy link
Author

Luckily, processors in provider are generated automatically from the aws-go client, and are available since this commit
a321b3b

Here are the release notes
https://github.com/aws/aws-sdk-go/releases/tag/v1.40.33

@avegao
Copy link

avegao commented Sep 30, 2021

Any news about this?

@bozerkins
Copy link
Author

Curious myself. Would really like to get feedback on this.

On the bright side - I've deployed this functionality to production and it's working brilliantly. With dynamic partitioning I managed to remove a decent chunk of the data partitioning logic (previously made with lambdas and S3 event notifications) and place it into firehouse.

@RonPenton
Copy link

Any updates on this? Would be great to have!

@tonyf
Copy link

tonyf commented Oct 8, 2021

Any updates here?

@zhelding
Copy link
Contributor

Pull request #21306 has significantly refactored the AWS Provider codebase. As a result, most PRs opened prior to the refactor now have merge conflicts that must be resolved before proceeding.

Specifically, PR #21306 relocated the code for all AWS resources and data sources from a single aws directory to a large number of separate directories in internal/service, each corresponding to a particular AWS service. This separation of code has also allowed for us to simplify the names of underlying functions -- while still avoiding namespace collisions.

We recognize that many pull requests have been open for some time without yet being addressed by our maintainers. Therefore, we want to make it clear that resolving these conflicts in no way affects the prioritization of a particular pull request. Once a pull request has been prioritized for review, the necessary changes will be made by a maintainer -- either directly or in collaboration with the pull request author.

For a more complete description of this refactor, including examples of how old filepaths and function names correspond to their new counterparts: please refer to issue #20000.

For a quick guide on how to amend your pull request to resolve the merge conflicts resulting from this refactor and bring it in line with our new code patterns: please refer to our Service Package Refactor Pull Request Guide.

@PG-Daniel-Andrews
Copy link

Any updates on this? It's great functionality but can't deploy new pipelines until this is available in Terraform.

@alexnovak
Copy link

I've slightly modified the code in this diff on a personal branch to get things working here: https://github.com/alexnovak/terraform-provider-aws/tree/adds-dynamic-configuration
This is compliant with the refactor mentioned in the comment above, and I'm currently using it in my local builds to interact with firehose.
I don't want to step on the toes of the original author, but given that this has been hanging for a little while, I'd love to help in any way I can to get some version of this pushed through. Do you happen to still know if you're able to get around to this @bozerkins ? If so, is there a way we can prioritize giving this diff some feedback @zhelding ? (Apologies to you both for the @'s)

@bozerkins
Copy link
Author

Thanks for the input! I'll try to get around to this in recent days :) Would appreciate some help with additional testing after the changes.

@YakDriver YakDriver self-assigned this Nov 4, 2021
@YakDriver
Copy link
Member

YakDriver commented Nov 4, 2021

@bozerkins Thank you for your contribution!! I would like to help out with getting this up to speed post-refactor. I may force push to your branch. If so, please pull before pushing any changes. What we really need are acceptance tests.

@github-actions github-actions bot added service/firehose Issues and PRs that pertain to the firehose service. and removed service/kinesis Issues and PRs that pertain to the kinesis service. labels Nov 4, 2021
@YakDriver
Copy link
Member

@bozerkins I've updated your branch so it should be working. Are you willing/able to add acceptance tests?

@alexnovak I looked at your changes. I like how you simplified but I think it's best if we stick with the structure as given by AWS. If they make changes later, it's easier if we're already aligned. Also, if we're aligned, the AWS Go SDK docs are like our docs and vice versa.

@YakDriver YakDriver added the waiting-response Maintainers are waiting on response from community or contributor. label Nov 4, 2021
@alexnovak
Copy link

Thanks for your feedback @YakDriver! I went forward with that change for two reasons:

  1. Using the diff as-is, when planning / applying after the resource is created, you will incorrectly have to update if you don't specify a retry_options within your code. We cannot (to my knowledge! Let me know if I'm wrong) specify a default for a schema-type object for retry_options, meaning it either has to be marked as required to circumvent the problem, or we simply permit the "unnecessary update" pattern.
  2. I noticed a number of other objects within this file required a RetryOptions block (eg redshift configuration). In all of those examples - the author chose to bring retry_duration to the parent instead of creating a separate retry_options block. It seemed more consistent with the patterns established elsewhere in the file.

I'm totally up for whatever pattern you feel is more appropriate! Just want to give some color as to why I thought those changes were necessary.

@github-actions github-actions bot removed the waiting-response Maintainers are waiting on response from community or contributor. label Nov 4, 2021
@alexnovak
Copy link

Lol sorry - I just got back from vacation but looks like @YakDriver 's got this.

@github-actions github-actions bot added tests PRs: expanded test coverage. Issues: expanded coverage, enhancements to test infrastructure. size/L Managed by automation to categorize the size of a PR. and removed size/M Managed by automation to categorize the size of a PR. tests PRs: expanded test coverage. Issues: expanded coverage, enhancements to test infrastructure. labels Nov 17, 2021
@github-actions github-actions bot added tests PRs: expanded test coverage. Issues: expanded coverage, enhancements to test infrastructure. size/XL Managed by automation to categorize the size of a PR. and removed size/L Managed by automation to categorize the size of a PR. tests PRs: expanded test coverage. Issues: expanded coverage, enhancements to test infrastructure. labels Nov 18, 2021
@github-actions github-actions bot added the tests PRs: expanded test coverage. Issues: expanded coverage, enhancements to test infrastructure. label Nov 18, 2021
@YakDriver
Copy link
Member

YakDriver commented Nov 18, 2021

@adutane PR #7026 added error_output_prefix to the extended_s3_configuration. In testing, dynamic partitioning appears to work correctly using it. s3_configuration does not have error_output_prefix but should not need it in order for dynamic partitioning to work.

Copy link
Member

@YakDriver YakDriver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome job! 🎉

Output from acceptance tests (us-west-2):

% make testacc TESTS='TestAccFirehoseDeliveryStream' PKG=firehose
==> Checking that code complies with gofmt requirements...
TF_ACC=1 go test ./internal/service/firehose/... -v -count 1 -parallel 20 -run='TestAccFirehoseDeliveryStream' -timeout 180m
--- PASS: TestAccFirehoseDeliveryStream_ExtendedS3_kinesisStreamSource (81.54s)
--- PASS: TestAccFirehoseDeliveryStream_ExtendedS3DataFormatConversionOpenXJSONSerDe_empty (106.58s)
--- PASS: TestAccFirehoseDeliveryStream_missingProcessing (109.65s)
--- PASS: TestAccFirehoseDeliveryStream_ExtendedS3Processing_empty (110.35s)
--- PASS: TestAccFirehoseDeliveryStream_ExtendedS3DataFormatConversionParquetSerDe_empty (117.16s)
--- PASS: TestAccFirehoseDeliveryStream_HTTPEndpoint_retryDuration (120.47s)
--- PASS: TestAccFirehoseDeliveryStreamDataSource_basic (138.24s)
--- PASS: TestAccFirehoseDeliveryStream_ExtendedS3DataFormatConversionDeserializer_update (140.39s)
--- PASS: TestAccFirehoseDeliveryStream_extendedS3KMSKeyARN (140.95s)
--- PASS: TestAccFirehoseDeliveryStream_ExtendedS3DataFormatConversionOrcSerDe_empty (141.61s)
--- PASS: TestAccFirehoseDeliveryStream_ExtendedS3_errorOutputPrefix (143.41s)
--- PASS: TestAccFirehoseDeliveryStream_ExtendedS3DataFormatConversionSerializer_update (147.56s)
--- PASS: TestAccFirehoseDeliveryStream_splunkUpdates (149.52s)
--- PASS: TestAccFirehoseDeliveryStream_extendedS3Updates (172.49s)
--- PASS: TestAccFirehoseDeliveryStream_httpEndpoint (176.82s)
--- PASS: TestAccFirehoseDeliveryStream_extendedS3DynamicPartitioning (186.68s)
--- PASS: TestAccFirehoseDeliveryStream_s3KinesisStreamSource (72.53s)
--- PASS: TestAccFirehoseDeliveryStream_ExtendedS3DataFormatConversionHiveJSONSerDe_empty (130.30s)
--- PASS: TestAccFirehoseDeliveryStream_s3WithCloudWatchLogging (97.64s)
--- PASS: TestAccFirehoseDeliveryStream_basic (90.19s)
--- PASS: TestAccFirehoseDeliveryStream_ExtendedS3_externalUpdate (108.36s)
--- PASS: TestAccFirehoseDeliveryStream_s3Updates (140.55s)
--- PASS: TestAccFirehoseDeliveryStream_disappears (116.98s)
--- PASS: TestAccFirehoseDeliveryStream_s3basicWithTags (152.89s)
--- PASS: TestAccFirehoseDeliveryStream_ExtendedS3DataFormatConversion_enabled (150.63s)
--- PASS: TestAccFirehoseDeliveryStream_s3basic (119.51s)
--- PASS: TestAccFirehoseDeliveryStream_extendedS3basic (197.18s)
--- PASS: TestAccFirehoseDeliveryStream_redshiftUpdates (379.67s)
--- PASS: TestAccFirehoseDeliveryStream_s3basicWithSSEAndKeyType (271.87s)
--- PASS: TestAccFirehoseDeliveryStream_s3basicWithSSEAndKeyARN (272.37s)
--- PASS: TestAccFirehoseDeliveryStream_s3basicWithSSE (299.28s)
--- PASS: TestAccFirehoseDeliveryStream_elasticSearchUpdates (1099.87s)
--- PASS: TestAccFirehoseDeliveryStream_elasticSearchEndpointUpdates (1176.48s)
--- PASS: TestAccFirehoseDeliveryStream_elasticSearchWithVPCUpdates (1744.71s)
PASS
ok  	github.com/hashicorp/terraform-provider-aws/internal/service/firehose	1745.955s

Output from acceptance tests (GovCloud):

% make testacc TESTS=TestAccFirehoseDeliveryStream PKG=firehose 
==> Checking that code complies with gofmt requirements...
TF_ACC=1 go test ./internal/service/firehose/... -v -count 1 -parallel 20 -run='TestAccFirehoseDeliveryStream' -timeout 180m
--- PASS: TestAccFirehoseDeliveryStream_missingProcessing (81.86s)
--- PASS: TestAccFirehoseDeliveryStream_ExtendedS3Processing_empty (82.81s)
--- PASS: TestAccFirehoseDeliveryStreamDataSource_basic (84.57s)
--- PASS: TestAccFirehoseDeliveryStream_extendedS3basic (95.13s)
--- PASS: TestAccFirehoseDeliveryStream_extendedS3KMSKeyARN (99.66s)
--- PASS: TestAccFirehoseDeliveryStream_ExtendedS3DataFormatConversionOpenXJSONSerDe_empty (100.03s)
--- PASS: TestAccFirehoseDeliveryStream_s3WithCloudWatchLogging (102.67s)
--- PASS: TestAccFirehoseDeliveryStream_ExtendedS3DataFormatConversionParquetSerDe_empty (103.74s)
--- PASS: TestAccFirehoseDeliveryStream_ExtendedS3DataFormatConversionHiveJSONSerDe_empty (104.65s)
--- PASS: TestAccFirehoseDeliveryStream_HTTPEndpoint_retryDuration (111.84s)
--- PASS: TestAccFirehoseDeliveryStream_ExtendedS3DataFormatConversionOrcSerDe_empty (112.50s)
--- PASS: TestAccFirehoseDeliveryStream_ExtendedS3_errorOutputPrefix (120.76s)
--- PASS: TestAccFirehoseDeliveryStream_ExtendedS3DataFormatConversionDeserializer_update (125.89s)
--- PASS: TestAccFirehoseDeliveryStream_ExtendedS3_externalUpdate (126.74s)
--- PASS: TestAccFirehoseDeliveryStream_s3Updates (128.97s)
--- PASS: TestAccFirehoseDeliveryStream_extendedS3DynamicPartitioning (135.72s)
--- PASS: TestAccFirehoseDeliveryStream_ExtendedS3DataFormatConversion_enabled (138.36s)
--- PASS: TestAccFirehoseDeliveryStream_ExtendedS3DataFormatConversionSerializer_update (141.30s)
--- PASS: TestAccFirehoseDeliveryStream_s3KinesisStreamSource (76.67s)
--- PASS: TestAccFirehoseDeliveryStream_s3basic (64.86s)
--- PASS: TestAccFirehoseDeliveryStream_ExtendedS3_kinesisStreamSource (75.70s)
--- PASS: TestAccFirehoseDeliveryStream_s3basicWithSSEAndKeyType (197.20s)
--- PASS: TestAccFirehoseDeliveryStream_disappears (97.52s)
--- PASS: TestAccFirehoseDeliveryStream_s3basicWithTags (123.60s)
--- PASS: TestAccFirehoseDeliveryStream_splunkUpdates (123.37s)
--- PASS: TestAccFirehoseDeliveryStream_basic (104.76s)
--- PASS: TestAccFirehoseDeliveryStream_httpEndpoint (131.76s)
--- PASS: TestAccFirehoseDeliveryStream_extendedS3Updates (127.34s)
--- PASS: TestAccFirehoseDeliveryStream_s3basicWithSSEAndKeyARN (193.12s)
--- PASS: TestAccFirehoseDeliveryStream_s3basicWithSSE (225.62s)
--- PASS: TestAccFirehoseDeliveryStream_redshiftUpdates (315.93s)
--- PASS: TestAccFirehoseDeliveryStream_elasticSearchEndpointUpdates (953.68s)
--- PASS: TestAccFirehoseDeliveryStream_elasticSearchUpdates (1032.05s)
--- PASS: TestAccFirehoseDeliveryStream_elasticSearchWithVPCUpdates (3633.01s)
PASS
ok  	github.com/hashicorp/terraform-provider-aws/internal/service/firehose	3634.619s

@YakDriver YakDriver added this to the v3.66.0 milestone Nov 18, 2021
@YakDriver YakDriver merged commit 889a927 into hashicorp:main Nov 18, 2021
@YakDriver
Copy link
Member

@Olaktal This should be part of v3.66.0 which should be released around Nov. 18-19.

@bozerkins bozerkins deleted the f-aws_kinesis_firehouse_delivery_stream-dynamic_paritioning branch November 18, 2021 14:59
@github-actions
Copy link

This functionality has been released in v3.66.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

@github-actions
Copy link

github-actions bot commented Jun 8, 2022

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement Requests to existing resources that expand the functionality or scope. service/firehose Issues and PRs that pertain to the firehose service. size/XL Managed by automation to categorize the size of a PR. tests PRs: expanded test coverage. Issues: expanded coverage, enhancements to test infrastructure.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

aws_kinesis_firehose_delivery_stream to support Dynamic Partitioning