Skip to content

Fix UTF-8 encoding in sync-members.py#105

Merged
Alexanderamiri merged 1 commit into
mainfrom
fix/sync-utf8-encoding
Mar 26, 2026
Merged

Fix UTF-8 encoding in sync-members.py#105
Alexanderamiri merged 1 commit into
mainfrom
fix/sync-utf8-encoding

Conversation

@Alexanderamiri
Copy link
Copy Markdown
Member

Norwegian characters were corrupted to ? in the generated members.yaml. Explicit encoding='utf-8' on file I/O.

Explicitly set encoding="utf-8" on file read/write to prevent
Norwegian characters (ø, å, æ) from being corrupted on CI runners
where the system locale may not default to UTF-8.
@Alexanderamiri Alexanderamiri requested a review from a team as a code owner March 26, 2026 22:08
@github-actions
Copy link
Copy Markdown

Terraform Plan

🚧 Changes detected — Plan: 5 to add, 0 to change, 0 to destroy.

Plan output

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # module.cost_analytics.aws_bcmdataexports_export.cur will be created
  + resource "aws_bcmdataexports_export" "cur" {
      + id       = (known after apply)
      + tags_all = {
          + "environment" = "production"
          + "managed-by"  = "terraform"
          + "repo"        = "javaBin/platform"
          + "service"     = "platform"
          + "team"        = "platform"
        }

      + export {
          + export_arn = (known after apply)
          + name       = "javabin-cur"

          + data_query {
              + query_statement      = "SELECT identity_line_item_id, identity_time_interval, bill_invoice_id, bill_invoicing_entity, bill_billing_entity, bill_bill_type, bill_payer_account_id, bill_billing_period_start_date, bill_billing_period_end_date, line_item_usage_account_id, line_item_line_item_type, line_item_usage_start_date, line_item_usage_end_date, line_item_product_code, line_item_usage_type, line_item_operation, line_item_availability_zone, line_item_resource_id, line_item_usage_amount, line_item_normalization_factor, line_item_normalized_usage_amount, line_item_currency_code, line_item_unblended_rate, line_item_unblended_cost, line_item_blended_rate, line_item_blended_cost, line_item_line_item_description, product_product_name, product_region, pricing_unit, pricing_public_on_demand_cost, pricing_public_on_demand_rate, pricing_term, pricing_offering_class, resource_tags_user_team, resource_tags_user_service, resource_tags_user_environment, resource_tags_user_repo, resource_tags_user_managed_by FROM COST_AND_USAGE_REPORT"
              + table_configurations = {
                  + "COST_AND_USAGE_REPORT" = {
                      + "INCLUDE_MANUAL_DISCOUNT_COMPATIBILITY" = "FALSE"
                      + "INCLUDE_RESOURCES"                     = "TRUE"
                      + "INCLUDE_SPLIT_COST_ALLOCATION_DATA"    = "FALSE"
                      + "TIME_GRANULARITY"                      = "DAILY"
                    }
                }
            }

          + destination_configurations {
              + s3_destination {
                  + s3_bucket = "javabin-cur-553637109631"
                  + s3_prefix = "cur"
                  + s3_region = "eu-central-1"

                  + s3_output_configurations {
                      + compression = "PARQUET"
                      + format      = "PARQUET"
                      + output_type = "CUSTOM"
                      + overwrite   = "OVERWRITE_REPORT"
                    }
                }
            }

          + refresh_cadence {
              + frequency = "SYNCHRONOUS"
            }
        }
    }

  # module.cost_analytics.aws_glue_crawler.cur will be created
  + resource "aws_glue_crawler" "cur" {
      + arn           = (known after apply)
      + configuration = jsonencode(
            {
              + Grouping = {
                  + TableGroupingPolicy = "CombineCompatibleSchemas"
                }
              + Version  = 1
            }
        )
      + database_name = "javabin_cur"
      + id            = (known after apply)
      + name          = "javabin-cur-crawler"
      + role          = (known after apply)
      + schedule      = "cron(0 6 * * ? *)"
      + tags_all      = {
          + "environment" = "production"
          + "managed-by"  = "terraform"
          + "repo"        = "javaBin/platform"
          + "service"     = "platform"
          + "team"        = "platform"
        }

      + s3_target {
          + path = "s3://javabin-cur-553637109631/cur/"
        }

      + schema_change_policy {
          + delete_behavior = "DELETE_FROM_DATABASE"
          + update_behavior = "UPDATE_IN_DATABASE"
        }
    }

  # module.cost_analytics.aws_iam_role.cur_crawler will be created
  + resource "aws_iam_role" "cur_crawler" {
      + arn                   = (known after apply)
      + assume_role_policy    = jsonencode(
            {
              + Statement = [
                  + {
                      + Action    = "sts:AssumeRole"
                      + Effect    = "Allow"
                      + Principal = {
                          + Service = "glue.amazonaws.com"
                        }
                    },
                ]
              + Version   = "2012-10-17"
            }
        )
      + create_date           = (known after apply)
      + force_detach_policies = false
      + id                    = (known after apply)
      + managed_policy_arns   = (known after apply)
      + max_session_duration  = 3600
      + name                  = "javabin-cur-crawler"
      + name_prefix           = (known after apply)
      + path                  = "/"
      + tags_all              = {
          + "environment" = "production"
          + "managed-by"  = "terraform"
          + "repo"        = "javaBin/platform"
          + "service"     = "platform"
          + "team"        = "platform"
        }
      + unique_id             = (known after apply)
    }

  # module.cost_analytics.aws_iam_role_policy.cur_crawler_s3 will be created
  + resource "aws_iam_role_policy" "cur_crawler_s3" {
      + id          = (known after apply)
      + name        = "javabin-cur-crawler-s3"
      + name_prefix = (known after apply)
      + policy      = jsonencode(
            {
              + Statement = [
                  + {
                      + Action   = [
                          + "s3:GetObject",
                          + "s3:ListBucket",
                        ]
                      + Effect   = "Allow"
                      + Resource = [
                          + "arn:aws:s3:::javabin-cur-553637109631",
                          + "arn:aws:s3:::javabin-cur-553637109631/*",
                        ]
                      + Sid      = "ReadCURData"
                    },
                ]
              + Version   = "2012-10-17"
            }
        )
      + role        = (known after apply)
    }

  # module.cost_analytics.aws_iam_role_policy_attachment.cur_crawler_glue will be created
  + resource "aws_iam_role_policy_attachment" "cur_crawler_glue" {
      + id         = (known after apply)
      + policy_arn = "arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole"
      + role       = "javabin-cur-crawler"
    }

Plan: 5 to add, 0 to change, 0 to destroy.

─────────────────────────────────────────────────────────────────────────────

Saved the plan to: tfplan

To perform exactly these actions, run the following command to apply:
    terraform apply "tfplan"

LLM Review

Risk: 🟢 LOW

Adding AWS Glue crawler and BCM data export infrastructure for cost analytics reporting with no destructive changes.

  • [routine] Creating AWS Glue crawler (javabin-cur-crawler) to automatically catalog CUR data from S3 daily at 6 AM UTC - standard cost analytics infrastructure
  • [routine] Adding BCM Data Exports resource to replace legacy CUR setup with new AWS native cost export format (PARQUET) - modernization of cost data pipeline
  • [routine] Creating IAM role and policies for Glue crawler with minimal permissions (S3 read-only on CUR bucket + standard AWSGlueServiceRole) - properly scoped
  • 💰 [cost] New AWS Glue crawler will incur minimal costs (~$0.44/DPU-hour, runs daily). BCM Data Exports is free tier for first 5GB/month, then $0.005/GB - low cost impact for cost analytics
  • [routine] All resources properly tagged with environment, team, service, repo, and managed-by tags - consistent with existing infrastructure standards

@Alexanderamiri Alexanderamiri merged commit 55ca350 into main Mar 26, 2026
3 checks passed
@Alexanderamiri Alexanderamiri deleted the fix/sync-utf8-encoding branch March 26, 2026 22:09
Alexanderamiri added a commit that referenced this pull request May 9, 2026
Norwegian characters were corrupted to ? in the generated members.yaml.
Explicit encoding='utf-8' on file I/O.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant