Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SpeedRun #1: Simple Extract-Load Pipeline #55

Open
18 of 21 tasks
aaronsteers opened this issue Feb 28, 2020 · 4 comments
Open
18 of 21 tasks

SpeedRun #1: Simple Extract-Load Pipeline #55

aaronsteers opened this issue Feb 28, 2020 · 4 comments
Assignees

Comments

@aaronsteers
Copy link
Contributor

aaronsteers commented Feb 28, 2020

As a training tool, as a test for ease-of-use, and as proof of value, we're creating a "speed run" video that demonstrates how to get up and running quickly with the Infrastructure Catalog and a basic DataOps pipeline. This will uncover usability issues and bugs which we'll need to resolve before we can promote the platform broadly.


Stop Point:

  • all infrastructure is deployed, including secrets upload to AWS SecretsManager
  • data Extraction backfill is in progress
  • daily extracts are scheduled
  • basic logging is configured via cloudwatch

Start Point:

  • one-time setup:
    • Installed software:

      • choco install vscode python3 docker awscli github-desktop
      • choco install git.install --params "/GitOnlyOnPath /SChannel /NoAutoCrlf /WindowsTerminal"
    • Access to LinuxAcademy, will be used to create a new 4-hour limited AWS account

  • environment setup (each time):
    • Hourglass app open with timer title: SpeedRun Goal: 12 Minutes
    • Open browser tabs:

Speed Target: 12 minutes


Other Details:

  • Must stay on each screen for at least 3 seconds
  • Scale browser windows and VS Code to: 150%
  • Pardot and AWS Creds will be blurred in 'post' using Camtasia

Blockers:

  • None!

Steps:

Create Repo and AWS Account (0:00-2:00, approx. 2m):

  • Create new repo from Slalom DataOps template, clone repo locally and open in VS Code (60s)
  • Get AWS credentials from LInux Academy (target: 30s)
  • Use the linux-academy link to login to AWS in the web browser (30s)

Configure Creds (2:00-3:30, approx. 1.5m):

  • Rename .secrets/credentials.template to .secrets/credentials, copy-paste credentials into file (30s)
  • Rename aws-secrets-manager-secrets.yml.template to aws-secrets-manager-secrets.yml, copy-paste Pardot credentials into new file (30s)
  • Rename .secrets/tap-sample-config.json.template to tap-pardot-config.json, copy-paste Pardot credentials into file (30s)

Configure Project (3:30-4:00, approx 0.5m):

  • Rename infra-config-template.yml to infra-config.yml - update email address and project name: SpeedRun003-n (30s)

Configure Extracts (4:00-6:00, approx. 2m):

  • Copy Pardot credentials file (tap-pardot-config.json) to the data/taps/.secrets folder (15s)
  • Open data/taps/data.select, delete Salesforce refs, update rules to include all columns on Pardot accounts and opportunities (45s)
  • Install dataops tools: pip3 install slalom.dataops (15s)
  • Run s-tap plan pardot to update Pardot extract plan (15s)
  • Review the updated plan file (30s)

Configure and Deploy Terraform (6:00-10:30, approx. 4.5m):

  • Open the infra folder, review and update each file (90s)
  • Run terraform init and terraform apply, type 'yes' (60s)
  • Wait for terraform apply to complete (2m)

Run a Sync Test (10:30-14:30, approx. 4m):

  • Switch to git tab, browse through all changes (30s)
  • Copy-paste and run the provided AWS User Switch command so aws-cli can locate our credentials (15s)
  • Copy-paste and run the provided Sync command to execute the Pardot sync in ECS (60s)
  • Click on the provided Logging URL link to open Cloudwatch logs in a browser (15s)
  • Wait for the ECS task to start (1-2m)
  • Stop the timer once rows are extracted (DONE!)
@aaronsteers aaronsteers self-assigned this Feb 28, 2020
@aaronsteers
Copy link
Contributor Author

aaronsteers commented Feb 28, 2020

Speed-run test log:

  • Run #1 on 2/24: 5m, blocked on steps 3 and 4
  • Run #2 on 2/24: 20m, blocked on step 7

@aaronsteers aaronsteers mentioned this issue Feb 28, 2020
2 tasks
@aaronsteers aaronsteers added this to the DE Training Capstone milestone Feb 28, 2020
@aaronsteers
Copy link
Contributor Author

aaronsteers commented Feb 28, 2020

Results for Run #3 on 2/28 - 36 minutes!:

  • Got through all the steps but the job failed at the final validation step due to a secrets access issue
  • Some step ordering was awkward so I'm going to review the flow.
  • Terraform init, plan, and apply all took longer than expected
  • Git didn't load in VS Code at first, for some bizarre reason
  • I forgot that I needed the create pardot creds/config file locally in order to run pardot plan (need to add a task)
  • Font was too large in VS Code, eventually scaled down by one tick.
  • When ECS job fails to start it doesn't emit to cloudwatch.
    • Is there a way to add logging to the cluster itself so that there is still a log created somewhere in cloudwatch?
  • Would be more efficient if we could watch ECS execution (cloudwatch logs) from the command line

Update: debugging the failure:

  • It appears the failure was due to the secrets mapping, which isn't really a surprise given the attempted nesting of secrets into a single object. This is resolved by simply declaring each secret separately in the yaml file. I'm realizing also that if I add support for json-encoded secrets files, then the same file can be used by secret manager as is used by the tap locally.

@aaronsteers
Copy link
Contributor Author

@aaronsteers
Copy link
Contributor Author

aaronsteers commented Mar 3, 2020

Results for Run #4 3/2 - 22 minutes!

  • 4 or 5 minutes at the end was waiting for terraform plan and apply (a minute or two on the NAT gateway).
  • Another couple minutes were spent fiddling with secrets and creds files.

Other learnings:

  • It would be really nice to not need the secrets-manager catalog entry, and more specifically, it would be nice to not have to map in pointers to pointers. Having instead native capability to upload secrets from a local file using just the singer catalog module.

Example:

module "singer-taps" {
  # ...
  taps = [{
    id = "pardot"
    settings = {}
    secrets = {
      email    = "file://${secrets_filepath}:email"
      password = "file://${secrets_filepath}:password"
      user_key = "file://${secrets_filepath}:user_key"
    }
  }]
  # ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant