Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Lambda calling Lambda using bert-etl #4

Merged
merged 8 commits into from Nov 20, 2019

Conversation

pllim
Copy link
Collaborator

@pllim pllim commented Sep 4, 2019

Lambda calling Lambda solution using third-party bert-etl by @jbcurtin .

TODO

  • Write in full logic into app_all_lambdas.py.
  • Test locally somehow. (S3 download costs!)
  • Deploy to AWS.
  • Obtain cost estimate with full run of one TESS ID.
    • Full run of one TIC ID across all 13 sectors (800-1000 CSVs produced per sector) resulted in maybe $0.20 for Lambda services (the rest of the charges did not appear until the next day).
    • Hard to calculate the wall time but from looking at "last entry" time stamps on CloudWatch, maybe around 1.5 hours to process them all. I was expecting it to take longer. Hmm.
  • Investigate why not all fullframes in a sector produced CSV. (Unclear from logs. Maybe try a smaller run and see.) Is the worker silently timing out? See Missing data points with bert-etl #5
  • Is caching behavior for URI listing correct? Do we also need to sort it by TIC ID? Need to understand how cubes are organized. (Nope, correct as-is, according to Clara Brasseur.)
  • Implement science logic from Susan.
  • Implement one level above main function to farm out multiple TIC IDs from CSV file in S3.
  • Implement bottling with bert-etl when it is available.
  • Do another full run and estimate the cost and time again. Remember to clean up previous outputs before next deployment. And then wait 48 hours for actual billing report.
  • Benchmark again with bottling fixed!
  • Update flow chart in Box.

cc @mustaric

ref: https://sourceforge.net/p/greataunttess/code/HEAD/tree/trunk/extractlc.py

products, productSubGroupDescription="FFIC", mrp_only=False)

# Use AWS S3 bucket to pull data from.
Observations.enable_cloud_dataset(verbose=False)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: Do not download/upload from/to S3 when testing locally. Add logic for that.

@pllim pllim force-pushed the lambda-bert-etl branch 2 times, most recently from f4e4846 to f1bb41b Compare September 25, 2019 03:05
@pllim pllim force-pushed the lambda-bert-etl branch 3 times, most recently from 4aef4ea to ebe7e56 Compare October 2, 2019 20:56
sandbox/aws_bucket_stats.py Outdated Show resolved Hide resolved
dec = float(event['dec'])

# TODO: Cache WCS for a given sector/camera/ccd combo.
# TODO: How to make sure WCS is good? 10% of FFIs might have bad WCS.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: If WCSAXES=2, the WCS is good (source: MIT and C. Brasseur). Can use astropy.wcs.WCS to do FITS WCS I/O.

pllim and others added 7 commits November 12, 2019 12:37
Setup app_all_lambdas to be deployable.
Docker commands.

Split workers into 2 flows; add caching.

Implement science workflow from @mustaric .

More caching and logging. Enable full run.
It takes an input list from @mustaric.
Use new input list.
@pllim pllim merged commit c0a2041 into mustaric:master Nov 20, 2019
@pllim pllim deleted the lambda-bert-etl branch November 20, 2019 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants