Unimpliment coalesce from spark script and make necessary changes to spark runner #454

Sami1309 · 2022-09-13T22:40:06Z

Description

Removes the .coalesce option called when saving resources in the spark script. This saves the resources in multiple parts instead of conjoining them into a single part. Implementing this requires changing how iterators parse through files and the functional definition of a resource in the offline store (so now a resource is defined as the parent directory containing all the parts, not the single parquet file)

Type of change

Does this correspond to an open issue?

Select type(s) of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update

Checklist:

I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have fixed any merge conflicts

codecov · 2022-09-13T22:46:40Z

Codecov Report

Merging #454 (dab2858) into main (a017707) will decrease coverage by 0.60%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##             main     #454      +/-   ##
==========================================
- Coverage   61.58%   60.97%   -0.61%     
==========================================
  Files          31       31              
  Lines        9188     9279      +91     
==========================================
  Hits         5658     5658              
- Misses       3035     3126      +91     
  Partials      495      495

Impacted Files	Coverage Δ
coordinator/coordinator.go	`40.54% <0.00%> (+0.09%)`	⬆️
provider/spark.go	`0.15% <0.00%> (-0.02%)`	⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

provider/spark.go

sdreyer · 2022-09-16T21:50:46Z

@Sami1309 can you run this with the tests @ahmadnazeri wrote and verify those still work?

…spark runner (#454) * make changes to avoid needing coalesce * extend resource row count to work with multipart resources * add recursive file lookup to pyspark script * fix iterator issues and add test * fix issues with SQL transformation testing * comment out unecessary tests * fix file path issue * fix row count issue distinguishing on transformation or primary * change row skip for iteration to use built in function/skip files

make changes to avoid needing coalesce

b794bd4

Sami1309 changed the title ~~make changes to avoid needing coalesce~~ Unimpliment coalesce from spark script and make necessary changes to spark runner Sep 13, 2022

Sami1309 had a problem deploying to Integration testing September 13, 2022 22:43 Error

Sami1309 had a problem deploying to Integration testing September 13, 2022 22:43 Failure

Sami1309 had a problem deploying to Integration testing September 13, 2022 22:43 Error

Merge branch 'main' into feature/remove-coalesce/sam

e60a799

Sami1309 had a problem deploying to Integration testing September 13, 2022 22:51 Failure

extend resource row count to work with multipart resources

b71581d

Sami1309 had a problem deploying to Integration testing September 13, 2022 23:08 Failure

Sami1309 had a problem deploying to Integration testing September 13, 2022 23:09 Failure

add recursive file lookup to pyspark script

3f4b898

Sami1309 had a problem deploying to Integration testing September 14, 2022 00:35 Failure

fix iterator issues and add test

a3348ac

Sami1309 temporarily deployed to Integration testing September 15, 2022 18:36 Inactive

Sam Inloes added 2 commits September 15, 2022 12:21

fix issues with SQL transformation testing

a5b99a7

comment out unecessary tests

ae3bab7

Sami1309 temporarily deployed to Integration testing September 15, 2022 19:26 Inactive

fix row count issue distinguishing on transformation or primary

eed5354

Sami1309 marked this pull request as ready for review September 15, 2022 21:43

Sami1309 requested review from ahmadnazeri and sdreyer and removed request for ahmadnazeri September 15, 2022 21:44

Sami1309 temporarily deployed to Integration testing September 15, 2022 21:46 Inactive

sdreyer reviewed Sep 15, 2022

View reviewed changes

change row skip for iteration to use built in function/skip files

dab2858

Sami1309 temporarily deployed to Integration testing September 15, 2022 23:57 Inactive

Sami1309 temporarily deployed to Deployment September 16, 2022 23:20 Inactive

sdreyer approved these changes Sep 17, 2022

View reviewed changes

Sami1309 merged commit 1a202b4 into main Sep 17, 2022

Sami1309 deleted the feature/remove-coalesce/sam branch September 17, 2022 01:30

anthonylasso mentioned this pull request Dec 7, 2023

Truncate long form errors #1205

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unimpliment coalesce from spark script and make necessary changes to spark runner #454

Unimpliment coalesce from spark script and make necessary changes to spark runner #454

Sami1309 commented Sep 13, 2022 •

edited

codecov bot commented Sep 13, 2022 •

edited

sdreyer commented Sep 16, 2022

Unimpliment coalesce from spark script and make necessary changes to spark runner #454

Unimpliment coalesce from spark script and make necessary changes to spark runner #454

Conversation

Sami1309 commented Sep 13, 2022 • edited

Description

Type of change

Does this correspond to an open issue?

Select type(s) of change

Checklist:

codecov bot commented Sep 13, 2022 • edited

Codecov Report

sdreyer commented Sep 16, 2022

Sami1309 commented Sep 13, 2022 •

edited

codecov bot commented Sep 13, 2022 •

edited