Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unimpliment coalesce from spark script and make necessary changes to spark runner #454

Merged
merged 10 commits into from Sep 17, 2022

Conversation

Sami1309
Copy link
Contributor

@Sami1309 Sami1309 commented Sep 13, 2022

Description

Removes the .coalesce option called when saving resources in the spark script. This saves the resources in multiple parts instead of conjoining them into a single part. Implementing this requires changing how iterators parse through files and the functional definition of a resource in the offline store (so now a resource is defined as the parent directory containing all the parts, not the single parquet file)

Type of change

Does this correspond to an open issue?

Select type(s) of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

Checklist:

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have fixed any merge conflicts

@Sami1309 Sami1309 changed the title make changes to avoid needing coalesce Unimpliment coalesce from spark script and make necessary changes to spark runner Sep 13, 2022
@codecov
Copy link

codecov bot commented Sep 13, 2022

Codecov Report

Merging #454 (dab2858) into main (a017707) will decrease coverage by 0.60%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##             main     #454      +/-   ##
==========================================
- Coverage   61.58%   60.97%   -0.61%     
==========================================
  Files          31       31              
  Lines        9188     9279      +91     
==========================================
  Hits         5658     5658              
- Misses       3035     3126      +91     
  Partials      495      495              
Impacted Files Coverage Δ
coordinator/coordinator.go 40.54% <0.00%> (+0.09%) ⬆️
provider/spark.go 0.15% <0.00%> (-0.02%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Sami1309 Sami1309 temporarily deployed to Integration testing September 15, 2022 18:36 Inactive
@Sami1309 Sami1309 temporarily deployed to Integration testing September 15, 2022 18:36 Inactive
@Sami1309 Sami1309 temporarily deployed to Integration testing September 15, 2022 18:36 Inactive
@Sami1309 Sami1309 temporarily deployed to Integration testing September 15, 2022 18:36 Inactive
@Sami1309 Sami1309 temporarily deployed to Integration testing September 15, 2022 19:26 Inactive
@Sami1309 Sami1309 marked this pull request as ready for review September 15, 2022 21:43
@Sami1309 Sami1309 requested review from ahmadnazeri and sdreyer and removed request for ahmadnazeri September 15, 2022 21:44
@Sami1309 Sami1309 temporarily deployed to Integration testing September 15, 2022 21:46 Inactive
@Sami1309 Sami1309 temporarily deployed to Integration testing September 15, 2022 21:46 Inactive
@Sami1309 Sami1309 temporarily deployed to Integration testing September 15, 2022 21:46 Inactive
@Sami1309 Sami1309 temporarily deployed to Integration testing September 15, 2022 21:46 Inactive
provider/spark.go Show resolved Hide resolved
provider/spark.go Outdated Show resolved Hide resolved
provider/spark.go Outdated Show resolved Hide resolved
provider/spark.go Outdated Show resolved Hide resolved
provider/spark.go Outdated Show resolved Hide resolved
provider/spark.go Show resolved Hide resolved
@Sami1309 Sami1309 temporarily deployed to Integration testing September 15, 2022 23:57 Inactive
@Sami1309 Sami1309 temporarily deployed to Integration testing September 15, 2022 23:57 Inactive
@Sami1309 Sami1309 temporarily deployed to Integration testing September 15, 2022 23:57 Inactive
@Sami1309 Sami1309 temporarily deployed to Integration testing September 15, 2022 23:57 Inactive
@sdreyer
Copy link
Collaborator

sdreyer commented Sep 16, 2022

@Sami1309 can you run this with the tests @ahmadnazeri wrote and verify those still work?

@Sami1309 Sami1309 temporarily deployed to Deployment September 16, 2022 23:20 Inactive
@Sami1309 Sami1309 temporarily deployed to Deployment September 16, 2022 23:20 Inactive
@Sami1309 Sami1309 temporarily deployed to Deployment September 16, 2022 23:20 Inactive
@Sami1309 Sami1309 temporarily deployed to Deployment September 16, 2022 23:20 Inactive
@Sami1309 Sami1309 temporarily deployed to Deployment September 16, 2022 23:20 Inactive
@Sami1309 Sami1309 temporarily deployed to Deployment September 16, 2022 23:20 Inactive
@Sami1309 Sami1309 temporarily deployed to Deployment September 16, 2022 23:20 Inactive
@Sami1309 Sami1309 temporarily deployed to Deployment September 16, 2022 23:20 Inactive
@Sami1309 Sami1309 merged commit 1a202b4 into main Sep 17, 2022
@Sami1309 Sami1309 deleted the feature/remove-coalesce/sam branch September 17, 2022 01:30
sdreyer pushed a commit that referenced this pull request Sep 17, 2022
…spark runner (#454)

* make changes to avoid needing coalesce

* extend resource row count to work with multipart resources

* add recursive file lookup to pyspark script

* fix iterator issues and add test

* fix issues with SQL transformation testing

* comment out unecessary tests

* fix file path issue

* fix row count issue distinguishing on transformation or primary

* change row skip for iteration to use built in function/skip files
@anthonylasso anthonylasso mentioned this pull request Dec 7, 2023
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants