Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update index.md #664

Merged
merged 2 commits into from
Feb 13, 2024
Merged

Update index.md #664

merged 2 commits into from
Feb 13, 2024

Conversation

abarciauskas-bgse
Copy link
Contributor

No description provided.

The data collection that Apache Beam transforms operates on is a
[`PCollection`](https://beam.apache.org/documentation/programming-guide/#pcollections).
The data Apache Beam transforms operate on are
[`PCollections`](https://beam.apache.org/documentation/programming-guide/#pcollections).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do the iterables of a filepattern become pcollections? This is a bit unclear to me.

Copy link
Member

@cisaacstern cisaacstern Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for digging into this, @abarciauskas-bgse!

Yes, this is correct. A FilePattern is iterated over (via .items, not __iter__), and this is converted into a PCollection:

pattern = FilePattern(...)

recipe = beam.Create(pattern.items()) | ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abarciauskas-bgse did you want to include any further detail as part of this PR? Or is the current change sufficient for now?

@moradology
Copy link
Contributor

This looks fine; merging to clear up backlog though I think we'll want to revisit docs after dealing with the question of laziness of recipes

@moradology moradology merged commit 4e1d103 into main Feb 13, 2024
3 checks passed
@moradology moradology deleted the ab/docs-edit branch February 13, 2024 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants