Skip to content

[CoE Starter Kit - Feature]: CoE BYODL Dataflow re-factor to only process recent files #6969

@manuelap-msft

Description

@manuelap-msft

Is your feature request related to a problem? Please describe.

We need to update the dataflows to not process all files every day, as this is leading to throughput issues for larger tenants.

I think this would work

  • bring in latest files and get environment from there
  • merge with the data verse table filtered to only those environments that were modified (this is important, don't merge all of them)
  • have a flag/logic of if in DV but not in files mark as deleted
  • for the maker dataflow don't need to process all files every day as this one only picks up new makers so we can also reduce that to only read last modified files and then maybe we have to so the orphan logic outside of the dataflow (still investigating)

Describe the solution you'd like

Only process most recent files in dataflow.

Describe alternatives you've considered

No response

Additional context?

No response

AB#448

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

Done ✅

Relationships

None yet

Development

No branches or pull requests

Issue actions