Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import TRANSCODING_SEPARATOR to pipeline to make Kedro backwards compatible with kedro-viz on python 3.8 #3822

Merged
merged 3 commits into from
Apr 19, 2024

Conversation

ankatiyar
Copy link
Contributor

@ankatiyar ankatiyar commented Apr 18, 2024

Description

Flagged by e2e test on kedro-docker this morning (but not related to kedro-docker) - https://github.com/kedro-org/kedro-plugins/actions/runs/8730295535/job/23953858295#step:8:1

Context

The problem

For python versions > 3.8, everything is okay ✅

For python version == 3.8:
If a user has a project that uses both Kedro and Viz, the project becomes completely unusable -

  • The last version of viz that supports python 3.8 is 7.1.0
  • Kedro still supports python 3.8, so if a user creates a new project with 0.19.4 or upgrades the project to 0.19.4 it becomes a problem:
ImportError: cannot import name 'TRANSCODING_SEPARATOR' from 
      'kedro.pipeline.pipeline' 
      (/usr/local/lib/python3.8/site-packages/kedro/pipeline/pipeline.py)
  • User is then restricted to kedro versions < 0.19.4

Development notes

Proposed Solution 1

  • Do nothing, it's a very small subset of users (python == 3.8, kedro == 0.19.4 and using kedro-viz for their projects)
  • Issue a statement that you can't update to kedro 0.19.4 if you're on python 3.8 AND want to use kedro-viz (Uninstalling viz solves the problem)
  • Additionally -
    • We can fix this in 0.19.5 but don't need to do an immediate patch release

Proposed Solution 2

  • Add an import statement for TRANSCODE_SEPARATOR to kedro.pipeline.pipeline
  • Do a patch release to make Kedro backwards compatible with kedro-viz

Other considerations (Bigger discussion)

  • arrive at a consensus about which python versions should be supported by all Kedro/Viz/Plugins

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

  • Read the contributing guidelines
  • Signed off each commit with a Developer Certificate of Origin (DCO)
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes
  • Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
@deepyaman
Copy link
Member

Proposed Solution 2

  • Add an import statement for TRANSCODE_SEPARATOR to kedro.pipeline.pipeline
  • Do a patch release to make Kedro backwards compatible with kedro-viz

How about this, but just get rid of the _transcoding module and put the stuff back into pipeline. TBH I think adding the _transcoding module wasn't really necessary, and it introduced a new convention that doesn't exist anywhere else in the codebase (modules beginning with _).

But generally in favor of doing a patch release; easy fix that will avoid more questions, and show responsiveness.

@deepyaman
Copy link
Member

deepyaman commented Apr 18, 2024

Proposed Solution 2

  • Add an import statement for TRANSCODE_SEPARATOR to kedro.pipeline.pipeline
  • Do a patch release to make Kedro backwards compatible with kedro-viz

How about this, but just get rid of the _transcoding module and put the stuff back into pipeline. TBH I think adding the _transcoding module wasn't really necessary, and it introduced a new convention that doesn't exist anywhere else in the codebase (modules beginning with _).

But generally in favor of doing a patch release; easy fix that will avoid more questions, and show responsiveness.

To add some context to my suggestion—what was the original reason for moving stuff to the _transcoding module? The only somewhat functional thing I can see this does is make TRANSCODING_SEPARATOR not public (not totally sure why this is done; I have leveraged it in plugin code previously). However, if it was intentional to make this "private", then just importing it in pipeline undoes this, too.

@noklam
Copy link
Contributor

noklam commented Apr 18, 2024

we recently moved TRANSCODING_SEPARATOR to kedro.pipeline._transcoding.py from kedro.pipeline.pipeline.py

This is a breaking change and shouldn't be done in non-breaking release, we can do this but add a pointer to import from the new module instead. Agree with @deepyaman as I am not sure why this refactoring is necessary. It's not a big problem so a patch fix will do.

@merelcht
Copy link
Member

To add some context to my suggestion—what was the original reason for moving stuff to the _transcoding module? The only somewhat functional thing I can see this does is make TRANSCODING_SEPARATOR not public (not totally sure why this is done; I have leveraged it in plugin code previously). However, if it was intentional to make this "private", then just importing it in pipeline undoes this, too.

It was done as part of #3812 to fix the issue with node dependencies uncovered by using the built-in toposort functionality. When reviewing it I didn't realise the transcoding separator was used outside of the framework and didn't think it was such a consequential refactor 😕

@ElenaKhaustova
Copy link
Contributor

We missed that when we applied #3812 😔

Given the fact that the starters' tests will keep failing until we fix that, I would go with option 2 if it's relatively fast.

kedro/pipeline/pipeline.py Outdated Show resolved Hide resolved
Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with a small comment to explains why this is needed and remind ourselves this can be removed in 0.20

@deepyaman
Copy link
Member

Approved with a small comment to explains why this is needed and remind ourselves this can be removed in 0.20

I think this actually shouldn't be removed in 0.20? By adding this, we're once again signaling that TRANSCODING_SEPARATOR is a public attribute that people can depend upon. Furthermore, my opinion is, TRANSCODING_SEPARATOR is something that should be exposed (it's not an internal-only implementation detail, and plugins use it, e.g. for catalog parsing).

Copy link
Member

@deepyaman deepyaman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still pushing to just get rid of the _transcoding module. IMO the refactor was unnecessary and inconsistent (introduced a new convention that doesn't exist anywhere else in the codebase), on top of having the breaking effect.

As mentioned in a comment just now, I don't think TRANSCODING_SEPARATOR should be removed from public access even in 0.20.

As such, I see this is a good time to just move the stuff back into the pipeline module, rather that keeping it in _transcoding and importing most of the stuff, anyway.

@ankatiyar ankatiyar requested a review from merelcht April 19, 2024 08:56
Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ankatiyar I vote to go forward with this solution.

We didn't anticipate the move to _transcoding to have such an impact, but nevertheless I think that was a good refactor and even though it's a new convention in Kedro it's not new for the world of Python to indicate private API in this way.

@merelcht merelcht dismissed deepyaman’s stale review April 19, 2024 10:33

Thanks for weighing in Deepyaman. I disagree that introducing _transcoding was a bad refactor idea and think we should move forward. Juanlu and I discussed to have some sort of post-mortem on this issue so we can talk more.

ankatiyar and others added 2 commits April 19, 2024 11:34
Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com>
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
@ankatiyar ankatiyar marked this pull request as ready for review April 19, 2024 10:36
@ankatiyar ankatiyar enabled auto-merge (squash) April 19, 2024 10:43
@ankatiyar ankatiyar merged commit 39080a9 into main Apr 19, 2024
34 checks passed
@ankatiyar ankatiyar deleted the patch-fix branch April 19, 2024 10:54
@idanov
Copy link
Member

idanov commented Apr 19, 2024

@deepyaman The addition of the module is actually necessary, because it resolves a circular dependency between node.py and pipeline.py. So not really just random move, but a necessity. The convention of private modules is not unheard of, that's the same general convention for private elements in Python, but applied for modules, which is simply not as common, but still part of the convention.

@deepyaman
Copy link
Member

@deepyaman The addition of the module is actually necessary, because it resolves a circular dependency between node.py and pipeline.py. So not really just random move, but a necessity.

There are other ways to remove circular dependencies, like not importing at the top level. It's not necessarily required. But that's fine.

The convention of private modules is not unheard of, that's the same general convention for private elements in Python, but applied for modules, which is simply not as common, but still part of the convention.

Yes, there are a lot of libraries that use it very heavily. My point is that Kedro does not use it anywhere—I searched before making that comment—so it's introducing a new convention in the codebase. Even naming it transcoding instead of _transcoding would be more consistent IMO.

@deepyaman
Copy link
Member

@deepyaman The addition of the module is actually necessary, because it resolves a circular dependency between node.py and pipeline.py. So not really just random move, but a necessity.

There are other ways to remove circular dependencies, like not importing at the top level. It's not necessarily required.

Created a simple PR to illustrate: #3826

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants