Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kedro-datasets release 1.5.0 doesn't reflect SparkDataSet well #290

Closed
Tracked by #2919
ElenaMironovaQB opened this issue Aug 2, 2023 · 2 comments · Fixed by #302
Closed
Tracked by #2919

kedro-datasets release 1.5.0 doesn't reflect SparkDataSet well #290

ElenaMironovaQB opened this issue Aug 2, 2023 · 2 comments · Fixed by #302
Assignees
Labels
bug Something isn't working

Comments

@ElenaMironovaQB
Copy link

Description

Short description of the problem here.

After yesterdays release of kedro-datasets==1.5.0, our CI started failing during system tests which do a kedro run for a pipeline with spark (see the screenshot). As far as i can see, SparkDataSet is still defined with the same name as before. When we used kedro-datasets==1.4.2 the same tests were running smoothly. I also couldn't find anything specific in the release notes.

Screenshot 2023-08-01 at 15 24 22

Context

How has this bug affected you? What were you trying to accomplish?

Our system tests which run kedro on pipelines with spark stopped running.
More discussion on slack: https://kedro-org.slack.com/archives/C03RKP2LW64/p1690896281915309

Steps to Reproduce

  1. Run a pipeline, where kedro-datasets[spark.SparkDataSet] is used
  2. [Second Step]
  3. [And so on...]

Expected Result

Tell us what should happen.

The pipeline should run successfully till the end

Actual Result

Tell us what happens instead.

![Screenshot 2023-08-01 at 15 24 22](https://github.com/kedro-org/kedro-plugins/assets/64854268/a721559d-4687-42ac-bb9f-f83b351b6001)

-- Separate them if you have more than one.

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

  • Kedro version used (pip show kedro or kedro -V): 0.18.11
  • Kedro plugin and kedro plugin version used (pip show kedro-airflow): kedro-datasets==1.5.0
  • Python version used (python -V): 3.8
  • Operating system and version: ubuntu-2004:202201-02
@ElenaMironovaQB ElenaMironovaQB changed the title <Title> kedro-datasets release 1.5.0 doesn't reflect SparkDataSet well kedro-datasets release 1.5.0 doesn't reflect SparkDataSet well Aug 2, 2023
@merelcht merelcht added the bug Something isn't working label Aug 2, 2023
@DimedS DimedS self-assigned this Aug 9, 2023
@noklam
Copy link
Contributor

noklam commented Aug 10, 2023

I suspect this is the root cause #263

@noklam
Copy link
Contributor

noklam commented Aug 10, 2023

@sbrugman is pip install kedro-datasets[pandas.CSVDataSet] still possible? I think this is an undesire side-effect. I did some quick search and seem that the standard pyproject.toml doesn't support pip install kedor-datasets[pandas.CSVDataSet] but only pip install kedro-datasets[pandas].

At this point I don't think we want to bring in more advance tool like poetry just for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants