Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebuild Kedro/Databricks workflow recommendations #2185

Closed
yetudada opened this issue Jan 9, 2023 · 3 comments
Closed

Rebuild Kedro/Databricks workflow recommendations #2185

yetudada opened this issue Jan 9, 2023 · 3 comments
Labels
Component: Documentation 📄 Issue/PR for markdown and API documentation Issue: Feature Request New feature or improvement to existing feature Type: Parent Issue

Comments

@yetudada
Copy link
Contributor

yetudada commented Jan 9, 2023

Description

We concluded a research item on how Kedro is being used on Databricks (#2105). This task makes a recommendation to improve our Deployment to a Databricks cluster documentation.

Context

We will work on a Kedro-Databricks plugin at a later stage but first we'll overhaul the documentation because there was an insight about how much our users rely on it to get their work done. At this point in time, we'll recommend use of dbx and Databricks Repos as a way to use Kedro on Databricks.

Possible Implementation

Our Deployment to a Databricks cluster documentation needs quite a bit of help in the following ways:

  • High-priority
    • Include an introduction about why you would choose to use Kedro on Databricks
    • Recommend a workflow for syncing the latest version of their code written in an IDE to the Databricks workspace; we should recommend Databricks Repos and dbx sync as the way to do this
    • Recommend a workflow for running their pipelines on Databricks; we should recommend use of the iPython extension (used through a Databricks notebook) or use of dbx deploy
    • Recommend a workflow for visualising their pipeline through a Databricks notebook (this section is written, it just needs to be made more prominent)
    • Additionally, please walk users through being able to configure dbx and Databricks Repos so that they can use this functionality
  • Medium-priority
    • Provide recommendations specific to Azure; our documentation is heavily based on AWS
@jmholzer
Copy link
Contributor

This parent issue needs to be broken down further:

  1. Define a new workflow with Databricks repos, dbx and kedro
  2. Document our new workflow, make changes to existing documentation
  3. Document recommendations for use of Azure databricks (medium priority)

@astrojuanlu
Copy link
Member

I guess only the Azure databricks part is missing?

@merelcht
Copy link
Member

merelcht commented Jul 6, 2023

All subtasks have now been completed. The remaining work is blog posts and has been removed to kedro-devrel.

@merelcht merelcht closed this as completed Jul 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Documentation 📄 Issue/PR for markdown and API documentation Issue: Feature Request New feature or improvement to existing feature Type: Parent Issue
Projects
None yet
Development

No branches or pull requests

5 participants