Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Software Engineering for Data Scientists #2089

Closed
yetudada opened this issue Dec 5, 2022 · 3 comments
Closed

Software Engineering for Data Scientists #2089

yetudada opened this issue Dec 5, 2022 · 3 comments

Comments

@yetudada
Copy link
Contributor

yetudada commented Dec 5, 2022

Why do we care about this problem?

A secondary benefit of Kedro is that it increases your software engineering expertise because these principles are embedded in the product. Unfortunately, this benefit also has a downside. We have received feedback about Kedro’s advanced learning curve for the longest time. The actual learning curve is the software engineering fundamentals we expect you to know before you start using Kedro.

This is reflected in our adoption. Profiles like Machine Learning Engineers and Data Engineers (emphasis on engineering) have been the fastest adopters of Kedro. In contrast, Data Scientists (who are not software engineers nor have advanced Python knowledge) have been slower adopters.

We have been exposing these users to a “double lift” of onboarding to Python best-practices (the mindset shift) and Kedro at the same time. Therefore, if we want Kedro to be a well-adopted framework, we must help our users with this software engineering foundation.

What evidence do we have that this is a problem?

Quantitative evidence

We have unpacked this learning curve and have the following insights from 211 surveyed users (this survey assumes exposure to Kedro):

  • Overall, Kedro is perceived to have a “Standard learning curve, similar to learning any other Python framework.”
  • Most users, including Machine-Learning Engineers, Data Engineers and Data Scientists with a skew toward “Expert Python usage”, marked that Kedro has a “Standard learning curve, similar to learning any other Python framework.”
  • Most Data Scientists marked that Kedro had a learning curve that was “Challenging, it requires more attention and time than other frameworks” or “Difficult, the learning curve is very steep.”

kedro_learning_curve

It’s possible to summarise that Data Scientists or users with beginner or intermediate Python expertise will encounter a steeper learning curve when they attempt Kedro.

Qualitative evidence

  • “Part of the context is that I may fall slightly on the less technical side, so I have been mainly working out of notebooks and relying on data engineers to productionalize code.”
  • ”For a person with a non-software engineering background, it is complicated to understand the structure at the beginning. Also, it requires advanced Python skills as it requires turning everything into functions.”
  • “You have to become familiar with the new concepts; I think you are expected to know about software engineering.”
  • “So I’m coming from a data scientist perspective. Most data scientists are not taught how to code software, like code properly. We’re not taught how to code like software engineers in school; we build a script and explore the data. So when starting with Kedro, you’re faced with all of these directories... just initially looking at that is confusing. You have to read the docs to really understand. That’s the main thing, it’s moving from this scripting Notebook approach to this more directory, software engineering approach... just changing the way that we think about how to run great data science code, I think that’s tough.”

How have our users tried to solve this problem for themselves?

Most users when missing this software engineering skillset, relied on being taught the fundamentals through structured learning programs like the Clean Code Workshop (as piloted by users located in McKinsey) or by people with software engineering backgrounds drafting and teaching software engineering bootcamps. Or they hosted longer-form, day-long workshops to teach Kedro like the team at GetInData.

Users also depended on someone who could pair-program with them; this affected how they learnt Kedro, too; users would need other team members experienced in Kedro to learn Kedro.

Qualitative evidence

  • “The adoption is challenging if you are on your own and don’t have anyone experienced to ask to.”
  • “The nuances of how Kedro works take time to get used to. I learnt Kedro mostly through colleagues rather than reading through the documents, which helped me grasp it faster. Otherwise, I would have struggled.”

What are we going to do about this problem?

We want to address this problem and will take two approaches to solve this:

  • Primary priority: Development of learning material to address the gaps in knowledge for people who do not have software engineering expertise, this initiative will fall into a larger objective known as Kedro Academy. It will be delivered as a bootcamp or series of training modules.
  • Secondary and ongoing priority: Making Kedro less intimidating for new users by simplifying the start-up journey and removing software engineering jargon in our documentation

What would success look like?

We would explore quantitative and qualitative measures of success:

  • Quantitative:
    • Number of people that go through our training modules
    • Conversion into Kedro users
    • Ratings of the course material
  • Qualitative:
    • Quoted impact on workflows
@yetudada yetudada added Issue: Feature Request New feature or improvement to existing feature Type: Parent Issue and removed Issue: Feature Request New feature or improvement to existing feature labels Dec 5, 2022
@yetudada
Copy link
Contributor Author

yetudada commented Dec 9, 2022

We have a survey running to design the Minimum Learnable Curriculum (lol 😂) for this course, it needs to be completed by the 14th of December, and it's well on track to be our most filled survey to date: https://www.surveys.online/jfe/form/SV_37WxRUy5xkC0xkW

@astrojuanlu
Copy link
Member

Is this ready to be closed?

@astrojuanlu
Copy link
Member

This is now an ongoing internal training backed by public materials https://github.com/kedro-org/kedro-academy/tree/main/iswe4dx closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Shipped 🚀
Development

No branches or pull requests

2 participants