Software Engineering for Data Scientists #2089

yetudada · 2022-12-05T10:16:19Z

Why do we care about this problem?

A secondary benefit of Kedro is that it increases your software engineering expertise because these principles are embedded in the product. Unfortunately, this benefit also has a downside. We have received feedback about Kedro’s advanced learning curve for the longest time. The actual learning curve is the software engineering fundamentals we expect you to know before you start using Kedro.

This is reflected in our adoption. Profiles like Machine Learning Engineers and Data Engineers (emphasis on engineering) have been the fastest adopters of Kedro. In contrast, Data Scientists (who are not software engineers nor have advanced Python knowledge) have been slower adopters.

We have been exposing these users to a “double lift” of onboarding to Python best-practices (the mindset shift) and Kedro at the same time. Therefore, if we want Kedro to be a well-adopted framework, we must help our users with this software engineering foundation.

What evidence do we have that this is a problem?

Quantitative evidence

We have unpacked this learning curve and have the following insights from 211 surveyed users (this survey assumes exposure to Kedro):

Overall, Kedro is perceived to have a “Standard learning curve, similar to learning any other Python framework.”
Most users, including Machine-Learning Engineers, Data Engineers and Data Scientists with a skew toward “Expert Python usage”, marked that Kedro has a “Standard learning curve, similar to learning any other Python framework.”
Most Data Scientists marked that Kedro had a learning curve that was “Challenging, it requires more attention and time than other frameworks” or “Difficult, the learning curve is very steep.”

It’s possible to summarise that Data Scientists or users with beginner or intermediate Python expertise will encounter a steeper learning curve when they attempt Kedro.

Qualitative evidence

“Part of the context is that I may fall slightly on the less technical side, so I have been mainly working out of notebooks and relying on data engineers to productionalize code.”
”For a person with a non-software engineering background, it is complicated to understand the structure at the beginning. Also, it requires advanced Python skills as it requires turning everything into functions.”
“You have to become familiar with the new concepts; I think you are expected to know about software engineering.”
“So I’m coming from a data scientist perspective. Most data scientists are not taught how to code software, like code properly. We’re not taught how to code like software engineers in school; we build a script and explore the data. So when starting with Kedro, you’re faced with all of these directories... just initially looking at that is confusing. You have to read the docs to really understand. That’s the main thing, it’s moving from this scripting Notebook approach to this more directory, software engineering approach... just changing the way that we think about how to run great data science code, I think that’s tough.”

How have our users tried to solve this problem for themselves?

Most users when missing this software engineering skillset, relied on being taught the fundamentals through structured learning programs like the Clean Code Workshop (as piloted by users located in McKinsey) or by people with software engineering backgrounds drafting and teaching software engineering bootcamps. Or they hosted longer-form, day-long workshops to teach Kedro like the team at GetInData.

Users also depended on someone who could pair-program with them; this affected how they learnt Kedro, too; users would need other team members experienced in Kedro to learn Kedro.

Qualitative evidence

“The adoption is challenging if you are on your own and don’t have anyone experienced to ask to.”
“The nuances of how Kedro works take time to get used to. I learnt Kedro mostly through colleagues rather than reading through the documents, which helped me grasp it faster. Otherwise, I would have struggled.”

What are we going to do about this problem?

We want to address this problem and will take two approaches to solve this:

Primary priority: Development of learning material to address the gaps in knowledge for people who do not have software engineering expertise, this initiative will fall into a larger objective known as Kedro Academy. It will be delivered as a bootcamp or series of training modules.
Secondary and ongoing priority: Making Kedro less intimidating for new users by simplifying the start-up journey and removing software engineering jargon in our documentation

What would success look like?

We would explore quantitative and qualitative measures of success:

Quantitative:
- Number of people that go through our training modules
- Conversion into Kedro users
- Ratings of the course material
Qualitative:
- Quoted impact on workflows

yetudada · 2022-12-09T10:13:39Z

We have a survey running to design the Minimum Learnable Curriculum (lol 😂) for this course, it needs to be completed by the 14th of December, and it's well on track to be our most filled survey to date: https://www.surveys.online/jfe/form/SV_37WxRUy5xkC0xkW

astrojuanlu · 2023-05-22T20:13:33Z

Is this ready to be closed?

astrojuanlu · 2023-10-23T18:01:41Z

This is now an ongoing internal training backed by public materials https://github.com/kedro-org/kedro-academy/tree/main/iswe4dx closing

yetudada added Issue: Feature Request New feature or improvement to existing feature Type: Parent Issue and removed Issue: Feature Request New feature or improvement to existing feature labels Dec 5, 2022

yetudada mentioned this issue Dec 5, 2022

Usability Testing for Spaceflights Tutorial #2091

Closed

yetudada mentioned this issue Mar 3, 2023

Scaling introductory Kedro training kedro-org/kedro-devrel#44

Closed

astrojuanlu closed this as completed Oct 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Software Engineering for Data Scientists #2089

Software Engineering for Data Scientists #2089

yetudada commented Dec 5, 2022

yetudada commented Dec 9, 2022

astrojuanlu commented May 22, 2023

astrojuanlu commented Oct 23, 2023

Software Engineering for Data Scientists #2089

Software Engineering for Data Scientists #2089

Comments

yetudada commented Dec 5, 2022

Why do we care about this problem?

What evidence do we have that this is a problem?

Quantitative evidence

Qualitative evidence

How have our users tried to solve this problem for themselves?

Qualitative evidence

What are we going to do about this problem?

What would success look like?

yetudada commented Dec 9, 2022

astrojuanlu commented May 22, 2023

astrojuanlu commented Oct 23, 2023