Skip to content

This repository is made to organize my sources for a multivocal literature review on the modern data stack. Basically a list of articles on this topic.

Notifications You must be signed in to change notification settings

szukiadam/modern-data-stack-MLR

Repository files navigation

MODERN DATA STACK - A Multivocal Literature Review

Table of contents

  1. Multivocal Literature Review
  2. Resources
    1. 2021 MAD
    2. Analytics Engineering Roundup with Benn Stancil

Multivocal Literature Review

Guidelines for conducting MLRs

“there are no systematic guidelines for conducting MLRs in computer science” [57] and “There is no explicit guideline for collecting ML [multivocal literature]” [55]. We are addressing that need in this paper. Grey literature = literature that is not formally published in sources such as books or journal articles

Tiers of GL:

  1. 1st Tier - High outlet control/high credibility (e.g. books, magazines, white papers)i
  2. 2nd Tier - Moderate outler control/Moderate credibility (e.g. News articles, presentations, videos, Q/A sites, wiki articles)
  3. 3rd Tier - Low outler control/low credibility (e.g. blogs, emails, tweets)
White literature Grey literature Black literature
Journal papers Technical reports Ideas
Conference proceedings Lectures Concepts
Books Blogs Thoughts
AV media

closing the gap between academic research and professional practice

While extensive GL is available in the field of SE and the volume of GL in SE is clearly expanding on a very rapid pace (e.g., in blogs and free online books), little effort has been made to utilize such a knowledge in SE research.

We want to highlight that according to [81] GL is important when context has a large effect on the implementation and the outcome which is typically the case in SE [84, 85].

Another guideline paper [17] suggests including GL in reviews when relevant knowledge is not reported adequately in academic articles, for validating scientific outcomes with practical experience, and for challenging assumptions in practice using academic research. This is my main reason to use GL. It is increasingly discussed in the SE community that “contextual” information (e.g., what approach works for whom, where, when, and why?) [86-89] are critical for most of SE research topics and shall be carefully considered. Since GL sometimes provide contextual information, including them and conducting a MLR would be important. Table4 - Questions to decide whether to include the GL in software engineering reviews i

RESOURCES

The 2021 Machine Learning, AI and Data (MAD) Landscape
The Analytics Engineering Roundup Ep.9 with Benn Stancil

2021-MAD-Landscape

Analytics Engineering Roundup with Benn Stancil

What is your take on the modern data stack? Is there a definition or is it just marketing buzz?

It's probably mostly marketing buzz. ... the modern data stack is data tools that were launched on product hunt I think there is this sense of like the modern data stack are the tools that are a little bit more immediately understandable.

What makes a great analyst?

It's not to be a "benngineer". .. you have to be curious enough to look and then you have to be observant enough to see it, and then you'd have to be like analytical enough to connect the dots.

Looking 10 years out, what do you hope to be true for the data industry?

I hope that we basically spend a lot more time on the detective work and a lot less time searching for clues. ... the thing that I think makes data valuable is you have it when you need it ... We spent too much time on things like the collection and figuring out like all of the inputs to that, and too little time on, "All right, we have all this information. What decision do we make ..

Ideas / things to include in the review

  1. Analysis of companies using MDS / parts of the MDS
  2. Reasons for their migration
  3. Positive and negative experience of the process

Talk with Rúben

  1. Start early
  2. Reporting should be the same as for SLRs
  3. Look at other sample MLRs and follow the different parts from there

About

This repository is made to organize my sources for a multivocal literature review on the modern data stack. Basically a list of articles on this topic.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published