Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
GitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Should at least some of the inflammation data pass the `detect_problems` function? #170
So we have a bunch of csv files with inflammation data. In 05-cond and 06-func, we develop some tests to tell whether or not our data is suspicious. One of the tests concerns the max behavior and one of the tests concerns the min behavior. But when we run
I just taught this lesson and some of the students stopped me to ask "Wait, so none of our data is okay?" I think it was demotivating for some of them (as it would be in the real world if you realized that all of your data were bad).
Should we consider changing either the datasets or the conditional tests in
I definitely agree that this outcome is pretty sad depending on how you present it. I find that I often motivate the Python lesson by saying that it's a pretty common thing for a colleague to give you data and for you to be poking through it, so I'm often tempted to present this as the situation in which you get these csv files. However, since we're kind of trying to find some fraud here, it makes more sense to present it more adversarially, which in itself is kind of a bummer.
Personally, I find it difficult to discuss this data in general, since it just seems like numbers that I can't really attach to the real world. The Lessons subcommittee has been having some meetings (which you're welcome to attend!) talking about an overhaul of this lesson, and one of the issues we've discussed is the data set. We're looking at other data sets like the GapMinder data, but I think we're pretty unanimous in not using the inflammation data moving forward.
Of course, that's a long-term answer, and it'll be a while before those overhauled lessons are ready. So what should we do right now? I think that it's relatively difficult to cook up some fake data, and I can't think of anything less interesting than typing numbers into a spreadsheet to make sure they pass some checks. If someone wants to do that, I would tip my hat to them, but I think a more important short-term solution would be to add a callout explaining the data set in
Do you think this would alleviate your concerns?