Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The CSV example can give novices the wrong impression of how to structure CSV files #886

Open
bast opened this issue Nov 1, 2020 · 1 comment
Labels
type:discussion Discussion or feedback about the lesson

Comments

@bast
Copy link
Contributor

bast commented Nov 1, 2020

The CSV data used in this lesson (https://github.com/swcarpentry/python-novice-inflammation/tree/gh-pages/data) has two problems which for me are significant:

  1. The files contain no header line:
  • This means that only by looking at the data alone, we have no idea what the data represents. I think this is not good practice from the documentation/reproducibility perspective.
  • Many CSV readers understand the header line to create a dictionary or dataframe but this line is now missing.
  • I understand that the motivation to omit it is perhaps so that it can be read in with numpy.loadtxt but this is not how I would read in CSV data. Here my impression is that the data is adapted to the solution rather than adapting the solution to the data.
  1. The data is not in "tidy" format (https://en.wikipedia.org/wiki/Tidy_data):
  • When teaching data visualization (different course) I emphasize to arrange data in tidy format (columns are variables, rows are measurements) so that the data can be extended with more measurements without modifying the analysis/plotting scripts.

I find it so important to show good examples, in particular to novices because novices will often take what they see and assume that this is the way to do it and use this in their work, but for me this is not a good example. And novices may not see that this is not a good example for storing data for analysis/plotting.

Also I don't only want to point out problems but also offer to contribute to fixing this but before doing that I wanted to start a discussion first and get some feedback. It might be just me who has a problem with this.

@annefou
Copy link

annefou commented Nov 1, 2020

numpy.loadtxt can now handle headers (for instance, we can use skiprows) and we can also add comments. My guess is that is is mainly for historical reasons.

@ldko ldko added the type:discussion Discussion or feedback about the lesson label Nov 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:discussion Discussion or feedback about the lesson
Projects
None yet
Development

No branches or pull requests

3 participants