Autoload csv files from data directory #2761

Floppy · 2014-08-16T13:54:51Z

Sometimes it's simplest to store data in CSV format. This PR autoloads these files as well, just like JSON or YAML.

parkr · 2014-08-17T00:48:10Z

Oh goodness, I thought I did this! Thanks for the PR. Looks pretty good to me.

parkr · 2014-08-17T00:49:08Z

lib/jekyll/site.rb

-          data[key] = SafeYAML.load_file(path)
+          case File.extname(path).downcase
+          when '.csv'
+            data[key] = CSV.read(path, headers: true).map(&:to_hash)


We follow the GitHub Ruby Style Guide, which dictates we use hash rockets:

data[key] = CSV.read(path, :headers => true).map(&:to_hash)

Additionally, what happens if no header is specified? /cc @benbalter

Hashrocket added.

As for headers, if you didn't have headers in the CSV, there would be no way to do things like site.members.name (as there wouldn't be anything to say it was a name), so I think it's OK for Jekyll to support a very precise definitions of CSV, i.e. comma separated and includes header row. That's what most people will want to use anyway. If there wasn't a header row, you'd get junk data, but there's currently no simple way to be sure if a CSV has a header or not, so we can't really throw an error.

parkr · 2014-08-17T18:52:11Z

As for headers, if you didn't have headers in the CSV, there would be no way to do things like site.members.name (as there wouldn't be anything to say it was a name), so I think it's OK for Jekyll to support a very precise definitions of CSV, i.e. comma separated and includes header row. That's what most people will want to use anyway. If there wasn't a header row, you'd get junk data, but there's currently no simple way to be sure if a CSV has a header or not, so we can't really throw an error.

I agree that we should enforce headers. I would really like a way to show some sort of error if no headers exist. Or add a huuuge warning in the docs and the release notes should say support reading CSV's with headers in _data. How can we be clear about this?

Floppy · 2014-08-17T20:02:56Z

Paging @ldodds and @pezholio. Do you guys think there's any reasonable way to detect a header row in a CSV? It seems it would always be very brittle, to me.

parkr · 2014-08-17T20:03:56Z

@benbalter may also have an idea. He works with this kind of data quite often.

Floppy · 2014-08-17T20:06:28Z

We've been building http://csvlint.io recently for CSV validation, and I'm 99% sure we don't have a reliable way to autodetect headers, so I expect it'll have to be a documentation thing. Anyway, we'll see what the others say first!

paulfitz · 2014-08-18T03:47:40Z

I agree with @Floppy that detecting whether a CSV file has a header is unreliable in the general case. It works great on big juicy files with cells stuffed with numbers, dates, and the like, but it breaks your heart on important edge cases, including tables with few rows, or a table full of short strings.

I think it'd definitely be reasonable to treat the following cases as errors:

Blank cells in the alleged header.
Repeated cells in the alleged header.
Numeric-looking cells (integer, float) in the alleged header (this one is a bit less reasonable than the first two, but would catch a lot more headerless CSV files).

Anything that tries to be much smarter than that, it'd be great to have a configuration switch to turn off for when predictability is important.

Very happy user of the _data directory, thanks for including it, and CSV support of any kind would be total icing on the cake!

parkr · 2014-08-18T05:38:58Z

Great set of criteria. Thinking more about it now, this kind of validation would better serve the jekyll doctor command. We can print CSV files that violate any of the above. What do you think?

Floppy · 2014-08-18T08:34:24Z

That could work. The core of csvlint.io is in a gem, https://github.com/theodi/csvlint.rb/. We could add the heuristic @paulfitz suggests to that, and integrate that check into jekyll doctor perhaps. It would then catch a whole bunch of CSV errors, which might be useful.

Ref: #2761

ghost · 2014-11-05T05:12:12Z

Thank you for shipping it with 2.4.0 !

Love it <3

Floppy added 3 commits August 16, 2014 14:54

Autoload csv files from data directory

687176e

link issue number

3a89923

map with proc for CSV loading

866935d

parkr reviewed Aug 17, 2014
View reviewed changes

paulfitz mentioned this pull request Aug 17, 2014

Edit github CSVs rufuspollock-okfn/dataexplorer#155

Closed

hashrockets in CSV loading

cccfda7

parkr added the Feature label Aug 17, 2014

parkr added a commit that referenced this pull request Aug 18, 2014

Merge pull request #2761 from theodi/csv-data

c4a2ac2

parkr merged commit c4a2ac2 into jekyll:master Aug 18, 2014

parkr added a commit that referenced this pull request Aug 18, 2014

Update history to reflect merge of #2761 [ci skip]

c54fb1a

Floppy mentioned this pull request Aug 18, 2014

Header error detection Data-Liberation-Front/csvlint.rb#96

Open

Floppy deleted the csv-data branch August 18, 2014 16:06

parkr added a commit that referenced this pull request Aug 26, 2014

Add note to datafiles docs around CSV's.

568464b

Ref: #2761

jekyll locked and limited conversation to collaborators Feb 27, 2017

jekyllbot added the frozen-due-to-age label Feb 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autoload csv files from data directory #2761

Autoload csv files from data directory #2761

Floppy commented Aug 16, 2014

parkr commented Aug 17, 2014

parkr Aug 17, 2014

parkr Aug 17, 2014

Floppy Aug 17, 2014

parkr commented Aug 17, 2014

Floppy commented Aug 17, 2014

parkr commented Aug 17, 2014

Floppy commented Aug 17, 2014

paulfitz commented Aug 18, 2014

parkr commented Aug 18, 2014

Floppy commented Aug 18, 2014

ghost commented Nov 5, 2014

Autoload csv files from data directory #2761

Autoload csv files from data directory #2761

Conversation

Floppy commented Aug 16, 2014

parkr commented Aug 17, 2014

parkr Aug 17, 2014

Choose a reason for hiding this comment

parkr Aug 17, 2014

Choose a reason for hiding this comment

Floppy Aug 17, 2014

Choose a reason for hiding this comment

parkr commented Aug 17, 2014

Floppy commented Aug 17, 2014

parkr commented Aug 17, 2014

Floppy commented Aug 17, 2014

paulfitz commented Aug 18, 2014

parkr commented Aug 18, 2014

Floppy commented Aug 18, 2014

ghost commented Nov 5, 2014