New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support data sources #1003

Merged
merged 1 commit into from Oct 1, 2013

Conversation

Projects
None yet
@liufengyun
Contributor

liufengyun commented Apr 27, 2013

Data Source enables you to load data, serialized as YAML, from a directory in Jekyll's source: _data.

Here's an example to illustrate:

In my _data directory, I have members.yml, which contains the following:

- name: Ben Balter
  company: GitHub
  location: Washington D.C.
- name: Parker Moore
  location: "Ithaca, NY"

Jekyll loads this data (in this case, an Array of Hashes) into site.data.members (note that the namespace is based on the filename) which can be used in Liquid thusly:

{% for member in site.data.members %}
  <p>
    {{ member.name }} lives in {{ member.location }}
    {% if member.company %} and works for {{ member.company }}{% endif %}
  </p>
{% endfor  %}

This allows us to input arbitrary data into our templates without the use of _config.yml and it is re-read each time a file changes when running jekyll with --watch.

  • Implement
  • Tests
  • Figure out best namespace: site.data, or data?
@swanson

This comment has been minimized.

Contributor

swanson commented May 6, 2013

Would love to not have to shove stuff into _config.yml to achieve this.

Here's an example use case: https://github.com/IndyStartupLab/indystartuplab.org/blob/gh-pages/_config.yml - feed into data to includes blocks to prevent copy-pasting every time we add a member/project.

@RohitRox

This comment has been minimized.

RohitRox commented Jun 6, 2013

👍 desperately waiting for this

@bumpux

This comment has been minimized.

bumpux commented Jul 15, 2013

+1 This will get me out of current solution and living in Jekyll

@parkr

This comment has been minimized.

Member

parkr commented Jul 16, 2013

This is a pretty cool feature. I'm not sure about the security implications for this, however. @benbalter, perhaps you could elaborate?

@benbalter

This comment has been minimized.

Contributor

benbalter commented Jul 17, 2013

This functionality would be a great plugin, but I don't know that the use-case is widespread enough to warrant inclusion in core, at least not at the onset. Would 80% of users use this?

The ability to get global data from something other than _config.yml would be cool. (e.g., with choosealicense's _config.yml).

I think the more Jekyll way to do that would be more transparent. Perhaps a _data directory, that automatically parses any .yml file, and exposes it as site.[filename], without the need to clutter _config.yml with all sorts of needless settings that we could just as easily detect.

From a security standpoint, would love it to be limited to local files within the repo root, at least if safe mode is on. Also, I'd stick with YAML. That's Jekyll's input language. We use it for front matter, we use it for config. We should stick with it.

Can you give some examples of use cases where external files would be needed? If I needed external data, I'd personally much rather have a build script that pulls in the datafile and vendors it to the repo so that I can version it, have a backup if the datasource goes down, etc.

@swanson

This comment has been minimized.

Contributor

swanson commented Jul 17, 2013

+1 on "The ability to get global data from something other than _config.yml would be cool. (e.g., with choosealicense's _config.yml)" - more examples: https://github.com/sep/letsworkhappier.com/blob/gh-pages/_config.yml https://github.com/plusjade/jekyll-bootstrap/blob/master/_config.yml

I agree on the remote data/YAML - I think that use case is beyond the 80%.

@liufengyun

This comment has been minimized.

Contributor

liufengyun commented Jul 17, 2013

@benbalter The idea of a _data directory and auto parsing any .yml sounds great. That's much cleaner without tedious settings in _config.yml.

I believe local yaml data files will be welcomed by most jekyll users, while remote data or non-yaml files may be very special use cases.

Though remote data & non-yaml files may be beyond the 80%, I'd like jekyll to support it at least in unsafe mode, so that jekyll is not closed, but open with many interesting possibilities. For example, use data directly from database to generate the site. And it's only about 15 lines of code to support the feature.

@parkr

This comment has been minimized.

Member

parkr commented Jul 17, 2013

I like the idea of a _data dir very much. Instead of directly on site, I might move it to site.data instead or a new variable called data altogether, point being that we should namespace this stuff.

I'd like to start first with just local data. An additional concern is where to put references to remote data. In the YAML files? In _config.yml? If we start with just local data, we can add in that complexity later.

@swanson

This comment has been minimized.

Contributor

swanson commented Jul 17, 2013

I haven't looked at the internals - but one annoying bit about putting stuff in _config.yml is that changes are not picked up with the watch flag and you have to manually restart the server. If possible, this should be avoided in a new _data directory setup.

@liufengyun

This comment has been minimized.

Contributor

liufengyun commented Jul 18, 2013

@parkr I agree local yaml files in _data directory would be a good start.

Regarding the namespace issue, a new global variable data seems not good. site.data is acceptable. But I think it's better to leave it to users, it's up to end users to avoid the conflict with reserved config vars. Because in most use cases one site only have one or two yaml files(at most several), it's easy to avoid the naming conflict.

Regarding remote data sources, I think it should be defined in _config.yml. For remote data sources(e.g. database), it's impossible to watch changes without restarting the server.

@parkr parkr referenced this pull request Jul 26, 2013

Open

I18N #68

@liufengyun

This comment has been minimized.

Contributor

liufengyun commented Aug 30, 2013

@parkr @benbalter I've updated the pull request to only autoload yaml files under _data directory.

The jekyll engine will autoload all yaml files(ends with .yml or .yaml) under _data. If there's a file members.yml under the directory, then user can access contents of the file through site.members.

@parkr

View changes

lib/jekyll/site.rb Outdated
def read_data(dir)
base = File.join(self.source, dir)
return [] unless File.exists?(base)
entries = Dir.chdir(base) { Dir['*.{yaml, yml}'] }

This comment has been minimized.

@parkr

parkr Aug 30, 2013

Member

Is the space allowed there? Can you add a test for both yaml and yml?

@parkr

View changes

lib/jekyll/site.rb Outdated
# Returns nothing
def read_data(dir)
base = File.join(self.source, dir)
return [] unless File.exists?(base)

This comment has been minimized.

@parkr

parkr Aug 30, 2013

Member

I'd probably check to make sure it's a directory too:

return [] unless File.directory?(base)
@parkr

View changes

lib/jekyll/site.rb Outdated
entries.each do |entry|
path = File.join(self.source, dir, entry)
key = File.basename(entry, '.*')
@data[key] = YAML.safe_load_file(path)

This comment has been minimized.

@parkr

parkr Aug 30, 2013

Member

We tend to use self.data[key] for accessing attributes on the instance.

@parkr

View changes

site/docs/structure.md Outdated
<p>
Well-formatted site data should be placed here. The jekyll engine will
autoload all yaml files(ends with <code>.yml</code> or <code>.yaml</code>)

This comment has been minimized.

@parkr

parkr Aug 30, 2013

Member

Whoops! Looks like there is a space missing between files and (

@parkr

View changes

test/test_site.rb Outdated
site = Site.new(Jekyll.configuration)
site.process
assert_equal site.data['members'].size, 2

This comment has been minimized.

@parkr

parkr Aug 30, 2013

Member

Can you also check to make sure what is read in is proper?

It'd also be good to make sure that site.members from site_payload is right.

@parkr

This comment has been minimized.

Member

parkr commented Aug 30, 2013

I desperately want this feature. I've been using it (ostensibly) in theclassnotes.github.io and wrote a short rake task to join them into the _config.yml. It'd be amazing to have it built-in :)

Additionally, we should make sure they're re-read when the contents change.

@liufengyun

This comment has been minimized.

Contributor

liufengyun commented Aug 31, 2013

@parkr I've just refined the code according to review.

I think reload will work without any problem, as if any file changes in the source directory, site.process will be called, which then calls site.read and finally site.read_data will be called.

@parkr

View changes

features/data.feature Outdated
Scenario: read YAML files in _data directory
Given I have a _data directory
And I have a "_data/languages.yaml" file that contains "[java, ruby]"

This comment has been minimized.

@parkr

parkr Aug 31, 2013

Member

Would you mind also adding a yml/yaml thing here? Maybe have a second file in this scenario?

This comment has been minimized.

@liufengyun

liufengyun Aug 31, 2013

Contributor

I've just updated this feature.

@parkr

This comment has been minimized.

Member

parkr commented Aug 31, 2013

Could we support subdirectories? Should we support subdirectories?

@liufengyun

This comment has been minimized.

Contributor

liufengyun commented Aug 31, 2013

I think it can satisfy 80% of the requirements without complications with subdirectories.

Later if there's a concrete scenarios for subdirectories, we can add that support as well.

@parkr

This comment has been minimized.

Member

parkr commented Aug 31, 2013

Agreed! Let's skip subdirectories for now and just read in the YAML files in _data.

This PR LGTM. @mattr-?

@parkr

View changes

lib/jekyll/site.rb Outdated
@@ -266,7 +284,7 @@ def post_attr_hash(post_attr)
# "tags" - The Hash of tag values and Posts.
# See Site#post_attr_hash for type info.
def site_payload
{"site" => self.config.merge({
{"site" => self.data.merge(self.config).merge({

This comment has been minimized.

@parkr

parkr Aug 31, 2013

Member

We should probably use deep_merge here. Maybe we can setup a new method which collects the data and configs?

This comment has been minimized.

@liufengyun

liufengyun Aug 31, 2013

Contributor

What's the point here for deep_merge? In my mind, if there's collision of keys, then it's abnormal usage.

This comment has been minimized.

@parkr

parkr Aug 31, 2013

Member

If I have a _config.yml that contains:

members:
- name: Ben
  username: benbalter
- name: Parker
  username: parkr

I'd want to deep-merge it with a _data/members.yml with the following contents:

- name: Ben Balter
- name: Parker Moore

To get the output:

site.data['members']
# => [
#  {"name" => "Ben Balter", "username" => "benbalter"}
#  {"name" => "Parker Moore", "username" => "parkr"}
#]

This comment has been minimized.

@liufengyun

liufengyun Sep 1, 2013

Contributor

I doubt if there's real-world usage of the case above. Why define a single piece of data in two different places?

I think it's up to the end user to guarantee that keys in _data will not conflict with keys in _config.yml.

This comment has been minimized.

@parkr

parkr Sep 1, 2013

Member

I think we should enforce best-practices to some degree, but I really think helping the user out (maybe he or she is tired or just not well-focused that day) does a world of good.

This comment has been minimized.

@liufengyun

liufengyun Sep 2, 2013

Contributor

OK, I've changed it to deep_merge.

@parkr

View changes

features/data.feature Outdated
Given I have a _data directory
And I have a "_data/members.yml" file with content:
"""
- jack

This comment has been minimized.

@parkr

parkr Aug 31, 2013

Member

It'd be cool to make sure hashes work as well instead of just arrays. And arrays of hashes. (Mostly that liquid exposes them properly)

This comment has been minimized.

@liufengyun

liufengyun Sep 1, 2013

Contributor

OK, I've refined the feature to cover arrays, hashes and arrays of hashes.

This comment has been minimized.

@parkr

parkr Sep 1, 2013

Member

Thank you!

This comment has been minimized.

@parkr

parkr Sep 1, 2013

Member

Just want to be thorough :)

@parkr

View changes

lib/jekyll/site.rb Outdated
# Returns nothing
def read_data(dir)
base = File.join(self.source, dir)
return [] unless File.directory?(base)

This comment has been minimized.

@parkr

parkr Sep 1, 2013

Member

It'd be great to print out a warning message if someone specified a file instead of a directory:

unless File.directory?(base)
  Jekyll.logger.warn "The data directive specified in the configuration does not exist or is not an accessible directory."
  return Array.new
end

This comment has been minimized.

@liufengyun

liufengyun Sep 2, 2013

Contributor

Good point, I've added the warning.

@parkr

View changes

lib/jekyll/site.rb Outdated
entries.each do |entry|
path = File.join(self.source, dir, entry)
key = File.basename(entry, '.*')

This comment has been minimized.

@parkr

parkr Sep 1, 2013

Member

We should probably do some more sanitation here. If I have the file hello dolly.yml then it should come in as hello_dolly.

def sanitize_filename(name)
  name.gsub(/[^\w\s_-]+/, '')
      .gsub(/(^|\b\s)\s+($|\s?\b)/, '\\1\\2')
      .gsub(/\s+/, '_')
end

Then use it:

key = sanitize_filename(File.basename(entry, '.*'))

This comment has been minimized.

@liufengyun

liufengyun Sep 2, 2013

Contributor

Thanks for the regex code, it saves me time:-)

@mattr-

View changes

jekyll.gemspec Outdated
@@ -6,7 +6,7 @@ Gem::Specification.new do |s|
s.name = 'jekyll'
s.version = '1.1.2'
s.license = 'MIT'
s.date = '2013-07-25'
s.date = '2013-08-30'

This comment has been minimized.

@mattr-

mattr- Sep 1, 2013

Member

No need to revise this part of the gemspec. We don't update the spec (outside of the file lists) until release time.

This comment has been minimized.

@liufengyun

liufengyun Sep 2, 2013

Contributor

I've reverted the date.

@mattr-

View changes

lib/jekyll/site.rb Outdated
# Returns nothing
def read_data(dir)
base = File.join(self.source, dir)
unless File.directory?(base)

This comment has been minimized.

@mattr-

mattr- Sep 3, 2013

Member

This gives me a warning even if I don't have a _data directory. I don't want to see a warning if the directory doesn't exist.

This comment has been minimized.

@parkr

parkr Sep 3, 2013

Member

Ah, good point. That was my suggestion! You can remove it - my b.

@liufengyun

This comment has been minimized.

Contributor

liufengyun commented Oct 1, 2013

@mattr- I've rebased to jekyll/master locally, but the master is failing https://travis-ci.org/mojombo/jekyll. Does it matter?

@liufengyun

This comment has been minimized.

Contributor

liufengyun commented Oct 1, 2013

@mattr- I got it, you mean squash all my commits to a single commit? I'll do it right away.

@mattr-

This comment has been minimized.

Member

mattr- commented Oct 1, 2013

The test failures were already there from before so don't worry about those. As far as squashing goes, I don't have a preference one way or the other. Whatever you feel like doing. 😺

Autoload yaml files under _data directory
The jekyll engine will autoload all yaml files(ends with .yml or .yaml)
under _data. If there's a file members.yml under the directory, then user
can access contents of the file through site.members.
@liufengyun

This comment has been minimized.

Contributor

liufengyun commented Oct 1, 2013

It's done, @mattr-

mattr- added a commit that referenced this pull request Oct 1, 2013

@mattr- mattr- merged commit cb4d155 into jekyll:master Oct 1, 2013

1 check failed

default The Travis CI build failed
Details

mattr- added a commit that referenced this pull request Oct 1, 2013

@mattr-

This comment has been minimized.

Member

mattr- commented Oct 1, 2013

🎉 Awesome! 🎉

Thank you so much for your work on this!

So happy to finally 🚢 this.

@liufengyun

This comment has been minimized.

Contributor

liufengyun commented Oct 1, 2013

Thank you all for help review and improve the pull request.

㊗️ Jekyll has finally entered the data era!

@parkr

This comment has been minimized.

Member

parkr commented Oct 1, 2013

I may or may not be 😢 with happiness. Great work @liufengyun!

@benbalter

This comment has been minimized.

Contributor

benbalter commented Oct 1, 2013

This is a game change. Great stuff. 🍻 🍸 🍷 🌴 🎉 🎆

@cobyism

This comment has been minimized.

Member

cobyism commented Oct 8, 2013

💖

@localheinz

This comment has been minimized.

Contributor

localheinz commented Nov 3, 2013

@liufengyun

Has support for sub-directories been added yet?

@liufengyun

This comment has been minimized.

Contributor

liufengyun commented Nov 4, 2013

@localheinz , sub-directories are not supported yet. Currently I think _data/ without sub-directory can satisfy most of the data requirements in site generation.

@localheinz

This comment has been minimized.

Contributor

localheinz commented Nov 4, 2013

@liufengyun

Was hoping now was already later.

@liufengyun

This comment has been minimized.

Contributor

liufengyun commented Nov 5, 2013

@localheinz The _data feature is just officially released. Let's wait and see how it's received and used. If there's a strong demand for sub-directories support in real-world usage, I think a pull request will be welcomed.

@benbalter

This comment has been minimized.

Contributor

benbalter commented Nov 5, 2013

A big 👍 for sub-folder support in the next point release if it's a light lift (and there's community adoption of the feature). I'd call it a core use case for any site with more than one data type.

If I have a site with just one data type, e.g., cars, it's fine. {% for car in site.data %}. I can safely assume anything in the _data folder is a car.

Now Imagine I have cars and trucks, which I place in the _data folder. If sub foldered, I could do {% for truck in site.data.trucks %} to iterate through trucks. Without sub folders, it's likely more like {% for vehicle in site.data %}{% if vehicle.type == "truck" %}... (which also requires storing the type value in each vehicle, where as before it was simply foldered.

Alternatively, could I name my yaml file trucks.pickup.yml now to have it parsed into site.data.trucks.pickup?

@liufengyun

This comment has been minimized.

Contributor

liufengyun commented Nov 5, 2013

@benbalter I think there's a misunderstanding about the feature.

If you've a file trucks.yaml under _data/, then you can access it with {% for truck in site.data.trucks %}. No sub-directory required in this case. So you can have members.yaml, projects.yaml, products.yaml under _data, and access them respectively as site.data.members, site.data.projects and site.data.products.

Currently, if you put a file named trucks.pickup.yml under _data, then it's hooked to site.data.truckspickupyml. Points and white spaces are removed.

@routelastresort

This comment has been minimized.

routelastresort commented Nov 7, 2013

@benbalter: When will gh-pages support this feature? I just made a new site with 1.3, then realized that 1.2 (what the github-pages gem is at) had issues (rbenv shim version points to mine, bundle exec jekyll serve uses the gh-pages version, and production github.io as well). Obviously, my site renders fine with 1.3, but will I wait days/weeks/months for Github's version to catch up? Thanks, btw, hehe 👍

@swanson

This comment has been minimized.

Contributor

swanson commented Nov 7, 2013

@parkr

This comment has been minimized.

Member

parkr commented Nov 7, 2013

@routelastresort I was told by @benbalter that it's been pushed to the production servers.

@routelastresort

This comment has been minimized.

routelastresort commented Nov 7, 2013

@parkr, @swanson, @benbalter - thanks! 👍 I noticed it when I pushed/visited today. You guys rock!

@localheinz

This comment has been minimized.

Contributor

localheinz commented Nov 8, 2013

❤️

@semireg

This comment has been minimized.

semireg commented Nov 15, 2013

Thank you. This has allowed me to design two-tier navigation without static front-matter. One step closer to dynamic front-matter.

https://gist.github.com/caylanlarson/7493380

semireg industries - labelscope

@Wolfr

This comment has been minimized.

Wolfr commented Dec 10, 2013

Just made a repo to illustrate some cases: https://github.com/Wolfr/jekyll-data-test

Feel free to fork or PR I feel more concrete examples will help people new to Jekyll and/or YAML and/or Liquid make sense of it.

@TuckerWhitehouse TuckerWhitehouse referenced this pull request Oct 19, 2014

Closed

Remote Data #3015

@jekyll jekyll locked and limited conversation to collaborators Feb 27, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.