Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple webpages for multiple datasets in the same repo? #51

Open
tiernanmartin opened this issue May 25, 2018 · 8 comments
Open

Multiple webpages for multiple datasets in the same repo? #51

tiernanmartin opened this issue May 25, 2018 · 8 comments
Milestone

Comments

@tiernanmartin
Copy link

Exciting new package 🎉

If I have multiple datasets in the same repository is it possible to build a webpage for each dataset? Or is this package only set up for the 1:1 dataset-to-repo use case?

@amoeba
Copy link
Collaborator

amoeba commented May 26, 2018

The sky's the limit, I suppose. Right now, dataspice is one repo <-> one page, where, where your repo can have one or more CSV's in it (more formats in the future). Figuring out what a "dataset" is is very hard so dataspice just ignores that and generates metadata for the files (much easier to determine what a file is).

Would you be willing to describe what your use case looks like a bit? If you have an example repo with a few datasets that'd be super helpful. It's certainly common enough to expect that others have this use case, so thanks!

@tiernanmartin
Copy link
Author

tiernanmartin commented May 26, 2018

Sure!

I was inspired by fivethirtyeight/data to create a repo with the datasets for many unrelated projects. The motivation for doing this is my desire to make it easier for one dataset to be used in multiple unrelated projects. dataspice seems like a great way to generate metadata for these data files, but I would need the ability to create a separate page for each file.

Here's an illustration of the repo architecture that I'd like to create (using two made up datasets):

    .
    ├── my-twitter-mentions/
    │   ├── data/
    │   │   ├── metadata/
    │   │   └── my-twitter-mentions-2017.csv 
    │   ├── docs/
    │   │   └── index.html
    │   └── README.md
    ├── product-user-survey-2018/
    │   ├── data/
    │   │   ├── metadata/
    │   │   └── product-user-survey-2018-results.csv 
    │   ├── docs/
    │   │   └── index.html
    │   └── README.md
    └── README.md

@amoeba
Copy link
Collaborator

amoeba commented May 26, 2018

That's nice and clean. Thanks.

We're currently using a convention of "your data are in ./data" and your example is only a minor tweak to that so I think it'd fit well.

We were discussing how to do the HTML pages over in #46 and Blogdown/Hugo came up as an option for supporting more complex repos (like your example) with indexes of datasets, search, tags, etc.

I'm wondering what a good API would be for getting the metadata from the user for each of the folders of data. Our current API is nice and simple because we're only supporting one folder. If we smushed all of them into a single set of metadata template CSVs I guess that'd be workable both for the Shiny apps and editing the CVSs manually. Not quite sure what the R function API would be but it's definitely possible.

I'll keep chewing on this. Thanks!

@cboettig
Copy link
Member

cboettig commented May 30, 2018

Exploring the look and strategy for this a bit with Hugo. My current sketch supports the following strategies:

  • Create a separate landing page for each dataset in content/ directory by pasting the dataspice.json into the header block (since JSON is valid YAML). add the shortcode:
{{% cards2.html %}}

to create a rich-card style layout. (I'll add some more layouts as alternate shortcodes).

This will create a landing page for each dataset, and the homepage will create a list view with preview cards for all the datasets on the website. See: https://cboettig.github.io/dataspice-web/ex2/ (source at: http://github.com/cboettig/dataspice-web/

Alternately, for a more one-off approach, hugo can also read JSON from the data/ dir, so one could drop the dataspice.json there and use some different shortcodes to embed the metadata cards onto any page.

I think I can turn this into a hugo theme pretty easily, mostly need to figure out how much CSS style gets hardwired in and how much I can manage to make easily customizable. thoughts ideas? Suggestions for other layout shortcodes? (I'll mock one up that looks more like the dataspice build_site() template where there's no submenu navigation on the cards).

@rubenarslan
Copy link

I think the term "dataset" is a bit ambiguous. You mean multiple tables, right? But any given study may e.g. one table to describe locations, one to describe animals, one to describe time series of behaviour of those animals. Therefore, I thought it would be good to give people the freedom to put several codebooks on the same page and intersperse descriptions of how they relate to one another. Using the rmd partial approach, they can also easily choose to put them on separate pages where that makes more sense.

@cboettig
Copy link
Member

@rubenarslan Thanks for the comments! Yes, I think dataset is somewhat intentionally ambiguous.
Google's page laying out the Schema.org/Dataset vocabulary on which dataspice is based puts it like this:

Here are some examples of what can qualify as a dataset:

  • A table or a CSV file with some data
  • An organized collection of tables
  • A file in a proprietary format that contains data
  • A collection of files that together constitute some meaningful dataset
  • A structured object with data in some other format that you might want to load into a special tool for processing
  • Images capturing data
  • Files relating to machine learning, such as trained parameters or neural network structure definitions
  • Anything that looks like a dataset to you

So yes, it could be multiple tables, and also things that are not tables. Note that the goal of dataspice is primarily to provide a convenient way of creating and managing a basic schema.org/Dataset description of data, rather than to create websites. Once you have these metadata descriptions in this convenient and standard JSON format, it's relatively easy to create websites with various layouts. dataspice includes some layouts out of the box, but I don't think it's our goal to have a one-size-fits-all-needs-equally-well layout; but rather a framework that could be easily adapted to "whatever looks like a dataset to you".

@rubenarslan
Copy link

Do you describe the dataspice high-level idea somewhere where I can read it?

@cboettig
Copy link
Member

@rubenarslan Great question! I think we're still consolidating our ideas to some extent, a blog-post in the works now should give a bit more soon.

Meanwhile, there's a little blurb at the top of the README that has the basic gist I think:

The goal of dataspice is to make it easier for researchers to create basic, lightweight and concise metadata files for their datasets. These basic files can then be used to:

  • make useful information available during analysis.
  • create a helpful dataset README webpage.
  • produce more complex metadata formats to aid dataset discovery.
  • Metadata fields are based on schema.org and other metadata standards.

Not sure if that helps. I think we're still figuring out how best to communicate these ideas to different audiences, so feedback is definitely most helpful!

@amoeba amoeba added this to the v1.1 milestone Jul 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants