Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV and TOML loading global functions #379

Merged
merged 21 commits into from Oct 18, 2018
Merged

CSV and TOML loading global functions #379

merged 21 commits into from Oct 18, 2018

Conversation

kellpossible
Copy link
Contributor

@kellpossible kellpossible commented Aug 22, 2018

This pull request adds the ability to load TOML files and CSV files into templates using these global functions: load_csv and load_toml.

The personal use case I have in mind for these is image galleries, I have folders containing images and an index CSV file with each image listed and information about each image. These features could be used for many many other purposes I'm sure.

TOML

TOML is loaded in exactly as the file is found in the same structure:

[section]
key = "value"

loaded as:

{
  "section": {
    "key": "value",
  },
}

CSV

CSV is loaded in as an object:

header1,header2
row1col1,row1col2
row2col1,row2col2

loaded as:

{
  "headers": ["header1","header2"],
  "records":[["row1col1","row1col2"],["row2col1","row2col2"]],
}

Future Ideas

In the future it would also be cool to add:

  • More options for the CSV parser
  • Markdown table shortcode/filter/function which can load from CSV file.

Feedback

I have a feeling a number of the match statements for converting the error types could be done in a less verbose manner, but I'm little new to Rust and still trying to figure out the error handling story. Feedback much appreciated!

@Freaky
Copy link
Contributor

Freaky commented Aug 22, 2018

    if !content_path.join(source.clone()).exists() {

Don't check if a file exists before trying to open/read it, just try to open/read it. That's why they can return errors :)

   let path = content_path.join(source.clone());

Lend it a reference rather than allocating a new string. join(&source).

        Err(_) => {
          return Err(format!("`load_toml`: Unable to open file {}", source.clone()).into())

Don't throw away errors. Unable to open file? Great, why?

Matching on Err, reformatting and then returning it, or unwrapping, is best done using map_err to translate the error, and using the question-mark operator.

   match file.read_to_string(&mut content_string) {

Instead of opening the file then calling read_to_string, you can just use std::fs::read_to_string.

These half-dozen or so lines can be replaced with something like:

let path = content_path.join(&source);
let content_string = read_to_string(path)
    .map_err(|e| format!("'load_toml': {} - {}", source, e))?;
        let mut headers_array: Vec<Value> = Vec::new();
        for h in hdrs {
            let value = Value::String(String::from(h));
            headers_array.push(value);
        }

map/collect are more idiomatic for building collections like this:

let headers_array = hdrs.iter().map(|v| Value::String(v.to_string())).collect();

@kellpossible
Copy link
Contributor Author

@Freaky thanks a lot for the feedback! I'll look into it tonight, and hopefully update my code :)

@kellpossible
Copy link
Contributor Author

@Freaky I've updated the code based on your feedback! :)

@Freaky
Copy link
Contributor

Freaky commented Aug 23, 2018

Much better :)

@Keats
Copy link
Collaborator

Keats commented Aug 24, 2018

That looks good thanks.
I was thinking of a function to load json data but didn't think of toml or csv.

What do you think of something more generic like:

# Will check the extension for the type
{{ load_data(path="something.toml") }}
# ... but we can force it
{{ load_data(path="something.toml", kind="toml") }}
{{ load_data(path="something.csv"") }}
{{ load_data(path="something.json"") }}

# AND
{{ load_data(url="https://api.twitter.com/1/statuses/oembed.json?url=https://twitter.com/20100Prouillet/status/1032189593322508288", kind="json") }}

The main idea being that we are able to query local file but also APIs (mostly JSON for those though), allowing for example to have a Twitter shortcode that works without having to load JS for it, tweets could be embedded at compile time.

@kellpossible
Copy link
Contributor Author

@Keats this sounds really good, however, will there be some confusion do you think due to the different resulting data structures for csv vs toml/json?

Maybe I'll start with making a local load_data method, and then expand it to pull from urls in a future pull request?

Also is there any security concern for the paths that the user can enter here, or are we assuming that if they are using this feature in their templates only (not available in markdown) then it should be okay?

@Keats
Copy link
Collaborator

Keats commented Aug 28, 2018

will there be some confusion do you think due to the different resulting data structures for csv vs toml/json?

I don't know, in the end the function will json-ify it through serde. Not a big deal for me but it can be reconsidered if people find it confusing.

Maybe I'll start with making a local load_data method, and then expand it to pull from urls in a future pull request?

Sounds good!

Also is there any security concern for the paths that the user can enter here, or are we assuming that if they are using this feature in their templates only (not available in markdown) then it should be okay?

I guess that depends where users would put these files:

  • in the templates folder?
  • in the content folder?
  • in the static folder?
  • in a random folder?

I think we should limit the paths to be from the directory containing the config.toml and below, so a user can't do load_data(path="~/.ssh/id_rsa") but they can still put data files where they want. The Sitehas abase_path` we can use for that.

Something else to consider is that Hugo has the concept of data folder (https://gohugo.io/templates/data-templates/#the-data-folder) where you can drop files in and Hugo automatically makes them available in the template, without having to load them explicitly. Does it sound like it would fix your need of loading toml/csv files? You would have to keep the CSVs in a separate folder than the images though, which could be an issue.

@kellpossible
Copy link
Contributor Author

I guess that depends where users would put these files:

  • in the templates folder?
  • in the content folder?
  • in the static folder?
  • in a random folder?

I'm thinking that for now these files would be either within the content folder or within the static folder.

For my personal use cases, I would like to put these files in the content directory, but I can also see myself perhaps putting something in the static directory also.

I think we should limit the paths to be from the directory containing the config.toml and below, so a user can't do load_data(path="~/.ssh/id_rsa") but they can still put data files where they want. The site has a base_path we can use for that.

I agree. Do the existing functions implement this limitation? (Resize image, etc). I'll do some research to try and find the best way to implement this.

I guess if the user wants to do something fancy and have access to files outside the project directory they can always use a symbolic link or a script to copy it in.

@Keats
Copy link
Collaborator

Keats commented Aug 30, 2018

Do the existing functions implement this limitation? (Resize image, etc). I'll do some research to try and find the best way to implement this.

I don't think so, now that you mention it...

What are you thoughts on a data folder (last paragraph of my previous post)?

@kellpossible
Copy link
Contributor Author

What are you thoughts on a data folder (last paragraph of my previous post)?

Ooops sorry yes I did forget to reply to that.

Something else to consider is that Hugo has the concept of data folder (https://gohugo.io/templates/data-templates/#the-data-folder) where you can drop files in and Hugo automatically makes them available in the template, without having to load them explicitly. Does it sound like it would fix your need of loading toml/csv files? You would have to keep the CSVs in a separate folder than the images though, which could be an issue.

Like you say, I'd like to keep the CSV in the same folder as the images. I think often people will have data associated with articles, like say a csv table they want to include with a shortcode, etc. For me personally I struggle to see the use of the data folder, but I guess if it's popular someone will raise an issue to request it?

On another note, I was thinking of also adding support for JSON loading. TOML, JSON and CSV covers most cases I can think of. Perhaps also support for pulling these from urls using http might be good? But I guess that will slow the build down considerably...

@Keats
Copy link
Collaborator

Keats commented Sep 2, 2018

Like you say, I'd like to keep the CSV in the same folder as the images. I think often people will have data associated with articles, like say a csv table they want to include with a shortcode, etc. For me personally I struggle to see the use of the data folder, but I guess if it's popular someone will raise an issue to request it?

I haven't use data folders myself so I was wondering the use. Maybe if you generate content from some JSON files and you can just drop them in the folder and you don't have to worry about looking for it in your templates?
I don't think it is really related to #365 but let's ping @paulcmal and @piedoom in case it intersects.

On another note, I was thinking of also adding support for JSON loading. TOML, JSON and CSV covers most cases I can think of. Perhaps also support for pulling these from urls using http might be good? But I guess that will slow the build down considerably...

JSON, CSV and TOML should be enough for everybody!

I think pulling them from http also make senses: in my example above {{ load_data(url="https://api.twitter.com/1/statuses/oembed.json?url=https://twitter.com/20100Prouillet/status/1032189593322508288", kind="json") }} is actually the way to load a tweet data from JSON. If you want to embed a tweet otherwise, you will need to insert the twitter JS SDK in the HTML and that's slowing down every pageview.
I think I would use this function more with http than with local files personally

@kellpossible
Copy link
Contributor Author

kellpossible commented Sep 12, 2018

Okay I've implemented this as a generic load_data function but I still need to:

  • limit the path to be within the content folder
  • implement json loading

http loading I think deserves its own issue/pull request, I'll make one when I'm finished

@kellpossible
Copy link
Contributor Author

@Keats it seems to me that restricting paths is actually a difficult problem. For my personal use case I sometimes symbolic link folders into my git repo which contain large files which I don't want to store there. Handling path restrictions with symbolic links is a difficult because all the standard library methods canonicalize the path and resolve the symbolic link, which makes it fall outside the directory.

example:
big files kept at: /home/kellpossible/files/blogfiles/
blog git repo at: /home/kellpossible/blog/

symbolic link to the blogfiles at: /home/kellpossible/blog/content/blogfiles/ -> /home/kellpossible/files/blogfiles/

Let's say I put a path to a file within blogfiles in load_data:
load_data("blogfiles/a_folder/../a_file.json")

I guess you could throw an exception on the use of any .. path elements to avoid needing to canonicalize the path, perhaps this would be a solution? Otherwise any attempt to canonicalize this path will result in a path which falls outside the content directory. There is this crate which may come in handy, but I haven't looked into it too much: https://docs.rs/path_abs/0.4.1/path_abs/

Either way, I think perhaps this change falls outside the scope of this PR, and could be implemented in the future for all user facing file accesses if we want, and if there is demand for this security feature.

If there's some agreement on that I'll create a new issue to track the idea

@kellpossible
Copy link
Contributor Author

I think this one's ready to merge

@kellpossible
Copy link
Contributor Author

There is some problem with the Cargo.lock file that got messed up in a merge, I've yet to work out 😢

Copy link
Collaborator

@Keats Keats left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks good to me! Don't worry about the folder restriction for now

components/templates/src/global_fns.rs Show resolved Hide resolved
components/templates/src/global_fns.rs Show resolved Hide resolved
let toml_content: toml::Value = toml::from_str(&content_string)
.map_err(|e| format!("'load_data': {} - {}", toml_path.to_str().unwrap(), e))?;

to_value(toml_content).map_err(|err| err.into())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it work with dates in the toml? I remember having issues with that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that not a problem with upstream toml support?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay I added a date to the test, see latest version. It seems that dates are parsed as:

"date": {
    "$__toml_private_datetime": "1979-05-27T07:32:00Z"
 }

Is it worth trying to extract the string, and remove the object for all dates matching the $__toml_private_datetime pattern?

/// tera value
fn load_json(json_path: &PathBuf) -> Result<Value> {
let content_string: String = read_file(json_path)
.map_err(|err: errors::Error| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that can be changed to

.chain_err(|| format!("`load_data`: error loading json file"."))

I think
Right now it seems to not display the original error

/// tera Value
fn load_toml(toml_path: &PathBuf) -> Result<Value> {
let content_string: String = read_file(toml_path)
.map_err(|err: errors::Error| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@kellpossible
Copy link
Contributor Author

No worries, I'll work on these changes soon then

@Keats Keats mentioned this pull request Oct 7, 2018
@Keats
Copy link
Collaborator

Keats commented Oct 10, 2018

@kellpossible sorry you will have more conflicts to fix :/

Maybe once this is in @RealOrangeOne can you do the remote fetching aspect?

@RealOrangeOne
Copy link
Contributor

RealOrangeOne commented Oct 10, 2018

Looking at the implementation, it shouldn't be too difficult to bolt on the remote-fetching aspect, likely using the API defined in #379 (comment). I think we should get the local-only implementation out first, then work on the remote section working too. It seems unnecessary to spend more time working on a second, somewhat unrelated, feature which happens to hit the same codepaths, when we could ship what we have, and work on it later.

(By ship, I mean merge into next, I see no reason why once this is stable, I can't start working on the remote straight away. I would do it now, but don't want to step on anyone's toes!)

I've created #476 to track the remote component of this PR, so this can remain simply as loading remote data files.

@kellpossible
Copy link
Contributor Author

hey everyone, sorry I ran into a few issues resolving the remaining issues on this one, Couldn't get it to compile, more error chaining confusion. I'll push the code that I've got at the moment and see if you guys have any ideas.

@kellpossible
Copy link
Contributor Author

kellpossible commented Oct 11, 2018

So, this is the error I was having:

error[E0599]: no method named `chain_err` found for type `std::result::Result<std::string::String, errors::Error>` in the current scope
   --> components\templates\src\global_fns.rs:299:10
    |
299 |         .chain_err(|| format!("`load_data`: error loading json file {}", json_path.to_str().unwrap().into()))?;
    |          ^^^^^^^^^
    |
    = help: items from traits can only be used if the trait is in scope
    = note: the following traits are implemented but not in scope, perhaps add a `use` for one of them:
            candidate #1: `use tera::errors::ResultExt;`
            candidate #2: `use global_fns::error_chain::ChainedError;`
            candidate #3: `use global_fns::error_chain::example_generated::inner::ResultExt;`
            candidate #4: `use global_fns::error_chain::example_generated::ResultExt;`
            and 2 others

error: aborting due to previous error

For more information about this error, try `rustc --explain E0599`.
error: Could not compile `templates`.

To learn more, run the command again with --verbose.

@kellpossible
Copy link
Contributor Author

kellpossible commented Oct 12, 2018

I resolved it with using map_err instead:

 let content_string: String = read_file(toml_path)
        .map_err(|e| format!("`load_data`: error {} loading json file {}", json_path.to_str().unwrap(), e))?;

Hopefully acceptable?

@Keats
Copy link
Collaborator

Keats commented Oct 12, 2018

@kellpossible the issue is that chain_err comes from a trait, ResultExt. The first candidate in the list is the right answer:

candidate #1: `use tera::errors::ResultExt;`

@kellpossible
Copy link
Contributor Author

@Keats I did try that and it didn't work for some reason, but I'll try again 🙂

@Keats
Copy link
Collaborator

Keats commented Oct 18, 2018

I'll merge this but something to consider is caching local files, if the function call happens in some template that is inherited many times, it will read + serialize the files many time which is a bit of a waste.

@Keats Keats merged commit 1baa775 into getzola:next Oct 18, 2018
@kellpossible
Copy link
Contributor Author

@Keats @Freaky thanks for all the feedback and help getting this through!

@RealOrangeOne good luck with the remote file loading, I'll keep an eye on your PR 🙂

@RealOrangeOne RealOrangeOne mentioned this pull request Oct 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants