Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YAML.safe_load fails when a string contains a non-existent date #262

Open
Fjan opened this issue Dec 23, 2015 · 22 comments
Open

YAML.safe_load fails when a string contains a non-existent date #262

Fjan opened this issue Dec 23, 2015 · 22 comments

Comments

@Fjan
Copy link

Fjan commented Dec 23, 2015

YAML.safe_load will raise an exception when you try to load text that happens to contain a sequence of numbers that looks like a date but is not:

s="2016-02-31"
YAML.safe_load(s.to_yaml)
# =>  Psych::DisallowedClass: Tried to load unspecified class: Date

Using YAML.load instead of safe_load works fine and text that contains a correct date works fine too. But this can be used to raise an exception on any application that uses YAML.safe_load on user provided text (accidentally or otherwise)

@tenderlove
Copy link
Member

I guess this is because Psych leans on the Date class to determine what is valid date or not. I'm not really keen on writing my own date validation logic. Do you have a suggestion for how to fix this?

@Fjan
Copy link
Author

Fjan commented Dec 23, 2015

I took a brief look at the code but I don't understand yet why a valid date will not raise and exception. So it's doing validation somewhere anyway.

A workaround is to simply allow the Date class. Another option would be use modified tokeniser for safe_load so it only recognises allowed classes and turns the rest into strings.

@tenderlove
Copy link
Member

I took a brief look at the code but I don't understand yet why a valid date will not raise and exception. So it's doing validation somewhere anyway.

I don't follow. A valid date will raise an exception with safe_load:

irb(main):003:0> Psych.safe_load '2013-01-01'
Psych::DisallowedClass: Tried to load unspecified class: Date
    from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/2.3.0/psych/class_loader.rb:97:in `find'
    from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/2.3.0/psych/class_loader.rb:28:in `load'
    from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/2.3.0/psych/class_loader.rb:39:in `block (2 levels) in <class:ClassLoader>'
    from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/2.3.0/psych/scalar_scanner.rb:70:in `tokenize'
    from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/2.3.0/psych/visitors/to_ruby.rb:60:in `deserialize'
    from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/2.3.0/psych/visitors/to_ruby.rb:123:in `visit_Psych_Nodes_Scalar'
    from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/2.3.0/psych/visitors/visitor.rb:16:in `visit'
    from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/2.3.0/psych/visitors/visitor.rb:6:in `accept'
    from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/2.3.0/psych/visitors/to_ruby.rb:32:in `accept'
    from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/2.3.0/psych/visitors/to_ruby.rb:311:in `visit_Psych_Nodes_Document'
    from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/2.3.0/psych/visitors/visitor.rb:16:in `visit'
    from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/2.3.0/psych/visitors/visitor.rb:6:in `accept'
    from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/2.3.0/psych/visitors/to_ruby.rb:32:in `accept'
    from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/2.3.0/psych.rb:302:in `safe_load'
    from (irb):3
    from /Users/aaron/.rbenv/versions/ruby-trunk/bin/irb:11:in `<main>'
irb(main):004:0>

A workaround is to simply allow the Date class. Another option would be use modified tokeniser for safe_load so it only recognises allowed classes and turns the rest into strings.

Both of these are major changes. I wouldn't be willing to do that in a bugfix release. I think returning strings would be fine, but I'm definitely not in favor of allowing Dates by default.

@Fjan
Copy link
Author

Fjan commented Dec 23, 2015

I mean this:

Psych.safe_load('2013-01-01'.to_yaml)  # => "2013-01-01" 
Psych.safe_load('2013-02-31'.to_yaml)  # Psych::DisallowedClass

So some date parsing is apparently already happening somewhere, I just haven't been able to find it in the code, perhaps we can fix it there. I agree that allowing Date as an extra class in a bug fix release would not be appropriate, but this is a good workaround for people who have a problem and want a quick fix:

Psych.safe_load('2013-02-31'.to_yaml,[Date])   # => "2013-02-31"

@tenderlove
Copy link
Member

When you do '2013-01-01'.to_yaml, Psych looks at the string and realizes that it is ambiguous, so the resulting YAML is quoted:

irb(main):002:0> require 'psych'
=> true
irb(main):003:0> '2013-01-01'.to_yaml
=> "--- '2013-01-01'\n"
irb(main):004:0>

Note the single quotes around the date string in the resulting YAML.

Since '2013-02-31' isn't a valid date, it considers this to not be an ambiguous value, so it doesn't add the quotes around it:

irb(main):004:0> '2013-02-31'.to_yaml
=> "--- 2013-02-31\n...\n"
irb(main):005:0>

The single quotes tell the parser "this is absolutely a string, do not check for other values". The second value isn't ambiguous when dumping the YAML, but is ambiguous when parsing. Maybe that helps?

@Fjan
Copy link
Author

Fjan commented Dec 23, 2015

Yes, I just realised that. So another avenue would be to fix .to_yaml to always quote strings that look like a date, that's probably much easier to do.

Perhaps we should have a .to_safe_yaml that can be used on user supplied input and can be relied upon to only create scalars. Sigh. But it's probably better to move to JSON (which I was hoping to avoid).

@lucascaton
Copy link

Any news on this?

@akostadinov
Copy link

[2] pry(main)> str = "tt: 2012-09-24T13:00:00"
=> "tt: 2012-09-24T13:00:00"
[7] pry(main)> YAML.safe_load(str)
Psych::DisallowedClass: Tried to load unspecified class: Time
from /usr/share/ruby/psych/class_loader.rb:97:in `find'

What is the problem here?

What I'm really after is stop Psych from parsing the date and then dumping back to YAML in a different format. I want to read the YAML, change some values and dump back to YAML. The problem is that once the date is parsed, then dumping back to YAML results in

---
tt: 2012-09-24 16:00:00.000000000 +03:00

This is why I tried whether safe_load will help. It is same problem as described here http://stackoverflow.com/questions/32730275

@najamelan
Copy link

I just got hit by this after the hashie library updated and they now call safe_load. I would really like to request that this method just returns strings for anything that is deemed unsafe to be parsed.

The difference is that on the end of the chain I can work with sanitized data. If I really want a Date object I can do the conversion myself. Or I can use it as a string, but if reading in a user supplied configuration file leads to a backtrace, than that's it. Game over. Epic fail. Given that a library like hashi is calling this means I have no control over the call to load vs safe_load, and even if I did, I would prefer to be safe and decide how I validate date strings before converting them, but I don't want it to become impossible to read in a perfectly valid yaml file.

My program doesn't need to have values automagically converted, but asking every user of my library for the eternity to never ever put an unquoted date in a yaml file is impossible.

@Soleone
Copy link

Soleone commented May 30, 2019

We are running into this issue as well. A user provided String that is attempted to be serialized and put into the database is 0000-00-00:

require 'psych'

original_valid_content = "---\n- >-\n  test\n- >-\n  1976-01-02"
original_invalid_content = "---\n- >-\n  test\n- >-\n  0000-00-00"

# Works
deserialized_content = Psych.safe_load(original_valid_content)
new_serialized_content = Psych.dump(deserialized_content)
# => "---\n- test\n- '1976-01-02'\n"
new_deserialized_content = Psych.safe_load(new_serialized_content)

# Breaks:
deserialized_content = Psych.safe_load(original_invalid_content)
new_serialized_content = Psych.dump(deserialized_content)
# => "---\n- test\n- 0000-00-00\n"
new_deserialized_content = Psych.safe_load(new_serialized_content)
# => Psych::DisallowedClass: Tried to load unspecified class: Date

@Soleone
Copy link

Soleone commented May 30, 2019

Looking at tenderlove's comment here maybe it might make most sense to be more aggressive with quoting on dump?

Ensure that a string that is not a valid date like 0000-00-00 gets quoted and treat it as "ambiguous". What would be a drawback of that type of solution?

@tenderlove
Copy link
Member

Ensure that a string that is not a valid date like 0000-00-00 gets quoted and treat it as "ambiguous". What would be a drawback of that type of solution?

Honestly, I can't think of any drawback. I'd merge a commit that does that.

On a different subject, maybe in the future we should add a safe_dump. I want to guarantee that objects can round trip through dump and load, but I'm not sure if we can make the same guarantee about dump -> safe_load. This case seems like we can though.

@stiller-leser
Copy link

Is there any way around the issue? A date notation like 20190101 works fine, however I do need 2019-01-01...

@danielrehner
Copy link

If you want to make sure a YAML value is always treated like a string, you can cast it explicitly by prepending !!str .
Example:

pry(main)> YAML.safe_load("!!str 2019-01-01")
=> "2019-01-01"

@dometto
Copy link

dometto commented Jun 11, 2020

Can anyone explain what is the problem with @najamelan's suggestion?

I would really like to request that this method just returns strings for anything that is deemed unsafe to be parsed.

Users can put quotes around dates themselves, but in many cases that may prevent interoperability between Ruby/Psych-based apps (in my case gollum) and other applications that parse YAML, when the latter do support the parsing of dates. If Psych just returned a string for anything that looks like a date, the user wouldn't need to worry about how to specify their data.

@MatzFan
Copy link

MatzFan commented Feb 5, 2021

ping

@ronaldtse
Copy link

ronaldtse commented Jan 13, 2022

Users can put quotes around dates themselves, but in many cases that may prevent interoperability between Ruby/Psych-based apps

Fully agree with @dometto.

The YAML spec does support "date" natively: https://yaml.org/spec/1.2.2/

Example 2.22 Timestamps

date: 2002-12-14

Example 2.23 Various Explicit Tags

not-date: !!str 2002-04-28

It is clear that quoting or escaping the date is not the correct solution (according to the spec).

Is there a plan to resolve this issue? @najamelan's suggestion of "just returns strings for anything that is deemed unsafe to be parsed." seems reasonable as it allows for the user to handle any "unsafe" cases.

Thanks!

@perlpunk
Copy link

Just a comment about the 1.2.2 spec and the timestamp example: None of the recommended schemas in YAML 1.2 provide a timestamp tag anymore, and the example 2.22 is a leftover that should probably have been removed.
I created an issue: yaml/yaml-spec#268

@ronaldtse
Copy link

Thanks @perlpunk for clarifying the spec -- is the YAML way for handling a date/time now a string?

@perlpunk
Copy link

@ronaldtse yes, in YAML 1.2 it would be loaded as a string, and the app can turn it into a date.
Or the specific YAML loader can add a custom tag resolver the same way it was working in 1.1.
In 1.1 there was no standard set of tags to be implemented, so some implement merge, date, binary, and some didn't.
In 1.2 there is a recommended Core schema that every YAML library should implement. That will hopefully increase interoperability. But it has less tags than the complete 1.1 tag repository.

nahteb pushed a commit to co-cddo/api-catalogue that referenced this issue May 5, 2022
This updates both the Dockerfile and .ruby-version to Ruby 3. The only breaking
change for us is an issue with Middleman, where dates in the frontmatter cause
the following error: `YAML Exception parsing
api-catalogue/source/Borders/index.html.md: Tried to load unspecified class:
Date`

This is an issue with an underlying library:
ruby/psych#262

It's fixed on Middleman's main branch, but there is no release for it yet and
it's not clear if there will be one. We can work around this and use strings
for dates.

---
updated-dependencies:
- dependency-name: ruby
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
nahteb pushed a commit to co-cddo/api-catalogue that referenced this issue May 5, 2022
This updates both the Dockerfile and .ruby-version to Ruby 3. The only breaking
change for us is an issue with Middleman, where dates in the frontmatter cause
the following error: `YAML Exception parsing
api-catalogue/source/Borders/index.html.md: Tried to load unspecified class:
Date`

This is an issue with an underlying library:
ruby/psych#262

It's fixed on Middleman's main branch, but there is no release for it yet and
it's not clear if there will be one. We can work around this and use strings
for dates.

---
updated-dependencies:
- dependency-name: ruby
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
@josephholsten
Copy link

Was dealing with a similar date parsing issue in js-yaml: nodeca/js-yaml#477, and I'm sorry to disagree with @ronaldtse, but the spec is definitely impossibly ambiguous about those timestamp examples and we can't simply use RFC 3339 (or even a faithfully minimal extension of it) to support those examples.

To quote myself:

Just found this and I've got to say, the spec is way to ambiguous on its definition of timestamps. YAML 1.2 §10.1. Failsafe Schema doesn't actually provide any guidance about what to do with Example 2.22 Timestamps even though it's implied that it shouldn't need an explicit tag. As an editor of other specs, I'd call this a spec bug, not just an implementation one. So congrats, you're both right!

I'm inclined to say we should defer to RFC 3339 Date and Time on the Internet: Timestamps §5.6 Internet Date/Time Format, which covers the canonical, iso8601 and date examples, but I really don't know what to do about spaced: 2001-12-14 21:59:43.10 -5. If it were me, I'd say allowing space would go from RFC 3339's:

   time-numoffset  = ("+" / "-") time-hour ":" time-minute
   time-offset     = "Z" / time-numoffset

   partial-time    = time-hour ":" time-minute ":" time-second
                     [time-secfrac]
   full-date       = date-fullyear "-" date-month "-" date-mday
   full-time       = partial-time time-offset

   date-time       = full-date "T" full-time

to instead have:

   full-time       = partial-time [" "] time-offset
   date-time       = full-date ("T" / "t" / " ") full-time

But then we don't either allow 2019-03-13 20:18:42 +0000 (no colon) or 2001-12-14 21:59:43.10 -5 (not even remotely close to a time-offset).

I think that a well specified standard would require not just that the date-time match the ABNF, but that it be a valid date. If we deferred to RFC3339, it says:

    date-mday       = 2DIGIT  ; 01-28, 01-29, 01-30, 01-31 based on
                             ; month/year

Which pretty clearly implies 2016-02-31 is not an RFC3339 date.

Personally, I think the horses have left the barn and the spec should suggest a reasonable middle ground. Most YAML parsers have date parsing, it's acting differently everywhere, both inbound and outbound. We create standards to avoid this (well, in theory ;-).

@anke1460
Copy link

anke1460 commented Dec 6, 2022

Psych::ClassLoader::ALLOWED_PSYCH_CLASSES = [ Time ]

module Psych
  class ClassLoader
    ALLOWED_PSYCH_CLASSES = [] unless defined? ALLOWED_PSYCH_CLASSES
    class Restricted < ClassLoader
      def initialize classes, symbols
        @classes = classes + Psych::ClassLoader::ALLOWED_PSYCH_CLASSES.map(&:to_s)
        @symbols = symbols
        super()
      end
    end
  end
end 

It works for me http://sundivenetworks.com/archive/2021/tried-to-load-unspecified-class-time-psych-disallowedclass.html

reggieb pushed a commit to co-cddo/api-catalogue that referenced this issue Apr 11, 2023
This updates both the Dockerfile and .ruby-version to Ruby 3. The only breaking
change for us is an issue with Middleman, where dates in the frontmatter cause
the following error: `YAML Exception parsing
api-catalogue/source/Borders/index.html.md: Tried to load unspecified class:
Date`

This is an issue with an underlying library:
ruby/psych#262

It's fixed on Middleman's main branch, but there is no release for it yet and
it's not clear if there will be one. We can work around this and use strings
for dates.

---
updated-dependencies:
- dependency-name: ruby
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests