Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protection against CSV injection attacks #793

Closed
wcerfgba opened this issue May 11, 2020 · 15 comments
Closed

Protection against CSV injection attacks #793

wcerfgba opened this issue May 11, 2020 · 15 comments

Comments

@wcerfgba
Copy link
Contributor

When importing a CSV file, Microsoft Excel and LibreOffice Calc will both interpret cells beginning with a = as formulae, which can lead to attacks that can result in data exfiltration or arbitrary command execution. [1] This is easily remedied by prefixing cells that begin with =, +, - or @ with a ' in order to suppress automatic interpretation of formulae by these softwares. [2]

I would like to propose an option escapeFormulae for Papa.unparse to provide this prefixing behaviour.

Thanks!

[1] https://owasp.org/www-community/attacks/CSV_Injection
[2] https://www.contextis.com/en/blog/comma-separated-vulnerabilities

@pokoli
Copy link
Collaborator

pokoli commented May 12, 2020

That make sense for me.

I'm wondering if we should activate this option by default or not.

@wcerfgba
Copy link
Contributor Author

I'm wondering if we should activate this option by default or not.

I guess this is dependent on if you have users who currently rely on formulae interpretation and for whom this would break on an upgrade. My intuition is that this would be a minority of users and that for most use cases, it would be better to be safe by default, and have escapeFormulae default to true.

@pokoli
Copy link
Collaborator

pokoli commented May 14, 2020

Having a safer default sounds good to me. Then we just need to have a proper documentation to explain why the should should be desactivated and include a warning about the it's safety risks.

So we just need an implmentation for it.

@dylanlingelbach
Copy link

Having a safer default sounds good to me

I'm sorry but this doesn't make sense to me. papaparse is a CSV parser. By default modifying the contents of a CSV file that may or may not ever get processed by Excel/LibreOffice seems like very odd behavior.

This bug is reported as a high severity vulnerability which also doesn't make sense to me - by that logic every text editor has the same vulnerability

@pokoli
Copy link
Collaborator

pokoli commented May 15, 2020

@dylanlingelbach not all contents are modified but only the ones that are suceptible of having a formula, just the ones that start with = which from my understanding should not me the bast majority.

But I agree that this should not be treated as high severty (at least on Papaparse) because:

  • The file may not be openend by a SpreadSheet software, so this users are not afected.
  • There is no evidence how the attacker can gain access to the input data to include the formulas and how to let the user execute it.

P.S: My main usage of papaparse is to generate files to be opened by a SpreadSheet software, so probably I'm a little bit biaged here. I will like to have more opinion on this.

@mholt @dboskovic what do you think it should be the defalut value of this new parameter?

@asafbiton
Copy link

asafbiton commented May 15, 2020

Hi there! 👋

My name is Asaf and I'm part of the Snyk Security Team. We have been tracking this issue for a few days now, and an advisory has been mistakenly published. I tend to agree with all the above arguments and do not believe there is a vuln within the context of papaparse.

I have therefore revoked this advisory from our database. I apologize for any inconvenience caused by this.

For further inquiries please don't hesitate to contact us at report@snyk.io or using the vulnerability disclosure form.

@pokoli
Copy link
Collaborator

pokoli commented May 15, 2020

@asafbiton Thank you so much!

@dylanlingelbach
Copy link

@asafbiton thank you!

@pokoli I think if any contents of the CSV file are modified by default that is odd (CSV quoting/escaping excluded)

@wcerfgba
Copy link
Contributor Author

Sorry I have been away from the thread for a few days. Would it be fair to say the consensus is that this option is a good idea, but it should be disabled by default? I am happy to write a patch some point this week if we are aligned on this. :)

@shawn-eary
Copy link

@dylanlingelbach

I'm sorry but this doesn't make sense to me. papaparse is a CSV parser. By default modifying the contents of a CSV file that may or may not ever get processed by Excel/LibreOffice seems like very odd behavior.

I agree many people don't give a rip if a file processed by papaparse contains Macro, Command or SQL injection attacks, but some users might appreciate a PapaParse "plugin" that will alert them if the parsed file seems suspicious. If a probable attack is found, it might be useful for the PapaParse "plugin" to state the line and/or char number where the suspect text is.

@dylanlingelbach
Copy link

some users might appreciate a PapaParse "plugin" that will alert them if the parsed file seems suspicious

Agreed there! Modifying the data just shouldn't be default behavior for a CSV parser

@shawn-eary
Copy link

@dylanlingelbach

Agreed there! Modifying the data just shouldn't be default behavior for a CSV parser

How about overloading the Parse function to accept the following optional arguments?:

  1. User defined function that runs on each field and returns true if the user thinks the field contains suspicious patterns
  2. User defined function that runs a transform on each field

PapaParse could then provide standard functions that the user could chose from to pass in for Item 1 and Item 2 above. If the user passes in nothing, PapaParse would behave the say way it has for years.

NOTE: About Item 1 above, if a function is passed in for this parameter and that function returns true, PapaParse would then alert the user with an appropriate warning so she/he can begin mitigation.

@mholt
Copy link
Owner

mholt commented Jan 14, 2022

I think interpretation of the data is out of scope for a CSV parser and lends itself to more security problems, not less.

@shawn-eary
Copy link

I apologize, PapaParse already supports a transform function. I'm sorry I didn't see this in the manual... The already supported transform function not only allows you to remove risky values from CSV cells, but it also allows you to identify them if you don't mind using "global" variable style programming:

  var globalRisksFound = []; 
  function isRisky(v, cName) {
    // Grossly over conservative
    var isAlphaNumeric = v.match(/^ *[A-Za-z0-9 ]+ *$/)
    if (!isAlphaNumeric) {
      globalRisksFound.push(
        {
          col: cName, 
          val: v
        }
      );
    }
    // Don't do a transform. Just report the error
    return v;
  }

  let resultingJSON = PapaParse.parse(
    uploadedData,
    {delimiter: ",", header: true, transform: isRisky}
  );
  
  // The suspect values are now in the globalRisksFound array

I think a cool enhancement someday might be for the transform function to pass the record number via a third parameter.

@bean5
Copy link

bean5 commented Aug 10, 2022

Chiming in a bit late. It seems that an interpreter such as Excel, upon loading from a file specifically named CSV but which contains potential formula should be the one prompting about macros. Does it not already do this?

Having the ability to prevent such alerts is still a good idea. I like the fact that escapeFormulae is provided as an option and set to true by default. That maintains backwards-compatibility while providing the feature. Win-win.

Congrats to Snyk team for owning up to the false alarm (given the context), too. That being said, replacing the vuln listing with a 404 rather than a retraction is a bit confusing. Even still, they are doing good work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants