Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trim config option? #241

Closed
amit777 opened this issue Jul 26, 2015 · 17 comments
Closed

trim config option? #241

amit777 opened this issue Jul 26, 2015 · 17 comments

Comments

@amit777
Copy link

amit777 commented Jul 26, 2015

Hi, is there a simple way to trim whitespace from the ends of the fieldname as well as data values? This library has everything that I want so I'm guessing I'm just missing something simple.

thanks!

@mholt
Copy link
Owner

mholt commented Jul 26, 2015

No, just do it yourself. :) str.trim()

@mholt mholt closed this as completed Jul 26, 2015
@bthorben
Copy link

Doing it yourself is actually pretty unhandy. You have a neat option to output the data as dicts with field: value. If field always contains a space in the beginning (e.g. " password" or " username") it's something not so easy to correct.

I am writing this because I was about to use papa to parse files that looks like:

username, email, password
test-1, 1@example.com, Password1
test-2, 2@example.com, Password1
test-3, 3@example.com, Password1
test-4, 4@example.com, Password1
test-5, 5@example.com, Password1
test-6, 6@example.com, Password1
test-7, 7@example.com, Password1
test-8, 8@example.com, Password1
test-9, 9@example.com, Password1

@mholt
Copy link
Owner

mholt commented Sep 21, 2015

Why is it not so easy to correct? Instead of results.data[i].password you do results.data[i].password.trim()

But you have to assume that the password doesn't have spaces on the edge. Could be a dangerous assumption. That's why I leave it up to the user to do. I'm not gonna go there.

@bluej100
Copy link
Contributor

I think he's saying he would have to do results.data[i][" password"], which is a little gross.

@mholt
Copy link
Owner

mholt commented Sep 21, 2015

Oh, I see.

Unfortunately, the CSV spec specifically says: "Spaces are considered part of a field and should not be ignored." - if your CSV files are created with spaces after the commas, then the spaces are errors in the input and the generator needs to be fixed.

@bthorben
Copy link

Well, true, the spec says that, but you get all kind of wrong csv files all the time

@KamalAman
Copy link

While I think there should be an option for PapaParse so that you can enable trimming on the input, this problems is not too difficult to solve on your own:

Just pre-process you data with the following regex

"a ,b, c cc , d dd".replace(/\s*,\s*/g, ',')
//a,b,c cc,d dd

@aendra-rininsland
Copy link

I honestly think this should be reconsidered...

a. CSV files coming out of Excel quite often have superfluous spaces everywhere. Yes, that's valid for the format, but these are generally unintentional and break things further downstream.

b. The "preprocess with regex" approach suggested by @KamalAman modifies the input data, which is bad because it makes troubleshooting downstream errors more difficult.

c. Having to trim() every string coming out of PapaParse can require a lot of defensive programming.

I'm currently trying to use the step callback to do this, but all of my rows are now coming back as null for reasons I can't quite figure out...

@lrossy
Copy link

lrossy commented Sep 15, 2016

I used this guys trimObj() to solve this issue in my completeFn. Worked perfectly.

https://stackoverflow.com/questions/33510625/trim-white-spaces-in-both-object-key-and-value-recursively/33511005#33511005

@rsand27
Copy link

rsand27 commented Sep 15, 2017

I'm using Papa Parse (well Baby Parse for Node) to read local files from an upload folder. I had an issue with a space in front of a field that threw my app off. I get the data in Node using:

file = await BabyParse.parseFiles(`${ appDir }/${ req.file.path }`, {
  header: true,
  skipEmptyLines: true
});

To trim the white space and delete empty fields from each row object, I use this:

// Clean up the data
file.data.forEach(row => {
  for (let prop in row) {
    // Trim spaces from front and back 
    row[prop] = row[prop].trim();
    // Delete any empty fields
    if (row[prop] === '') {
      delete row[prop];
    }
  }
});

This returns the desired results for me before processing the data and saving it to MongoDB.

@pokoli
Copy link
Collaborator

pokoli commented Sep 18, 2017

Hi @rsand27, latests paparse version can be run also on Node, so I will recomend using PapaParse instead of BabyParse on Node.

If this does not work, please open a new issue.

@larryboymi
Copy link

What if you want to trim the parsed header? I just had a prepended \uFEFF sneak through in a header name that I could trim out with access to the header parsing function.

@ttfreeman
Copy link

@amit777 if you set {dynamicTyping: true} ,you shouldn't need to trime() white spaces. Papaparse will do it for you.

@mtmacdonald
Copy link

In case it helps anyone else, 4x version does have trimHeaders option (undocumented). And 5x version has transformHeader.

@ataft
Copy link

ataft commented Jan 29, 2021

Has anybody been able to get this to work for the data values, not just the header? When I have spaces after the commas, the values have a space and double quotes. I'm using version 5.3.0.

For example, even with dynamicTyping, this CSV:
`
"Country","Alpha-2 code","Alpha-3 code","Numeric code","Latitude (average)","Longitude (average)"

"Australia", "AU", "AUS", "36", "-27", "133"
`

Gives me these values:

Alpha-2 code: " "AF""
Alpha-3 code: " "AFG""
Country: "Afghanistan"
Latitude (average): " "33""
Longitude (average): " "65""
Numeric code: " "4""

@BilalIftikhar
Copy link

BilalIftikhar commented Sep 14, 2022

here is code .
beforeFirstChunk: function(chunk) {
var rows = chunk.trim().replace(/\s*,\s*/g, ',');
return rows;
},

@mislavmiocevic
Copy link

mislavmiocevic commented Dec 5, 2022

I do not know if this is relevant anymore, but there is 'transform' function that you can use in config which is executed on every item.
https://www.papaparse.com/docs#config - transform

import { parse } from 'papaparse';

const { data } = parse('A, B\n1, 2', {
    transform: (value) => value.trim()
});

console.log(data);

P.S. I do not know how it will handle larger datasets and will it be slow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests