Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for readCSV mismatch between Browser and Node #443

Merged
merged 2 commits into from Apr 20, 2022

Conversation

dcrescim
Copy link
Contributor

There is a mismatch between the readCSV calls for Browser and Web. I have a CSV located at https://scikitjs.org/data/boston.csv, which has 506 rows (it's a famous Boston housing dataset). When I read it with danfojs-node with the following code

let dfd = require("danfojs-node")
dfd.readCSV("https://scikitjs.org/data/boston.csv").then((df) => console.log(df.shape))

it correctly tells me that I have 506 rows, and 14 columns. But when I do the same thing with the browser build, example index.html file as follows.

<!DOCTYPE html>
<html>
  <body>
    Open console to see result.
    <div>We print the shape of the Boston csv located <a href="https://scikitjs.org/data/boston.csv">here</a></div>
    <br />
    <div>
      This CSV has 506 rows and 14 columns, but the web version thinks that it has 507 rows because it thinks there's an empty line
      at the end of the file and adds that to the dataframe. The node version doesn't do that. An easy solution is to just pass <pre>{skipEmptyLines: "greedy"}</pre> 
      to Papaparse and it will filter this erroneous row out.
    </div>    
    <script src="https://cdn.jsdelivr.net/npm/danfojs@1.1.0/lib/bundle.js"></script>
    <script>
      dfd.readCSV("https://scikitjs.org/data/boston.csv").then((df) => {
        console.log(`The shape of the fetched boston dataset is [${df.shape}] but it should be [506,14]`)
      })
    </script>
  </body>
</html>

then it says there are 507 rows. After some digging it looks like the dfd.readCSV has an extra "empty" line at the end of the dataframe. An easy fix for this is just to set another default parameter to the Papaparse config options skipEmptyLines: greedy. The docs for that option are here https://www.papaparse.com/docs and I think it would be a good addition to the default options that we pass to Papaparse.

…o deal with irregularities between the node and web csv read
…erstand that skipEmptyLines is indeed an option
@risenW risenW self-requested a review April 20, 2022 20:04
@risenW risenW merged commit 39c5103 into javascriptdata:dev Apr 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants