Fix for readCSV mismatch between Browser and Node #443

dcrescim · 2022-04-19T06:36:07Z

There is a mismatch between the readCSV calls for Browser and Web. I have a CSV located at https://scikitjs.org/data/boston.csv, which has 506 rows (it's a famous Boston housing dataset). When I read it with danfojs-node with the following code

let dfd = require("danfojs-node")
dfd.readCSV("https://scikitjs.org/data/boston.csv").then((df) => console.log(df.shape))

it correctly tells me that I have 506 rows, and 14 columns. But when I do the same thing with the browser build, example index.html file as follows.

<!DOCTYPE html>
<html>
  <body>
    Open console to see result.
    <div>We print the shape of the Boston csv located <a href="https://scikitjs.org/data/boston.csv">here</a></div>
    <br />
    <div>
      This CSV has 506 rows and 14 columns, but the web version thinks that it has 507 rows because it thinks there's an empty line
      at the end of the file and adds that to the dataframe. The node version doesn't do that. An easy solution is to just pass <pre>{skipEmptyLines: "greedy"}</pre> 
      to Papaparse and it will filter this erroneous row out.
    </div>    
    <script src="https://cdn.jsdelivr.net/npm/danfojs@1.1.0/lib/bundle.js"></script>
    <script>
      dfd.readCSV("https://scikitjs.org/data/boston.csv").then((df) => {
        console.log(`The shape of the fetched boston dataset is [${df.shape}] but it should be [506,14]`)
      })
    </script>
  </body>
</html>

then it says there are 507 rows. After some digging it looks like the dfd.readCSV has an extra "empty" line at the end of the dataframe. An easy fix for this is just to set another default parameter to the Papaparse config options skipEmptyLines: greedy. The docs for that option are here https://www.papaparse.com/docs and I think it would be a good addition to the default options that we pass to Papaparse.

…o deal with irregularities between the node and web csv read

…erstand that skipEmptyLines is indeed an option

dcrescim added 2 commits April 18, 2022 23:26

feat: added skipEmptyLines: greedy to the papaparse default options t…

f3f59e5

…o deal with irregularities between the node and web csv read

feat: added any because I couldn't get the typescript compiler to und…

f6afd8d

…erstand that skipEmptyLines is indeed an option

risenW self-requested a review April 20, 2022 20:04

risenW approved these changes Apr 20, 2022

View reviewed changes

risenW merged commit 39c5103 into javascriptdata:dev Apr 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for readCSV mismatch between Browser and Node #443

Fix for readCSV mismatch between Browser and Node #443

dcrescim commented Apr 19, 2022

Fix for readCSV mismatch between Browser and Node #443

Fix for readCSV mismatch between Browser and Node #443

Conversation

dcrescim commented Apr 19, 2022