Use scratch-flash CSV import instead of PapaParse #4582

towerofnix · 2019-02-23T03:35:02Z

Resolves

Resolves #4329.

Proposed Changes

Ports scratch-flash's CSV import code to scratch-gui.

Reason for Changes

Compatibility with Scratch 2.0 projects which export list data or expect to import data in a particular format.

Also, the code is Cleaner.™

Test Coverage

Not tested yet. While I've tested this with some basic file importing, I would greatly prefer to implement some unit tests for this, since the code is designed to be testable and this is the sort of utility function where confirming a particular input gives a particular output (i.e, tests, obviously) would be particularly useful.

paulkaplan

So part of the reason to use a module for CSV parsing is that it can get pretty complicated, with lots of edge cases. If we are going to do CSV parsing on our own, we need to add a lot of unit tests. Could you add unit tests covering some different CSV situations? Including the ones that were the reason for changing this in the first place? You can find examples of how to do it in the test/unit directory

towerofnix · 2019-03-07T21:21:08Z

@paulkaplan I've added unit tests that confirm that confirm all the behavior of the code in this PR. A couple of the tests, particularly "does not do anything with quotation marks", might seem sort of controversial (since it means we aren't following CSV specification), but they're there so that 3.0 behavior matches 2.0, so we can be sure that no related 2.0 projects are broken in 3.0. Happy to add more tests if there are other edge cases you'd like to be certain about.

paulkaplan · 2019-03-12T18:29:47Z

@towerofnix thanks for the patience on this. I brought this issue up with the whole team, and I think we should maintain the use of an official CSV parsing library, and make sure lists exported from scratch can be re-imported into scratch correctly. Here is the reasoning:

If people are getting data from spreadsheet apps like google sheets/excel, or other data sources that call themselves CSVs, they will use the real CSV encoding. If we have our own parser that doesn't implement the CSV spec, we will break all the other data sources out there.
We are really concerned about maintainability. We already have experience trying to maintain our own parsers for ill-defined specs (I'm looking at you, SVG files 👀) and it can be a big problem to maintain that over time. It is better to use whatever the most accepted standard is, which in this case is the well-defined CSV spec.
There is definitely an issue with the fact that you cannot export a list containing a quote and re-import that list into scratch. This is something you could actually help us with, could we use the CSV parser to encode a CSV file correctly when exporting (link to how Papaparse does this).

In summary, instead of changing this direction, lets export CSV files correctly. How does that sound?

towerofnix · 2019-03-12T19:11:04Z

@paulkaplan No worries, and thanks for the discussion and detailed response. As a whole, I agree - just some aside thoughts here.

I can only think of one situation where using the old CSV parser (i.e. the code in this PR) would be better, and that's for importing data exported from a program that targets 2.0's CSV parser, e.g. a tool a user wrote for their 2.0 project or Scratch 2.0 itself. (The same reasoning goes the other way around, e.g. exporting data which a user-created external tool expects to be in the non-standard / simple pseudo-CSV format.) But I think either of these cases is very rare, and we gain much more from exporting proper CSV - i.e, the ability to consistently import with papaparse and compatibility with external tools that expect standards-compliant CSV (which there are a lot of).

I actually didn't consider that exporting data properly would solve the problem just as well, and better, than changing the way data is imported, setting aside the case I described above. Definitely agreed that it'll serve a lot more use to keep papaparse and change exporting.

It looks like we could use Papa.unparse to do the conversion here; then we wouldn't have to depend on "guessing" the CSV spec for exporting either.

paulkaplan · 2019-03-13T14:32:21Z

@towerofnix thanks again for putting in the thinking on this issue. I see what you mean now when you said "compatibility issue", it is true that if we change the importing we will break the import/export cycle if lists were exported previously (I suspect this is the origin of the original issue, since the person is trying to import a file that looks like it was generated by scratch). This is definitely pointing out all the more reason to use standard file types and be as strict/universal as possible about generating them.

One other possible solution is that we could use the csv parser only for files called .csv, and use a simple line breaker for .txt files. We actually do not pretend to export csv files, we specifically call them .txt files (probably for exactly this reason, they are not csv files!) This would kind of solve all the problems. @kchadha what do you think of that?

towerofnix · 2019-03-13T16:54:25Z

One other possible solution is that we could use the csv parser only for files called .csv, and use a simple line breaker for .txt files.

In this regard, should we just use the code that this PR has, when importing .txt files? After all it's already implemented and unit-tested to maintain compatibility with 2.0 projects. It's a sizable chunk of code for a relatively obscure feature, but since it's all already written (and objectively the most compliant way to go, being the same code 2.0 uses), I'd think we may as well use it.

t-karcher · 2019-03-27T22:20:09Z

Will this change also fix #4694 ?

towerofnix · 2019-03-28T02:38:35Z

@t-karcher actually while working on this a few days ago (my code's not pushed yet) I was wondering if we should implement that. I agree that it would be useful and it wouldn't be too hard (most code is already present on my local repo).

@paulkaplan Thoughts on making entering zero (and(?) any invalid/out-of-range value) make it import the whole CSV file as though it were a plain text file, importing every line unmodified? That was the behavior in Scratch 2.0. (Also already implemented here for txt import, and on my local repo this is easy to add to CSV import too.)

kchadha · 2019-03-29T14:51:43Z

One other possible solution is that we could use the csv parser only for files called .csv, and use a simple line breaker for .txt files. We actually do not pretend to export csv files, we specifically call them .txt files (probably for exactly this reason, they are not csv files!) This would kind of solve all the problems. @kchadha what do you think of that?

Sorry for the late response! @paulkaplan, I think I'm okay with that solution. I think we generally try to avoid handling how we parse things using only the file extension, but I think in this case, 1) there isn't a better way to differentiate the two formats, and 2) using "the wrong" parser because the file extension is inaccurate shouldn't cause any huge problems.

towerofnix · 2019-03-29T17:10:57Z

@kchadha @paulkaplan I've pushed changes to make non-txt files import with Papaparse instead. (This also implements @t-karcher's suggestion of making a zero/invalid column index return the full original file, as 2.0 behaved, even with non-txt files.) The individual commit diff probably isn't too useful - you might be better off reviewing the full PR changes.

Exporting files to always save as proper CSV (with Papa.unparse) hasn't been done yet - that's still todo!

…-compat

towerofnix · 2019-03-29T17:38:33Z

Alright, pushed Papa.unparse.

thisandagain · 2019-04-17T18:38:19Z

@paulkaplan @kchadha Bump

picklesrus · 2021-01-11T20:52:14Z

We took a look at this today and decided the problem we want to solve is that Scratch is not being consistent with itself in importing and exporting of lists. We think solving that problem would probably take a different approach that starts on the export side I'm closing this in favor of solving #6565 instead.

Use scratch-flash CSV import instead of PapaParse

a9ce852

thisandagain assigned paulkaplan Feb 25, 2019

thisandagain added the pr - needs review label Feb 25, 2019

thisandagain added this to the March 2019 milestone Feb 25, 2019

paulkaplan suggested changes Mar 7, 2019

View reviewed changes

paulkaplan added pr - changes requested and removed pr - needs review labels Mar 7, 2019

Compatibility CSV import unit tests

4db4bb4

Use Papaparse to parse non-txt files

03292bb

towerofnix added 2 commits March 29, 2019 14:32

Use Papa.unparse to export CSV files

349b164

Merge branch 'develop' of https://github.com/LLK/scratch-gui into csv…

ce40563

…-compat

thisandagain added pr - needs review and removed pr - changes requested labels Apr 17, 2019

thisandagain modified the milestones: March 2019, April 2019 Apr 17, 2019

towerofnix mentioned this pull request Apr 27, 2019

Single character list export and import bug scratchfoundation/scratch-vm#2155

Closed

chrisgarrity modified the milestones: April 2019, Schedule Review Jun 8, 2020

chrisgarrity unassigned paulkaplan Jun 8, 2020

chrisgarrity removed this from the Schedule Review milestone Jun 25, 2020

chrisgarrity added priority 4 and removed pr - needs review labels Jul 24, 2020

chrisgarrity added this to the September 2020 milestone Jul 24, 2020

chrisgarrity assigned paulkaplan Jul 24, 2020

chrisgarrity unassigned paulkaplan Nov 6, 2020

picklesrus mentioned this pull request Jan 11, 2021

Exporting a list and re-importing it doesn't always result in the same list #6565

Open

picklesrus closed this Jan 11, 2021

Use scratch-flash CSV import instead of PapaParse #4582

Use scratch-flash CSV import instead of PapaParse #4582

Uh oh!

Conversation

towerofnix commented Feb 23, 2019

Resolves

Proposed Changes

Reason for Changes

Test Coverage

Uh oh!

paulkaplan left a comment

Choose a reason for hiding this comment

Uh oh!

towerofnix commented Mar 7, 2019

Uh oh!

paulkaplan commented Mar 12, 2019

Uh oh!

towerofnix commented Mar 12, 2019

Uh oh!

paulkaplan commented Mar 13, 2019

Uh oh!

towerofnix commented Mar 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

t-karcher commented Mar 27, 2019

Uh oh!

towerofnix commented Mar 28, 2019

Uh oh!

kchadha commented Mar 29, 2019

Uh oh!

towerofnix commented Mar 29, 2019

Uh oh!

towerofnix commented Mar 29, 2019

Uh oh!

thisandagain commented Apr 17, 2019

Uh oh!

picklesrus commented Jan 11, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

towerofnix commented Mar 13, 2019 •

edited

Loading