-
Notifications
You must be signed in to change notification settings - Fork 4k
Use scratch-flash CSV import instead of PapaParse #4582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
paulkaplan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So part of the reason to use a module for CSV parsing is that it can get pretty complicated, with lots of edge cases. If we are going to do CSV parsing on our own, we need to add a lot of unit tests. Could you add unit tests covering some different CSV situations? Including the ones that were the reason for changing this in the first place? You can find examples of how to do it in the test/unit directory
|
@paulkaplan I've added unit tests that confirm that confirm all the behavior of the code in this PR. A couple of the tests, particularly "does not do anything with quotation marks", might seem sort of controversial (since it means we aren't following CSV specification), but they're there so that 3.0 behavior matches 2.0, so we can be sure that no related 2.0 projects are broken in 3.0. Happy to add more tests if there are other edge cases you'd like to be certain about. |
|
@towerofnix thanks for the patience on this. I brought this issue up with the whole team, and I think we should maintain the use of an official CSV parsing library, and make sure lists exported from scratch can be re-imported into scratch correctly. Here is the reasoning:
In summary, instead of changing this direction, lets export CSV files correctly. How does that sound? |
|
@paulkaplan No worries, and thanks for the discussion and detailed response. As a whole, I agree - just some aside thoughts here. I can only think of one situation where using the old CSV parser (i.e. the code in this PR) would be better, and that's for importing data exported from a program that targets 2.0's CSV parser, e.g. a tool a user wrote for their 2.0 project or Scratch 2.0 itself. (The same reasoning goes the other way around, e.g. exporting data which a user-created external tool expects to be in the non-standard / simple pseudo-CSV format.) But I think either of these cases is very rare, and we gain much more from exporting proper CSV - i.e, the ability to consistently import with papaparse and compatibility with external tools that expect standards-compliant CSV (which there are a lot of). I actually didn't consider that exporting data properly would solve the problem just as well, and better, than changing the way data is imported, setting aside the case I described above. Definitely agreed that it'll serve a lot more use to keep papaparse and change exporting. It looks like we could use |
|
@towerofnix thanks again for putting in the thinking on this issue. I see what you mean now when you said "compatibility issue", it is true that if we change the importing we will break the import/export cycle if lists were exported previously (I suspect this is the origin of the original issue, since the person is trying to import a file that looks like it was generated by scratch). This is definitely pointing out all the more reason to use standard file types and be as strict/universal as possible about generating them. One other possible solution is that we could use the csv parser only for files called |
In this regard, should we just use the code that this PR has, when importing .txt files? After all it's already implemented and unit-tested to maintain compatibility with 2.0 projects. It's a sizable chunk of code for a relatively obscure feature, but since it's all already written (and objectively the most compliant way to go, being the same code 2.0 uses), I'd think we may as well use it. |
|
Will this change also fix #4694 ? |
|
@t-karcher actually while working on this a few days ago (my code's not pushed yet) I was wondering if we should implement that. I agree that it would be useful and it wouldn't be too hard (most code is already present on my local repo). @paulkaplan Thoughts on making entering zero (and(?) any invalid/out-of-range value) make it import the whole CSV file as though it were a plain text file, importing every line unmodified? That was the behavior in Scratch 2.0. (Also already implemented here for txt import, and on my local repo this is easy to add to CSV import too.) |
Sorry for the late response! @paulkaplan, I think I'm okay with that solution. I think we generally try to avoid handling how we parse things using only the file extension, but I think in this case, 1) there isn't a better way to differentiate the two formats, and 2) using "the wrong" parser because the file extension is inaccurate shouldn't cause any huge problems. |
|
@kchadha @paulkaplan I've pushed changes to make non-txt files import with Papaparse instead. (This also implements @t-karcher's suggestion of making a zero/invalid column index return the full original file, as 2.0 behaved, even with non-txt files.) The individual commit diff probably isn't too useful - you might be better off reviewing the full PR changes. Exporting files to always save as proper CSV (with |
|
Alright, pushed Papa.unparse. |
|
@paulkaplan @kchadha Bump |
|
We took a look at this today and decided the problem we want to solve is that Scratch is not being consistent with itself in importing and exporting of lists. We think solving that problem would probably take a different approach that starts on the export side I'm closing this in favor of solving #6565 instead. |
Resolves
Resolves #4329.
Proposed Changes
Ports scratch-flash's CSV import code to scratch-gui.
Reason for Changes
Compatibility with Scratch 2.0 projects which export list data or expect to import data in a particular format.
Also, the code is Cleaner.™
Test Coverage
Not tested yet. While I've tested this with some basic file importing, I would greatly prefer to implement some unit tests for this, since the code is designed to be testable and this is the sort of utility function where confirming a particular input gives a particular output (i.e, tests, obviously) would be particularly useful.