Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
[BUG] handle } in line delimited json #14391
Conversation
joshowen
changed the title from
[BUG] fix for quoted special characters to [BUG] fix for quoted special characters in line delimited json
Oct 10, 2016
joshowen
changed the title from
[BUG] fix for quoted special characters in line delimited json to [BUG] handle } in line delimited json
Oct 10, 2016
codecov-io
commented
Oct 11, 2016
•
Current coverage is 85.26% (diff: 100%)@@ master #14391 diff @@
==========================================
Files 140 140
Lines 50634 50639 +5
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43173 43178 +5
Misses 7461 7461
Partials 0 0
|
jreback
added Bug IO JSON
labels
Oct 11, 2016
|
can you add a benchmark for |
|
Not sure why that test failed,, the only change was a line of whitespace |
joshowen
added some commits
Oct 12, 2016
| + self.f = '__test__.msg' | ||
| + self.N = 100000 | ||
| + self.C = 5 | ||
| + self.index = date_range('20000101', periods=self.N, freq='H') |
joshowen
Oct 12, 2016
Contributor
Looks like self.N/self.C are repeated in most of these classes. Want me to clean them all up?
jreback
Oct 12, 2016
Contributor
sure that would be great (you can also make a common base class(s) if that helps as well)
jorisvandenbossche
Oct 12, 2016
Owner
@joshowen you can leave it here as is. I cleaned this up in another PR (@jreback yes I know, I should merge that ...)
jreback
Oct 12, 2016
Contributor
ok that's fine (though @joshowen make sure your example doesn't have dups as this is new code)
jorisvandenbossche
Oct 12, 2016
Owner
ah, yes, it is of course OK to remove the lines in this added code that you do not need for this benchmark
| + df = DataFrame([["foo}", "bar"], ['foo"', "bar"]], columns=['a', 'b']) | ||
| + result = df.to_json(orient="records", lines=True) | ||
| + expected = '{"a":"foo}","b":"bar"}\n{"a":"foo\\"","b":"bar"}' | ||
| + self.assertEqual(result, expected) |
jreback
Oct 12, 2016
Contributor
can you also round trip it and user assert_frame_equal on the result (in addition to the above test)
jorisvandenbossche
added this to the
0.19.1
milestone
Oct 12, 2016
|
@joshowen can you also post a run for this benchmark (versus previous); can also do it in a %timeit as well. Just checking if any perf issues. |
|
@jreback is there an easy way to do that? Or should I port the asv test to master and run/compare? |
|
you can run asv if you want, otherwise just do it in ipython (before and after), e.g. something like
and looking at this, we have a BIG perf hit when |
|
cc @aterrel |
|
@joshowen can you open another issue about the perf
something odd going on here |
joshowen
referenced
this pull request
Oct 12, 2016
Closed
PERF: to_json very slow with lines=True #14408
|
lgtm. @jorisvandenbossche |
jreback
referenced
this pull request
Oct 14, 2016
Closed
PERF: improved perf in .to_json when lines=True #14429
jreback
closed this
in 286b9b9
Oct 15, 2016
|
Thanks @joshowen ! |
tworec
added a commit
to RTBHOUSE/pandas
that referenced
this pull request
Oct 21, 2016
|
|
joshowen + tworec |
04023d2
|
joshowen commentedOct 10, 2016
•
edited
git diff upstream/master | flake8 --diff