ENH: Add JSON export option for DataFrame #631 #1226

Closed
wants to merge 114 commits into
from

Conversation

Projects
None yet
Contributor

Komnomnomnom commented May 11, 2012

No description provided.

@Komnomnomnom Komnomnomnom ENH: Add JSON export option for DataFrame #631
Bundle custom ujson lib for DataFrame and Series JSON export & import.
cb7c6ae
Contributor

takluyver commented May 11, 2012

I don't think we should be bundling a json encoder. There's a JSON module in Python since 2.6, and it's simple enough to install other implementations if the user needs e.g. more speed. Let's just have a little shim module that will try to import JSON APIs in order of preference.

Contributor

Komnomnomnom commented May 11, 2012

@takluyver there's a bit of a discussion already at #631, not sure if you're aware of it. I should have added more info in the description though, sorry. The main motivation for including this fork of ujson in pandas is it specifically works with pandas datatypes at a very low level (it is pure C) so it wouldn't be of any benefit to non-pandas users. If a user wants to use their own favourite JSON decoder they would obviously still be free to do so.

However I'll admit that high performance JSON serialisation is probably a minor requirement for most people's needs so I'm happy either way.

Contributor

takluyver commented May 11, 2012

Thanks, I wasn't aware of that. I'm still not wild on the approach - it seems like it will make for a heavier library and a bigger codebase to maintain. But Wes seems to be happy with the idea, so you don't have to worry about my objections ;-)

A couple of practical questions:

Your README has a lot of benchmarks, but I haven't taken the time to work out what they all mean. Can you summarise: what sort of improvement do we see from forking ujson, versus the best we could do with a stock build?

What sort of workloads do we envisage - is the bottleneck when you have one huge dataframe, or thousands of smaller ones?

Assuming ujson is still actively developed, how important and how easy will it be to get updates from upstream in the future?

Contributor

Komnomnomnom commented May 11, 2012

When working with numpy types:

  • encoding : no real advantage, sometimes even a disadvantage, the numpy to list conversion is very efficient.
  • decoding : about 1.5 to 2x the speed (when working with numeric types).

DataFrames:

  • encoding : depending on the desired format and the nature (shape, size) of the input a speedup of about 2x to 10x. Although there's cases where it's about 20x (e.g. 200x4 zeros).
  • decoding : again depending on the encoded format a speedup of about 2x to 3x is typical, but can be up to 20x.
  • for time series data encoding & decoding is usually better, or on a par with, encoding the corresponding Python basic type (i.e. a dict). For time series data with datetime indices I'm seeing about a 7x speedup for encoding DataFrames and about 3x for decoding. In the best case, where a transpose would otherwise be necessary, the speedup is about 15 to 20x.

And this is on top of ujson already being one of the speediest JSON libraries.

My specific use case is the need to share lots of Dataframes between Python processes (and other languages) with a mix of sizes. JSON was the natural choice for us because of portability, and we wanted to get the best performance out of it.

ujson is a relatively small and stable library. There has only been some minor patches in the last few months and the author seems pretty open to pull requests etc. I'll be merging any applicable upstream changes to my fork and I'd be happy to do the same for pandas if it ends up being integrated. I'm pretty familiar with the ujson code now (it's really only four files) and I'd likewise be happy to deal with any bugs / enhancements coming from pandas usage too. It's worth noting that the library is split in two parts, one being the language agnostic JSON encoder / decoder and the other being the Python bindings. I managed to keep the bulk of my changes limited to the Python bindings and even then they are new functions / new code rather than changes to existing functions. My point being upstream changes should be easy enough to merge.

Contributor

takluyver commented May 11, 2012

Thanks, that all sounds pretty reasonable, and I'm satisfied that this is worth doing.

Owner

wesm commented May 12, 2012

This is really excellent work, thanks so much for doing this. Yeah, I was initially a bit hesitant to bundle ujson, but given that more and more people want to do JS<->pandas integration, getting the best possible encoding/decoding performance and being able to access the NumPy arrays directly in the C encoder makes a lot of sense. We'll have to periodically pull in upstream changes from ujson, I guess.

just curious, how would this handle nested JSON? i.e.

j = {'person' : {'first_name' : 'Albert', 'last_name' : 'Einstein', 'occupation': {'job_title': 'Theoretical Physicist', 'institution' : 'Princeton University', 'accomplishments':['Brownian motion', 'Special Relativity', 'General Relativity']}}}

df = pandas.DataFrame(j)

df = ?

Contributor

Komnomnomnom commented May 12, 2012

From a performance standpoint not very well I'm afraid, the numpy with labels handling bombs out if it detects more than two levels of nesting. It probably could be tweaked to deal with this better but when decoding with complex types (i.e. objects and strings) a Python list is needed as an intermediary anyway, so I'm not sure there'd be any advantage.

The good news is the methods in DataFrame and Series fall back to standard decoding if the numpy version fails so it should still work as expected, albeit without the performance improvements.

Just tested it out to make sure

In [1]: from pandas import DataFrame
In [2]: j = {'person' : {'first_name' : 'Albert', 'last_name' : 'Einstein', 'occupation': {'job_title': 'Theoretical Physicist', 'institution' : 'Princeton University', 'accomplishments':['Brownian motion', 'Special Relativity', 'General Relativity']}}}

In [3]: df = DataFrame(j)

In [4]: df
Out[4]: 
<class 'pandas.core.frame.DataFrame'>
Index: 3 entries, first_name to occupation
Data columns:
person    3  non-null values
dtypes: object(1)

In [5]: df['person']['occupation']
Out[5]: 
{'accomplishments': ['Brownian motion',
  'Special Relativity',
  'General Relativity'],
 'institution': 'Princeton University',
 'job_title': 'Theoretical Physicist'}

In [6]: df.to_json()
Out[6]: '{"person":{"first_name":"Albert","last_name":"Einstein","occupation":{"accomplishments":["Brownian motion","Special Relativity","General Relativity"],"institution":"Princeton University","job_title":"Theoretical Physicist"}}}'

In [7]: json = df.to_json()

In [8]: DataFrame.from_json(json)
Out[8]: 
<class 'pandas.core.frame.DataFrame'>
Index: 3 entries, first_name to occupation
Data columns:
person    3  non-null values
dtypes: object(1)

In [9]: DataFrame.from_json(json)['person']['occupation']
Out[9]: 
{u'accomplishments': [u'Brownian motion',
  u'Special Relativity',
  u'General Relativity'],
 u'institution': u'Princeton University',
 u'job_title': u'Theoretical Physicist'}

Edit: I should have mentioned the comments above are related to decoding only. Encoding does not suffer the same issues and the performance improvements still apply.

Owner

wesm commented May 19, 2012

Hey @Komnomnomnom I started to see if I can merge this and am getting a segfault on my system (Python 2.7.2, NumPy 1.6.1, 64-bit Ubuntu).

The object returned by series.to_json(orient='columns') in _check_orient(series, "columns", dtype=dtype) from test_series.py, line 341, appears to be NULL (the gdb backtrace showed the segfault in from_json, but the data returned by to_json is malformed):

test_from_json_to_json (__main__.TestSeries) ... > /home/wesm/code/pandas/pandas/tests/test_series.py(324)_check_orient()
-> foo
(Pdb) type(series.to_json(orient=orient))
Segmentation fault

I can probably track down the problem, but I figure since you wrote the C code that you'd be more able if you can reproduce the error.

Contributor

Komnomnomnom commented May 19, 2012

Hi Wes, I just tried with my local clone of my fork and had no segmentation fault (all tests passed when I made my commit / pull request). I'll merge in the latest from pandas master and see what happens.

For the record I'm using Pyton 2.7.2, numpy 1.6.1 on 64-bit OSX.

Owner

wesm commented May 19, 2012

I put in print statements

  printf("%s\n", ret);

  printf("length: %d\n", strlen(ret));

and here's the output

{"2F4SMHsw4I":-1.4303216796,"nMi4KBCmg7":-1.32552412,"Molf5Ue3kF":-1.2705465829,"9kkHHlfXPA":-0.8877964843,"6E3ma1UHv7":-0.850191537,"2F5JdoFIqQ":-0.8013936673,"VzJclGGLsr":-0.7985248155,"cI4bkkV9MH":-0.7000873004,"TxS6mJ8UuP":-0.6864885751,"2jGSZe0rmF":-0.6708315768,"oHooxHeHqu":-0.6482430589,"HuqOm1mf57":-0.624890804,"bEWcPipOk9":-0.5669391204,"zpy7FQCGgp":-0.3383151716,"nYIL8VPVT3":-0.2663003599,"x0YmXOvJ49":-0.1767082308,"bJm3Pbjx14":-0.1510545428,"E51nrgW9Yt":0.0101299091,"QycwIANnTx":0.1575097137,"8wVdQ8RIdQ":0.2073634038,"90c5KPKyeS":0.2539122603,"eERFnAAd8k":0.3728367,"tZLEG6seKV":0.4332938883,"ehdTUcPK7A":0.457039038,"biYpVDeFiz":0.5021518808,"JlVXVA62Zz":0.5918523437,"2UTfjHGMEy":0.6413052158,"5VOyIV1TYs":0.6828158342,"WyNfVlEOK3":1.1809723971,"YrW1NS7fCX":1.3862224711}
length: 790

pandas/src/ujson/python/objToJSON.c: MARK(1490)
Segmentation fault

Somehow the result of PyString_FromString is malformed, it seems like maybe ret is not null-terminated? I suspect this is a red herring, though

Owner

wesm commented May 19, 2012

It looks like something is getting corrupted:

14:09 ~/code/pandas  (json-export)$ python pandas/tests/test_ujson.py 
nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$']
testArrayNumpyExcept (__main__.NumpyJSONTests) ... ok
testArrayNumpyLabelled (__main__.NumpyJSONTests) ... ok
testArrays (__main__.NumpyJSONTests) ... ok
testBool (__main__.NumpyJSONTests) ... ok
testBoolArray (__main__.NumpyJSONTests) ... ok
testFloat (__main__.NumpyJSONTests) ... ok
testFloatArray (__main__.NumpyJSONTests) ... ok
testFloatMax (__main__.NumpyJSONTests) ... ok
testInt (__main__.NumpyJSONTests) ... ok
testIntArray (__main__.NumpyJSONTests) ... ok
testIntMax (__main__.NumpyJSONTests) ... ok
testDataFrame (__main__.PandasJSONTests) ... > /home/wesm/code/pandas/pandas/tests/test_ujson.py(943)testDataFrame()
-> foo
(Pdb) u
> /home/wesm/epd/lib/python2.7/unittest/case.py(327)run()
-> testMethod()
(Pdb) d
> /home/wesm/code/pandas/pandas/tests/test_ujson.py(943)testDataFrame()
-> foo
(Pdb) l
938     class PandasJSONTests(TestCase):
939     
940         def testDataFrame(self):
941             df = DataFrame([[1,2,3], [4,5,6]], index=['a', 'b'], columns=['x', 'y', 'z'])
942     
943  ->         foo
944             # column indexed
945             outp = DataFrame(ujson.decode(ujson.encode(df)))
946             self.assertTrue((df == outp).values.all())
947             assert_array_equal(df.columns, outp.columns)
948             assert_array_equal(df.index, outp.index)
(Pdb) ujson.encode(df)
'{"x":{"a":1,"b":4},"y":{"a":2,"b":5},"z":{"a":3,"b":6}}'
(Pdb) print df
Segmentation fault
Owner

wesm commented May 19, 2012

It looks like the culprit must be NpyArr_encodeLabels. I'm not enough of a C guru to see what might be going wrong-- everything works here except encoding Series/DataFrame, and inside there is plenty of twiddling of bytes. Let me know if you manage to figure it out =/

Contributor

Komnomnomnom commented May 19, 2012

Hmm I've merged to the latest on pandas master, seeing some failed tests but still no segmentation faults, no corruption and those print statements work fine. I'm going to try in a Ubuntu VM see if I can get to the bottom of it.

wesm and others added some commits May 10, 2012

@wesm @Komnomnomnom wesm REF: working toward #1150, broke apart Cython module into generated _…
…algos extension
3af585e
@wesm @Komnomnomnom wesm REF: have got things mostly working for #1150 11f2c0d
@wesm @Komnomnomnom wesm BUG: more bug fixes, have to fix intraday frequencies still e9dee69
@wesm @Komnomnomnom wesm BUG: more intraday unit fixes 69d0baa
@wesm @Komnomnomnom wesm BUG: test suite passes, though negative ordinals broken 5485c2d
@wesm @Komnomnomnom wesm BUG: weekly and business daily unit support #1150 879779d
@wesm @Komnomnomnom wesm REF: remove period multipliers, close #1199 85fcd69
@mwiebe @Komnomnomnom mwiebe Remove dependencies on details of experimental numpy datetime64 ABI
Pandas was using some of the enums and structures exposed by its headers.
By creating its own local copies of these, it is possible to allow the
numpy ABI to be improved while in its experimental state.
b457ff8
@wesm @Komnomnomnom wesm ENH: move _ensure_{dtype} functions to Cython for speedup, close #1221 075f05e
@wesm @Komnomnomnom wesm DOC: doc fixes ee73df1
@wesm @Komnomnomnom wesm ENH: handle dict return values and vbench, close #823 9e88e0c
@wesm @Komnomnomnom wesm ENH: add is_full method to PeriodIndex close #1114 a31ed38
@adamklein @Komnomnomnom adamklein ENH: #1020 implementation. needs tests and adding to API b98e4e0
@mwiebe @Komnomnomnom mwiebe Use datetime64 with a 'us' unit explicitly, for 1.6 and 1.7 compatibi…
…lity
3d83387
@mwiebe @Komnomnomnom mwiebe Use an explicit unit for the 1.7 datetime64 scalar constructor c53e093
@mwiebe @Komnomnomnom mwiebe Use assert_equal instead of assert, to see the actual values 89bd898
@mwiebe @Komnomnomnom mwiebe Microseconds (us) not milliseconds (ms) 4e6720f
@wesm @Komnomnomnom wesm TST: use NaT value a7bccd8
@wesm @Komnomnomnom wesm ENH: add docs and add match function to API, close #502 1ecb5c4
@wesm @Komnomnomnom wesm ENH: add Cython nth/last functions, vbenchmarks. close #1043 4ac9abb
@wesm @Komnomnomnom wesm BUG: fix improper quarter parsing for frequencies other than Q-DEC, c…
…lose #1228
b246ae1
@wesm @Komnomnomnom wesm BUG: implement Series.repeat to get expected results, close #1229 4d052f9
@wesm @Komnomnomnom wesm ENH: anchor resampling frequencies like 5minute that evenly subdivide…
… one day in resampling to always get regular intervals. a bit more testing needed, but close #1165
74a6be0
@Komnomnomnom Kelsey Jordahl ENH: Allow different number of rows & columns in a histogram plot 0cf9e3d
@wesm @Komnomnomnom wesm BUG: support resampling of period data to, e.g. 5minute thoguh with t…
…imestamped result, close #1231
e043862
@wesm @Komnomnomnom wesm BUG: remove restriction in lib.Reducer that index by object dtype. close 996b964
@wesm @Komnomnomnom wesm TST: vbenchmark for #561, push more work til 0.9 7baa84c
@wesm @Komnomnomnom wesm BUG: don't print exception in reducer 8b972a1
@wesm @Komnomnomnom wesm BUG: rogue foo 93b5221
@wesm @Komnomnomnom wesm ENH: reimplment groupby_indices using better algorithmic tricks, asso…
…ciated vbenchmark. close #609
eb460c0
@wesm @Komnomnomnom wesm BLD: fix npy_* -> pandas_*, compiler warnings 197a7f6
@wesm @Komnomnomnom wesm TST: remove one skip test aca4c43
@wesm @Komnomnomnom wesm ENH: store pytz time zones as zone strings in HDFStore, close #1232 c1260e3
@ruidc @Komnomnomnom ruidc treat XLRD.XL_CELL_ERROR as NaN 8d27185
@ruidc @Komnomnomnom ruidc replace tabs with spaces 1e6aea5
@Komnomnomnom Chang She ENH: convert multiple text file columns to a single date column #1186 349bccb
@Komnomnomnom Chang She Stop storing class reference in HDFStore #1235 4c32ab8
@Komnomnomnom Chang She removed extraneous IntIndex instance test e057ad5
@wesm @Komnomnomnom wesm BUG: fix rebase conflict from #1236 0cdfe75
@wesm @Komnomnomnom wesm RLS: release note 63952a8
@Komnomnomnom Chang She Merged extra keyword with parse_dates 52492dd
@Komnomnomnom Chang She TST: VB for multiple date columns 9c01e77
@Komnomnomnom Chang She A few related bug fixes 1febe66
@wesm @Komnomnomnom wesm TST: test with headers 3fdf18a
@lbeltrame @Komnomnomnom lbeltrame ENH: Add support for converting DataFrames to R data.frames and
matrices, close #350
c9af5c5
@lbeltrame @Komnomnomnom lbeltrame BUG: Properly handle the case of matrices d17f1d5
@Komnomnomnom Chang She ENH: maybe upcast masked arrays passed to DataFrame constructor a89e7b9
@wesm @Komnomnomnom wesm RLS: release notes ea7f4e1
@wesm @Komnomnomnom wesm ENH: optimize join/merge on integer keys, close #682 4c1eb1b
@wesm @Komnomnomnom wesm RLS: release notes for #1081 8572d54
@wesm @Komnomnomnom wesm ENH: efficiently box datetime64 -> Timestamp inside Series.__getitem__.
close #1058
8ecb31b
@wesm @Komnomnomnom wesm BLD: add modified numpy Cython header 4b56332
@wesm @Komnomnomnom wesm BLD: fix datetime.pxd d2b947b
@wesm @Komnomnomnom wesm ENH: can pass multiple columns to GroupBy.__getitem__, close #383 67a98ff
@tkf @Komnomnomnom tkf ENH: treat complex number in maybe_convert_objects 48a073a
@tkf @Komnomnomnom tkf ENH: treat complex number in maybe_convert_objects a3e538f
@wesm @Komnomnomnom wesm ENH: accept list of tuples, preserving function order in SeriesGroupB…
…y.aggregate
2e9de0e
@wesm @Komnomnomnom wesm ENH: more flexible multiple function application in DataFrameGroupBy, c…
…lose #642
92d050b
@wesm @Komnomnomnom wesm DOC: release notes b07f097
@tkf @Komnomnomnom tkf TST: Add complex number in test_constructor_scalar_inference ca6558c
@tkf @Komnomnomnom tkf ENH: treat complex number in internals.form_blocks 3f3b900
@tkf @Komnomnomnom tkf ENH: add internals.ComplexBlock dc43a1e
@tkf @Komnomnomnom tkf BUG: fix max recursion error in test_reindex_items
It looks like sorting by dtype itself does not work.
To see that, try this snippet:

>>> from numpy import dtype
>>> sorted([dtype('bool'), dtype('float64'), dtype('complex64'),
...         dtype('float64'), dtype('object')])
[dtype('bool'),
 dtype('float64'),
 dtype('complex64'),
 dtype('float64'),
 dtype('object')]
c280d22
@wesm @Komnomnomnom wesm BLD: fix platform int issues a7698da
@wesm @Komnomnomnom wesm TST: verify consistently set group name, close #184 0782990
@wesm @Komnomnomnom wesm ENH: don't populate hash table in index engine if > 1e6 elements, to …
…save memory and speed. close #1160
d66ac45
@wesm @Komnomnomnom wesm ENH: support different 'bases' when resampling regular intervals like…
… 5 minute, close #1119
be5b5a4
@Komnomnomnom Chang She VB: more convenience auto-updates 8d581c8
@Komnomnomnom Chang She VB: get from and to email addresses from config file 6e09dda
@Komnomnomnom Chang She VB: removing cruft; getting config from user folders 31fefba
@wesm @Komnomnomnom wesm BUG: floor division for Python 3 d5b6b93
@Komnomnomnom Chang She DOC: function for auto docs build e275d76
@Komnomnomnom Chang She DOC: removed lingering sourceforge references 18d9a13
@Komnomnomnom Chang She DOC: removed lingering timeRule keyword use 545e917
@wesm @Komnomnomnom wesm ENH: very basic ordered_merge with forward filling, not with multiple…
… groups yet
40d9a3b
@wesm @Komnomnomnom wesm ENH: add group-wise merge capability to ordered_merge, unit tests, close 69229e7
@wesm @Komnomnomnom wesm BUG: ensure_platform_int actually makes lots of copies 9e2142b
@wesm @Komnomnomnom wesm RLS: release notes, close #1239 5891ad5
@wesm @Komnomnomnom wesm BLD: 32-bit compat fixes per #1242 42d1c90
@wesm @Komnomnomnom wesm ENH: add keys() method to DataFrame, close #1240 f1c6c89
@wesm @Komnomnomnom wesm DOC: release notes 6e8bbed
@Komnomnomnom Chang She TST: test cases for replace method. #929 e50c7d8
@Komnomnomnom Chang She ENH: Series.replace #929 b0e13c1
@Komnomnomnom Chang She ENH: DataFrame.replace and cython replace. Only works for floats and …
…ints. Need to generate datetime64 and object versions.
b7546b2
@Komnomnomnom Chang She ENH: finishing up DataFrame.replace need to revisit 45773c9
@Komnomnomnom Chang She removed bottleneck calls from replace 2f5319d
@Komnomnomnom Chang She moved mask_missing to common 245c126
@Komnomnomnom Chang She TST: extra test case for Series.replace 35220b4
@Komnomnomnom Chang She removed remaining references to replace code generation 40a0cb1
@wesm @Komnomnomnom wesm DOC: release note re: #929 76355d0
@invisibleroads @Komnomnomnom invisibleroads Removed erroneous reference to iterating over a Series, which iterate…
…s over values and not keys
927d370
@Komnomnomnom Chang She TST: rephrased .keys call for py3compat 49ad7e2
@invisibleroads @Komnomnomnom invisibleroads Fixed a few typos b60c0d3
@wesm @Komnomnomnom wesm REF: microsecond -> nanosecond migration, most of the way there #1238 d4407a9
@wesm @Komnomnomnom wesm BUG: more nano fixes 4f15d54
@Komnomnomnom Chang She DOC: put back doc regarding inplace in rename in anticipation of feature 421f5d3
@Komnomnomnom Chang She DOC: reworded description for MultiIndex 181f945
@Komnomnomnom Chang She DOC: started on timeseries.rst for 0.8 fb1e662
@wesm @Komnomnomnom wesm REF: more nanosecond support fixes, test suite passes #1238 9bc3814
@wesm @Komnomnomnom wesm ENH: more nanosecond support #1238 b026566
@orbitfold @Komnomnomnom orbitfold Changes to plotting scatter matrix diagonals c360391
@orbitfold @Komnomnomnom orbitfold Changed xtick, ytick labels cf74512
@orbitfold @Komnomnomnom orbitfold Added simple test cases d7d6a0f
@orbitfold @Komnomnomnom orbitfold Updated plotting.py scatter_matrix docstring to describe all the para…
…meters
cd8222c
@orbitfold @Komnomnomnom orbitfold Added scatter_matrix examples to visualization.rst 8e2f3f9
@wesm @Komnomnomnom wesm DOC: release notes da1b234
@Komnomnomnom Chang She BUG: DataFrame.drop_duplicates with NA values a6e32b8
@Komnomnomnom Chang She use fast zip with a placeholder value just for np.nan 2a6fc11
@Komnomnomnom Chang She TST: vbench for drop_duplicate with skipna set to False d95a254
@Komnomnomnom Chang She optimized a little bit for speed 7953ae8
@Komnomnomnom Chang She ENH: inplace option to DataFrame.drop_duplicates #805 with vbench 916be1d
@tkf @Komnomnomnom tkf BUG: replace complex64 with complex128
As mentioned in #1098.
ba6a9c8
@wesm @Komnomnomnom wesm ENH: add KDE plot from #1059 1cacb6c
Contributor

Komnomnomnom commented May 20, 2012

Ugh, I did not know merging into my fork would flood this pull request. It might be best to delete my current fork and submit a new pull request once this issue is sorted.

The good news is after a bit of setup I was able to reproduce the memory corruption you are seeing in my Ubuntu VM. It appears to happen even when NpyArr_encodeLabels is not involved. There is also some weirdness with timestamp conversion but I think that is a separate issue.

Contributor

Komnomnomnom commented May 20, 2012

I believe I've found the problem. The reference count of the object being encoded was mistakenly being decremented twice. I presume it was just chance that the memory layout or garbage collection schedule on my laptop meant the object wasn't being deleted.

There are a few more things I've noticed (like build clean deleting the C files and datetime conversion is now not working) which I'll fix before submitting a new pull request. I'll close this one for now and I'll create a feature branch on a new fork to avoid this mess happening again.

Owner

wesm commented May 20, 2012

That will teach you not to develop in master ;) BTW, you don't need to refork-- you can git reset --hard upstream/master and force-push that to github. Just make sure you make a branch of your current master with the JSON work

Contributor

Komnomnomnom commented May 20, 2012

Ooop too late I re-forked a few minutes ago....hope this doesn't cause further problems... :-/

BTW if you want to test the fix on your machine the offending line was 278 in NpyArr_iterEnd
cb7c6ae#L6R279
(That line should be removed.)

Also I'm still noticing some timestamp weirdness, I'm guessing there were changes recently in master regarding datetime64 ? Is this work still ongoing?

Owner

wesm commented May 20, 2012

Yes, the work is still ongoing. Test failures in JSON encoding/decoding or elsewhere (pydata/master test suite passes cleanly for me)? I should be able to fix them myself

Contributor

jreback commented Jun 11, 2013

implemented via #3804

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment