Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Construct 1d array from listlike #18769

Merged
merged 7 commits into from
Dec 19, 2017

Conversation

toobaz
Copy link
Member

@toobaz toobaz commented Dec 13, 2017

  • tests passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff

#18626 isn't converging, will need more work and I won't have much time soon, so I took away the asv and refactoring that should be straighforward to merge.

asv says

       before           after         ratio
     [96439fb1]       [e72437c5]
-        38.6±1ms       35.1±0.5ms     0.91  panel_ctor.Constructors3.time_panel_from_dict_same_index
-      24.2±0.1μs       21.8±0.2μs     0.90  ctors.Constructors.time_index_from_array_string
-        39.7±1ms      34.9±0.02ms     0.88  panel_ctor.Constructors2.time_panel_from_dict_equiv_indexes
-     14.1±0.04μs      12.3±0.08μs     0.87  ctors.Constructors.time_series_from_ndarray
-      12.2±0.6μs      10.1±0.05μs     0.82  ctors.Constructors.time_dtindex_from_series

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

... but they are borderline, and don't always come up.

@toobaz
Copy link
Member Author

toobaz commented Dec 14, 2017

@jreback Any hint about why all 2.7 fail with "ImportError: C extension: lib not built. If you want to import pandas from the source directory, you may need to run 'python setup.py build_ext --inplace --force' to build the C extensions first."?

@toobaz toobaz force-pushed the construct_1d_array_from_listlike branch from e72437c to dcf0136 Compare December 14, 2017 08:17
@codecov
Copy link

codecov bot commented Dec 14, 2017

Codecov Report

Merging #18769 into master will decrease coverage by 0.04%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18769      +/-   ##
==========================================
- Coverage   91.64%   91.59%   -0.05%     
==========================================
  Files         154      154              
  Lines       51401    51430      +29     
==========================================
+ Hits        47106    47108       +2     
- Misses       4295     4322      +27
Flag Coverage Δ
#multiple 89.46% <100%> (-0.03%) ⬇️
#single 40.83% <80%> (-0.13%) ⬇️
Impacted Files Coverage Δ
pandas/core/dtypes/cast.py 88.6% <100%> (+0.07%) ⬆️
pandas/core/ops.py 91.77% <100%> (ø) ⬆️
pandas/core/common.py 92.92% <100%> (-0.09%) ⬇️
pandas/core/algorithms.py 94.14% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/plotting/_converter.py 64.78% <0%> (-1.74%) ⬇️
pandas/util/testing.py 82.71% <0%> (-0.44%) ⬇️
pandas/core/frame.py 97.68% <0%> (-0.11%) ⬇️
pandas/io/parsers.py 95.55% <0%> (+0.05%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7a0ee19...e5b3cd5. Read the comment docs.

# we have a list-of-list
result[:] = [tuple(x) for x in values]
if hasattr(values, '__array__'):
values = [tuple(x) for x in values]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this one (can you add some comments?)
The original comment said "we have a list-of-list", which is actually not possible because it was already checked before if it was a list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that comment was wrong, and confusing. Will add a comment.

@toobaz
Copy link
Member Author

toobaz commented Dec 14, 2017

Any hint about why all 2.7 fail with

Answering myself... it was a circular dependency

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some tests for construct_1d_object_array_from_listlike in test_cast.py (where similar things are tested).

@@ -33,3 +34,21 @@ def time_dtindex_from_series(self):

def time_dtindex_from_index_with_series(self):
Index(self.s)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use params for this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lots of examples in the other asvs


Returns
-------
1-dimensional numpy array of dtype "dtype"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename this to
construct_1d_object_array_from_listlike and drop the dtype (always make it object).

you can also just add another function for this as well (e.g. have 1 that accepts the dtype as a required parameter ). I also don't think we are actually using dtype anywhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The array we return is of dtype dtype. So this will raise an error if e.g. dtype=int and the input contains lists, but this is good because it allows cleaner code (and less checks) in those methods where the actual dtype is unknown.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are not using this, so its very confusing. I would just rather return an object dtyped array here (or have 2 methods).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are not using this, so its very confusing

Non sequitur.

have 2 methods

... with almost perfect duplication of code, one of which perfectly generalizes the other, and without any performance gain?! I thought you liked to reduce code :-)

This method builds a 1d array of type dtype with given data. If it is impossible, it raises an error. This is pretty straightforward, and whenever np.array will have an ndmax parameter, all calls to this method will be replaced with a simple call to np.array(..., ndmax=1).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I will explain again.

We are not using the dtype argument. So just eliminate it. I don't see utility in having it.

@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Dec 14, 2017
@toobaz toobaz force-pushed the construct_1d_array_from_listlike branch from dcf0136 to c88f05f Compare December 14, 2017 14:04
@pep8speaks
Copy link

pep8speaks commented Dec 14, 2017

Hello @toobaz! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on December 19, 2017 at 13:39 Hours UTC

@toobaz toobaz force-pushed the construct_1d_array_from_listlike branch from c88f05f to c7fbfb0 Compare December 14, 2017 14:27
@@ -33,3 +34,21 @@ def time_dtindex_from_series(self):

def time_dtindex_from_index_with_series(self):
Index(self.s)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lots of examples in the other asvs


Returns
-------
1-dimensional numpy array of dtype "dtype"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are not using this, so its very confusing. I would just rather return an object dtyped array here (or have 2 methods).

try:
problem = (dtype is not object and
np.array(data, dtype=dtype).ndim != 1) and ValueError
except (ValueError, TypeError) as exc:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make 2 tests, 1 of which all succeed, 1 of which tests the ones that error

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can, but then I wouldn't parametrize (that would be a nightmare, since the cartesian product of parameters cannot be cleanly split). Let me try to add some comments to make it more clear instead.

@toobaz toobaz force-pushed the construct_1d_array_from_listlike branch 3 times, most recently from 61ebefb to 159427a Compare December 15, 2017 18:03
[False, True]]

def setup(self, data_fmt, with_index):
N = 10**2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this 10**4, this is too small

param_names = ["data_fmt", "with_index"]
params = [[lambda x: x,
list,
lambda arr: dict(zip(range(len(arr)), arr)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add something with tuples here
add strings



class FromNDArray(object):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there already list-like constructor asv's here? and from dict?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no idea (unrelated to this PR)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the answer is?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no idea, as above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well since you are adding this one, should be very easy to .parametrize and add similar to the Series case, at least a few simple ones.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, some simple ones are present in frame_ctor.py, above the code I moved from ctor.py. I can open an issue to remind adding more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except ValueError:
# we have a list-of-list
result[:] = [tuple(x) for x in values]
# Avoid building an array of arrays:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there an asv that hits this case?

Copy link
Member Author

@toobaz toobaz Dec 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't even think there is any valid code path that hits this case... which indeed should be suppressed in #18626

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there is not valid code, then let's remove it. or make a new issue. inside a PR doesn't help.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the issue number here with a TODO

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(done)


Returns
-------
1-dimensional numpy array of dtype "dtype"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I will explain again.

We are not using the dtype argument. So just eliminate it. I don't see utility in having it.

def test_cast_1d_array(self, dtype, datum1, datum2):
data = [datum1, datum2]
try:
# Conversion to 1d array is possible if requested dtype is object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please separate into 2 tests. This is so confusing as to what cases are what. If you need to skip a few tests on specific cases in the 'good' case, that is ok, but you need to clearly have a separate tests for the error cases. catching errors IN a test is a big no-no. it is impossible to tell what should succeed here.

@toobaz
Copy link
Member Author

toobaz commented Dec 18, 2017

We are not using the dtype argument. So just eliminate it. I don't see utility in having it.

Yeah, but this is hardly an argument - you're not even seeing that the change you are firmly pretending in the code is in contrast with the change you are firmly pretending in the test.

But I guess I'll have to explain you in some other PR, now I don't have enough time.

An updated version of this one is on its way.

@toobaz toobaz force-pushed the construct_1d_array_from_listlike branch 2 times, most recently from 905046a to 8848344 Compare December 18, 2017 10:16
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comments.

list,
lambda arr: list(arr.astype(str)),
lambda arr: dict(zip(range(len(arr)), arr)),
lambda arr: [(i, -i) for i in arr],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some args are duplicated here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, was reading the ( as [



class FromNDArray(object):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the answer is?

except ValueError:
# we have a list-of-list
result[:] = [tuple(x) for x in values]
# Avoid building an array of arrays:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there is not valid code, then let's remove it. or make a new issue. inside a PR doesn't help.


@pytest.mark.parametrize('datum1', [1, 2., "3", (4, 5), [6, 7], None])
@pytest.mark.parametrize('datum2', [8, 9., "10", (11, 12), [13, 14], None])
def test_cast_1d_array(self, datum1, datum2):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there fail cases? iow where this routine raises?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not once you remove support for non-object dtypes (which is what you just asked me to do).

(well, as long as data has a len() and is iterable...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right so can you add a test that fails for scalars then

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(done)

def test_cast_1d_array(self, datum1, datum2):
data = [datum1, datum2]
result = construct_1d_object_array_from_listlike(data)
# Direct comparison fails: https://github.com/numpy/numpy/issues/10218
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line

list,
lambda arr: list(arr.astype(str)),
lambda arr: dict(zip(range(len(arr)), arr)),
lambda arr: [(i, -i) for i in arr],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, was reading the ( as [



class FromNDArray(object):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well since you are adding this one, should be very easy to .parametrize and add similar to the Series case, at least a few simple ones.

except ValueError:
# we have a list-of-list
result[:] = [tuple(x) for x in values]
# Avoid building an array of arrays:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the issue number here with a TODO


@pytest.mark.parametrize('datum1', [1, 2., "3", (4, 5), [6, 7], None])
@pytest.mark.parametrize('datum2', [8, 9., "10", (11, 12), [13, 14], None])
def test_cast_1d_array(self, datum1, datum2):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right so can you add a test that fails for scalars then

Returns
-------
1-dimensional numpy array of dtype object
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would not object to an
assert is_iterable(values) with a nice error message

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is: I can't think of any code path which could be hitting it. Scalar input to a Series() is (considered valid and) recasted to a 1-d before calling this. Similarly, an operation such as Series([1,2]) + 3 transforms 3 before hitting this. So I don't know what the error message could actually say.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not what i am asking

this is a completely internal
routine
it should fail with invalid input

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should fail with invalid input

Sure it does, TypeError: object of type 'int' has no len(). Which is pretty clear, considering the docstring, and precisely in light of the fact that this is an internal routine. That said, feel free to suggest an error message which is worth the cost of the additional assert.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a raises section to the doc-string

@toobaz toobaz force-pushed the construct_1d_array_from_listlike branch from 8848344 to e5b3cd5 Compare December 18, 2017 14:43
@codecov
Copy link

codecov bot commented Dec 18, 2017

Codecov Report

❗ No coverage uploaded for pull request base (master@b32dd63). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #18769   +/-   ##
=========================================
  Coverage          ?   91.62%           
=========================================
  Files             ?      154           
  Lines             ?    51401           
  Branches          ?        0           
=========================================
  Hits              ?    47097           
  Misses            ?     4304           
  Partials          ?        0
Flag Coverage Δ
#multiple 89.49% <100%> (?)
#single 40.84% <80%> (?)
Impacted Files Coverage Δ
pandas/core/common.py 92.92% <100%> (ø)
pandas/core/algorithms.py 94.14% <100%> (ø)
pandas/core/dtypes/cast.py 88.6% <100%> (ø)
pandas/core/ops.py 91.77% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b32dd63...d0a6e48. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comments. ping on green.

Returns
-------
1-dimensional numpy array of dtype object
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a raises section to the doc-string

@toobaz toobaz force-pushed the construct_1d_array_from_listlike branch from e5b3cd5 to d0a6e48 Compare December 19, 2017 13:39
@jreback jreback added this to the 0.22.0 milestone Dec 19, 2017
@jreback
Copy link
Contributor

jreback commented Dec 19, 2017

this will fail because of a conda issue, that I think is now fixed in master. I will merge after this is resolved (no need to repush).

@jreback jreback merged commit 04db779 into pandas-dev:master Dec 19, 2017
@jreback
Copy link
Contributor

jreback commented Dec 19, 2017

thanks!

@toobaz toobaz deleted the construct_1d_array_from_listlike branch December 19, 2017 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants