Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

np.char.split returns an array of lists, not a 2d array #14014

Closed
Msegade opened this issue Jul 15, 2019 · 4 comments
Closed

np.char.split returns an array of lists, not a 2d array #14014

Msegade opened this issue Jul 15, 2019 · 4 comments

Comments

@Msegade
Copy link

Msegade commented Jul 15, 2019

The returned array of the function np.char.split gives an array of list, which has to be further processed to produce a normal 2d numpy array

Example

>>> a = np.array([['0.1 0.2 0.3'], ['0.3 0.4 0.5'], ['0.5 0.6 0.7']])
>>> np.char.split(a)
array([[list(['0.1', '0.2', '0.3'])],
       [list(['0.3', '0.4', '0.5'])],
       [list(['0.5', '0.6', '0.7'])]], dtype=object)

Solution

To ease the processing of the arrays, it will be better to have the function return

array([['0.1', '0.2', '0.3'],
       ['0.3', '0.4', '0.5'],
       ['0.5', '0.6', '0.7']], dtype='<U3')
@eric-wieser
Copy link
Member

What would you have the following produce?

>>> a = np.array([['0.1 0.2'], ['0.3 0.4 0.5 0.6']])
>>> np.char.split(a)

@Msegade
Copy link
Author

Msegade commented Jul 15, 2019

Ok, I see the issue. Maybe a function argument that generates the 2d array if the resulting lists have the same length, and otherwise raises an exception?

np.char.split(a, homogeneous_array=True)

@amjangde
Copy link

First convert the array (i.e. array of lists returned by the split() method) to list. So now we have list of lists. And now use 'np.array()' method to convert the list (i.e. list of lists) to a 2D array.

Please refer this converty numpy array of arrays to 2d array

example:

date_cols = ['2012-12-12', '2011-11-11', '2010-10-10']

date_cols = np.char.split(date_cols,'-')

//here we will get array of lists
//date_cols will look like -> [list(['2012', '12', '12']), list(['2011', '11', '11']), list(['2010', '10', '10'])]

date_cols = date_cols.tolist()
date_cols = np.array(date_cols)

//date_cols will look like this -> [['2012' '12' '12'] ['2011' '11' '11'] ['2010' '10' '10'] which is a 2D array

@eric-wieser
Copy link
Member

eric-wieser commented Sep 11, 2019

A way to solve this is to use:

def array_of_lists_to_array(arr):
    return np.apply_along_axis(lambda a: np.array(a[0]), -1, arr[..., None])

Obviously this spelling is less than ideal, but it does correctly throw an exception if something is amiss.

I'm going to close this issue, I doubt we are in favor of adding special-cases to char.split. I've opened #14478 to track adding a generic helper to solve this kind of problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants