Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL like join on Column id for 2D arrays #9526

Open
miskolc opened this issue Aug 7, 2017 · 1 comment
Open

SQL like join on Column id for 2D arrays #9526

miskolc opened this issue Aug 7, 2017 · 1 comment

Comments

@miskolc
Copy link

miskolc commented Aug 7, 2017

I've been searching the docs and on google for a way to do and SQL JOIN between 2D numpy arrays. So far the best I've found was the join_by function in recfunctions but even this one seems to require the key to be a string:

key : {string, sequence}
        A string or a sequence of strings corresponding to the fields used
        for comparison.

The alternative is to use pandas's merge but this requieres a type conversion.between numpy array and pandas Dataframes. I would like to avoid doing that. I would like to have something like an:

numpy.merge(left_array, right_array, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False)

where left_on and right_on are either integers or lists of integers

@eric-wieser
Copy link
Member

eric-wieser commented Aug 7, 2017

Why is requiring the key to be a string a problem? Attaching names to your columns via structured dtypes is essentially free.

For example:

d = np.arange(12).reshape(4, 3)  # sample data

# data with column names
df = d.view([('a', d.dtype), ('b', d.dtype), ('c', d.dtype)]).squeeze(axis=-1)

And then you can use join_by on df

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants