Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Expand type specializations and multiple "by" parameters in merge_asof() #13936
Comments
|
can you show some specific examples which fail and/or you want to work (in the top section). This will provide some examples / tests cases. |
jreback
added Enhancement Reshaping Dtypes
labels
Aug 8, 2016
jreback
added this to the
Next Major Release
milestone
Aug 8, 2016
jreback
added Difficulty Intermediate Effort Medium
labels
Aug 8, 2016
PerformanceUsing
Runtime of the 64-bit case:
Now cast to 32-bit:
And re-run it:
The added overhead comes from converting the Multi-ByLet's say I have these DataFrames:
I want to join by both the ticker symbol and the stock exchange:
I.e., the expected result is:
For the These two requests are of low priority right now, so I personally won't be able to get to them right away. |
|
does the 'by' issue work? (and just is somewhat slower)? |
|
@jreback No, the |
|
yes it can take tuples. normally that is not a good idea, but might work here. mainly I would like to raise if there is something that is non-sensical (ATM). If it works but is slow that is fine as well. |
|
It raises an error now, yes.
|
|
@chrisaycock perfect. Then good to go for now. |
chrisaycock
referenced
this issue
Dec 1, 2016
Closed
ENH: merge_asof() has type specializations and can take multiple 'by' parameters (#13936) #14783
jreback
modified the milestone: 0.19.2, Next Major Release
Dec 15, 2016
jreback
closed this
in e7df751
Dec 16, 2016
ischurov
added a commit
to ischurov/pandas
that referenced
this issue
Dec 19, 2016
|
|
+ ischurov |
61bd74a
|
jorisvandenbossche
added a commit
to jorisvandenbossche/pandas
that referenced
this issue
Dec 24, 2016
|
|
+ jorisvandenbossche |
c520b25
|
yarikoptic
added a commit
to neurodebian/pandas
that referenced
this issue
Jan 3, 2017
|
|
yarikoptic |
5f6a820
|
yarikoptic
added a commit
to neurodebian/pandas
that referenced
this issue
Jan 3, 2017
|
|
yarikoptic |
f3a4da0
|
chrisaycock commentedAug 8, 2016
•
edited
pd.merge_asof()can take an integer or floating-point number in theonparameter, and it can take an integer or an object in thebyparameter. Specifically, the user's types are promoted toint64_t,double, orobjectas needed. That means, for example, that anint32_tis permitted, but we'll have to create a copy of the user's column to promote toint64_t.This brings a question of whether we should add type specializations for every integer and floating-point number for better performance.
A second issue to consider is that only one column is permitted in the
byparameter. But the user may wish to match on both ticker symbol and stock exchange, for example. To allow for this, the implementation logic would need to allow arrays of objects/integers for our grouping.