ENH: column label filtering via regexes to work for numeric names #10384

Closed
wants to merge 18 commits into
from

Conversation

Projects
None yet
2 participants
Contributor

cyrusmaher commented Jun 18, 2015

Simple fix to allow regex filtering to work for numeric column labels, e.g. df.filter(regex="[12][34]")

closes #10506

@cyrusmaher cyrusmaher Update generic.py
Simple fix to allow regex filtering to work for numeric column labels, e.g. df.filter(regex="[12][34]")
ac58777
Contributor

jreback commented Jun 18, 2015

can you add some tests?

jreback added the Indexing label Jun 18, 2015

jreback changed the title from Update generic.py to ENH: column label filtering via regexes to work for numeric names Jun 18, 2015

Contributor

cyrusmaher commented Jun 18, 2015

For search(x) -> search(str(x))?

Contributor

cyrusmaher commented Jul 3, 2015

Any advice on what to add or where? I don't see any existing tests for this function...

Contributor

jreback commented Jul 3, 2015

look in pandas/tests/test_frame for test_filter

jreback added this to the 0.17.0 milestone Jul 3, 2015

jreback added the API Design label Jul 3, 2015

Contributor

cyrusmaher commented Jul 3, 2015

Thanks Jeff! Added the test. Let me know what you think...

@jreback jreback commented on an outdated diff Jul 3, 2015

pandas/tests/test_frame.py
@@ -10755,7 +10755,11 @@ def test_filter(self):
df = DataFrame(0., index=[0, 1, 2], columns=[0, 1, '_A', '_B'])
filtered = df.filter(like='_')
self.assertEqual(len(filtered.columns), 2)
-
+
+ # regex with ints in column names
+ df = DataFrame(0., index=[0, 1, 2], columns=[0, 1, 'A1', 'B'])
@jreback

jreback Jul 3, 2015

Contributor

add the issue number as a comment (this PR number since no associated issue)

@jreback jreback commented on an outdated diff Jul 3, 2015

pandas/tests/test_frame.py
@@ -10755,7 +10755,11 @@ def test_filter(self):
df = DataFrame(0., index=[0, 1, 2], columns=[0, 1, '_A', '_B'])
filtered = df.filter(like='_')
self.assertEqual(len(filtered.columns), 2)
-
+
+ # regex with ints in column names
+ df = DataFrame(0., index=[0, 1, 2], columns=[0, 1, 'A1', 'B'])
+ filtered = df.filter(regex='^[0-9]+$')
+ self.assertEqual(len(filtered.columns), 2)
@jreback

jreback Jul 3, 2015

Contributor

do the test again with all number columns that are strings, e.g. ['0','1'...] (i think results should be the same)

on the comparision do

expected = DataFrame(.....)
assert_frame_equal(fitlered, expected)

IOW construct the expected manually

Contributor

jreback commented Jul 3, 2015

add a not in whatsnew/0.17.0. Put in Other Enhancements section

What would this do in 0.16.2 (if you passed the regex), not fitler anything? or raise?

Contributor

cyrusmaher commented Jul 3, 2015

Done! In 0.16.2 re.search will raise if a column name is numeric...

@jreback jreback commented on an outdated diff Jul 3, 2015

doc/source/whatsnew/v0.17.0.txt
@@ -26,7 +26,8 @@ New features
Other enhancements
^^^^^^^^^^^^^^^^^^
-
+- `regex` argument to DataFrame.filter now handles numeric column names instead of raising an exception.
@jreback

jreback Jul 3, 2015

Contributor

use double backticks here (and around DateFrame.filter)

@jreback

jreback Jul 3, 2015

Contributor

add the issue number (this PR number) onto the end (see how the other issues are done)

@jreback

jreback Jul 3, 2015

Contributor

say instead of raising ValueError

@jreback jreback commented on an outdated diff Jul 3, 2015

pandas/tests/test_frame.py
@@ -10755,6 +10755,16 @@ def test_filter(self):
df = DataFrame(0., index=[0, 1, 2], columns=[0, 1, '_A', '_B'])
filtered = df.filter(like='_')
self.assertEqual(len(filtered.columns), 2)
+
+ # regex with ints in column names
+ # from PR #10384
+ df = DataFrame(0., index=[0, 1, 2], columns=[0, 1, 'A1', 'B'])
+ filtered = df.filter(regex='^[0-9]+$')
+ self.assertEqual(len(filtered.columns), 2)
@jreback

jreback Jul 3, 2015

Contributor

use an assert_frame_equal here as well

you will need to explicty construct the expected, e.g. something like

expected = DataFrame(0,index=[0,1,2],columns=['A1','B'])

also change the test a bit to put the numerics not all at the beginning (e.g. put one in the middle or end)

Contributor

jreback commented Jul 3, 2015

when you are all done, pls rebase/squash see contributing docs here

cyrusmaher added some commits Jun 18, 2015

@cyrusmaher @cyrusmaher cyrusmaher Fix regex filter for numeric columns
Simple fix to allow regex filtering to work for numeric column labels, e.g. df.filter(regex="[12][34]")

Add test for regex filter on numeric column names

Add release note

Add second regex test
12d79e7
@cyrusmaher cyrusmaher Merge branch 'patch-1' of https://github.com/cyrusmaher/pandas into p…
…atch-1
ccc7490
@cyrusmaher cyrusmaher Update docs, test 009422c
@cyrusmaher cyrusmaher Fix merge conflict b46133f
Contributor

cyrusmaher commented Jul 3, 2015

I'm having trouble with squashing the commits. I don't have a ton of experience with git, so I'm not sure what to do next. Below is the message. Seems to have to do with a merge conflict in test_frame? Any advice?

error: could not apply ac90352... Add test for regex filter on numeric column names

When you have resolved this problem, run "git rebase --continue".
If you prefer to skip this patch, run "git rebase --skip" instead.
To check out the original branch and stop rebasing, run "git rebase --abort".
Contributor

jreback commented Jul 3, 2015

contributing docs are here: http://pandas.pydata.org/pandas-docs/stable/contributing.html

you have a conflict and need to fix it

cyrusmaher added some commits Jun 18, 2015

@cyrusmaher @cyrusmaher cyrusmaher # This is a combination of 2 commits.
# The first commit's message is:

Fix regex filter for numeric columns

Simple fix to allow regex filtering to work for numeric column labels, e.g. df.filter(regex="[12][34]")

Add test for regex filter on numeric column names

Add release note

Add second regex test

# This is the 2nd commit message:

Update generic.py

Simple fix to allow regex filtering to work for numeric column labels, e.g. df.filter(regex="[12][34]")
5bd0a4b
@cyrusmaher cyrusmaher Fix merge conflict 88a8e3e
@cyrusmaher cyrusmaher Fix merge conflict? 49b607f
@cyrusmaher @cyrusmaher cyrusmaher Update generic.py
Simple fix to allow regex filtering to work for numeric column labels, e.g. df.filter(regex="[12][34]")
2a9ddd1
@cyrusmaher @cyrusmaher cyrusmaher Add test for regex filter on numeric column names 0d3af4c
@cyrusmaher cyrusmaher Add release note 94626cc
@cyrusmaher cyrusmaher Add second regex test 3bb6d05
@cyrusmaher cyrusmaher Update docs, test 86d523a
@cyrusmaher cyrusmaher Fix merge conflict f562f7f
@cyrusmaher cyrusmaher Maybe this merge fix worked d9c4523
Contributor

cyrusmaher commented Jul 3, 2015

Hmm, when I rebase it detects conflicts, then I resolve them using git mergetool, and commit. Doesn't seem to change anything. When I run git merge master I get that everything is up-to-date. I'm probably missing something simple?

Contributor

jreback commented Jul 5, 2015

FYI, you don't normally need to add an issue if you just create a PR (like you did), but no biggie.

Contributor

jreback commented Jul 5, 2015

I rebase you: https://travis-ci.org/jreback/pandas/builds/69631109

FYI don't use merge master. This is not pandas standard practice. This makes rebasing much more difficult.

Contributor

jreback commented Jul 6, 2015

merged via bfe5a7f

thanks!

jreback closed this Jul 6, 2015

cyrusmaher deleted the cyrusmaher:patch-1 branch Jul 7, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment