ENH: Treat 'Inf' as infinity in text parser #4220

jtratner · 2013-07-12T00:54:27Z

Makes 'inf' checks case insensitive using strcasecmp
Fixes #4219.

jreback · 2013-07-12T00:57:56Z

was using strcasecmp slower? (prob not a big deal either way)

jtratner · 2013-07-12T01:25:52Z

No. I'm doing that instead. Just found a few other things that needed to be fixed (plus means test cases need to be larger).

Is TestPythonParser supposed to call the parser functions in parsers.pyx?

jreback · 2013-07-12T01:29:04Z

some of python parser functions are shared

but not sure (your tests are run thru both parsers though so u will know if they work or not)

jtratner · 2013-07-12T01:30:31Z

Okay, I guess I thought engine='python' would set it to pure python only. Not a big deal to me, just wanted to make sure (thought it could have been a bug or something)

jtratner · 2013-07-12T01:34:25Z

And here's a perf test between various options: http://nbviewer.ipython.org/5980552

jreback · 2013-07-12T01:37:49Z

@jtratner you are very thorough!

and you said u were busy this week :)

jtratner · 2013-07-12T01:41:08Z

haha ... didn't take long to do so I figured I'd crank it out.

This means that any text that converts to 'inf' when lower cased is considered infinity [though only when attempting to convert column to float/double]. Is that what we want to do?

(e.g., "-iNF", "inf", "inF" all would become versions of +/- infinity)

jreback · 2013-07-12T01:45:28Z

yes I think any of those variations are fine

it might be easier/faster to make a hash table of all of the combinations and just do a lookup
(this is how na values r done) - but only if u r interested in trying it out - prob not necessary

jtratner · 2013-07-12T01:59:58Z

hmm...I'll think about it. Could probably just reuse the function that creates the hashtable for na_values then. But given that right now it's a 25ish character change in the actual code (import strcasecmp, strcmp-->strcasecmp), this is pretty simple. [just realized I left in the extra (and now unnecessary inf values)]

jreback · 2013-07-12T02:13:19Z

yep looks good to go

jtratner · 2013-07-12T09:21:39Z

If you want doc update on this, I can add Monday.

On Thu, Jul 11, 2013 at 10:13 PM, jreback notifications@github.com wrote:

yep looks good to go

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/4220#issuecomment-20854752
.

jreback · 2013-07-12T12:50:36Z

hold off on release notes till after 0.12...have to add the new file

jtratner · 2013-07-12T12:57:49Z

Will do.

jtratner · 2013-07-12T12:58:26Z

Btw - when are you pulling the trigger on 0.12? :)

jreback · 2013-07-12T13:00:04Z

just a couple minor issues with 0.12rc1...then should be out

Instead of just 'inf' or '-inf', can test for 'Inf', '-Inf', 'INF', etc. Uses strcasecmp under the hood. (also, small fix to assert_almost_equal to make string comparisons clearer)

jtratner · 2013-07-26T02:26:42Z

@jreback This should be ready to go (just waitin' on Travis).

jreback · 2013-07-26T02:28:46Z

should 'infinity' be there too?

jtratner · 2013-07-26T02:36:37Z

I'd say no, because that's bordering more on things that could be string-like (not 100% sure: are inf checks are done before dtype decision?). If we did that, would need to switch to hashtable format, which I could do. your call on how you want to do it.

jreback · 2013-07-26T02:41:07Z

right. so forget that
looks good what u have

can u just do a quick perf test
just run vs 0.12 vbenchs
you can use -r regex to only run some
I some in the io_brench

just to make sure it doesn't slow it down much (I think it'll be fine though)

jtratner · 2013-07-26T03:06:36Z

@jreback is there a way to collect vbench results into a csv?

I find the my results can vary a lot and being able to combine multiple would help smooth it out.

jtratner · 2013-07-26T03:19:29Z

Here's perf results - ran it twice (wasn't sure what regex to use):
https://gist.github.com/jtratner/cb114164d3bdce2e4dfa

That said, reading-related items don't seem to be particularly affected (or at least are not negative).

cpcloud · 2013-07-26T04:18:09Z

Make sure you're not doing literally anything else when you run test perf. I sometimes forget that and I get "bad" results. Ratio > 1.15 is usually something sketchy but only if you weren't running anything else.

cpcloud · 2013-07-26T04:19:08Z

Well anything like building docs or compiling stuff. Surfing the web is OK.

jtratner · 2013-07-26T10:01:20Z

Here are just the "read_*" benches:

*** Processing results...
***
Invoked with :
--ncalls: 3
--repeats: 3


-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
read_store_table                             |   3.3030 |   4.4127 |   0.7485 |
read_csv_vb                                  |  29.5163 |  37.5853 |   0.7853 |
read_csv_comment2                            |  29.8669 |  31.6617 |   0.9433 |
read_table_multiple_date                     | 295.3293 | 298.5720 |   0.9891 |
read_store                                   |   2.6910 |   2.6627 |   1.0107 |
read_table_multiple_date_baseline            | 137.8644 | 135.1160 |   1.0203 |
read_store_table_mixed                       |   6.5330 |   6.3693 |   1.0257 |
read_parse_dates_iso8601                     |   1.7890 |   1.7393 |   1.0286 |
read_store_mixed                             |   5.8227 |   5.6427 |   1.0319 |
read_csv_thou_vb                             |  43.4330 |  41.5463 |   1.0454 |
read_store_table_panel                       |  42.0457 |  40.0033 |   1.0511 |
read_store_table_wide                        |  21.0853 |  19.6953 |   1.0706 |
read_csv_standard                            |  18.4487 |  15.5730 |   1.1847 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

Ratio < 1.0 means the target commit is faster then the baseline.
Seed used: 1234

Target [3080721] : ENH: Use case-insensitive checks for 'inf' in parser.
Base   [6c8fca9] : Merge pull request #4282 from cpcloud/add-clipboard-test

jtratner · 2013-07-26T10:10:41Z

I just ran the bench against a bogus commit (literally adding whitespace to ez_setup.py) and I get about the same differences just by random chance, so I'm not convinced that these results are particularly meaningful.

jtratner · 2013-07-26T10:11:20Z

e.g. here's against a basically blank commit:

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
read_store_table_mixed                       |   6.5234 |   6.8897 |   0.9468 |
read_store_table_panel                       |  40.2050 |  41.2770 |   0.9740 |
read_store                                   |   2.7483 |   2.7854 |   0.9867 |
read_csv_standard                            |  15.7667 |  15.9566 |   0.9881 |
read_csv_comment2                            |  32.8847 |  33.2080 |   0.9903 |
read_store_table                             |   3.2610 |   3.2636 |   0.9992 |
read_store_mixed                             |   5.7760 |   5.6917 |   1.0148 |
read_csv_thou_vb                             |  43.5576 |  42.8537 |   1.0164 |
read_parse_dates_iso8601                     |   1.7897 |   1.7587 |   1.0177 |
read_store_table_wide                        |  20.0947 |  19.7203 |   1.0190 |
read_table_multiple_date_baseline            | 141.1477 | 137.1057 |   1.0295 |
read_table_multiple_date                     | 317.5010 | 296.3533 |   1.0714 |
read_csv_vb                                  |  34.2953 |  29.2070 |   1.1742 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

jreback · 2013-07-26T10:43:49Z

perf testing is actually quite tricky as many things can affect it from one run to another - just looking to make sure that the change which we think is a non event really is

so looks good then

ENH: Treat 'Inf' as infinity in text parser

jreback · 2013-07-26T12:03:44Z

thank you sir!

ENH: Use case-insensitive checks for 'inf' in parser.

3080721

Instead of just 'inf' or '-inf', can test for 'Inf', '-Inf', 'INF', etc. Uses strcasecmp under the hood. (also, small fix to assert_almost_equal to make string comparisons clearer)

jreback added a commit that referenced this pull request Jul 26, 2013

Merge pull request #4220 from jtratner/expand-inf-comparisons

562b4bd

ENH: Treat 'Inf' as infinity in text parser

jreback merged commit 562b4bd into pandas-dev:master Jul 26, 2013

jreback mentioned this pull request Jul 26, 2013

BLD: fix inf processing in read_csv #4373

Closed

jtratner deleted the expand-inf-comparisons branch September 21, 2013 20:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Treat 'Inf' as infinity in text parser #4220

ENH: Treat 'Inf' as infinity in text parser #4220

jtratner commented Jul 12, 2013

jreback commented Jul 12, 2013

jtratner commented Jul 12, 2013

jreback commented Jul 12, 2013

jtratner commented Jul 12, 2013

jtratner commented Jul 12, 2013

jreback commented Jul 12, 2013

jtratner commented Jul 12, 2013

jreback commented Jul 12, 2013

jtratner commented Jul 12, 2013

jreback commented Jul 12, 2013

jtratner commented Jul 12, 2013

jreback commented Jul 12, 2013

jtratner commented Jul 12, 2013

jtratner commented Jul 12, 2013

jreback commented Jul 12, 2013

jtratner commented Jul 26, 2013

jreback commented Jul 26, 2013

jtratner commented Jul 26, 2013

jreback commented Jul 26, 2013

jtratner commented Jul 26, 2013

jtratner commented Jul 26, 2013

cpcloud commented Jul 26, 2013

cpcloud commented Jul 26, 2013

jtratner commented Jul 26, 2013

jtratner commented Jul 26, 2013

jtratner commented Jul 26, 2013

jreback commented Jul 26, 2013

jreback commented Jul 26, 2013

ENH: Treat 'Inf' as infinity in text parser #4220

ENH: Treat 'Inf' as infinity in text parser #4220

Conversation

jtratner commented Jul 12, 2013

jreback commented Jul 12, 2013

jtratner commented Jul 12, 2013

jreback commented Jul 12, 2013

jtratner commented Jul 12, 2013

jtratner commented Jul 12, 2013

jreback commented Jul 12, 2013

jtratner commented Jul 12, 2013

jreback commented Jul 12, 2013

jtratner commented Jul 12, 2013

jreback commented Jul 12, 2013

jtratner commented Jul 12, 2013

jreback commented Jul 12, 2013

jtratner commented Jul 12, 2013

jtratner commented Jul 12, 2013

jreback commented Jul 12, 2013

jtratner commented Jul 26, 2013

jreback commented Jul 26, 2013

jtratner commented Jul 26, 2013

jreback commented Jul 26, 2013

jtratner commented Jul 26, 2013

jtratner commented Jul 26, 2013

cpcloud commented Jul 26, 2013

cpcloud commented Jul 26, 2013

jtratner commented Jul 26, 2013

jtratner commented Jul 26, 2013

jtratner commented Jul 26, 2013

jreback commented Jul 26, 2013

jreback commented Jul 26, 2013