New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: Removed the GIL from parts of the TextReader class #11272

Merged
merged 1 commit into from Nov 4, 2015

Conversation

Projects
None yet
3 participants
@jdeschenes
Contributor

jdeschenes commented Oct 9, 2015

The GIL was removed around the tokenizer functions and the conversion function(_string_convert excluded).

Benchmark:

Data Generation:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(1000000,10))
df.to_csv('test.csv')

Benchmark Code:

import pandas as pd
from pandas.util.testing import test_parallel

def f():
    for i in range(4):
        pd.read_csv('test.csv', index_col=0)

@test_parallel(4)
def g():
    pd.read_csv('test.csv', index_col=0)

Before:

In [4]: %timeit pd.read_csv('test.csv', index_col=0)
1 loops, best of 3: 2.3 s per loop

In [7]: %timeit f()
1 loops, best of 3: 9.15 s per loop

In [8]: %timeit g()
1 loops, best of 3: 9.25 s per loop

After:

In [6]: %timeit pd.read_csv('test.csv', index_col=0)
1 loops, best of 3: 2.35 s per loop

In [9]: %timeit f()
1 loops, best of 3: 9.55 s per loop

In [10]: %timeit g()
1 loops, best of 3: 4.38 s per loop

@jreback jreback added this to the 0.17.1 milestone Oct 9, 2015

@jreback

View changes

Show outdated Hide outdated asv_bench/benchmarks/gil.py Outdated
@jreback

View changes

Show outdated Hide outdated pandas/parser.pyx Outdated
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 9, 2015

Contributor

some windows cythoning errors.

odd they don't show up for you, what platform are you testing on?

[pandas3.5] C:\Users\Jeff Reback\pandas3.5>git checkout -b jdeschenes-nogil_csv master
Switched to a new branch 'jdeschenes-nogil_csv'

[pandas3.5] C:\Users\Jeff Reback\pandas3.5>git pull https://github.com/jdeschenes/pandas.git nogil_csv
remote: Counting objects: 7, done.
remote: Compressing objects: 100% (7/7), done.
remote: Total 7 (delta 0), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (7/7), done.
From https://github.com/jdeschenes/pandas
 * branch            nogil_csv  -> FETCH_HEAD
Updating 26db172..2da353e
Fast-forward
 asv_bench/benchmarks/gil.py |  23 ++++
 pandas/parser.pyx           | 269 +++++++++++++++++++++++++++++++-------------
 2 files changed, 213 insertions(+), 79 deletions(-)

[pandas3.5] C:\Users\Jeff Reback\pandas3.5>make
python setup.py build_ext --inplace
make: python: Command not found
Makefile:2: recipe for target `tseries' failed
make: *** [tseries] Error 127

[pandas3.5] C:\Users\Jeff Reback\pandas3.5>python setup.py build_ext --inplace
running build_ext
skipping 'pandas\index.c' Cython extension (up-to-date)
skipping 'pandas\tslib.c' Cython extension (up-to-date)
skipping 'pandas\hashtable.c' Cython extension (up-to-date)
skipping 'pandas\algos.c' Cython extension (up-to-date)
cythoning pandas\parser.pyx to pandas\parser.c

Error compiling Cython file:
------------------------------------------------------------
...
            # in the hash table
            if k != na_hashset.n_buckets:
                na_count[0] += 1
                data[0] = NA
            else:
                data[0] = parser.converter(word, &p_end, parser.decimal, parser.sci,
                                         ^
------------------------------------------------------------

pandas\parser.pyx:1538:42: Calling gil-requiring function not allowed without gil

Error compiling Cython file:
------------------------------------------------------------
...
                        data[0] = NA
            data += 1
    else:
        for i in range(lines):
            COLITER_NEXT(it, word)
            data[0] = parser.converter(word, &p_end, parser.decimal, parser.sci,
                                     ^
------------------------------------------------------------

pandas\parser.pyx:1557:38: Calling gil-requiring function not allowed without gil

Error compiling Cython file:
------------------------------------------------------------
...
                na_count[0] += 1
                data[0] = NA
                data += 1
                continue

            error = to_boolean(word, data)
                             ^
------------------------------------------------------------

pandas\parser.pyx:1680:30: Calling gil-requiring function not allowed without gil

Error compiling Cython file:
------------------------------------------------------------
...
            data += 1
    else:
        for i in range(lines):
            COLITER_NEXT(it, word)

            error = to_boolean(word, data)
                             ^
------------------------------------------------------------

pandas\parser.pyx:1688:30: Calling gil-requiring function not allowed without gil

Error compiling Cython file:
------------------------------------------------------------
...
            if k != false_hashset.n_buckets:
                data[0] = 0
                data += 1
                continue

            error = to_boolean(word, data)
                             ^
------------------------------------------------------------

pandas\parser.pyx:1758:30: Calling gil-requiring function not allowed without gil

Error compiling Cython file:
------------------------------------------------------------
...
            if k != false_hashset.n_buckets:
                data[0] = 0
                data += 1
                continue

            error = to_boolean(word, data)
                             ^
------------------------------------------------------------

pandas\parser.pyx:1778:30: Calling gil-requiring function not allowed without gil
building 'pandas.parser' extension
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\amd64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -Ipandas/src/klib -Ipandas/src -IC:\Miniconda\envs\pandas3.5\lib\site-packages\numpy\core\in
clude -IC:\Miniconda\envs\pandas3.5\include -IC:\Miniconda\envs\pandas3.5\include "-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Microsoft Visual Studio 1
4.0\VC\ATLMFC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6\include\um" "-IC:\Program Files (x86)\Windows Kits\10\in
clude\10.0.10240.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\winrt" /Tcpandas\parser.c /Fobuild\temp.win
-amd64-3.5\Release\pandas\parser.obj
parser.c
pandas\parser.c(1): fatal error C1189: #error:  Do not use this file, it is the result of a failed Cython compilation.
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN\\amd64\\cl.exe' failed with exit status 2

[pandas3.5] C:\Users\Jeff Reback\pandas3.5>
Contributor

jreback commented Oct 9, 2015

some windows cythoning errors.

odd they don't show up for you, what platform are you testing on?

[pandas3.5] C:\Users\Jeff Reback\pandas3.5>git checkout -b jdeschenes-nogil_csv master
Switched to a new branch 'jdeschenes-nogil_csv'

[pandas3.5] C:\Users\Jeff Reback\pandas3.5>git pull https://github.com/jdeschenes/pandas.git nogil_csv
remote: Counting objects: 7, done.
remote: Compressing objects: 100% (7/7), done.
remote: Total 7 (delta 0), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (7/7), done.
From https://github.com/jdeschenes/pandas
 * branch            nogil_csv  -> FETCH_HEAD
Updating 26db172..2da353e
Fast-forward
 asv_bench/benchmarks/gil.py |  23 ++++
 pandas/parser.pyx           | 269 +++++++++++++++++++++++++++++++-------------
 2 files changed, 213 insertions(+), 79 deletions(-)

[pandas3.5] C:\Users\Jeff Reback\pandas3.5>make
python setup.py build_ext --inplace
make: python: Command not found
Makefile:2: recipe for target `tseries' failed
make: *** [tseries] Error 127

[pandas3.5] C:\Users\Jeff Reback\pandas3.5>python setup.py build_ext --inplace
running build_ext
skipping 'pandas\index.c' Cython extension (up-to-date)
skipping 'pandas\tslib.c' Cython extension (up-to-date)
skipping 'pandas\hashtable.c' Cython extension (up-to-date)
skipping 'pandas\algos.c' Cython extension (up-to-date)
cythoning pandas\parser.pyx to pandas\parser.c

Error compiling Cython file:
------------------------------------------------------------
...
            # in the hash table
            if k != na_hashset.n_buckets:
                na_count[0] += 1
                data[0] = NA
            else:
                data[0] = parser.converter(word, &p_end, parser.decimal, parser.sci,
                                         ^
------------------------------------------------------------

pandas\parser.pyx:1538:42: Calling gil-requiring function not allowed without gil

Error compiling Cython file:
------------------------------------------------------------
...
                        data[0] = NA
            data += 1
    else:
        for i in range(lines):
            COLITER_NEXT(it, word)
            data[0] = parser.converter(word, &p_end, parser.decimal, parser.sci,
                                     ^
------------------------------------------------------------

pandas\parser.pyx:1557:38: Calling gil-requiring function not allowed without gil

Error compiling Cython file:
------------------------------------------------------------
...
                na_count[0] += 1
                data[0] = NA
                data += 1
                continue

            error = to_boolean(word, data)
                             ^
------------------------------------------------------------

pandas\parser.pyx:1680:30: Calling gil-requiring function not allowed without gil

Error compiling Cython file:
------------------------------------------------------------
...
            data += 1
    else:
        for i in range(lines):
            COLITER_NEXT(it, word)

            error = to_boolean(word, data)
                             ^
------------------------------------------------------------

pandas\parser.pyx:1688:30: Calling gil-requiring function not allowed without gil

Error compiling Cython file:
------------------------------------------------------------
...
            if k != false_hashset.n_buckets:
                data[0] = 0
                data += 1
                continue

            error = to_boolean(word, data)
                             ^
------------------------------------------------------------

pandas\parser.pyx:1758:30: Calling gil-requiring function not allowed without gil

Error compiling Cython file:
------------------------------------------------------------
...
            if k != false_hashset.n_buckets:
                data[0] = 0
                data += 1
                continue

            error = to_boolean(word, data)
                             ^
------------------------------------------------------------

pandas\parser.pyx:1778:30: Calling gil-requiring function not allowed without gil
building 'pandas.parser' extension
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\amd64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -Ipandas/src/klib -Ipandas/src -IC:\Miniconda\envs\pandas3.5\lib\site-packages\numpy\core\in
clude -IC:\Miniconda\envs\pandas3.5\include -IC:\Miniconda\envs\pandas3.5\include "-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Microsoft Visual Studio 1
4.0\VC\ATLMFC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6\include\um" "-IC:\Program Files (x86)\Windows Kits\10\in
clude\10.0.10240.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\winrt" /Tcpandas\parser.c /Fobuild\temp.win
-amd64-3.5\Release\pandas\parser.obj
parser.c
pandas\parser.c(1): fatal error C1189: #error:  Do not use this file, it is the result of a failed Cython compilation.
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN\\amd64\\cl.exe' failed with exit status 2

[pandas3.5] C:\Users\Jeff Reback\pandas3.5>
@jdeschenes

This comment has been minimized.

Show comment
Hide comment
@jdeschenes

jdeschenes Oct 9, 2015

Contributor

I fixed a few issues with the build

Contributor

jdeschenes commented Oct 9, 2015

I fixed a few issues with the build

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 9, 2015

Contributor

builds ,couple of errors on windows

[pandas3.5] C:\Users\Jeff Reback\pandas3.5>nosetests -A "not network" pandas\io\tests\test_parsers.py
...........S.........................................................................F......................................S....................S..........................S...........................
................................F........S..................................................................................................................................S...........................
..............................................
======================================================================
FAIL: test_parse_bools (pandas.io.tests.test_parsers.TestCParserHighMemory)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\Jeff Reback\pandas3.5\pandas\io\tests\test_parsers.py", line 1177, in test_parse_bools
    self.assertEqual(data['A'].dtype, np.bool_)
AssertionError: dtype('O') != <class 'numpy.bool_'>

======================================================================
FAIL: test_parse_bools (pandas.io.tests.test_parsers.TestCParserLowMemory)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\Jeff Reback\pandas3.5\pandas\io\tests\test_parsers.py", line 1177, in test_parse_bools
    self.assertEqual(data['A'].dtype, np.bool_)
AssertionError: dtype('O') != <class 'numpy.bool_'>

----------------------------------------------------------------------
Ran 446 tests in 12.620s

FAILED (SKIP=6, failures=2)
Contributor

jreback commented Oct 9, 2015

builds ,couple of errors on windows

[pandas3.5] C:\Users\Jeff Reback\pandas3.5>nosetests -A "not network" pandas\io\tests\test_parsers.py
...........S.........................................................................F......................................S....................S..........................S...........................
................................F........S..................................................................................................................................S...........................
..............................................
======================================================================
FAIL: test_parse_bools (pandas.io.tests.test_parsers.TestCParserHighMemory)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\Jeff Reback\pandas3.5\pandas\io\tests\test_parsers.py", line 1177, in test_parse_bools
    self.assertEqual(data['A'].dtype, np.bool_)
AssertionError: dtype('O') != <class 'numpy.bool_'>

======================================================================
FAIL: test_parse_bools (pandas.io.tests.test_parsers.TestCParserLowMemory)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\Jeff Reback\pandas3.5\pandas\io\tests\test_parsers.py", line 1177, in test_parse_bools
    self.assertEqual(data['A'].dtype, np.bool_)
AssertionError: dtype('O') != <class 'numpy.bool_'>

----------------------------------------------------------------------
Ran 446 tests in 12.620s

FAILED (SKIP=6, failures=2)
@jdeschenes

This comment has been minimized.

Show comment
Hide comment
@jdeschenes

jdeschenes Oct 9, 2015

Contributor

This was an issue with python3 and not limited to windows.

Contributor

jdeschenes commented Oct 9, 2015

This was an issue with python3 and not limited to windows.

@jreback

View changes

Show outdated Hide outdated asv_bench/benchmarks/gil.py Outdated
@jreback

View changes

Show outdated Hide outdated pandas/parser.pyx Outdated
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 13, 2015

Contributor

@jdeschenes

  • couple of questions for you above.
  • can you post the asv results here.
  • pls add a note in the performance section (use this PR number)
  • squash
Contributor

jreback commented Oct 13, 2015

@jdeschenes

  • couple of questions for you above.
  • can you post the asv results here.
  • pls add a note in the performance section (use this PR number)
  • squash
@jdeschenes

This comment has been minimized.

Show comment
Hide comment
@jdeschenes

jdeschenes Oct 13, 2015

Contributor

@jreback

Where do I need to add information in the performance section? Is it in the what's new file in the documentation?

Contributor

jdeschenes commented Oct 13, 2015

@jreback

Where do I need to add information in the performance section? Is it in the what's new file in the documentation?

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 13, 2015

Contributor

whatsnew/v0.17.1 (Performance section)

Contributor

jreback commented Oct 13, 2015

whatsnew/v0.17.1 (Performance section)

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 25, 2015

Contributor

@jdeschenes looks good. can you

  • add a note in whatsnew (performance section)
  • rebase
  • squash
Contributor

jreback commented Oct 25, 2015

@jdeschenes looks good. can you

  • add a note in whatsnew (performance section)
  • rebase
  • squash
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Nov 2, 2015

Contributor

can you update according to comments

Contributor

jreback commented Nov 2, 2015

can you update according to comments

@mrocklin

This comment has been minimized.

Show comment
Hide comment
@mrocklin

mrocklin Nov 3, 2015

Contributor

What's the status on this @jdeschenes? I'd like to include this work in a talk happening tomorrow. It'd be awesome to be able to say that this was in master rather than in a branch.

Contributor

mrocklin commented Nov 3, 2015

What's the status on this @jdeschenes? I'd like to include this work in a talk happening tomorrow. It'd be awesome to be able to say that this was in master rather than in a branch.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Nov 3, 2015

Contributor

you can say slated for 0.17.1 :)

Contributor

jreback commented Nov 3, 2015

you can say slated for 0.17.1 :)

@jdeschenes

This comment has been minimized.

Show comment
Hide comment
@jdeschenes

jdeschenes Nov 3, 2015

Contributor

I will get the final changes tomorrow.

Contributor

jdeschenes commented Nov 3, 2015

I will get the final changes tomorrow.

@jdeschenes

This comment has been minimized.

Show comment
Hide comment
@jdeschenes

jdeschenes Nov 4, 2015

Contributor

@jreback, The changes have been implemented. Let me know if there is anything else that needs to be done.

Contributor

jdeschenes commented Nov 4, 2015

@jreback, The changes have been implemented. Let me know if there is anything else that needs to be done.

@jreback

View changes

Show outdated Hide outdated pandas/parser.pyx Outdated
@jreback

View changes

Show outdated Hide outdated doc/source/whatsnew/v0.17.1.txt Outdated
@jreback

View changes

Show outdated Hide outdated pandas/parser.pyx Outdated
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Nov 4, 2015

Contributor

@jdeschenes thanks, just some small comments. ping when pushed. pls also post a short benchmark in the top of the PR (you can just run before/after in ipython via timeit if you want), mainly for posterity.

Contributor

jreback commented Nov 4, 2015

@jdeschenes thanks, just some small comments. ping when pushed. pls also post a short benchmark in the top of the PR (you can just run before/after in ipython via timeit if you want), mainly for posterity.

PERF: Released the GIL from parts of the TextReader class
The GIL was released around the tokenizer functions and the conversion function(_string_convert excluded).
@jdeschenes

This comment has been minimized.

Show comment
Hide comment
@jdeschenes

jdeschenes Nov 4, 2015

Contributor

@jreback: Added the benchmarks.

Contributor

jdeschenes commented Nov 4, 2015

@jreback: Added the benchmarks.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Nov 4, 2015

Contributor

looks good

ping when green

Contributor

jreback commented Nov 4, 2015

looks good

ping when green

@mrocklin

This comment has been minimized.

Show comment
Hide comment
@mrocklin

mrocklin Nov 4, 2015

Contributor

Ping

Contributor

mrocklin commented Nov 4, 2015

Ping

jreback added a commit that referenced this pull request Nov 4, 2015

Merge pull request #11272 from jdeschenes/nogil_csv
PERF: Removed the GIL from parts of the TextReader class

@jreback jreback merged commit 774411c into pandas-dev:master Nov 4, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Nov 4, 2015

Contributor

thanks @jdeschenes

and @mrocklin for the pings!

Contributor

jreback commented Nov 4, 2015

thanks @jdeschenes

and @mrocklin for the pings!

@jdeschenes jdeschenes deleted the jdeschenes:nogil_csv branch Nov 4, 2015

khs26 added a commit to khs26/pandas that referenced this pull request Nov 6, 2015

Merge pull request #1 from pydata/master
Merge pull request pandas-dev#11272 from jdeschenes/nogil_csv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment