Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
PERF: Removed the GIL from parts of the TextReader class #11272
Conversation
jreback
added Performance CSV
labels
Oct 9, 2015
jreback
added this to the
0.17.1
milestone
Oct 9, 2015
jreback
commented on an outdated diff
Oct 9, 2015
| @@ -320,3 +325,21 @@ def time_nogil_kth_smallest(self): | ||
| def run(arr): | ||
| algos.kth_smallest(arr, self.k) | ||
| run() | ||
| + | ||
| + | ||
| +class nogil_read_csv(object): | ||
| + number = 1 | ||
| + repeat = 5 | ||
| + | ||
| + def setup(self): | ||
| + if (not have_real_test_parallel): | ||
| + raise NotImplementedError | ||
| + # Using the values | ||
| + self.data = '0.1213700904466425978256438611,0.0525708283766902484401839501,0.4174092731488769913994474336\n 0.4096341697147408700274695547,0.1587830198973579909349496119,0.1292545832485494372576795285\n 0.8323255650024565799327547210,0.9694902427379478160318626578,0.6295047811546814475747169126\n 0.4679375305798131323697930383,0.2963942381834381301075609371,0.5268936082160610157032465394\n 0.6685382761849776311890991564,0.6721207066140679753374342908,0.6519975277021627935170045020\n ' |
jreback
Contributor
|
jreback
commented on an outdated diff
Oct 9, 2015
|
some windows cythoning errors. odd they don't show up for you, what platform are you testing on?
|
|
I fixed a few issues with the build |
|
builds ,couple of errors on windows
|
|
This was an issue with python3 and not limited to windows. |
jreback
and 1 other
commented on an outdated diff
Oct 9, 2015
| + self.df.to_csv('__test__.csv') | ||
| + | ||
| + self.rng = date_range('1/1/2000', periods=10000) | ||
| + self.df_date_time = DataFrame(np.random.randn(10000, 10), index=self.rng) | ||
| + self.df_date_time.to_csv('__test_datetime__.csv') | ||
| + | ||
| + self.df_object = DataFrame('foo', index=self.df.index, columns=self.create_cols('object')) | ||
| + self.df_object.to_csv('__test_object__.csv') | ||
| + | ||
| + def create_cols(self, name): | ||
| + return [('%s%03d' % (name, i)) for i in range(5)] | ||
| + | ||
| + def time_nogil_read_csv(self): | ||
| + @test_parallel(num_threads=2) | ||
| + def run(arr): | ||
| + read_csv('__test__.csv', sep=',', header=None, float_precision=None) |
jreback
Contributor
|
jreback
and 1 other
commented on an outdated diff
Oct 9, 2015
| @@ -1452,6 +1450,20 @@ cdef _to_fw_string(parser_t *parser, int col, int line_start, | ||
| result = np.empty(line_end - line_start, dtype='|S%d' % width) | ||
| data = <char*> result.data | ||
| + with nogil: | ||
| + error = _to_fw_string_internal(parser, col, line_start, line_end, width, data) | ||
| + if error != 0: | ||
| + raise CParserError("Unknown error raised") |
jdeschenes
Contributor
|
|
|
Where do I need to add information in the performance section? Is it in the what's new file in the documentation? |
|
whatsnew/v0.17.1 (Performance section) |
|
@jdeschenes looks good. can you
|
|
can you update according to comments |
mrocklin
commented
Nov 3, 2015
|
What's the status on this @jdeschenes? I'd like to include this work in a talk happening tomorrow. It'd be awesome to be able to say that this was in master rather than in a branch. |
|
you can say slated for 0.17.1 :) |
|
I will get the final changes tomorrow. |
|
@jreback, The changes have been implemented. Let me know if there is anything else that needs to be done. |
jreback
commented on an outdated diff
Nov 4, 2015
jreback
commented on an outdated diff
Nov 4, 2015
| @@ -60,6 +60,8 @@ Performance Improvements | ||
| - Release the GIL on most datetime field operations (e.g. ``DatetimeIndex.year``, ``Series.dt.year``), normalization, and conversion to and from ``Period``, ``DatetimeIndex.to_period`` and ``PeriodIndex.to_timestamp`` (:issue:`11263`) | ||
| - Release the GIL on some srolling algos (``rolling_median``, ``rolling_mean``, ``rolling_max``, ``rolling_min``, ``rolling_var``, ``rolling_kurt``, `rolling_skew`` (:issue:`11450`) | ||
| +- Release the GIL when reading a file using the TextReader class(``read_csv``, ``read_table``). | ||
| + The GIL is now released around the tokenizer functions and the conversion functions(_string_convert excluded). |
jreback
Contributor
|
jreback
commented on an outdated diff
Nov 4, 2015
| lines = line_end - line_start | ||
| result = np.empty(lines, dtype=np.float64) | ||
| data = <double *> result.data | ||
| + na_fset = kset_float64_from_list(na_flist) | ||
| + with nogil: | ||
| + error = _try_double_internal(parser, col, line_start, line_end, | ||
| + na_filter, na_hashset, use_na_flist, na_fset, NA, data, &na_count) |
|
|
|
@jdeschenes thanks, just some small comments. ping when pushed. pls also post a short benchmark in the top of the PR (you can just run before/after in ipython via timeit if you want), mainly for posterity. |
|
@jreback: Added the benchmarks. |
|
looks good ping when green |
mrocklin
commented
Nov 4, 2015
|
Ping |
jreback
added a commit
that referenced
this pull request
Nov 4, 2015
|
|
jreback |
774411c
|
jreback
merged commit 774411c
into pandas-dev:master
Nov 4, 2015
1 check passed
|
thanks @jdeschenes and @mrocklin for the pings! |
jdeschenes
deleted the
jdeschenes:nogil_csv branch
Nov 4, 2015
khs26
added a commit
to khs26/pandas
that referenced
this pull request
Nov 6, 2015
|
|
khs26 |
fd4fc5f
|
jdeschenes commentedOct 9, 2015
The GIL was removed around the tokenizer functions and the conversion function(_string_convert excluded).
Benchmark:
Data Generation:
Benchmark Code:
Before:
After: