New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C extension with SWIG (counting bytes, not characters) #7

Merged
merged 1 commit into from Nov 17, 2017

Conversation

Projects
None yet
3 participants
@rochacbruno
Copy link
Owner

rochacbruno commented Nov 17, 2017

by @martinxyz

Note: the comparision is not really fair because the C extension is comparing
bytes, while python and rust are comparing utf8 characters.

------------------------------------------------------------------------------------- benchmark: 7 tests ------------------------------------------------------------------------------------
Name (time in us)                 Min                    Max                   Mean              StdDev                 Median                 IQR            Outliers(*)  Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_c_bytes_once            962.7170 (1.0)       1,820.2346 (1.0)         989.9444 (1.0)       51.8415 (1.0)         982.5421 (1.0)       11.8364 (1.18)           15;43     721           1
test_rust_once             1,079.2040 (1.12)      1,883.8202 (1.03)      1,101.9418 (1.11)      85.7826 (1.65)      1,087.5929 (1.11)       9.9953 (1.0)            18;67     879           1
test_rust                  2,806.8721 (2.92)      6,881.0866 (3.78)      2,964.6178 (2.99)     360.4035 (6.95)      2,872.8019 (2.92)     183.9128 (18.40)          13;13     339           1
test_regex                28,782.4213 (29.90)    32,900.5020 (18.07)    29,226.6153 (29.52)    705.3509 (13.61)    29,092.1649 (29.61)    194.7829 (19.49)            2;3      34           1
test_pure_python_once     41,820.9783 (43.44)    46,459.0848 (25.52)    42,251.5536 (42.68)    948.0384 (18.29)    42,014.0354 (42.76)    299.1488 (29.93)            1;2      23           1
test_pure_python          56,672.6997 (58.87)    58,843.5191 (32.33)    57,210.8221 (57.79)    588.5420 (11.35)    56,984.3869 (58.00)    250.2394 (25.04)            3;4      18           1
test_itertools            61,560.3020 (63.94)    62,581.4409 (34.38)    61,776.0723 (62.40)    285.4289 (5.51)     61,637.5552 (62.73)    315.8825 (31.60)            2;1      17           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
C extension with SWIG (counting bytes, not characters)
Note: the comparision is not really fair because the C extension is comparing
bytes, while python and rust are comparing utf8 characters.

------------------------------------------------------------------------------------- benchmark: 7 tests ------------------------------------------------------------------------------------
Name (time in us)                 Min                    Max                   Mean              StdDev                 Median                 IQR            Outliers(*)  Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_c_bytes_once            962.7170 (1.0)       1,820.2346 (1.0)         989.9444 (1.0)       51.8415 (1.0)         982.5421 (1.0)       11.8364 (1.18)           15;43     721           1
test_rust_once             1,079.2040 (1.12)      1,883.8202 (1.03)      1,101.9418 (1.11)      85.7826 (1.65)      1,087.5929 (1.11)       9.9953 (1.0)            18;67     879           1
test_rust                  2,806.8721 (2.92)      6,881.0866 (3.78)      2,964.6178 (2.99)     360.4035 (6.95)      2,872.8019 (2.92)     183.9128 (18.40)          13;13     339           1
test_regex                28,782.4213 (29.90)    32,900.5020 (18.07)    29,226.6153 (29.52)    705.3509 (13.61)    29,092.1649 (29.61)    194.7829 (19.49)            2;3      34           1
test_pure_python_once     41,820.9783 (43.44)    46,459.0848 (25.52)    42,251.5536 (42.68)    948.0384 (18.29)    42,014.0354 (42.76)    299.1488 (29.93)            1;2      23           1
test_pure_python          56,672.6997 (58.87)    58,843.5191 (32.33)    57,210.8221 (57.79)    588.5420 (11.35)    56,984.3869 (58.00)    250.2394 (25.04)            3;4      18           1
test_itertools            61,560.3020 (63.94)    62,581.4409 (34.38)    61,776.0723 (62.40)    285.4289 (5.51)     61,637.5552 (62.73)    315.8825 (31.60)            2;1      17           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
@rochacbruno

This comment has been minimized.

Copy link

rochacbruno commented on f8e36ab Nov 17, 2017

Hey! nice! can you send a Pull Request @martinxyz I would like to add your C implementation to the repo and article

This comment has been minimized.

Copy link

rochacbruno replied Nov 17, 2017

Hi, I opened the PR myself, and will adjust and add to the repo, thanks

rochacbruno#7

This comment has been minimized.

Copy link
Owner

martinxyz replied Nov 17, 2017

Ah I didn't expect you'd want to integrate it. Looks all done now, so thanks :-)

@cuviper

This comment has been minimized.

Copy link
Contributor

cuviper commented Nov 17, 2017

To do rust_bytes_once, just change its .chars() to .bytes(). I get:

---------------------------------------------------------------------------------------------
Name (time in us)                 Min                    Max                   Mean          
---------------------------------------------------------------------------------------------
test_rust_bytes_once         318.9880 (1.0)         539.6140 (1.0)         319.9789 (1.0)    
test_c_bytes_once            551.8680 (1.73)        891.1050 (1.65)        553.9931 (1.73)   
test_rust_once               694.0350 (2.18)        883.5360 (1.64)        696.2701 (2.18)   
test_rust                  1,806.4350 (5.66)      2,686.6140 (4.98)      1,821.0970 (5.69)   
test_regex                14,234.3260 (44.62)    14,489.9930 (26.85)    14,271.0314 (44.60)  
test_pure_python_once     24,460.7070 (76.68)    32,115.6160 (59.52)    25,180.8352 (78.70)  
test_pure_python          30,797.6000 (96.55)    31,115.7370 (57.66)    30,959.0431 (96.75)  
test_itertools            34,062.9200 (106.78)   35,351.9700 (65.51)    34,241.0481 (107.01) 
---------------------------------------------------------------------------------------------

😄

@rochacbruno

This comment has been minimized.

Copy link
Owner

rochacbruno commented Nov 17, 2017

Thanks @martinxyz and @cuviper

I am going to commit the new implementations and see the results

screenshot_2017-11-17_16-18-27

@rochacbruno rochacbruno merged commit f8e36ab into rochacbruno:master Nov 17, 2017

@martinxyz

This comment has been minimized.

Copy link
Contributor

martinxyz commented Nov 17, 2017

Nice update! By the way, I did not implement bytes because it was faster, but because I had no clue at all how to do this correctly in C. (And I'm not going to try if I can use Rust instead.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment