Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove string.c from the package. #655

Merged
merged 9 commits into from
May 11, 2020
Merged

remove string.c from the package. #655

merged 9 commits into from
May 11, 2020

Conversation

Niraj-Kamdar
Copy link
Contributor

Removed string.c from the package and also removed usage of it from setup script and string module.

@codecov-io
Copy link

codecov-io commented May 9, 2020

Codecov Report

Merging #655 into master will increase coverage by 0.14%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #655      +/-   ##
==========================================
+ Coverage   78.50%   78.64%   +0.14%     
==========================================
  Files          63       63              
  Lines        2549     2538      -11     
  Branches      498      500       +2     
==========================================
- Hits         2001     1996       -5     
+ Misses        406      402       -4     
+ Partials      142      140       -2     
Flag Coverage Δ
#longtests 78.64% <100.00%> (+0.14%) ⬆️
Impacted Files Coverage Δ
cve_bin_tool/strings.py 100.00% <100.00%> (+20.00%) ⬆️
test/test_strings.py 88.00% <100.00%> (+1.88%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d4fdc16...36fac32. Read the comment docs.

@Niraj-Kamdar
Copy link
Contributor Author

Surprisingly, new string parser is as fast as string command during first run and in subsequent run new python parser is faster than strings command.

test_binutils
strings command: 0.0014863650000052075
string.parse: 0.0014164240000127393
#########

test_curl
strings command: 0.0018305610000197703
string.parse: 0.0010337909999407202
#########

test_kerberos
strings command: 0.0015554930000689637
string.parse: 0.0011313310000105048
#########

Here is the script I used for benchmarking

    def _parse_test(self, filename):
        """Helper function to parse a binary file and check whether
        the given string is in the parsed result"""
        self.strings.filename = os.path.join(BINARIES_PATH, filename)
        with tempfile.TemporaryFile() as f:
            t1 = perf_counter()
            subprocess.call(["strings", self.strings.filename], stdout=f)
            binutils_strings = f.readlines()
            t2 = perf_counter()
            print(f"strings command: {t2 - t1}")
            t1 = perf_counter()
            ours = self.strings.parse().split("\n")
            t2 = perf_counter()
            print(f"string.parse: {t2 - t1}")
            for theirs in binutils_strings:
                self.assertIn(theirs.decode("utf-8"), ours)

Should we remove subprocess?

Copy link
Member

@pdxjohnny pdxjohnny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dist/cve_bin_tool-1.0-py3.7.egg should be removed from this.

Looking good!

test/test_strings.py Outdated Show resolved Hide resolved
@Niraj-Kamdar
Copy link
Contributor Author

subprocess is slow because of the overhead of system call. current python solution is very fast due to various improvement like set to lookup printable character, use of join and list instead of string concatenation.

Copy link
Member

@pdxjohnny pdxjohnny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dist/cve_bin_tool-1.0-py3.7.egg is still here. Also let's make sure we have performance numbers for the final implementation before we hit merge on this.

cve_bin_tool/strings.py Show resolved Hide resolved
@Niraj-Kamdar
Copy link
Contributor Author

python implementation is taking half the time of subprocess string

subprocess string: 0.004477346000385296
python string: 0.002734930999849894

subprocess string: 0.005265122999844607
python string: 0.002693021000141016

@pdxjohnny
Copy link
Member

Okay looks good. It is faster than pstring but I'm not sure if it's really faster than strings. When benchmarking it's always good to run something for multiple iterations, rather than just one pass.

For example here are the numbers I saw with 10,000 runs.

subprocess string: 18.777919821441174
pstring string: 62.875923331826925
python string: 22.99539073370397

Let's back out the benchmarking code and then we'll merge this. Thanks!

@pdxjohnny pdxjohnny merged commit e0aec09 into intel:master May 11, 2020
@pdxjohnny
Copy link
Member

Thanks! Nice work!

@Niraj-Kamdar Niraj-Kamdar deleted the py2 branch May 12, 2020 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants