Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module name matching is too fuzzy #63

Closed
taviso opened this issue Mar 22, 2019 · 4 comments
Closed

Module name matching is too fuzzy #63

taviso opened this issue Mar 22, 2019 · 4 comments
Assignees
Labels
Milestone

Comments

@taviso
Copy link

taviso commented Mar 22, 2019

I've been trying out lighthouse for coverage browsing, it's very polished, thanks!

I ran into one issue, the module name matching is too relaxed. If I have coverage for module "foo", but a module is listed earlier in the list called "foobar", then it matches and lighthouse claims there is no coverage.

I believe this is because you search for modules with fuzzy matching by default like this:

            # attempt lookup using case-insensitive filename
            for module in self.modules:
                if module_name.lower() in module.filename.lower():
                    return module

I manually edited my coverage file so that unrelated modules didn't contain my module as a substring, and then it worked perfectly.

>>> "foo".lower() in "foobar".lower()
True

I think fuzzy should be the fallback, not the default, WDYT?

@gaasedelen
Copy link
Owner

Hi Tavis! Thanks for the kind words.

I agree that this is an issue. The fuzzy lookup flow is admittedly... pretty cheap. I am kind of surprised it has lasted this long without someone raising your example.

I have been meaning to improve this code, but also add a bailout dialog to list the modules for human selection. A lot of the coverage loading code is starting to get retooled on the develop branch, so this is probably a good time to address it.

In the meantime, I would recommend you tweak the fuzzy function logic to serve your needs. I'll close this issue once my changes make their way onto dev, maybe in the next week or two.

@gaasedelen gaasedelen self-assigned this Mar 22, 2019
@gaasedelen gaasedelen added the bug label Mar 22, 2019
@gaasedelen gaasedelen added this to the v0.9.0 milestone Mar 22, 2019
@cregnec
Copy link

cregnec commented May 14, 2019

Hi, I'm trying to use lighthouse with dynamorio. The result is pretty cool.

But I had some problem with the get_module too.
when there are several segments, dynamorio generates multiple modules with the same name.

Columns: id, containing_id, start, end, entry, offset, path

2,   2, 0x00007f5d4b26b000, 0x00007f5d4b278000, 0x00007f5d4b278170, 0000000000000000, module
3,   2, 0x00007f5d4b278000, 0x00007f5d4b29b000, 0x00007f5d4b278170, 000000000000d000, module
4,   2, 0x00007f5d4b29b000, 0x00007f5d4b2a8000, 0x00007f5d4b278170, 0000000000030000, module
5,   2, 0x00007f5d4b2a8000, 0x00007f5d4b2ea000, 0x00007f5d4b278170, 000000000003c8a8, module

currently, only one module is returned:

module = self.get_module(module_name)

It should probably return all matching modules and then merge their IDs.

@gaasedelen
Copy link
Owner

Hi, I'm trying to use lighthouse with dynamorio. The result is pretty cool.

But I had some problem with the get_module too.
when there are several segments, dynamorio generates multiple modules with the same name.

Columns: id, containing_id, start, end, entry, offset, path

2,   2, 0x00007f5d4b26b000, 0x00007f5d4b278000, 0x00007f5d4b278170, 0000000000000000, module
3,   2, 0x00007f5d4b278000, 0x00007f5d4b29b000, 0x00007f5d4b278170, 000000000000d000, module
4,   2, 0x00007f5d4b29b000, 0x00007f5d4b2a8000, 0x00007f5d4b278170, 0000000000030000, module
5,   2, 0x00007f5d4b2a8000, 0x00007f5d4b2ea000, 0x00007f5d4b278170, 000000000003c8a8, module

currently, only one module is returned:

module = self.get_module(module_name)

It should probably return all matching modules and then merge their IDs.

Sorry for the late reply, I am only now catching up on lighthouse maintenance 😬.

The drcov module splitting is unrelated to the main issue reported by taviso, but I just pushed a fix for it in commit f6902ba. The lighthouse development branch has a bunch of other updates & fixes, I would recommend updating to it.

Regarding the main issue around fuzzy name matching, it is the next item on my agenda. I'll have something on the dev branch in the next 24 hours.

gaasedelen added a commit that referenced this issue Apr 2, 2020
@gaasedelen
Copy link
Owner

gaasedelen commented Apr 2, 2020

Long overdue, but there is now a fallback dialog that will pop-up if lighthouse cannot automatically match your database/binary name to the ones in a coverage file:

image

This is available on the development branch right now.

I think the fuzzy-name lookup was tweaked to be stricter when this issue first came in, but I will double check and make sure before the upcoming release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants