Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting extremely broad search results when searching on username field #243

Open
ianfitzpatrick opened this issue Apr 25, 2018 · 5 comments

Comments

@ianfitzpatrick
Copy link

I am running into a weird issue where when searching on a username (like bob@example.com), for certain users I get extremely broad results...users that definitely do not have that phrase in their title, description, or content fields. In one case I get 7000+ results in my queryset, even though the email in question definitely only associated with one entry in my index.

To make things more confusing, some searches return as expected. If I do "kara@example.com" for instance, I get exactly one results, as would be expected since username is a unique field.

Here is my app config:

class UsersAppConfig(AppConfig):
    """
    Automatically import standalone signals file once app is ready.

    Get around a circular import error otherwise facing.
    """

    name = 'users'

    def ready(self):
        import signals 
        from django.contrib.auth.models import User
        watson.register(
            User, CaseInsensitiveSearchAdapter, fields=(
                'first_name',
                'last_name',
                'username'
            )
        )

And the custom adapter I created based on some code you posted:

class CaseInsensitiveSearchAdapter(watson.SearchAdapter):

    def get_title(self, obj):
        return super(
            CaseInsensitiveSearchAdapter, self
        ).get_title(obj).lower()

    def get_description(self, obj):
        return super(
            CaseInsensitiveSearchAdapter, self
        ).get_description(obj).lower()

    def get_content(self, obj):
        return super(
            CaseInsensitiveSearchAdapter, self
        ).get_content(obj).lower()

I am using MySQL as my database. When I manually inspect the data in the index, I don't see any duplication of data. And if I do a normal contains query for "bob@example.com" I only get one result.

Sorry this is not the best issue as I don't know how to provide a reduced case here. Maybe there is a forehead thunker here that sticks out though?

Thanks so much for your work on this project, it's really awesome. I'm in the process of ripping out haystack + solr with this, and if I can just get this weird case figured out it will greatly reduce the moving pieces in my system.

@ianfitzpatrick
Copy link
Author

One idea I had was, could this be some weird interaction between the @ symbol and the query used in the MySQL backend? Just a WAG, but thought I'd throw it out there.

@ianfitzpatrick
Copy link
Author

Okay I think I'm on the right track with my @ symbol theory. If I change:

backends.py
RE_MYSQL_ESCAPE_CHARS = re.compile(r'["()><~*+-]', re.UNICODE)

to (add an @)
RE_MYSQL_ESCAPE_CHARS = re.compile(r'["()><~*+-]@', re.UNICODE)

And then enclose my actual search query text in " " I get the result I am expecting, exactly one result for "bob@example.com".

According to the MySQL docs this an exact phrase match I believe, relevant SO answer: https://stackoverflow.com/questions/8961148/mysql-match-against-when-searching-e-mail-addresses

I'm in a situation where I want flexibility, users can search on name or email, so in the case of email i want to do an exact match, however I want more broad results when searching on name.

I still don't get why just some particular usernames (emails) are triggering these very broad search results, where was others are not. But I can live with that if I can just work around the issue.

So I think I just need to do some pre-processing on my search text and if I detect something email like in it, auto-enclose it in quotes (my users will not have the savvy to do this themselves).

@etianen
Copy link
Owner

etianen commented May 17, 2018 via email

@etianen
Copy link
Owner

etianen commented May 17, 2018 via email

@ianfitzpatrick
Copy link
Author

Sure thing, I'll try and get something to you next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants