Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] first pass at adding in inoas's code #1150

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Ch4s3
Copy link
Contributor

@Ch4s3 Ch4s3 commented Jul 6, 2022

This is a first pass at integrating the parser and sanitizer. I still need to turn the SQL query into ecto and maybe add some DB migrations.

end
end

# TODO: Do not call String.graphemes/1 String.reverse/1 multiple times but work on list of char and benchmark with https://hexdocs.pm/benchee/readme.html

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO found

|> String.reverse()
end

# TODO Same

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO found

end
end

# TODO: Do not call String.graphemes/1 String.reverse/1 multiple times but work on list of char and benchmark with https://hexdocs.pm/benchee/readme.html

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO found

end
end

# TODO: Do not call String.graphemes/1 String.reverse/1 multiple times but work on list of char and benchmark with https://hexdocs.pm/benchee/readme.html

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO found

|> String.trim()
end

# TODO ask inoas what this is for and how to use it

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO found

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caveat: I am out of this for quite some time and very loaded on work, so it will be slow from my side, sorry about that.

take_graphemes_at_max_bytes was as far as I know meant to limit search string length.
There must be basically 2 or 3 limits:

  1. limit on the hole string that will hit SQL
  2. limit on the number of and/or operations that will hit SQL
  3. limit on the length of each search-stringlet, aka foo bar | quux = 3, all or'ed foo & bar & quux= 3, all and'ed - so each of those foo, bar and quux should not contain more than x graphemes (where each grapheme can AFAIR be 1 to 4 bytes because of UTF8).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to make sure to not overload the database with too long strings and/or too many boolean operations, etc - so let's say a total of 10 and/or and total of 10 strings each being max 100 graphemes (or 400 bytes which would already be 4000 bytes of max string search size in this example).

Copy link

@inoas inoas Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took another look: https://gist.github.com/inoas/19eedcdd1d0da03cd180d1c4ba29be34w
So take_graphemes_at_max_bytes was to limit the search query to X bytes totally.

However that could still result into a & b & c & ... 8 & 9 = about 200/2 = 100 logical operations at max graphemes of 200 - all in ts_query (so some weak/fuzzy search that might hit postgres at 100 of them onto 4 columns).

@sourcelevel-bot
Copy link

SourceLevel has finished reviewing this Pull Request and has found:

  • 5 possible new issues (including those that may have been commented here).

See more details about this review.

|> String.trim()
end

# TODO ask inoas what this is for and how to use it
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caveat: I am out of this for quite some time and very loaded on work, so it will be slow from my side, sorry about that.

take_graphemes_at_max_bytes was as far as I know meant to limit search string length.
There must be basically 2 or 3 limits:

  1. limit on the hole string that will hit SQL
  2. limit on the number of and/or operations that will hit SQL
  3. limit on the length of each search-stringlet, aka foo bar | quux = 3, all or'ed foo & bar & quux= 3, all and'ed - so each of those foo, bar and quux should not contain more than x graphemes (where each grapheme can AFAIR be 1 to 4 bytes because of UTF8).

|> String.trim()
end

# TODO ask inoas what this is for and how to use it
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to make sure to not overload the database with too long strings and/or too many boolean operations, etc - so let's say a total of 10 and/or and total of 10 strings each being max 100 graphemes (or 400 bytes which would already be 4000 bytes of max string search size in this example).

@@ -9,6 +9,10 @@ defmodule HexpmWeb.PackageController do
def index(conn, params) do
letter = Hexpm.Utils.parse_search(params["letter"])
search = Hexpm.Utils.parse_search(params["search"])
sanitized_search = Hexpm.Search.Sanitizer.sanitize(params["search"])
IO.inspect(sanitized_search, label: "sanitized_search")
parsed_result = Hexpm.Search.Parser.parse_sanitized_user_input(sanitized_search)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for this part the idea I had was to be nice to the user:

  1. the sanitizer kicks in, it tries to guess best what the user might have meant, adding additional parentheses where it is clear that they are lacking or removing extra ones.
  2. the parser kicks in, it's result is transformed to both: SQL and a new search string, so that the user will see a sanitizer and parsed (aka correct) version, much like a code formatter.

Copy link

@inoas inoas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In both my original sanitizer.exs and parser.exs where some inline pseudo-tests (just input's and output to stdout), that should be carried over and created as real unit tests so we can verify that both work on their own

@ericmj
Copy link
Member

ericmj commented Apr 16, 2024

Hi @Ch4s3 and @inoas! Are either of you interested in picking up this work? Otherwise I will close since it's becoming stale.

@inoas
Copy link

inoas commented Apr 16, 2024

@ericmj let's see if I can find some time during the next 4 weeks, sick today. If you close it please let the branch live so I can pull/clone it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants