[WIP] first pass at adding in inoas's code #1150

Ch4s3 · 2022-07-06T01:35:04Z

This is a first pass at integrating the parser and sanitizer. I still need to turn the SQL query into ecto and maybe add some DB migrations.

sourcelevel-bot · 2022-07-06T01:35:35Z

lib/hexpm/search/sanitizer.ex

+      end
+    end
+
+    # TODO: Do not call String.graphemes/1 String.reverse/1 multiple times but work on list of char and benchmark with https://hexdocs.pm/benchee/readme.html


sourcelevel-bot · 2022-07-06T01:35:36Z

lib/hexpm/search/sanitizer.ex

+    |> String.reverse()
+  end
+
+  # TODO Same


sourcelevel-bot · 2022-07-06T01:35:37Z

lib/hexpm/search/sanitizer.ex

+      end
+    end
+
+    # TODO: Do not call String.graphemes/1 String.reverse/1 multiple times but work on list of char and benchmark with https://hexdocs.pm/benchee/readme.html


sourcelevel-bot · 2022-07-06T01:35:37Z

lib/hexpm/search/sanitizer.ex

+      end
+    end
+
+    # TODO: Do not call String.graphemes/1 String.reverse/1 multiple times but work on list of char and benchmark with https://hexdocs.pm/benchee/readme.html


sourcelevel-bot · 2022-07-06T01:35:38Z

lib/hexpm/search/sanitizer.ex

+    |> String.trim()
+  end
+
+  # TODO ask inoas what this is for and how to use it


Caveat: I am out of this for quite some time and very loaded on work, so it will be slow from my side, sorry about that.

take_graphemes_at_max_bytes was as far as I know meant to limit search string length.
There must be basically 2 or 3 limits:

limit on the hole string that will hit SQL

limit on the number of and/or operations that will hit SQL

limit on the length of each search-stringlet, aka foo bar | quux = 3, all or'ed foo & bar & quux= 3, all and'ed - so each of those foo, bar and quux should not contain more than x graphemes (where each grapheme can AFAIR be 1 to 4 bytes because of UTF8).

The idea was to make sure to not overload the database with too long strings and/or too many boolean operations, etc - so let's say a total of 10 and/or and total of 10 strings each being max 100 graphemes (or 400 bytes which would already be 4000 bytes of max string search size in this example).

I took another look: https://gist.github.com/inoas/19eedcdd1d0da03cd180d1c4ba29be34w
So take_graphemes_at_max_bytes was to limit the search query to X bytes totally.

However that could still result into a & b & c & ... 8 & 9 = about 200/2 = 100 logical operations at max graphemes of 200 - all in ts_query (so some weak/fuzzy search that might hit postgres at 100 of them onto 4 columns).

sourcelevel-bot · 2022-07-06T01:35:40Z

SourceLevel has finished reviewing this Pull Request and has found:

5 possible new issues (including those that may have been commented here).

See more details about this review.

inoas · 2022-07-06T14:25:37Z

lib/hexpm/search/sanitizer.ex

+    |> String.trim()
+  end
+
+  # TODO ask inoas what this is for and how to use it


Caveat: I am out of this for quite some time and very loaded on work, so it will be slow from my side, sorry about that.

take_graphemes_at_max_bytes was as far as I know meant to limit search string length.
There must be basically 2 or 3 limits:

limit on the hole string that will hit SQL

limit on the number of and/or operations that will hit SQL

limit on the length of each search-stringlet, aka foo bar | quux = 3, all or'ed foo & bar & quux= 3, all and'ed - so each of those foo, bar and quux should not contain more than x graphemes (where each grapheme can AFAIR be 1 to 4 bytes because of UTF8).

inoas · 2022-07-06T14:27:21Z

lib/hexpm/search/sanitizer.ex

+    |> String.trim()
+  end
+
+  # TODO ask inoas what this is for and how to use it


The idea was to make sure to not overload the database with too long strings and/or too many boolean operations, etc - so let's say a total of 10 and/or and total of 10 strings each being max 100 graphemes (or 400 bytes which would already be 4000 bytes of max string search size in this example).

inoas · 2022-07-06T14:28:47Z

lib/hexpm_web/controllers/package_controller.ex

@@ -9,6 +9,10 @@ defmodule HexpmWeb.PackageController do
  def index(conn, params) do
    letter = Hexpm.Utils.parse_search(params["letter"])
    search = Hexpm.Utils.parse_search(params["search"])
+    sanitized_search = Hexpm.Search.Sanitizer.sanitize(params["search"])
+    IO.inspect(sanitized_search, label: "sanitized_search")
+    parsed_result = Hexpm.Search.Parser.parse_sanitized_user_input(sanitized_search)


As for this part the idea I had was to be nice to the user:

the sanitizer kicks in, it tries to guess best what the user might have meant, adding additional parentheses where it is clear that they are lacking or removing extra ones.

the parser kicks in, it's result is transformed to both: SQL and a new search string, so that the user will see a sanitizer and parsed (aka correct) version, much like a code formatter.

inoas

In both my original sanitizer.exs and parser.exs where some inline pseudo-tests (just input's and output to stdout), that should be carried over and created as real unit tests so we can verify that both work on their own

ericmj · 2024-04-16T17:42:54Z

Hi @Ch4s3 and @inoas! Are either of you interested in picking up this work? Otherwise I will close since it's becoming stale.

inoas · 2024-04-16T19:31:25Z

@ericmj let's see if I can find some time during the next 4 weeks, sick today. If you close it please let the branch live so I can pull/clone it.

first pass at adding in inoas's code

d2effd8

sourcelevel-bot bot reviewed Jul 6, 2022

View reviewed changes

lib/hexpm/search/sanitizer.ex

|> String.reverse()

end

# TODO Same

Copy link

sourcelevel-bot bot Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO found

sourcelevel-bot bot reviewed Jul 6, 2022

View reviewed changes

inoas reviewed Jul 6, 2022

View reviewed changes

Ch4s3 marked this pull request as draft July 8, 2022 19:47

ericmj force-pushed the main branch from eba0fa2 to aad402b Compare January 29, 2023 23:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] first pass at adding in inoas's code #1150

[WIP] first pass at adding in inoas's code #1150

Ch4s3 commented Jul 6, 2022

sourcelevel-bot bot Jul 6, 2022

sourcelevel-bot bot Jul 6, 2022

sourcelevel-bot bot Jul 6, 2022

sourcelevel-bot bot Jul 6, 2022

sourcelevel-bot bot Jul 6, 2022

inoas Jul 6, 2022

inoas Jul 6, 2022

inoas Jul 6, 2022 •

edited

sourcelevel-bot bot commented Jul 6, 2022

inoas Jul 6, 2022

inoas Jul 6, 2022

inoas Jul 6, 2022

inoas left a comment •

edited

ericmj commented Apr 16, 2024

inoas commented Apr 16, 2024

[WIP] first pass at adding in inoas's code #1150

Are you sure you want to change the base?

[WIP] first pass at adding in inoas's code #1150

Conversation

Ch4s3 commented Jul 6, 2022

sourcelevel-bot bot Jul 6, 2022

Choose a reason for hiding this comment

sourcelevel-bot bot Jul 6, 2022

Choose a reason for hiding this comment

sourcelevel-bot bot Jul 6, 2022

Choose a reason for hiding this comment

sourcelevel-bot bot Jul 6, 2022

Choose a reason for hiding this comment

sourcelevel-bot bot Jul 6, 2022

Choose a reason for hiding this comment

inoas Jul 6, 2022

Choose a reason for hiding this comment

inoas Jul 6, 2022

Choose a reason for hiding this comment

inoas Jul 6, 2022 • edited

Choose a reason for hiding this comment

sourcelevel-bot bot commented Jul 6, 2022

inoas Jul 6, 2022

Choose a reason for hiding this comment

inoas Jul 6, 2022

Choose a reason for hiding this comment

inoas Jul 6, 2022

Choose a reason for hiding this comment

inoas left a comment • edited

Choose a reason for hiding this comment

ericmj commented Apr 16, 2024

inoas commented Apr 16, 2024

inoas Jul 6, 2022 •

edited

inoas left a comment •

edited