Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search: ranking results based on package status #4510

Closed
wants to merge 6 commits into from

Conversation

sorawee
Copy link
Collaborator

@sorawee sorawee commented Dec 2, 2022

This is a first cut for search result ranking.
Within a group of search results of the same score, search results associated with the base package will come first, followed by those in main-distribution, and then everything else.

Search results also have an additional indicator on the right border, which can be hovered to read the description.

Unfortunately, it slows down raco setup scribblings/main by about 10 seconds. The command is invoked quite often, as it is run as a part of raco pkg install and raco pkg remove.
It is unclear if the performance hit is a deal breaker.

Fixes #4083 and issues raised in
https://racket.discourse.group/t/whats-official-racket-and-whats-not/1447

This is a first cut for search result ranking.
Within a group of search results of the same score,
search results associated with the `base` package will come first,
followed by those in `main-distribution`, and then everything else.

Search results also have an additional indicator on the right border,
which can be hovered to read the description.

Unfortunately, it slows down `raco setup scribblings/main` by about 10
seconds. The command is invoked quite often, as it is run as a part of
`raco pkg install` and `raco pkg remove`.
It is unclear if the performance hit is a deal breaker.

Fixes racket#4083 and issues raised in
https://racket.discourse.group/t/whats-official-racket-and-whats-not/1447
@sorawee
Copy link
Collaborator Author

sorawee commented Dec 2, 2022

Current search:

Screenshot 2022-12-02 at 11 33 32 PM

This PR:

Screenshot 2022-12-02 at 11 31 41 PM

@sorawee
Copy link
Collaborator Author

sorawee commented Dec 2, 2022

Here's a self-contained program to optimize, if anyone wish to help speeding up the PR.

#lang racket/base

(require scribble/xref
         setup/xref
         setup/main-collects
         racket/match
         pkg/path
         scribble/manual-struct)

(define pkg-cache-for-path->pkg (make-hash))

(define (module-path->path module-path)
  (with-handlers ([exn:missing-module? (lambda (exn) #f)])
    (resolved-module-path-name
     (module-path-index-resolve
      (module-path-index-join module-path #f)))))

(define (->pkg path)
  (cond
    [(path->pkg path #:cache pkg-cache-for-path->pkg)]
    [else (and (path->main-collects-relative path) "base")]))

(define (->path tag)
  (match tag
    [(or (list 'part (list path-string _ ...))
         (list 'idx (list 'gentag _ path-string _ ...))
         (list 'tech (list path-string _ ...)))

     (match path-string
       [(pregexp #px"^\\(lib (.*)\\)$" (list _ path))
        (module-path->path (list 'lib path))])]

    [(or (list 'def (list path _ ...))
         (list 'form (list path _ ...))
         (list 'meth (list (list path _ ...) _))
         (list 'sig-val (list path _ ...)))
     (module-path->path path)]

    [(list 'mod-path path-string)
     (module-path->path (read (open-input-string path-string)))]

    ;; HACK: xrepl weirdly forges their own tag.
    ;; Since it is a part of the main distribution, we add a hack to handle it
    [(list 'xrepl _) 'xrepl]

    [_ #f]))

(define (tag->pkg tag)
  (let loop ([path (->path tag)])
    (match path
      [(? path? path) (->pkg path)]
      ['xrepl "xrepl"]
      ['#%kernel "base"]
      ['#%foreign "base"]
      [(cons path _) (loop path)]
      [_ #f])))

(time
 (for ([e (in-list (xref-index (load-collections-xref)))]
       #:do [(match-define (entry words what tag desc) e)]
       #:unless (constructor-index-desc? desc))
   (tag->pkg tag)))

@racket-discourse-github-bot

This pull request has been mentioned on Racket Discussions. There might be relevant details there:

https://racket.discourse.group/t/whats-official-racket-and-whats-not/1447/32

@benknoble
Copy link
Contributor

If you haven't already, you might want to check the red/green being used for color-blind-friendly contrast.

@sorawee
Copy link
Collaborator Author

sorawee commented Dec 2, 2022

I'm happy to adjust the colors. Do you have any recommendation for the color hex codes or color names?

@benknoble
Copy link
Contributor

A quick search turned up https://davidmathlogic.com/colorblind/#%23D81B60-%231E88E5-%23FFC107-%23004D40 which I liked

@sorawee
Copy link
Collaborator Author

sorawee commented Dec 3, 2022

The blue-yellow ones look pretty decent across four (or five) columns. I pushed a change to use them. Thanks for the feedback!

@sorawee
Copy link
Collaborator Author

sorawee commented Dec 3, 2022

Alright, adding even more caching fixes the performance issue. Now it runs in 2 seconds.

@sorawee
Copy link
Collaborator Author

sorawee commented Dec 3, 2022

Here's one complication:

For bindings, the associated package is accurate.

For sections and tech terms, the associated package is a doc package. So we have a weird result where syntax/parse is in base (correct), but "Syntax: Meta-Programming Helpers" is in main-distribution, even though "Syntax: Meta-Programming Helpers" contains syntax/parse.

Should this be a concern? And if so, is there a solution?

@mflatt
Copy link
Member

mflatt commented Dec 6, 2022

I'm in favor of the goal here. I have some minor quibbles if we stick to this implementation:

  • the distinctions to make in sorting should be configurable, probably via "config.rktd", instead of being hardwired to "base", "main-distribution", and everything else;
  • red means "warning" or "error" in many contexts (although it is a perfectly good color in many contexts <insert University of Utah "U" here>), so I'd suggest something like gold and blue as the colors.

But I'm more concerned with the way the categories are currently derived. The package inference here works well, and I have no better idea given the documentation as-is. But it seems fragile and the long way around for information that should be declared. Given that the immediate goal is to treat main-distribution packages specially, it's within our reach to modify main-distribution documentation to declare the right information. Maybe documentation should start with an explicit declaration of the package that it's describing, for example, and have that propagated to index entries.

The issue here seems related to the problem of reporting bindings with syntactic categories as needed for Rhombus. Maybe it's worth looking at extending index information for these goals (and, of course, trying to imagine future needs) at the same time.

@sorawee
Copy link
Collaborator Author

sorawee commented Dec 6, 2022

Gold and blue don't work well, because highlighted entries are colored light yellow, which doesn't contrast well with gold. What about blue and brown?

Indeed, this is related to a couple of my pending PRs on the "kind information". In the PRs, I created a new exported-index-desc* which extends exported-index-desc, and then deprecate other struct types that extends exported-index-desc (like procedure-index-desc, form-index-desc, etc.).

However, the change that you suggest here would be far more involved. I think the problem stems from how struct is not easily extensible (without creating new struct types, which is not a viable approach). It would be better if an index entry description is a hash table. But changing the representation means we deprecate scribble/manual-struct entirely. Is that OK with you?

@mflatt
Copy link
Member

mflatt commented Dec 6, 2022

Brown and blue sounds like it could work.

Yes, I'm ok with deprecating the old index-desc structure types and switching to one that's based on a hash table.

@mflatt
Copy link
Member

mflatt commented Dec 8, 2022

I accidentally pushed this, because I forgot I had merged it locally to try out (and got distracted by a long message for a new commit).

Although I'd really like to see a different implementation, this doesn't bother me enough to revert the commits. If it gets in the way of a different change, though, I can revert the merge.

@sorawee
Copy link
Collaborator Author

sorawee commented Dec 8, 2022

Got it. I am working on a different implementation, so I will submit further PRs to amend this PR.

@shhyou
Copy link
Collaborator

shhyou commented Dec 21, 2022

@sorawee I really appreciate this feature!! It has been confusing to find identically-named bindings from other packages rather than the ones from main distribution.

While searching for namespace, I found this result. Do you know why @deftech{namespace} does not come before others?

(I don't know what is qcr. I saw it in the current search result so I installed it locally to test the ranking algorithm.)

With search ranking (HEAD):
Searching for namespace, with ranking

Without search ranking (stable):
Searching for namespace, without ranking

@sorawee
Copy link
Collaborator Author

sorawee commented Dec 21, 2022

This is because "only bindings can be used for rexact matches" (where rexact is a special rank that comes before exact)

// only bindings can be used for rexact matches

which is introduced in ecc43ff

I do not know the justification for this, but it seems to be working as intended.

@shhyou
Copy link
Collaborator

shhyou commented Dec 28, 2022

This is because "only bindings can be used for rexact matches" (where rexact is a special rank that comes before exact)

// only bindings can be used for rexact matches

which is introduced in ecc43ff

I do not know the justification for this, but it seems to be working as intended.

@sorawee okay, thanks for the pointer! I'll play with it to see if it's possible to also include deftech for r-exact matches. In its current form, the sorting result of namespace is a bit unsatisfactory as those from guide and reference would be more preferable than the ones from other packages.

@samth
Copy link
Sponsor Member

samth commented Jan 3, 2023

This PR should probably be closed as merged.

Unfortunately, I think it resulted in some regressions in package installation. See eg https://plt.cs.northwestern.edu/pkg-build/server/built/fail/sketching.txt which fails with:


standard-module-name-resolver: contract violation
  expected: module-path?
  given: #f
  context...:
   /home/root/racket/share/pkgs/racket-index/scribblings/main/private/pkg.rkt:68:5
   /home/root/racket/collects/racket/private/more-scheme.rkt:377:2: hash-ref!
   /home/root/racket/share/pkgs/racket-index/scribblings/main/private/pkg.rkt:109:0: tag->pkg
   /home/root/racket/share/pkgs/racket-index/scribblings/main/private/make-search.rkt:67:0: make-script
   /home/root/racket/share/pkgs/scribble-lib/scribble/base-render.rkt:974:4: render-content method in render%
   /home/root/racket/share/pkgs/scribble-lib/scribble/html-render.rkt:1561:4: render-plain-content method in render-mixin
   /home/root/racket/share/pkgs/scribble-lib/scribble/base-render.rkt:974:4: render-content method in render%
   /home/root/racket/share/pkgs/scribble-lib/scribble/html-render.rkt:1218:4: do-render-paragraph method in render-mixin
   /home/root/racket/share/pkgs/scribble-lib/scribble/html-render.rkt:1246:4: render-intrapara-block method in render-mixin
   /home/root/racket/share/pkgs/scribble-lib/scribble/base-render.rkt:921:20: loop
   /home/root/racket/share/pkgs/scribble-lib/scribble/base-render.rkt:920:4: render-compound-paragraph method in render%
   /home/root/racket/share/pkgs/scribble-lib/scribble/html-render.rkt:1802:4: render-compound-paragraph method in render-mixin
   /home/root/racket/share/pkgs/scribble-lib/scribble/html-render.rkt:1205:6: loop
   [repeats 2 more times]
   /home/root/racket/share/pkgs/scribble-lib/scribble/html-render.rkt:1118:4: render-part-content method in render-mixin
   /home/root/racket/share/pkgs/scribble-lib/scribble/html-render.rkt:821:4: render-one-part method in render-mixin
   ...

@sorawee
Copy link
Collaborator Author

sorawee commented Jan 3, 2023

Will take a look, thanks! If it's difficult to fix, let's just revert it. Though if it's easy to fix, I'd prefer to make a quick fix. And in any case, let's close this PR.

@sorawee sorawee closed this Jan 3, 2023
@sorawee
Copy link
Collaborator Author

sorawee commented Jan 4, 2023

Initial investigation: this is due to how in Sketching, there's a @defidentifier[#'delay] where delay is from racket/promise. This produces a tag '(def (#f delay)) which is ... malformed?

Another observation: currently, https://docs.racket-lang.org/search/index.html?q=delay links to Sketching as the first result. However, the definition of delay in the doc is also a link to delay from racket/promise?!?

From my understanding, Sketching doesn't define delay. The doc says that delay is the name that is used in the original Sketching, but it's renamed to nap in the Racket implementation. If that's the case, I think defidentifier is not appropriate, and stuff like racketidfont should probably be used instead.

CC: @soegaard

That being said, the make-search function should also have a better error handling. I will fix it.

@samth
Copy link
Sponsor Member

samth commented Jan 4, 2023

There are several other packages that fail because of related errors, so I don't think it's about sketching in particular.

@samth
Copy link
Sponsor Member

samth commented Jan 4, 2023

@sorawee
Copy link
Collaborator Author

sorawee commented Jan 4, 2023

It's a closely related problem, though not quite the same. My planned fix for Sketching does fix the problem with dropbox.

I still want to better understand how these tags are generated before submitting a PR to fix the problem.

In any case, it would also be nice to fix the dropbox documentation, which has many failed links, the underlying reason for the failure.

CC: @stchang

sorawee added a commit to sorawee/racket that referenced this pull request Jan 6, 2023
mflatt pushed a commit that referenced this pull request Jan 6, 2023
@mflatt
Copy link
Member

mflatt commented Jan 25, 2023

Matthias ran into problems rendering documentation for a PLaneT package:

> raco setup: error: during building docs for <pkgs>/racket-index/scribblings/main/user/search.scrbl
> raco setup:   match: no matching clause for "(planet describe.scrbl (williams describe.plt 1 5))"
> raco setup:     location...:
> raco setup:      pkgs/racket-index/scribblings/main/private/pkg.rkt:95:5
> raco setup:     context...:
> raco setup:      /Users/matthias/plt/pkgs/racket-index/scribblings/main/private/pkg.rkt:89:0: ->path

Since planet module paths have become rare, I imagine that we just hadn't tried them before.

@sorawee
Copy link
Collaborator Author

sorawee commented Jan 25, 2023

@shhyou: I agree with you though that this ranking could be unsatisfactory. The "hash code" tech term, for example, is the last entry (on my local machine), even though it's the exact match.

So my plan, now that the PR is reverted, is to also revisit the ranking algorithm. Perhaps @elibarzilay could help us understand the motivation for "only bindings can be used for rexact matches".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] prioritize main distribution in search results
6 participants