Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add migrations for ML generated fields #1153

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@

DROP TABLE vulnerability.code_snippet;

-- Add a generated summary to the reference to make it easier for the LLM to choose what to read
ALTER TABLE vulnerability.reference_content DROP COLUMN summary;

ALTER TABLE package.package DROP COLUMN readme_text;
ALTER TABLE package.package DROP COLUMN use_case_summary;
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@

CREATE TABLE vulnerability.code_snippet
(
id uuid DEFAULT public.gen_random_uuid() NOT NULL PRIMARY KEY,
created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP NOT NULL,
-- Reference may be null because we may have pulled code from a non-web source such as vuln-db
reference_id uuid NULL references vulnerability.reference,
-- Include url since reference might be null but its still nice to be able to point a source like a vuln-db link for non-scraped content
source_url text NOT NULL,
vulnerability uuid NOT NULL references vulnerability.vulnerability,
code text NOT NULL,
score integer NOT NULL,
summary text NOT NULL,
type text NOT NULL,
language text NOT NULL
);

-- Add a generated summary to the reference to make it easier for the LLM to choose what to read
ALTER TABLE vulnerability.reference_content ADD COLUMN summary text NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also discussed this the other day, this is a short one to two sentence description of the content, so that we can display it in a list for the LLM to choose what to read.


ALTER TABLE package.package ADD COLUMN readme_text text NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A package already has package.description which often times the readme. Probably don't need to add another column to do the search.

Copy link
Contributor Author

@factoidforrest factoidforrest Mar 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we jamming the readme into that column? That isn't what I would expect, id expect it to have a short description. Also, often? When does it not? Depending on ecosystem?

Lets sort this out in standup because im interested!

ALTER TABLE package.package ADD COLUMN use_case_summary text NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will this column contain that is different than description?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this a few times so I think you're familiar. By telling the LLM to summarize the use case specifically, and avoid all other descriptions, we get a much better vector proximity. It's not a general description, its just "what is this for"

Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
An explanation of the Fastify framework, for instance,
would be identical, because the use case of the two libraries is the same. Don't mention the license, the creators name, or anything that isn't relevant to what the library is used for.

If you can't tell because the readme is useless and you're not familiar with the library from prior knowledge, return nothing at all, just empty, no words or explanation.
---- BEGIN README ----
{text}
---- END README ----
Expand Down
2 changes: 1 addition & 1 deletion lunatrace/bsl/ml/python/scrape_utils/summarize_scraped.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ def main(args):
print(results)

def add_subparser(subparsers):
subparser = subparsers.add_parser('summarize-scraped', help="takes any page content and a command to extract some information from it, as desired. Useful when you have a specific question to ask of an advisory")
subparser = subparsers.add_parser('summarize-scraped', help="takes any page content and a command to extract some information from it, as desired. Useful when you have a specific question to ask of an advisory. Not used for advisory ingestion, instead used by the chat-bot in real time.")

subparser.add_argument("contents", nargs = 1, type = str, help = "a string of page contents")
subparser.add_argument("query", nargs = 1, type = str, help = "query string that the scraper will try to focus on. can be phraised as a question or command, both are fine")
Expand Down
2 changes: 2 additions & 0 deletions lunatrace/gogen/sqlgen/lunatrace/package/model/package.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 8 additions & 2 deletions lunatrace/gogen/sqlgen/lunatrace/package/table/package.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

102 changes: 102 additions & 0 deletions lunatrace/gogen/sqlgen/lunatrace/vulnerability/table/code_snippet.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.