Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgraded DB schema doesn't match clean new schema (affecting full-text search) #3028

Open
Jakdaw opened this issue Oct 30, 2018 · 3 comments
Milestone

Comments

@Jakdaw
Copy link
Contributor

Jakdaw commented Oct 30, 2018

This is a diff of the PostgreSQL schema between an instance that was upgraded from v3 to v5.0.2 verses a clean empty schema created with v5.0.2 using manage.py database create_tables:

--- schema1     2018-10-29 22:38:34.752535928 +0000
+++ schema2     2018-10-29 22:38:28.479611730 +0000
@@ -37,7 +37,7 @@
     LANGUAGE plpgsql
     AS $$
             BEGIN
-                NEW.search_vector = ((to_tsvector('pg_catalog.simple', regexp_replace(coalesce(CAST(NEW.id AS TEXT), ''), '[-@.]', ' ', 'g')) || to_tsvector('pg_catalog.simple', regexp_replace(coalesce(NEW.name, ''), '[-@.]', ' ', 'g'))) || to_tsvector('pg_catalog.simple', regexp_replace(coalesce(NEW.description, ''), '[-@.]', ' ', 'g'))) || to_tsvector('pg_catalog.simple', regexp_replace(coalesce(NEW.query, ''), '[-@.]', ' ', 'g'));
+                NEW.search_vector = ((setweight(to_tsvector('pg_catalog.simple', regexp_replace(coalesce(CAST(NEW.id AS TEXT), ''), '[-@.]', ' ', 'g')), 'B') || setweight(to_tsvector('pg_catalog.simple', regexp_replace(coalesce(NEW.name, ''), '[-@.]', ' ', 'g')), 'A')) || setweight(to_tsvector('pg_catalog.simple', regexp_replace(coalesce(NEW.description, ''), '[-@.]', ' ', 'g')), 'C')) || setweight(to_tsvector('pg_catalog.simple', regexp_replace(coalesce(NEW.query, ''), '[-@.]', ' ', 'g')), 'D');
                 RETURN NEW;
             END
             $$;
@@ -418,10 +418,10 @@
     updated_at timestamp with time zone NOT NULL,
     created_at timestamp with time zone NOT NULL,
     id integer NOT NULL,
+    org_id integer NOT NULL,
     object_type character varying(255) NOT NULL,
     object_id integer NOT NULL,
-    user_id integer NOT NULL,
-    org_id integer NOT NULL
+    user_id integer NOT NULL
 );
 
 
@@ -700,11 +700,11 @@
     id integer NOT NULL,
     org_id integer NOT NULL,
     name character varying(320) NOT NULL,
-    email character varying(320) NOT NULL,
+    email character varying(255) NOT NULL,
+    profile_image_url character varying(320),
     password_hash character varying(128),
     groups integer[],
     api_key character varying(40) NOT NULL,
-    profile_image_url character varying,
     disabled_at timestamp with time zone
 );

So, importantly, upgraded schemas don't have the weighting of the search terms in the Postgres full text index; and there are a couple of other minor fields whose maximum lengths don't match.

@arikfr
Copy link
Member

arikfr commented Oct 31, 2018

@Jakdaw did you try changing the weights and see if it returns better search results?

@Jakdaw
Copy link
Contributor Author

Jakdaw commented Oct 31, 2018

I've not yet deployed a redash that includes the changes you mentioned in #2798 - but I can confirm that manual queries against postgres using the full text index work much better with the weights in place!

@jezdez
Copy link
Member

jezdez commented Nov 1, 2018

Oh, that is odd, seems like a missing step in the data migration for sure.

@arikfr arikfr added this to the Next milestone Mar 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants