Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate records due to .group() changes for gitpod mysql fixes #8177

Closed
jywarren opened this issue Jul 21, 2020 · 14 comments · Fixed by #8979
Closed

Duplicate records due to .group() changes for gitpod mysql fixes #8177

jywarren opened this issue Jul 21, 2020 · 14 comments · Fixed by #8979
Labels
bug the issue is regarding one of our programs which faces problems when a certain task is executed high-priority

Comments

@jywarren
Copy link
Member

I'm concerned that the expansion of .group() calls in #8152 has created some incorrect outcomes.

Responding to errors such as:

Mysql2::Error: Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'plots.node.nid' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by: SELECT node.nid, node.status, term_data., community_tags. FROM term_data INNER JOIN community_tags ON community_tags.tid = term_data.tid INNER JOIN community_tags node_tags_term_data_join ON node_tags_term_data_join.tid = term_data.tid INNER JOIN node ON node.nid = node_tags_term_data_join.nid WHERE (node.status = 1) AND (community_tags.date > 1592278988) AND (name NOT LIKE '%:%') GROUP BY term_data.name ORDER BY count DESC LIMIT 24 OFFSET 0

we expanded the .group() terms by a lot:

https://github.com/publiclab/plots2/pull/8152/files#diff-4e05ad0d64e6100656b63ad1e78f32c5R141

When doing this in #8048 i realized that doing something similar there (.group('term_data.name, node.nid, term_data.tid, community_tags.nid, community_tags.uid, community_tags.date, community_tags.created_at, community_tags.updated_at') in each instance of .group() in tag_controller.rb) caused tags to show up multiple times (duh!):

image

@jywarren jywarren added bug the issue is regarding one of our programs which faces problems when a certain task is executed high-priority labels Jul 21, 2020
@jywarren
Copy link
Member Author

.group('node_revisions.nid, node_revisions.vid') in wiki controller should be OK because the vid doesn't seem to be making the nodes appear twice.

@jywarren
Copy link
Member Author

Darn, the users_controller changes do cause duplicate entries:

image

@jywarren
Copy link
Member Author

Changes to node.rb (.group('community_tags.tid, community_tags.uid, community_tags.date, community_tags.created_at, community_tags.updated_at') don't seem to have caused duplicate tag display on pages like this: https://stable.publiclab.org/wiki/barnraising (but that may not be a thorough search)

@jywarren
Copy link
Member Author

Changes to Tag.trending don't seem to have caused trending tags to be messed up on the dashboard:

https://stable.publiclab.org/dashboard

So, I'm going to try reverting the /people page one. We may have to rework that query.

Also noting rails c doesn't work because it tries to start too early, oh well:

- command: rails c

@jywarren
Copy link
Member Author

And noting we still don't have a solution for these:

https://github.com/urvashigupta7/plots2/blob/e706f2e7418bc27b040b407f19c2320512e82cbb/app/controllers/tag_controller.rb#L21

Because changing them to .group('term_data.name, node.nid, term_data.tid, community_tags.nid, community_tags.uid, community_tags.date, community_tags.created_at, community_tags.updated_at') causes duplicate records to appear.

jywarren added a commit that referenced this issue Jul 21, 2020
@jywarren
Copy link
Member Author

OK, now that #8178 is done, /people is broken again in GitPod. But let's try to solve it without causing duplicates!

Mysql2::Error: Expression #28 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'plots.node.nid' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by: SELECT  *, rusers.status, MAX(node_revisions.timestamp) AS last_updated FROM `rusers` INNER JOIN `node` ON `node`.`uid` = `rusers`.`id` INNER JOIN `node_revisions` ON `node_revisions`.`nid` = `node`.`nid` WHERE (node_revisions.status = 1) AND (rusers.status = 1) GROUP BY rusers.id ORDER BY last_updated DESC LIMIT 30 OFFSET 0

@jywarren
Copy link
Member Author

jywarren commented Jul 28, 2020

Very strangely, I found that, in GitPod, in the rails console, I was able to run the whole Rails statement and it completed without an error:

[14] pry(main)> User.select('*, rusers.status, MAX(node_revisions.timestamp) AS last_updated').joins(:revisions).where("node_revisions.status = 1").group('rusers.id').order('last_updated').where('rusers.status = 1').limit(30).offset(0)
  User Load (1.4ms)  SELECT  *, rusers.status, MAX(node_revisions.timestamp) AS last_updated FROM `rusers` INNER JOIN `node` ON `node`.`uid` = `rusers`.`id` INNER JOIN `node_revisions` ON `node_revisions`.`nid` = `node`.`nid` WHERE (node_revisions.status = 1) AND (rusers.status = 1) GROUP BY rusers.id ORDER BY last_updated LIMIT 30 OFFSET 0
  User Load (0.6ms)  SELECT  *, rusers.status, MAX(node_revisions.timestamp) AS last_updated FROM `rusers` INNER JOIN `node` ON `node`.`uid` = `rusers`.`id` INNER JOIN `node_revisions` ON `node_revisions`.`nid` = `node`.`nid` WHERE (node_revisions.status = 1) AND (rusers.status = 1) GROUP BY rusers.id ORDER BY last_updated LIMIT 11 OFFSET 0
=> #<User::ActiveRecord_Relation:0x2b28a2df3f2c>

Very strange! When you run a diff on the generated SQL from what I'm running in the console vs. what is rendering on the Rails app page at the /people route, which runs the SAME Rails statement, the SQL from both is essentially identical!

Mysql2::Error: Expression #28 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'plots.node.nid' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by: SELECT  *, rusers.status, MAX(node_revisions.timestamp) AS last_updated FROM `rusers` INNER JOIN `node` ON `node`.`uid` = `rusers`.`id` INNER JOIN `node_revisions` ON `node_revisions`.`nid` = `node`.`nid` WHERE (node_revisions.status = 1) AND (rusers.status = 1) GROUP BY rusers.id ORDER BY last_updated DESC LIMIT 30 OFFSET 0

And yet, the /people route shows this error while the console line does not!

@icarito suggested maybe the environment with sql mode may vary between the console env and the running app, even within GitPod. So I tried closing the app and restarting it from the same shell that I'd run the console from. But the running app still shows the error while the console does not! And both are in development mode with the same database.yml.

🤯 😭

You can test this yourself by opening https://gitpod.io/#https://github.com/publiclab/plots2/ -- the query itself is on line 142 of /app/controllers/users_controller.rb -- and then going to /people in the running app!

@icarito
Copy link
Member

icarito commented Aug 4, 2020

Hi,
I believe I know that sql_mode needs to be changed to either: TRADITIONAL,
or if that doesn't work, to NO_AUTO_CREATE_USER,STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION (which is the default minus the problematic ONLY_FULL_GROUP_BY which is what is giving us problems. I tried changing this by adding:

  set_session:
    sql_mode: ...

in database.yml.gitpod and also directly in database.yml, and restarted the workspace, but I still see from the logs that something is issueing this command:
(0.5ms) SET NAMES utf8, @@SESSION.sql_mode = CONCAT(REPLACE(REPLACE(REPLACE(@@sql_mode, 'STRICT_TRANS_TABLES', ''), 'STRICT_ALL_TABLES', ''), 'TRADITIONAL', ''), ',NO_AUTO_VALUE_ON_ZERO'), @@SESSION.sql_auto_is_null = 0, @@SESSION.wait_timeout = 2147483

Which is seems to be resetting the value. Do you know where this could be happening?

dms-yondy pushed a commit to dms-yondy/plots2 that referenced this issue Aug 7, 2020
nadimakhtar97 pushed a commit to nadimakhtar97/plots2 that referenced this issue Sep 21, 2020
@jywarren
Copy link
Member Author

jywarren commented Oct 8, 2020

Just a thought -- we could just make all the .group() clauses conditional on development mode maybe, as a short term fix for Outreachy folks?

jywarren added a commit that referenced this issue Oct 8, 2020
Re: #8177 as temp solution:

```
@tags.group(:name) unless Rails.env.development? # for GitPod compatibility; #8117
```
shubhangikori pushed a commit to shubhangikori/plots2 that referenced this issue Oct 12, 2020
alvesitalo pushed a commit to alvesitalo/plots2 that referenced this issue Oct 14, 2020
piyushswain pushed a commit to piyushswain/plots2 that referenced this issue Oct 22, 2020
@jywarren
Copy link
Member Author

jywarren commented Jan 8, 2021

I thought these would now be resolved in GitPod due to #8856, but /people is still showing:

Mysql2::Error: Expression #28 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'plots.node.nid' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by: SELECT  *, rusers.status, MAX(node_revisions.timestamp) AS last_updated FROM `rusers` INNER JOIN `node` ON `node`.`uid` = `rusers`.`id` INNER JOIN `node_revisions` ON `node_revisions`.`nid` = `node`.`nid` WHERE (node_revisions.status = 1) AND (rusers.status = 1) GROUP BY rusers.id ORDER BY last_updated DESC LIMIT 30 OFFSET 0

SELECT list is not in GROUP BY clause and contains nonaggregated column 'plots.node.nid' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

@jywarren
Copy link
Member Author

jywarren commented Jan 8, 2021

I tried following @icarito's suggestion for TRADITIONAL or NO_AUTO_CREATE_USER,STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION above, but the line we've figure out how to change GitPod values with only splices out ONLY_FULL_GROUP_BY rather than sets the full value:

mysql -e "SET @@global.sql_mode=(SELECT REPLACE(@@global.sql_mode, 'TRADITIONAL', ''));"

So that didn't work -- so i tried setting the full string but I got Variable 'sql_mode' can't be set to the value of 'NO_AUTO_CREATE_USER'

$ mysql -e "SET @@global.sql_mode='NO_AUTO_CREATE_USER,STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION';"
ERROR 1231 (42000) at line 1: Variable 'sql_mode' can't be set to the value of 'NO_AUTO_CREATE_USER'

@icarito i think we can try one of your suggestions by changing this line in GitPod now:

mysql -e "SET @@global.sql_mode=(SELECT REPLACE(@@global.sql_mode, 'ONLY_FULL_GROUP_BY', ''));" &&

Can you suggest a different version of this line? Thank you!!

Update: just thought of this -- the above line is test-able within GitPod, actually so it's hopefully faster to iterate on to try out than other ways of setting sql_mode --

@jywarren
Copy link
Member Author

jywarren commented Jan 8, 2021

Otherwise we can make a new attempt to turn off grouping in development mode, or just in GitPod maybe... following the example in https://github.com/publiclab/plots2/pull/8499/files but with our new pagy pagination...

ok that didn't work -- #8979

@jywarren
Copy link
Member Author

Another option is to switch to MariaDB in GitPod -- like how we did here: #8874

@jywarren
Copy link
Member Author

We ultimately solved this by fixing the syntax of the core SQL query itself! There may be some leftover queries to fix but this method worked. And now we should be more compatible with mysql/mariadb versions!

jywarren added a commit that referenced this issue Feb 9, 2021
* Change ANY_VALUE aggregate function to MariaDB compatible MAX

https://stackoverflow.com/a/54173805 post #8499 #9000 #8177 #8251

* Update users_controller.rb
manchere pushed a commit to manchere/plots2 that referenced this issue Feb 13, 2021
manchere pushed a commit to manchere/plots2 that referenced this issue Feb 13, 2021
lagunasmel pushed a commit to lagunasmel/plots2 that referenced this issue Mar 2, 2021
lagunasmel pushed a commit to lagunasmel/plots2 that referenced this issue Mar 2, 2021
reginaalyssa pushed a commit to reginaalyssa/plots2 that referenced this issue Oct 16, 2021
reginaalyssa pushed a commit to reginaalyssa/plots2 that referenced this issue Oct 16, 2021
ampwang pushed a commit to ampwang/plots2 that referenced this issue Oct 26, 2021
billymoroney1 pushed a commit to billymoroney1/plots2 that referenced this issue Dec 28, 2021
billymoroney1 pushed a commit to billymoroney1/plots2 that referenced this issue Dec 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug the issue is regarding one of our programs which faces problems when a certain task is executed high-priority
Projects
None yet
2 participants