select rows based on simple criteria #8

chimeno · 2018-10-03T18:24:41Z

Would it be possible to select rows based in very simple criteria?

I mean, I have a database with a very large table and I would like to
select all rows for all tables except data table where I want 1000000 rows ORDER by timestamp DESC;

Thanks for the library.

The text was updated successfully, but these errors were encountered:

mla · 2018-10-05T06:08:47Z

I could see that. What do you think the command syntax should look like for that?

chimeno · 2018-10-05T08:31:58Z

I'm currently using:
./pg_sample --limit="*=*, nodes_data=1000000"

I guess something like:

./pg_sample --limit="*=*, nodes_data=1000000;order by timestamp DESC"

or

./pg_sample --limit="*=*, nodes_data=1000000(order by timestamp DESC)"

should be easy to parse and is extensible in case other criteria is added.

mla · 2021-10-04T05:32:40Z

You should be able to specify a where condition after the =. e.g.,

--limit="users=(user_id < 10)"

lustickd · 2022-05-17T17:10:21Z

@mla Had a similar question like this, is it possible to select EVERY table in DESC order? I think all (most?) tables in rails for example have "created_at", so it'd be nice to sample rows with ORDER BY created_at DESC as the default since usually early rows in a big database have a bunch of inactive rows. I'm trying with --random but it might be too slow for my purposes

mla · 2022-05-22T17:14:19Z

Hey @lustickd. Sorry for the delay in responding.

You can try this patch, which should just force that ORDER BY for every table.

diff --git a/pg_sample b/pg_sample
index a73af39..a1b5ec8 100755
--- a/pg_sample
+++ b/pg_sample
@@ -630,6 +630,7 @@ while (my $row = lower_keys($sth->fetchrow_hashref)) {
       notice "No candidate key found for '$table'; ignoring --ordered";
     }
   }
+  $order = 'created_at DESC';

We'd have to look at how we can express that for general use. Rails doesn't automatically create an index on all created_at columns, does it? That would be my worry, if you have really large tables.

mla · 2022-05-22T18:27:11Z

You might try this:

--- a/pg_sample
+++ b/pg_sample
@@ -624,7 +624,11 @@ while (my $row = lower_keys($sth->fetchrow_hashref)) {
   } elsif ($opt{ordered}) {
     my @cols = find_candidate_key($table);
     if (@cols) {
-      my $cols = join ', ', map { $dbh->quote_identifier($_) } @cols;
+      my $cols = join ', ',
+        map { "$_ DESC" }
+        map { $dbh->quote_identifier($_) }
+        @cols
+      ;
       $order = "ORDER BY $cols";
     } else {
       notice "No candidate key found for '$table'; ignoring --ordered";

And pass the --ordered option. We order by the first candidate key we find. Rails usually has its "id" column, which should roughly match created_at, I would think. Patch above just adds DESC to those columns. Seems like a reasonable default anyway for that option.

lustickd · 2022-05-23T23:37:00Z

Ah that makes sense thanks. Yeah I think created_at doesn't have an index so I'll go with the id method 👍

I did mess around a little bit with tsm_system_rows for random sampling and it's significantly faster than using SORT BY random() in a table with 40 million rows. Runs in 300 milliseconds per table instead of 30 seconds. Apparently the random() function in postgres loads the entire table into memory which makes it extremely slow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

select rows based on simple criteria #8

select rows based on simple criteria #8

chimeno commented Oct 3, 2018

mla commented Oct 5, 2018

chimeno commented Oct 5, 2018 •

edited

Loading

mla commented Oct 4, 2021

lustickd commented May 17, 2022 •

edited

Loading

mla commented May 22, 2022

mla commented May 22, 2022 •

edited

Loading

lustickd commented May 23, 2022

select rows based on simple criteria #8

select rows based on simple criteria #8

Comments

chimeno commented Oct 3, 2018

mla commented Oct 5, 2018

chimeno commented Oct 5, 2018 • edited Loading

mla commented Oct 4, 2021

lustickd commented May 17, 2022 • edited Loading

mla commented May 22, 2022

mla commented May 22, 2022 • edited Loading

lustickd commented May 23, 2022

chimeno commented Oct 5, 2018 •

edited

Loading

lustickd commented May 17, 2022 •

edited

Loading

mla commented May 22, 2022 •

edited

Loading