Skip to content

Latest commit

 

History

History
162 lines (127 loc) · 5.37 KB

SQL39-某乎问答最大连续回答问题天数大于等于3天的用户及其等级.md

File metadata and controls

162 lines (127 loc) · 5.37 KB

SQL39 某乎问答最大连续回答问题天数大于等于3天的用户及其等级

较难  通过率:51.47%  时间限制:1秒  空间限制:256M

描述

现有某乎问答创作者信息表author_tb如下(其中author_id表示创作者编号、author_level表示创作者级别,共1-6六个级别、sex表示创作者性别):

author_id author_level sex
101 6 m
102 1 f
103 1 m
104 3 m
105 4 f
106 2 f
107 2 m
108 5 f
109 6 f
110 5 m

创作者回答情况表answer_tb如下(其中answer_date表示创作日期、author_id指创作者编号、issue_id指回答问题编号、char_len表示回答字数):

answer_date author_id issue_id char_len
2021-11-01 101 E001 150
2021-11-01 101 E002 200
2021-11-01 102 C003 50
2021-11-01 103 P001 35
2021-11-01 104 C003 120
2021-11-01 105 P001 125
2021-11-01 102 P002 105
2021-11-02 101 P001 201
2021-11-02 110 C002 200
2021-11-02 110 C001 225
2021-11-02 110 C002 220
2021-11-03 101 C002 180
2021-11-04 109 E003 130
2021-11-04 109 E001 123
2021-11-05 108 C001 160
2021-11-05 108 C002 120
2021-11-05 110 P001 180
2021-11-05 106 P002 45
2021-11-05 107 E003 56

请你统计最大连续回答问题的天数大于等于3天的用户及其等级(若有多条符合条件的数据,按author_id升序排序),以上例子的输出结果如下:

author_id author_level days_cnt
101 6 3

答案

select F.author_id, author_level, days_cnt
from
### table F
(select author_id, sum(ddiff) + 1 as days_cnt
from
(select author_id, datediff(date_lead, answer_date) as ddiff
from
(select author_id, answer_date, lead(answer_date) over (partition by author_id order by answer_date) as date_lead
from
(select distinct author_id, answer_date from answer_tb order by author_id) as x) as y having ddiff=1) as z
group by author_id
having days_cnt >= 3
 ) as F 
### table F
join author_tb as auth
on F.author_id = auth.author_id

思路

本题的难点在于计算每个用户的「最大连续回答天数」。

可以看到,最后的答案像套娃一样😂 套了5个select。那么我们就从最里面开始,一层一层的分解。

首先要明确的是,answer_tb里面有用的column是answer_dateauthor_id,其他的都没用。author_tb只是用来最后贴标签的,除此之外也没有用。

以下步骤用用户101进行示意。


1️⃣ 把要计算的两个column answer_dateauthor_idanswer_tb里面提取出来。

select distinct author_id, answer_date from answer_tb
order by author_id) as x

用了distinct,就可以去重一个用户在同一天回答多个问题的情况。

当然可以在最后输出答案的时候再order by,这里为了看得更清楚直观,先按author_id排序。

author_id answer_date
101 2021-11-01
101 2021-11-02
101 2021-11-03
102 2011-11-01
103 ...
... ...

2️⃣ 我们用lead这个window function,得到一个新的column,值是把每个用户的answer_date这个column上移一格。

(select author_id, answer_date, 
lead(answer_date) over (partition by author_id order by answer_date) as date_lead
from x) as y
author_id answer_date date_lead
101 2021-11-01 2021-11-02
101 2021-11-02 2021-11-03
101 2021-11-03 null
102 2021-11-01 null
... ... ...
110 2021-11-02 2021-11-05
... ... ...

3️⃣ 用datediff这个date function,把两个日期column相减,得到ddiff这个column,值为1的才起码是连续的。

(select author_id, datediff(date_lead, answer_date) as ddiff
from y having ddiff=1) as z
author_id ddiff
101 1
101 1

4️⃣把每个用户的ddiff加起来(再加1)即是最大连续回答天数。

(select author_id, sum(ddiff) + 1 as days_cnt
from z
group by author_id
having days_cnt >= 3
 ) as F 
author_id days_cnt
101 3

5️⃣最后再按题目要求贴上author_level的标签即可。

select F.author_id, author_level, days_cnt
from F
join author_tb as auth
on F.author_id = auth.author_id
author_id author_level days_cnt
101 6 3