Implement data collection for the approval ratio graphs #2484

flodolo · 2022-04-20T09:07:47Z

Reference: specs

We need to collect data about user contributions: approval ratio, self-approval ratio. See specs for more details.

eemeli · 2022-04-20T12:04:09Z

I'm a bit concerned that these ratio graphs would end up hiding valuable absolute-value data. It should be relatively straightforward to validate from actual data assumptions about e.g. the effect of variance across months in the number of suggestion submissions in the resulting ratios.

As a counter-proposal, my gut feeling is that a single stacked bar graph showing approved/self-approved/rejected/unreviewed strings for each month would be more informative.

mathjazz · 2022-08-10T11:06:22Z

@flodolo What are your thoughts about the alternative chart?

flodolo · 2022-08-10T11:30:10Z

This issue is about data collection, not graphs (that would be #2487). Is there anything here that would change based on the graph we end up using?

mathjazz · 2022-08-10T11:45:50Z

Yeah, we'd need to store different (absolute numbers instead of ratios) and additional (unreviewed) data.

flodolo · 2022-08-10T12:32:08Z

I would prefer to stick with the proposal in the specs.

The original goal for this part was to quickly get a sense of the quality of contributions (is someone else looking at these translations for a manager? How many are rejected for a new contributor?) and how this changes over time.

With a stacked graph, it would be impossible to see how these values evolve (e.g. approval-ratio if I want to promote someone to translator).

To get a sense of the size of the contribution, there's already the Contribution graph part. The number of unreviewed belongs to that.

In terms of data. I guess nothing prevents us from storing absolute data + ratio (or calculate the ratio on the fly) if we want?

mathjazz · 2022-08-10T12:33:37Z

In terms of data. I guess nothing prevents us from storing absolute data + ratio (or calculate the ratio on the fly) if we want?

Correct.

mathjazz · 2022-08-18T00:22:51Z

It turns out we can collect the relevant data on each page load fast enough.

Ne need for cron jobs and storing data in the database.

I'll paste the script I used locally and use it in #2486.

I'll close the issue.

import datetime

from dateutil.relativedelta import relativedelta

from django.contrib.auth.models import User
from django.db.models import Count, F, Q
from django.db.models.functions import TruncMonth
from django.utils import timezone

from pontoon.actionlog.models import ActionLog
from pontoon.base.utils import convert_to_unix_time

today = timezone.now().date()

dates = sorted(
    [
        convert_to_unix_time(
            datetime.date(today.year, today.month, 1) - relativedelta(months=n)
        )
        for n in range(25)
    ]
)

def extract_data(qs):
    values = [0] * 25
    for item in (
        qs.annotate(created_month=TruncMonth("created_at"))
        .values("created_month")
        .annotate(count=Count("id"))
        .values("created_month", "count")
    ):
        date = convert_to_unix_time(item["created_month"])
        index = dates.index(date)
        values[index] = item["count"]
    return values

u = User.objects.get(email="flod+pontoon@mozilla.com")
actions = ActionLog.objects.filter(
    created_at__gte=timezone.now() - relativedelta(years=2),
    translation__user=u,
)

peer_actions = actions.exclude(performed_by=u)
peer_approvals = extract_data(
    peer_actions.filter(action_type=ActionLog.ActionType.TRANSLATION_APPROVED)
)
peer_rejections = extract_data(
    peer_actions.filter(action_type=ActionLog.ActionType.TRANSLATION_REJECTED)
)

self_actions = actions.filter(performed_by=u)
self_approvals = extract_data(
    self_actions.filter(
        # self-approved after submitting suggestions
        Q(action_type=ActionLog.ActionType.TRANSLATION_APPROVED)
        # submitted directly as translations
        | Q(
            action_type=ActionLog.ActionType.TRANSLATION_CREATED,
            translation__date=F("translation__approved_date"),
        )
    )
)

approval_ratio = [
    0 if sum(pair) == 0 else (pair[0] / sum(pair))
    for pair in zip(peer_approvals, peer_rejections)
]

self_approval_ratio = [
    0 if sum(pair) == 0 else (pair[0] / sum(pair))
    for pair in zip(peer_approvals, self_approvals)
]

flodolo added this to To do in Profile Page Redesign via automation Apr 20, 2022

This was referenced Apr 20, 2022

Implement new design layout (without contribution graph) #2486

Closed

Implement contribution graph #2487

Closed

mathjazz added the profile Profile page redesign label Apr 20, 2022

mathjazz changed the title ~~[profile] Implement data collection for new profile graphs~~ Implement data collection for new profile graphs Apr 20, 2022

mathjazz changed the title ~~Implement data collection for new profile graphs~~ Implement data collection for the approval ratio graphs Apr 21, 2022

mathjazz added the P2 We want to ship it soon, possibly in the current quarter label Apr 28, 2022

mathjazz added the enhancement label Aug 4, 2022

mathjazz self-assigned this Aug 4, 2022

mathjazz closed this as completed Aug 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement data collection for the approval ratio graphs #2484

Implement data collection for the approval ratio graphs #2484

flodolo commented Apr 20, 2022 •

edited by mathjazz

eemeli commented Apr 20, 2022

mathjazz commented Aug 10, 2022

flodolo commented Aug 10, 2022

mathjazz commented Aug 10, 2022

flodolo commented Aug 10, 2022

mathjazz commented Aug 10, 2022

mathjazz commented Aug 18, 2022

Implement data collection for the approval ratio graphs #2484

Implement data collection for the approval ratio graphs #2484

Comments

flodolo commented Apr 20, 2022 • edited by mathjazz

eemeli commented Apr 20, 2022

mathjazz commented Aug 10, 2022

flodolo commented Aug 10, 2022

mathjazz commented Aug 10, 2022

flodolo commented Aug 10, 2022

mathjazz commented Aug 10, 2022

mathjazz commented Aug 18, 2022

flodolo commented Apr 20, 2022 •

edited by mathjazz