Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement data collection for the approval ratio graphs #2484

Closed
flodolo opened this issue Apr 20, 2022 · 7 comments
Closed

Implement data collection for the approval ratio graphs #2484

flodolo opened this issue Apr 20, 2022 · 7 comments
Assignees
Labels
enhancement P2 We want to ship it soon, possibly in the current quarter profile Profile page redesign

Comments

@flodolo
Copy link
Collaborator

flodolo commented Apr 20, 2022

Reference: specs

We need to collect data about user contributions: approval ratio, self-approval ratio. See specs for more details.

@flodolo flodolo added this to To do in Profile Page Redesign via automation Apr 20, 2022
@mathjazz mathjazz added the profile Profile page redesign label Apr 20, 2022
@mathjazz mathjazz changed the title [profile] Implement data collection for new profile graphs Implement data collection for new profile graphs Apr 20, 2022
@eemeli
Copy link
Member

eemeli commented Apr 20, 2022

I'm a bit concerned that these ratio graphs would end up hiding valuable absolute-value data. It should be relatively straightforward to validate from actual data assumptions about e.g. the effect of variance across months in the number of suggestion submissions in the resulting ratios.

As a counter-proposal, my gut feeling is that a single stacked bar graph showing approved/self-approved/rejected/unreviewed strings for each month would be more informative.

@mathjazz mathjazz changed the title Implement data collection for new profile graphs Implement data collection for the approval ratio graphs Apr 21, 2022
@mathjazz mathjazz added the P2 We want to ship it soon, possibly in the current quarter label Apr 28, 2022
@mathjazz mathjazz self-assigned this Aug 4, 2022
@mathjazz
Copy link
Collaborator

@flodolo What are your thoughts about the alternative chart?

@flodolo
Copy link
Collaborator Author

flodolo commented Aug 10, 2022

This issue is about data collection, not graphs (that would be #2487). Is there anything here that would change based on the graph we end up using?

@mathjazz
Copy link
Collaborator

Yeah, we'd need to store different (absolute numbers instead of ratios) and additional (unreviewed) data.

@flodolo
Copy link
Collaborator Author

flodolo commented Aug 10, 2022

I would prefer to stick with the proposal in the specs.

The original goal for this part was to quickly get a sense of the quality of contributions (is someone else looking at these translations for a manager? How many are rejected for a new contributor?) and how this changes over time.

With a stacked graph, it would be impossible to see how these values evolve (e.g. approval-ratio if I want to promote someone to translator).

To get a sense of the size of the contribution, there's already the Contribution graph part. The number of unreviewed belongs to that.

In terms of data. I guess nothing prevents us from storing absolute data + ratio (or calculate the ratio on the fly) if we want?

@mathjazz
Copy link
Collaborator

In terms of data. I guess nothing prevents us from storing absolute data + ratio (or calculate the ratio on the fly) if we want?

Correct.

@mathjazz
Copy link
Collaborator

It turns out we can collect the relevant data on each page load fast enough.

Ne need for cron jobs and storing data in the database.

I'll paste the script I used locally and use it in #2486.

I'll close the issue.

import datetime

from dateutil.relativedelta import relativedelta

from django.contrib.auth.models import User
from django.db.models import Count, F, Q
from django.db.models.functions import TruncMonth
from django.utils import timezone

from pontoon.actionlog.models import ActionLog
from pontoon.base.utils import convert_to_unix_time

today = timezone.now().date()

dates = sorted(
    [
        convert_to_unix_time(
            datetime.date(today.year, today.month, 1) - relativedelta(months=n)
        )
        for n in range(25)
    ]
)

def extract_data(qs):
    values = [0] * 25
    for item in (
        qs.annotate(created_month=TruncMonth("created_at"))
        .values("created_month")
        .annotate(count=Count("id"))
        .values("created_month", "count")
    ):
        date = convert_to_unix_time(item["created_month"])
        index = dates.index(date)
        values[index] = item["count"]
    return values

u = User.objects.get(email="flod+pontoon@mozilla.com")
actions = ActionLog.objects.filter(
    created_at__gte=timezone.now() - relativedelta(years=2),
    translation__user=u,
)

peer_actions = actions.exclude(performed_by=u)
peer_approvals = extract_data(
    peer_actions.filter(action_type=ActionLog.ActionType.TRANSLATION_APPROVED)
)
peer_rejections = extract_data(
    peer_actions.filter(action_type=ActionLog.ActionType.TRANSLATION_REJECTED)
)

self_actions = actions.filter(performed_by=u)
self_approvals = extract_data(
    self_actions.filter(
        # self-approved after submitting suggestions
        Q(action_type=ActionLog.ActionType.TRANSLATION_APPROVED)
        # submitted directly as translations
        | Q(
            action_type=ActionLog.ActionType.TRANSLATION_CREATED,
            translation__date=F("translation__approved_date"),
        )
    )
)

approval_ratio = [
    0 if sum(pair) == 0 else (pair[0] / sum(pair))
    for pair in zip(peer_approvals, peer_rejections)
]

self_approval_ratio = [
    0 if sum(pair) == 0 else (pair[0] / sum(pair))
    for pair in zip(peer_approvals, self_approvals)
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement P2 We want to ship it soon, possibly in the current quarter profile Profile page redesign
Projects
No open projects
Development

No branches or pull requests

3 participants