Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency in meal price outlier classifier #489

Open
sergiomario opened this issue Jul 15, 2019 · 3 comments
Open

Inconsistency in meal price outlier classifier #489

sergiomario opened this issue Jul 15, 2019 · 3 comments
Labels

Comments

@sergiomario
Copy link
Collaborator

What is the problem?
In some publications made by Rosie's Twitter, it is noted that the value identified as suspect is within the standard deviation established by the classifier.

As can be verified in the following suspicion:
Suspicions Tweet
Jarbas Documebt
In this case the value is only 34.50 BRL.

How can this be addressed?
I think it is necessary to adjust the classifier rules or improve the training set.

@cuducos
Copy link
Collaborator

cuducos commented Jul 15, 2019

Arguably this example does not shows what you claim it does.

1. Average value for this venue is R$ 13

Using Jarbas's shell_plus we can see they are low in term of total value/price, in average each claim sums R$ 13.22:

In [1]: import statistics

In [2]: values = tuple(r.total_net_value for r in Reimbursement.objects.filter(cnpj_cpf='05467695000130'))

In [3]: sum(values) / len(values)
Out[3]: Decimal('13.22262773722627737226277372')

2. Standard deviation is R$ 6

Also, the standard deviation is around R$ 6.66 (hey devil 😈):

In [4]: statistics.stdev(values)
Out[4]: Decimal('6.655636628195527505758211117')

3. Thus R$ 34 happens to be above the threshold

Thus the threshold is R$ R$ 33.19, below the example value of R$ 34.05:

In [5]: (sum(values) / len(values)) + (3 * statistics.stdev(values))
Out[5]: Decimal('33.18953762181285988953740707')

@cuducos
Copy link
Collaborator

cuducos commented Jul 15, 2019

Also we can check the (arguably) low values in Jarbas: https://jarbas.serenata.ai/dashboard/chamber_of_deputies/reimbursement/?q=05467695000130

@cuducos
Copy link
Collaborator

cuducos commented Jul 15, 2019

What I mean is… Rosie has a good accuracy, but 100% is impossible. This example seams more a false-positive than a bug. Sure we can learn with this example and improve the classifier, let's say, to ask it to only consider venues with averages greater than a certain minimum limit ; )

machadowisck added a commit to machadowisck/serenata-de-amor that referenced this issue Nov 1, 2019
Does the suggestion made in issue's discussion about considering values greater than a certain minimum goes like this tiny change?

I'm afraid I'm missing the big picture.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants