Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test_id to Achilles Heel generation #27

Closed
chrisknoll opened this issue Oct 6, 2014 · 4 comments
Closed

Add test_id to Achilles Heel generation #27

chrisknoll opened this issue Oct 6, 2014 · 4 comments
Assignees

Comments

@chrisknoll
Copy link
Contributor

Currently, there is no way to uniquely identify an error for an analysis. For example, you may receive 2 errors for the same Analysis_Id, and while they have different text messages, you can't tell if they came from different tests. You also can not compare results between 2 different heel reports over time to determine if the same error occurred or was fixed.

We propose the following change to all Achilles heel queries:

First, add a column to Achilles_heel_results

CREATE TABLE ACHILLES_HEEL_results (
  test_id INT,
  analysis_id INT,
  ACHILLES_HEEL_warning VARCHAR(255)
);

Next, for each Achilles heel test:

--check for non-zero counts from checks of improper data (invalid ids, out-of-bound data, inconsistent dates)
INSERT INTO ACHILLES_HEEL_results (
        test_id, -- this is new
    analysis_id,
    ACHILLES_HEEL_warning
    )
SELECT DISTINCT 1 as test_id, -- hard code test ID for this test
    or1.analysis_id,
    'ERROR: ' + cast(or1.analysis_id as VARCHAR) + '-' + oa1.analysis_name + '; count (n=' + cast(or1.count_value as VARCHAR) + ') should not be > 0' AS ACHILLES_HEEL_warning
FROM ACHILLES_results or1
INNER JOIN ACHILLES_analysis oa1
    ON or1.analysis_id = oa1.analysis_id
WHERE or1.analysis_id IN (
        7,
        8,
        9,
        114,
        115,
        207,
        208,
        209,
        210,
        302,
        409,
        410,
        411,
        412,
        413,
        509,
        510,
        609,
        610,
        612,
        613,
        709,
        710,
        711,
        712,
        713,
        809,
        810,
        812,
        813,
        814,
        908,
        909,
        910,
        1008,
        1009,
        1010,
        1415,
        1500,
        1501,
        1600,
        1601,
        1701
        ) --all explicit counts of data anamolies
    AND or1.count_value > 0;

With a test_id, we can then determine if a test was resolved between heel reports.

@chrisknoll
Copy link
Contributor Author

Vojtech,
I'm assigning this issue to you since the recent updates to achilles will address this issue. We have a separate column called 'rule_id' (which replaces the proposed test_id above), plus a new column to store the record_count to understand the prevalence.

Please make the remaining changes to the AchillesHeel (v4 and v5) script that will populate the rule_id and record counts. Additionally, i you mentioned that it would be good to have a rule lookup table to find the name for each rule_id. This is also something nice to have, but please update the heel scripts to populate the rule_id and record_count as soon as possible, since in it's current form only a few Achilles heel rows have these values populated.

@vojtechhuser
Copy link
Contributor

This is now 99% completed.

@chrisknoll
Copy link
Contributor Author

So, what is the last 1%?

@vojtechhuser
Copy link
Contributor

Well, is is more like 99.9999%.
We still see errors in v4 Heel. Let's close the issue though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants