Skip to content

SAS table profiling. Validation and Verification of Database Tables Validation and Verification of Database Tables. Profile anay SAS dataset. Cardinality, outliers, top n frequencies, bottom n frequencies, one to many, many to many, associations between numerical and character variables abd duplicates on key. Keywords: sas sql join merge big dat…

License

rogerjdeangelis/voodoo

Repository files navigation

voodoo

Validation and Verification of Database Tables. Profile any SAS dataset. Cardinality, outliers, top n frequencies, bottom n frequencies, one to many, many to many, associations between numerical and character variables and duplicates on key

/* %let pgm=oto_voodoo;

  • use this to get locations of macros for easy editiong;
  • once program is solid you may want to move macros to autocall library; data null; infile "c:\utl\oto_voodoo.sas" lrecl=300 recfm=v; input; if index(lowcase(infile),'%macro')>0 then do; macro=scan(left(infile ),1,'('); put @5 n @15 macro $32.; end; run;quit;

What this macro does

see the tail of this program for same usage You should be able to run this as is.

The macro invocation is at the end of this file

1. Dataset level summary -- ie number of obs, variable types, static data                                                                 
2. Cardinality page (primary keys, codes/decodes number of unique values for every variable - EG fails here)                              
3. Complete frequency for all variables numeric and character with less than 200 levels                                                   
4. For variables with over 200 levels top 100 most frequent and bottom 100 least frequent Least frequent are the more interesting cases.  
5. Proc means on all numeric variables                                                                                                    
6. Proc univariate on all numeric variables                                                                                               
7. Special datetime analysis                                                                                                              
8. Histograms on all variables with less than 200 levels                                                                                  
9. Proc contents                                                                                                                          
10. Frequency of all numeric variables Missing, negative, zero and positive                                                               
11. Duplicates on single or compound key. Output printed vertically for easy comparison                                                   
12. Cross tabs of every variable with every other variable top 16 levels (if selectd)                                                     
13. You can also select one variable to cross tab with all other variables max top 16 levels                                              
14. Maximum and minimum lengths to hold all numeric and character variables exactly (optimize)                                            
15. Correlation of all pairs of numeric variables sorted by largest correlation to lowest.                                                
16. Nice display of max and mins for numeric and character in one table                                                                   
17. List of identical columns ie date and date1 have equal values on all observations                                                     
18. One to Many, Many to One, One to Many and Many to Many                                                                                
19. Cochran-Mantel-Haenszel Statistics (Cramer relationship amoung catagorical variables)                                                 
20. Finds missing patterns                                                                                                                
21. Printout of first 20, middle 20 and last 20 observations.                                                                             
22. Missing Pattern amalysis                                                                                                              
23. Missing populated in a single table                                                                                                   
24. Missing Pattern Analysis  

SAMPLE RUN

%inc "c:/oto/oto_voodoo.sas";

%utlvdoc
(
libname        = sashelp         /* libname of input dataset */
,data          = zipcode      /* name of input dataset */
,key           = 0            /* 0 or variable */
,ExtrmVal      = 10           /* display top and bottom 30 frequencies */
,UniPlot       = 0            /* 0 or univariate plots    */
,UniVar        = 0            /* 0 or univariate analysis */
,chart         = 0            /* 0 or proc chart horizontal histograme */
,misspat       = 0            /* 0 or 1 missing patterns */
,taball        = 0            /* 0 crosstabs of all pairwise combinations of vriables */
,tabone        = 0            /* 0 or all pairwise cross tabs with limits */
,mispop        = 0            /* 0 0 negative positive or missing on each variable */
,mispoptbl     = 0            /* 0 missing populated table */
,dupcol        = 0            /* 0 do two columns have the same values in all rows */
,unqtwo        = 0            /* 0 only use to find primary key unique leveels of compund keys */
,vdocor        = 0            /* 0 or all pairwise parametric and non parametric collolations */
,oneone        = 0            /* 0 or 1:1  1:many many:many */
,cramer        = 1            /* 0 or cramer V variable crossed with all others */
,optlength     = 0            /* 0 optimum length for character and numeric variables */
,maxmin        = 0            /* 0 or max min for every varuiable */
,unichr        = 0            /* 0 univariate analysis of character variiables */
,outlier       = 0            /* 0 robust regression determination of outliers */
,rsquare       = zip x y msa state  /* 0 robust regression determination of outliers */
,printto       = c:\txt\vdo\&data..txt  /* save the voluminous output */
,Cleanup       = 0
);

About

SAS table profiling. Validation and Verification of Database Tables Validation and Verification of Database Tables. Profile anay SAS dataset. Cardinality, outliers, top n frequencies, bottom n frequencies, one to many, many to many, associations between numerical and character variables abd duplicates on key. Keywords: sas sql join merge big dat…

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages