# SAS - Flowing Through Time: My Family of Four's Monthly Water Usage (Gallons) Compared to the Town of Cary's Average

## Download the and preview the xlsx data

In [3]:
/* Reference the data from GitHub */
filename xlfile URL 'https://github.com/pestyld/data_projects/raw/master/water_usage_analysis/data/AMI_METER_READS-METER_INFO_HOURLY.xlsx';

/* Change column names to valid SAS values */
options validvarname=v7;

/* Download and import the XLS file */
proc import datafile=xlfile
            dbms=xlsx 
            out=work.water_usage 
            replace;
run;


/* Preview the data */
proc print data=water_usage(obs=5);
run;

/* View column metadata */
ods select variables;
proc contents data=water_usage;
run;

Obs,Service,Read_Date_Time,Usage__in_Gallons
1,Water,11/15/23 12:00 AM,0
2,Water,11/14/23 11:00 PM,0
3,Water,11/14/23 10:00 PM,0
4,Water,11/14/23 9:00 PM,0
5,Water,11/14/23 8:00 PM,0

Alphabetic List of Variables and Attributes,Alphabetic List of Variables and Attributes,Alphabetic List of Variables and Attributes,Alphabetic List of Variables and Attributes,Alphabetic List of Variables and Attributes,Alphabetic List of Variables and Attributes,Alphabetic List of Variables and Attributes
#,Variable,Type,Len,Format,Informat,Label
2,Read_Date_Time,Char,17,$17.,$17.,Read Date/Time
1,Service,Char,5,$5.,$5.,Service
3,Usage__in_Gallons,Char,6,$6.,$6.,Usage in Gallons


## Prepare Data - Create Final Hourly Data
- Modify the **Read_Date_Time** character column to a valid date value
- Modify **Usage__in_Gallons** char column: rename Usage in gallons column, then convert to numeric
- Create **Date** column
- Create **Time** column
- Create **Year** column
- Create **MonthYear** column
- Format all columns accordingly
- Add labels
- Drop unnecessary columns

In [4]:
data water_clean;
    set water_usage (rename=(Usage__in_Gallons = Usage_in_Gallons_char)); /* Rename usage column to char to replace later */
    
    /* Convert usage_in_gallons to numeric */
    usage_in_gallons = input(Usage_in_Gallons_char, 8.);
    
    /* Convert read_date to a numeric date value */
    read_date = input(Read_Date_Time, mdyampm23.);
    
    /* Create date columns */
    Date = datepart(read_date);
    Time = timepart(read_date);
    Month = Date;
    Year = year(Date);
    MonthYear = Date;
    
    /* Format columns */
    format 
        read_date mdyampm23.
        Date date9.
        Time timeampm.
        Month monname.
        MonthYear monyy7.
        usage_in_gallons comma15.;
    
    /* Labels */
    label
        read_date = 'Read Date'
        usage_in_gallons = 'Total Gallons';
    
    /* Drop columns */
    drop 
        Service 
        Read_Date_Time
        Usage_in_Gallons_char;
run;

/* Preview the new table */
proc print data=water_clean(obs=5);
run;


/* View column metadata */
ods select Position;
proc contents data=water_clean varnum;
run;

Obs,usage_in_gallons,read_date,Date,Time,Month,Year,MonthYear
1,0,11/15/2023 12:00 AM,15NOV2023,12:00:00 AM,November,2023,NOV2023
2,0,11/14/2023 11:00 PM,14NOV2023,11:00:00 PM,November,2023,NOV2023
3,0,11/14/2023 10:00 PM,14NOV2023,10:00:00 PM,November,2023,NOV2023
4,0,11/14/2023 9:00 PM,14NOV2023,9:00:00 PM,November,2023,NOV2023
5,0,11/14/2023 8:00 PM,14NOV2023,8:00:00 PM,November,2023,NOV2023

Variables in Creation Order,Variables in Creation Order,Variables in Creation Order,Variables in Creation Order,Variables in Creation Order,Variables in Creation Order
#,Variable,Type,Len,Format,Label
1,usage_in_gallons,Num,8,COMMA15.,Total Gallons
2,read_date,Num,8,MDYAMPM23.,Read Date
3,Date,Num,8,DATE9.,
4,Time,Num,8,TIMEAMPM.,
5,Month,Num,8,MONNAME.,
6,Year,Num,8,,
7,MonthYear,Num,8,MONYY7.,


## Explore Data
Simply view max, mean and min of the data to see the high and low values, then see the start and end dates.

### Overall min, mean and max of hourly data

In [5]:
proc means data=water_clean noprint;
    var usage_in_gallons date;
    output out=data_summary(drop=_TYPE_) 
        max(usage_in_gallons)=MaxGal mean(usage_in_gallons)=MeanGal min(usage_in_gallons)=MinGal4
        max(date)=MaxDate min(date)=MinDate
        ;
run;

title "Total obs, Max, Mean and Min usage and date by Hour";
proc print data=data_summary;
run;
title;

Obs,_FREQ_,MaxGal,MeanGal,MinGal4,MaxDate,MinDate
1,9796,290,5,0,15NOV2023,01OCT2022


## Analyzing Water Usage Monthly

In [6]:
proc print data=water_clean(obs=5);
run;

Obs,usage_in_gallons,read_date,Date,Time,Month,Year,MonthYear
1,0,11/15/2023 12:00 AM,15NOV2023,12:00:00 AM,November,2023,NOV2023
2,0,11/14/2023 11:00 PM,14NOV2023,11:00:00 PM,November,2023,NOV2023
3,0,11/14/2023 10:00 PM,14NOV2023,10:00:00 PM,November,2023,NOV2023
4,0,11/14/2023 9:00 PM,14NOV2023,9:00:00 PM,November,2023,NOV2023
5,0,11/14/2023 8:00 PM,14NOV2023,8:00:00 PM,November,2023,NOV2023


## Data Preparation for Visualization
- Use MEANS to summarize the data by month and year
- Create two macro variables
    - **total_family_members** - Store the number of members of my family.
    - **AVGPP_Cary_2022** - Daily average gallons in the town of Cary in 2022
- Use the DATA step to:
    - **MeterStatus** - Identify when the meter broke (19AUG2023)
    - **num_days_in_month** - Find the number of days in each month
    - **avg_gallons_per_month** - Calculate average gallons per used per month
    - **total_water_usage_pp_day_cary** - Town of Cary's average per person water usage in 2022 (48) times the the number of people in my home (4)
    - **total_water_usage_pp_month_cary** - Multiple the town of Cary's average for a family of 4 per day times the number of days in a month for the total used.
    - **usage_in_gallons_broken** - test
    - **usage_in_gallons_avg_broken** - test
    - **usage_in_gallons_sum_labels** - test
    - **usage_in_gallons_avg_labels** - test

In [7]:
/* Create MonthYear summary table */
ods output Summary=monthly_summary;
proc means data=water_clean n sum;
    var usage_in_gallons;
    class MonthYear;
run;


/* Average gallons per person in Cary, NC in 2022 : https://data.townofcary.org/pages/water-use-per-capita/ */
%let total_family_members = 4;
%let AVGPP_Cary_2022 = 48;

data monthly_summary;
    length MeterStatus $7;
    set monthly_summary;
    
    /* Identify broken meter months */
    if MonthYear < '01AUG2023'd then MeterStatus = 'Working';
    else MeterStatus = 'Broken';
    
    /* Avg water usage per day in a month */
    num_days_in_month = day(intnx('month',MonthYear, 0,'end'));
    avg_gallons_per_month = round(usage_in_gallons_Sum / num_days_in_month);
    
    /* Avg and total water usage per month in Cary, NC */
    total_water_usage_pp_day_cary = &total_family_members * &AVGPP_Cary_2022;  
    total_water_usage_pp_month_cary = total_water_usage_pp_day_cary * num_days_in_month;
    
    /* Find number of days in the month */
    if MeterStatus='Broken' then do;
        usage_in_gallons_broken = usage_in_gallons_Sum;
        usage_in_gallons_avg_broken = avg_gallons_per_month;
    end;
    else do;
        usage_in_gallons_sum_labels = usage_in_gallons_Sum;
        usage_in_gallons_avg_labels = avg_gallons_per_month;
    end;
    
    /* Format the columns */
    format usage_in_gallons_Sum usage_in_gallons_sum_labels comma16.
           MonthYear monyy7.;
           
    /* Drop unnecessary columns */
    drop usage_in_gallons_N NObs;
run;

proc print data=monthly_summary;
run;

Analysis Variable : usage_in_gallons Total Gallons,Analysis Variable : usage_in_gallons Total Gallons,Analysis Variable : usage_in_gallons Total Gallons,Analysis Variable : usage_in_gallons Total Gallons
MonthYear,N Obs,N,Sum
OCT22,743,743,5720.0
NOV22,720,720,5000.0
DEC22,744,744,3840.0
JAN23,744,744,4550.0
FEB23,672,672,3890.0
MAR23,743,743,4050.0
APR23,720,720,5280.0
MAY23,744,744,3820.0
JUN23,720,720,3420.0
JUL23,744,744,4650.0

Obs,MeterStatus,MonthYear,usage_in_gallons_Sum,num_days_in_month,avg_gallons_per_month,total_water_usage_pp_day_cary,total_water_usage_pp_month_cary,usage_in_gallons_broken,usage_in_gallons_avg_broken,usage_in_gallons_sum_labels,usage_in_gallons_avg_labels
1,Working,OCT2022,5720,31,185,192,5952,.,.,5720,185
2,Working,NOV2022,5000,30,167,192,5760,.,.,5000,167
3,Working,DEC2022,3840,31,124,192,5952,.,.,3840,124
4,Working,JAN2023,4550,31,147,192,5952,.,.,4550,147
5,Working,FEB2023,3890,28,139,192,5376,.,.,3890,139
6,Working,MAR2023,4050,31,131,192,5952,.,.,4050,131
7,Working,APR2023,5280,30,176,192,5760,.,.,5280,176
8,Working,MAY2023,3820,31,123,192,5952,.,.,3820,123
9,Working,JUN2023,3420,30,114,192,5760,.,.,3420,114
10,Working,JUL2023,4650,31,150,192,5952,.,.,4650,150


#### Default Visual

In [8]:
title "DEFAULT GRAPH: Monthly Water Usage from October 2022 to November 2023";
proc sgplot data=monthly_summary;
    vline MonthYear / response=usage_in_gallons_Sum;
    vline MonthYear / response=avg_gallons_per_month;
run;
title;

#### Final Visual

Create macro variables for specific settings

In [9]:
/* Set the path to your folder (REQUIRED) */
%let outpath = &path\water_usage_analysis;

/* Set default visualization colors and setting */
%let textColor = CX3D444F;
%let myBlue = CX0766D1;
%let myRed = CXF24949;
%let myLightRed = CXFF9999;
%let lightGray = CXC1C7C9;
%let labelSize = 12pt;
%let curveLabelSize = 10pt;
%let townCaryLinesColor = gray;


/* Create the max y value for the graph by increasing the max value by %25  and rounding to the nearest 1,000 */
proc sql noprint;
    select round(max(usage_in_gallons_Sum) * 1.25, 1000)
        into :maxYValue trimmed
        from monthly_summary;
quit;

/* Create a macro for the position of water usage on Aug2023 for the annotation line */
proc sql noprint;
    select usage_in_gallons_Sum format=8.
    into :aug2023_total trimmed
    from monthly_summary
    where MonthYear='01Aug2023'd;
quit;

/* View macro variable values */
%put &=maxYValue;
%put &=aug2023_total;


23                                                         The SAS System                         14:34 Wednesday, December 20, 2023

252        ods listing close;ods html5 (id=saspy_internal) file=_tomods1 options(bitmap_mode='inline') device=svg style=HTMLBlue;
252      ! ods graphics on / outputfmt=png;
[38;5;21mNOTE: Writing HTML5(SASPY_INTERNAL) Body file: _TOMODS1[0m
253        
254        /* Set the path to your folder (REQUIRED) */
255        %let outpath = &path\water_usage_analysis;
256        
257        /* Set default visualization colors and setting */
258        %let textColor = CX3D444F;
259        %let myBlue = CX0766D1;
260        %let myRed = CXF24949;
261        %let myLightRed = CXFF9999;
262        %let lightGray = CXC1C7C9;
263        %let labelSize = 12pt;
264        %let curveLabelSize = 10pt;
265        %let townCaryLinesColor = gray;
266        
267        
268        /* Create the max y value for the graph by increasing the max value by %25  and rounding t

Create my annotation table to add annotations to the visual.

In [10]:
/* Import the annotation macro programs */
%SGANNO

/* Create annotation data set for the graph */
data myAnno;
    /* 2022 and 2023 labels */
    %sgtext(drawspace='datavalue',x1='01Oct2022'd, y1=2, label="2022", width = 10, 
            justify="left", textcolor = "&lightGray", textSize=16, anchor='bottomleft', discreteoffset=-.5);
    %sgtext(drawspace='datavalue',x1='01Jan2023'd, y1=2, label="2023", width = 10, 
            justify="left", textcolor = "&lightGray", textSize=16, anchor='bottomleft', discreteoffset=0);
    
    /* Bad water meter text and shading */
    %sgline(drawspace="datavalue", linepattern='shortdash', lineColor="&myRed",
            x1='01Aug2023'd, x2='01Sep2023'd, y1=&aug2023_total, y2=&aug2023_total);
    %sgline(drawspace="datavalue",
            x1='01Sep2023'd, x2='01Sep2023'd, y1=&aug2023_total, y2=4000);
    %sgtext(drawspace='datavalue', x1='01Aug2023'd, y1=&maxYValue-1800, label="Our home water meter broke on August 19, 2023, and has not been repaired.", 
            width = 25, justify="center", 
            textcolor = "white", textSize=11, anchor='topleft', discreteoffset=+.15,
            fillColor="&myRed", textweight="bold", reset='all');
    %sgrectangle(drawspace='datavalue', 
                 x1='01Aug2023'd , widthunit='data', width='01Oct2023'd,
                 y1=0, heightunit='data', height=&maxYValue,
                 display = 'fill', filltransparency=.9, fillcolor="&myRed", anchor='bottomleft',reset='all');
run;

/* View the data */
proc print data=myAnno;
run;

Obs,ANCHOR,DISPLAY,DRAWSPACE,FILLCOLOR,FUNCTION,HEIGHTUNIT,JUSTIFY,LABEL,LINECOLOR,LINEPATTERN,TEXTCOLOR,TEXTWEIGHT,WIDTHUNIT,DISCRETEOFFSET,TEXTSIZE,WIDTH,X1,Y1,X2,Y2,FILLTRANSPARENCY,HEIGHT
1,bottomleft,,datavalue,,TEXT,,left,2022,,,CXC1C7C9,,,-0.50,16,10,22919,2,.,.,.,.
2,bottomleft,,datavalue,,TEXT,,left,2023,,,CXC1C7C9,,,0.00,16,10,23011,2,.,.,.,.
3,bottomleft,,datavalue,,LINE,,left,2023,CXF24949,shortdash,CXC1C7C9,,,0.00,16,10,23223,2550,23254,2550,.,.
4,bottomleft,,datavalue,,LINE,,left,2023,CXF24949,shortdash,CXC1C7C9,,,0.00,16,10,23254,2550,23254,4000,.,.
5,topleft,,datavalue,CXF24949,TEXT,,center,"Our home water meter broke on August 19, 2023, and has not been repaired.",,,white,bold,,0.15,11,25,23223,5200,.,.,.,.
6,bottomleft,fill,datavalue,CXF24949,RECTANGLE,data,,,,,,,data,.,.,23284,23223,0,.,.,0.9,7000


Create final visualization

In [11]:
/* Save the visual as a PNG file and set the size and DPI */
ods listing gpath = "&outpath" image_dpi = 150;
ods graphics / imagename = "Water_Usage_MonthlyFinal" imagefmt = png width = 10in height = 5in;

/* Add titles and footnotes */
title justify = left color = &textColor height=14pt "Flowing Through Time: My Family of Four's Monthly Water Usage (Gallons) Compared to the Town of Cary's Average";
title2 justify = left color = &textColor height=12pt  "October 2022 - November 2023";

footnote1 justify = left color = &textColor height=8pt italic "To obtain Town of Cary water use per capita data, visit https://data.townofcary.org/pages/water-use-per-capita/";
footnote2 justify = left color = &textColor height=8pt italic "The daily average is determined by multiplying the per capita usage in 2022 by four. Monthly totals are calculated by multiplying the average usage for a family of four by the number of days in the month.";
footnote3 justify = left color = &textColor height=8pt italic "Created on November 11, 2023";

/* Visualization */
proc sgplot data = monthly_summary
            sganno = myAnno
            noborder nocycleattrs nowall;
            
    /* 1. REFLINE FOR THE NEW YEAR */
    refline 'Jan2023' / 
        axis = x 
        labelpos = min 
        labelloc = inside 
        lineattrs = (color = lightgray);
    
    /* 2. TOWN OF CARY TOTAL AND AVERAGE LINES */
    vline MonthYear / 
        name = 'Cary_Monthly_Total'
        response = total_water_usage_pp_month_cary
        lineattrs = (pattern = MediumDash color = &townCaryLinesColor);
    vline MonthYear / 
        name = 'Cary_Daily_Average'
        response = total_water_usage_pp_day_cary
        y2axis
        lineattrs = (pattern = ShortDash color = &townCaryLinesColor);
    
    /* 3. TOTAL GALLONS LINE (working and broken) */
    /* a. Working total line */
    vline MonthYear / 
        response=usage_in_gallons_Sum
        lineattrs = (thickness = 3 color = &myBlue)
        dataskin = none 
        markers markerattrs = (symbol = CircleFilled size = 10 color = &myBlue)
        datalabel = usage_in_gallons_sum_labels datalabelattrs = (color = &myBlue)
        curvelabel = 'Total Gallons' curvelabelpos = min curvelabelattrs = (color = &myBlue size = &curveLabelSize);
    /* b. Broken total line */
    vline MonthYear /
        response=usage_in_gallons_broken
        lineattrs = (thickness = 3 color = &myRed)
        markers markerattrs = (symbol = CircleFilled size = 10.5 color = red)
        dataskin = none;
    
    /* 4. AVG GALLONS A MONTH LINES (working and broken) */
    /* a. Working average line */
    vline MonthYear / 
        response = avg_gallons_per_month 
        y2axis
        lineattrs = (color = &myBlue)
        datalabel = usage_in_gallons_avg_labels datalabelattrs = (color = &myBlue)
        dataskin = none
        markers markerattrs = (color = &myBlue symbol = CircleFilled size = 6)
        curvelabel = 'Daily Average' curvelabelpos = min curvelabelattrs = (color = &myBlue size = &curveLabelSize);
    /* b. Broken average line */
    vline MonthYear /
        response = usage_in_gallons_avg_broken
        y2axis
        lineattrs = (color = &myRed)
        dataskin = none
        markers markerattrs = (color = &myRed symbol = CircleFilled size = 6.5)
    ;
    
    /* 5. AXIS ATTRIBUTES */
    xaxis display = (NOLABEL NOTICKS)
          valueattrs = (color = gray size = 9pt);
    yaxis display = NONE 
          offsetmin = 0
          max = &maxYValue;
    y2axis display = NONE 
          offsetmin = 0
          max = 900;
          
    /* 6. MODIFY THE LEGEND */
    keylegend 'Cary_Monthly_Total' 'Cary_Daily_Average' /
        noborder
        location = inside
        position = topleft
        across = 1
        valueattrs = (color = &textColor);
    label total_water_usage_pp_day_cary = 'Cary Daily Average Per Month' 
          total_water_usage_pp_month_cary = 'Cary Total Average Gallons Per Month';
run;
title;

/* Clear all */
ods graphics / reset;
title;
footnote;