Eric Hedberg

# Introduction

The purpose of the `tablefill` series of `Stata` `ado` files is to run survey statistics such as means, totals, and proportions and to populate simple cross tables with the results. The program is stored in the files in the `tablefill` directory. 

The key to making it work is that the `using` data must have variable and value labels which can be matched to the variable and value tables of the `using` table shell. The table shell can use returns and other special characters that are not going to be found in the variable and value labels, but the case insensitive characters must be unique across all labels in the variables used and match the table shell. 

# Syntax 

The syntax of the `tablefill` command is as follows

`tablefill using [`_excelfile_`] [if], sheet([`_sheetname_`]) statistics(`_statspec_`) domainvars(`_varlist_`) savefolder(`_path_`) titlecell(`_titlecell_`) title(`_titlestring_`) [raw]`

where

- _excelfile_ is the table shell
- `if` is to select which rows to use for analysis 
- _sheetname_ is the sheet in the table shell file to populate
- _varlist_ are all the variables associated with row or columns in the table shell
- _path_ is a path to store the estimation results
- _titlecell_ is the Excel cell to put the title, e.g. A1
- _titlestring_ is the Table title to put in _titlecell_
- use `raw` to avoid supression routines (good idea for descriptive stats

The _statspec_ is a series of stat commands seperated by a single `|`. A single stat command has the following syntax

[`total|mean|proportion] [`_var_]`, [row|col] point(`_pointcols_`) [se(`_secols_`)] [note(`_notecols_`)] factor(`_factorexpr_`) bformat(`_bfmt_`) seformat(`_sefmt_`)`

where you can have either totals or means of a single variable _var_ or a porportion, and the options detail which columns to put the point estimate, standard errors, and notes for unreliable estiamtes. The estimates and standard errors can be altered by a _factorexpr_ such as `*100` for changing proportions to percents or `*0.001` for changing raw counts to thousands. After this factor is applied, the results are formatted into strings using stata format expression for the point estimates (_bfmt_) and standard errors (_sefmt_).

# Example

Let's run totals and column percentages to populate this table saved in `tabn209_21_SASS_simple.xlsx`

![alt text](ex1.png "Example Excel File")


The data have been cleaned as are stored in `input_data.dta`

## Step 1 Load the Program

First, we need to load the tablefill program. Utill this is ready and sent to SSC, you need to point `Stata` to the right directory. Here, the program folder is saved in the working directory, so we use `adopath` to add the folder


In [1]:
adopath + "./tablefill"

  [1]  (BASE)      "/Applications/Stata/ado/base/"
  [2]  (SITE)      "/Applications/Stata/ado/site/"
  [3]              "."
  [4]  (PERSONAL)  "/Users/erichedberg/Documents/Stata/ado/personal/"
  [5]  (PLUS)      "/Users/erichedberg/Library/Application Support/Stata/ado/plu
> s/"
  [6]  (OLDPLACE)  "~/ado/"
  [7]              "/Users/erichedberg/anaconda3/lib/python3.11/site-packages/st
> ata_kernel/ado"
  [8]              "./tablefill"


## Step 2 Load the data

Here we load the data into memory and create a constant variable which is used for totals

In [2]:
use "input_data.dta", clear
gen cons = 1

Note this data also has an "all" variable which is 1, but is labeled for the "Total" row and column in the shell

In [3]:
codebook all


--------------------------------------------------------------------------------
all                                                                        Total
--------------------------------------------------------------------------------

                  Type: Numeric (double)
                 Label: tot

                 Range: [1,1]                         Units: 1
         Unique values: 1                         Missing .: 0/38,394

            Tabulation: Freq.   Numeric  Label
                       38,394         1  Total


Here are the other variables we will use

In [4]:
set more off
codebook T0356 RACETH_T AGE_T Highest_degree ///
        TOTEXPER_rc T0104_rc URBANIC TEALEV2 ///
        secondary elementary S0285_S0287 REGION S0256 




--------------------------------------------------------------------------------
T0356                                                                        Sex
--------------------------------------------------------------------------------

                  Type: Numeric (byte)
                 Label: sex

                 Range: [1,2]                         Units: 1
         Unique values: 2                         Missing .: 0/38,394

            Tabulation: Freq.   Numeric  Label
                       12,503         1  Male
                       25,891         2  Female

--------------------------------------------------------------------------------
RACETH_T                                                          Race/ethnicity
--------------------------------------------------------------------------------

                  Type: Numeric (byte)
                 Label: raceeth

                 Range: [1,5]                         Units: 1
         Unique values: 5     

## Step 3 Run the table

Here is the command which uses all the variables associated with the rows and columns. Note that the variable names can be the typical ugly variable names, all that is important is the labels. Be sure to have a "Results" folder ready, too. 

In [5]:

tablefill using "tabn209_21_SASS_simple.xlsx", /// the shell
    sheet("Digest 2000 Table 209.21") /// the sheet
    statistics( /// describe what statistics to estimate and columns
    total cons, point(B D F) ///
        factor(*0.001) bformat(%6.0fc)  ///
        | /// pipe for antoher statistic
    proportion, col p(H J L) ///
        factor(*100) bformat(%3.0f) ///
    ) ///
    domainvars( ///
        all ///
        T0356 RACETH_T AGE_T Highest_degree ///
        TOTEXPER_rc T0104_rc URBANIC TEALEV2 ///
        secondary elementary S0285_S0287 REGION S0256 ///
    ) ///
    savefolder("Results")   ///
    raw /// don't supress results based on cell counts or high SEs
    titlecell(A1) ///
    title("Table 209.21. Number and percentage distribution of teachers in traditional public elementary and secondary schools, by instructional level and selected teacher and school characteristics: School year 1999-2000")


Running total commands

Total estimation                        Number of obs = 38,394

--------------------------------------------------------------
             |      Total   Std. err.     [95% conf. interval]
-------------+------------------------------------------------
  c.cons@all |
      Total  |      38394          0             .           .
--------------------------------------------------------------
file Results/est_total_cons_by_all_tabn209_21_SASS_simplexlsx.ster saved

Total estimation                                 Number of obs = 13,143

-----------------------------------------------------------------------
                      |      Total   Std. err.     [95% conf. interval]
----------------------+------------------------------------------------
c.cons@all#elementary |
    Total#Elementary  |      13143          0             .           .
-----------------------------------------------------------------------
file Results/est_total_cons_by_all_elementary_tabn

In the Results folder is the populated Excel file

![alt text](ex2.png "Populated Excel File")