# Week 13. Scale Construction

## As sociologists, we are often struggling with how to measure "concepts." 

#### For instance, how would you measure "success"? What are the components of your "success"?

(Your answer here)

## In your research, the process of operationalization should be valid and reliable. 

## 1. Validity - whether your measurement is a reasonable measure of the concept.

This is a theoretical issue. So, we have to review prior studies as many as possible. 

## 2. Reliability - whether indicators of the concept measure the same thing 

### * We often group similar questions on one issue because multiple-item scales are reliable. 

### * Chronbach's alpha is an estimate of internal consistency across mulitple items
#### It lies between 0 and 1, and higher numbers indicate higher correlation between the items (higher reliability).

### * If appropriate, we can construct a scale by combining mulitple variation into conceptual measures. Factor analysis ecnompasses the identification and testing of the "underlying construct."

# For today, using data from the GSS 2018, we will practice fundamentals of scale construction.

In [1]:
quietly {
  set more off
  cd "C:\Users\Hyunsu\Desktop\UCM\10. Spring 2020\(SOC 211) Graduate Statistics 2 (TA)\Week 13"
  use age sex race educ comfort makefrnd religcon religint using "GSS2018w13.dta", clear
  order age sex race educ comfort makefrnd religcon religint
}

### The items we see are four: 1) comfort, 2) makefrnd, 3) religcon, and 4) religint. Each of them asks reposndents' general attitude toward religion.

In [2]:
codebook comfort makefrnd religcon religint


--------------------------------------------------------------------------------
comfort     religion helps people to gain comfort in times of trouble and sorrow
--------------------------------------------------------------------------------

                  type:  numeric (byte)
                 label:  COMFORT

                 range:  [1,5]                        units:  1
         unique values:  5                        missing .:  0/2348
       unique mv codes:  3                       missing .*:  1186/2348

            tabulation:  Freq.   Numeric  Label
                           484         1  strongly agree
                           602         2  agree
                            55         3  neither agree nor disagree
                            17         4  disagree
                             4         5  strongly disagree
                             2        .d  DONT KNOW
                          1173        .i  IAP
                            11        .n  NA

#### We can see the meaning of each variable. Both religcon2 and religint2 indicate a negative attitude toward religion while comfort2 and makefrnd2 focus on the positive function of religion. 

### Before running factor analysis, make sure all items have same direction. For here, we will recode these items, so higher numbers represent more agreeable attitudes.

In [3]:
foreach x of var comfort makefrnd religcon religint {
    recode `x' (1=5) (2=4) (3=3) (4=2) (5=1), gen(`x'2)
    tab `x' `x'2
}


(1107 differences between comfort and comfort2)

religion helps people |
   to gain comfort in | RECODE of comfort (religion helps people to
 times of trouble and |    gain comfort in times of trouble and
               sorrow |         1          2          3          4 |     Total
----------------------+--------------------------------------------+----------
       strongly agree |         0          0          0          0 |       484 
                agree |         0          0          0        602 |       602 
neither agree nor dis |         0          0         55          0 |        55 
             disagree |         0         17          0          0 |        17 
    strongly disagree |         4          0          0          0 |         4 
----------------------+--------------------------------------------+----------
                Total |         4         17         55        602 |     1,162 


                      | RECODE of
                      |  comfort
        

### First, run correlation analysis to see relationships between variables

In [4]:
pwcorr comfort2-religint2, sig


             | comfort2 makefr~2 religc~2 religi~2
-------------+------------------------------------
    comfort2 |   1.0000 
             |
             |
   makefrnd2 |   0.4399   1.0000 
             |   0.0000
             |
   religcon2 |  -0.1005  -0.0263   1.0000 
             |   0.0007   0.3759
             |
   religint2 |  -0.0557  -0.0319   0.4860   1.0000 
             |   0.0589   0.2806   0.0000
             |


### It seems comfort2 and makefrnds have a strong correlation (r=.434, p < .001). Also, there is a strong linear relationship between religcon2 and religint2 (r=.486, p < .000).

### Likewise, in many cases, correlation analsis some clues for construcing scales.

## Let's run factor analysis

In [7]:
factor comfort2-religint2

(obs=1126)

Factor analysis/correlation                        Number of obs    =     1126
    Method: principal factors                      Retained factors =        2
    Rotation: (unrotated)                          Number of params =        6

    --------------------------------------------------------------------------
         Factor  |   Eigenvalue   Difference        Proportion   Cumulative
    -------------+------------------------------------------------------------
        Factor1  |      0.80196      0.23433            0.9157       0.9157
        Factor2  |      0.56763      0.80150            0.6481       1.5638
        Factor3  |     -0.23386      0.02605           -0.2670       1.2968
        Factor4  |     -0.25992            .           -0.2968       1.0000
    --------------------------------------------------------------------------
    LR test: independent vs. saturated:  chi2(6)  =  558.79 Prob>chi2 = 0.0000

Factor loadings (pattern matrix) and unique variances

### Factor loadings suggest that there are potentially two factors. 

### Which variables are included in which factors? 

(Your answer here)

## If we have more than two factors, to find the best fits of each factor within the multidimensional variable space, rotation is required. 

## There are two tyoes of rotation
### 1. Orthogocal rotation: the varimax criterion
 #### * After rotation, the resulting loadings on a factor should be either large or small relative to the original loadings.
 #### * Preferred
### 2. Oblique rotation: the promax criterion
 #### * An oblique solution allows for correlations among the actors. 

In [8]:
quietly factor comfort2-religint2
rotate, varimax




Factor analysis/correlation                        Number of obs    =     1126
    Method: principal factors                      Retained factors =        2
    Rotation: orthogonal varimax (Kaiser off)      Number of params =        6

    --------------------------------------------------------------------------
         Factor  |     Variance   Difference        Proportion   Cumulative
    -------------+------------------------------------------------------------
        Factor1  |      0.72569      0.08179            0.8286       0.8286
        Factor2  |      0.64390            .            0.7352       1.5638
    --------------------------------------------------------------------------
    LR test: independent vs. saturated:  chi2(6)  =  558.79 Prob>chi2 = 0.0000

Rotated factor loadings (pattern matrix) and unique variances

    -------------------------------------------------
        Variable |  Factor1   Factor2 |   Uniqueness 
    -------------+--------------------+----

### Rotated factor loadings also suggest that we have two factors. The first includes "religcon2" and "religint2." The second contains "comfort2" and "makefrnd2."

## Based on the results, let's make two scales - "positive" and "negative" by adding variables included in each factor. 
### Sometimes, thie process requires standardization, so each item takes its standard score. 

In [9]:
gen positive = comfort2 + makefrnd2
gen negative = religcon2 + religint2
sum positive negative, d


(1192 missing values generated)

(1212 missing values generated)


                          positive
-------------------------------------------------------------
      Percentiles      Smallest
 1%            4              2
 5%            6              3
10%            6              3       Obs                1156
25%            8              4       Sum of Wgt.        1156

50%            8                      Mean           8.153114
                        Largest       Std. Dev.      1.327661
75%            9             10
90%           10             10       Variance       1.762683
95%           10             10       Skewness      -.7129747
99%           10             10       Kurtosis       3.896702

                          negative
-------------------------------------------------------------
      Percentiles      Smallest
 1%            2              2
 5%            4              2
10%            4              2       Obs                1136
25%            6

### Now, a new scale "positive" indicates repondents' positive attitude toward religion, ranging from 2 to 10. "Negative" signifies repondents' negative attitude toward religion, ranging from 2 to 10. 

## To check, internal consistency of creating scles, we then see cronbach's alpha between items of each scale. 

In [10]:
alpha comfort2 makefrnd2
alpha religcon2 religint2



Test scale = mean(unstandardized items)

Average interitem covariance:     .2616756
Number of items in the scale:            2
Scale reliability coefficient:      0.5937


Test scale = mean(unstandardized items)

Average interitem covariance:     .5524183
Number of items in the scale:            2
Scale reliability coefficient:      0.6514


### Do you think each scale is reliable? Why or why not?

(Your answer here)

### Using constructed scales, you may want to see differences in attitudes toward religion 

In [21]:
quietly recode age (min/29=1 "Less than 30") (30/39=2 "30-39") (40/49=3 "40-49") (50/59=4 "50-59") ///
  (60/max=5 "60 or more"), gen(age2)

tabstat positive negative, statistics(mean sd) by(age2)  




Summary statistics: mean, sd
  by categories of: age2 (RECODE of age (age of respondent))

        age2 |  positive  negative
-------------+--------------------
Less than 30 |  7.928205   6.84456
             |  1.364111   1.73406
-------------+--------------------
       30-39 |  8.122727  6.916667
             |  1.458226  1.878272
-------------+--------------------
       40-49 |  8.153061   6.93299
             |  1.271801   1.88274
-------------+--------------------
       50-59 |  8.111675  6.882051
             |  1.236237  1.813874
-------------+--------------------
  60 or more |  8.322674  7.095808
             |  1.292706  1.867161
-------------+--------------------
       Total |  8.152778  6.954064
             |  1.329708   1.84012
----------------------------------
