# DS103 Metrics and Data Processing : Lesson Five Companion Notebook

### Table of Contents <a class="anchor" id="DS103L5_toc"></a>

* [Table of Contents](#DS103L5_toc)
    * [Page 1 - Introduction](#DS103L5_page_1)
    * [Page 2 - Types of Testing](#DS103L5_page_2)
    * [Page 3 - Measurement Error](#DS103L5_page_3)
    * [Page 4 - Measurement Error in Manufacturing](#DS103L5_page_4)
    * [Page 5 - Reliability and Validity](#DS103L5_page_5)
    * [Page 6 - Factor Analysis](#DS103L5_page_6)
    * [Page 7 - Factor Analysis Setup in R](#DS103L5_page_7)
    * [Page 8 - Factor Analysis in R](#DS103L5_page_8)
    * [Page 9 - Calculating Reliability](#DS103L5_page_9)
    * [Page 10 - Calculating Reliability in R](#DS103L5_page_10)
    * [Page 11 - Key Terms](#DS103L5_page_11)
    

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 1 - Introduction<a class="anchor" id="DS103L5_page_1"></a>

[Back to Top](#DS103L5_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

In [1]:
from IPython.display import VimeoVideo
# Tutorial Video Name: Reliability and Validity
VimeoVideo('236613319', width=720, height=480)

# Introduction

There are a lot of factors that feed into the variation present when monitoring a metric. Some sources of variation can be controlled, some of them can be isolated, and some of them you just have to live with.

Some of the sources of variation are due to true variation in the product, but a common source of variation is found in the actual measurement device. In this lesson, you will learn about the measurement variation that results from the measurement system alone, and how to quantify it. You will also learn about how to validate surveys to ensure they are accurate measurement tools. 

<div class="panel panel-success">
    <div class="panel-heading">
        <h3 class="panel-title">Additional Info!</h3>
    </div>
    <div class="panel-body">
        <p>You may want to watch this <a href="https://vimeo.com/451698272"> recorded live workshop </a> that goes over the theory and interpretation of factor analysis. </p>
    </div>
</div>


In [2]:
from IPython.display import VimeoVideo
# Tutorial Video Name: Reliability and Validity
VimeoVideo('451698272', width=720, height=480)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 2 - Types of Testing<a class="anchor" id="DS103L5_page_2"></a>

[Back to Top](#DS103L5_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Types of Testing

---

## Destructive Tests

There was once a young man who was involved in the Boy Scouts. One week, he was preparing for an upcoming camp out. He needed to bring some matches, and he found an old box of stick matches while digging through the cabinets in the kitchen. He wasn't really sure if the matches still worked, and there was no point in bringing the matches if they weren't going to help him start his evening campfire.

![A lit match after being struck.](Media/L07-05.png)

The only thing he could think of to do to test his matches was to simply try and light one. He picked a match, and struck it. After a couple of strikes, it burst into flames.

He was satisfied that the match did in fact work, but then realized the test used rendered the match useless. The irony of the situation started to dawn on him - the only way to be absolutely sure that the match worked was to use it, after which time it wouldn't work anymore.

The story above is what you call a *destructive test*. There are lots of examples of destructive tests. They range from test firing rockets to doing quality assurance on food items. If you test fire a rocket to see if it fires properly with the correct thrust, balanced force, etc. you have consumed the rocket, and it is no good anymore.

If you work for a food manufacturer and are testing the nutritional value of a prepared food, you are probably involved in sampling the finished product line, and tearing open the package. Then the food goes into a blender, and several chemical composition tests are run on the "homogenized" food, looking for vitamin content, calorie content, fiber, and so forth. Clearly, the food cannot then be sold.

---

## Disruptive Tests

There are other means of measuring things that aren't destructive, but the method of completing the measurement actually alters that which is being measured. These are called *disruptive test*. The best example of this is a hand-held circuit tester.

Suppose you are testing the resistance in an electrical circuit. The only practical way to do this is to use an ohm meter with test leads to measure the voltage drop across a device. However, the act of hooking up the meter actually changes the voltage drop across the device, so the resistance you are measuring is inaccurate by nature.

Here is another example of a disruptive measurement. When you stick a thermometer into something to measure the temperature, you are actually changing the temperature. If you are cooking a turkey for Thanksgiving, the act of sticking a room temperature thermometer into a hot bird will actually decrease the temperature of the bird. The change will be very slight, but it changes nonetheless.

![A turkey with a meat thermometer in it being roasted in a pan.](Media/L07-06.png)

---

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 3 - Measurement Error<a class="anchor" id="DS103L5_page_3"></a>

[Back to Top](#DS103L5_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Measurement Error

There are also means of measuring things that are neither destructive nor disruptive, though they are typically not as accurate. Have you ever gone to the doctor and had to weigh yourself for your checkup, only to find that the scale at the doctor's office is about 4 lbs. heavier than your scale at home? Which scale is correct? Is it possible that neither is correct?

When things are measured with non-destructive and non-disruptive tests, there is still **always** some error in the measurement. You can also use the words *variance* or *uncertainty* to describe the error. Among everything that can add to the variance in the measurement, the actual measurement device can also add to the variation.

The accuracy of the measuring device might be off, and the ability for the device to give the same results for the same measurement might be off, too.

**Bias** occurs when you have built-in error in measurement. There will always be built-in error measurement; your goal is to reduce it as much as possible. Suppose you were to take multiple measures of an object. For instance, suppose you have an experiment where you are adding two chemicals together, and waiting for the reaction to go to completion. You know the reaction is complete, because the solution changes color.

You want to run the reaction several times, and you are using a hand-held timer to measure the time. You run the reaction 8 times, and each measurement is between 48.2 seconds and 49.4 seconds. You are pleased with the precision of the measurements, and proceed to write up your results.

What you didn't know is that the timer is broken. Because of an electronic glitch, the timer runs about five percent fast. So for a time duration of 40 seconds, the timer measures about 42 seconds. In other words, the reaction that was being measured actually went to completion in about 46.5 seconds or so.

![A hand holding a stopwatch.](Media/L07-07.png)

This is an example of measurement bias. Bias is a systematic error that makes all measurements wrong by a certain amount. The most dangerous part about bias is that it is very difficult to detect. If the accuracy of your measurement isn't that critical, bias is no big deal. Everyone has used a bathroom scale that was a couple of pounds heavy - you just live with it.

But if you are in an environment that requires a very high degree of accuracy, then lots of effort is taken to check and recheck calibration frequently. You can't think about it too much, because it can make your head hurt. The key to calibration is that eventually you have to have some sort of golden standard that you trust to be exactly accurate, and exactly precise.

---

## Accuracy versus Precision

Imagine that you're going to do some target practice.  *Accuracy* is the ability to hit what you are aiming for. *Precision* is being able to hit the same place over and over again. However, these two measures are independent of each other. As you can see in the diagram, a group of shots can be:

* **Precise, but not accurate:** You hit in the same place every time, but it's not centered on the bulls eye - you're not hitting what you want to.  This may be an indication of not sighting in your target correctly.

* **Accurate, but not precise:** You are grouping around the bulls eye, but you are not hitting the same place every time. So you're sighting in ok, but something in the way you shoot (using both eyes, not exhaling as you make the shot, flinching) or in external factors (the wind, rest slippage) may be throwing your aim off. 

* **Neither accurate nor precise:** Your buddies better watch out, because you are firing wild! You're not hitting anywhere near the bulls eye and you're never hitting the same place twice!

* **Both accurate and precise:** Expert marksmanship! You are hitting that bulls eye every single time. Way to go!

![Precision versus accuracy. Four targets. On the first target, shots are clustered together but far from the bulls eye. Accuracy yes, precision no. On the second target, shots surround the bulls eye but are far from one another. Precision no, accuracy yes. On the third target, shots are neither close together nor close to the bulls eye. Precision no, accuracy no. On the fourth target, shots are clustered together directly on the bulls eye. Precision yes, accuracy yes.](Media/L07-02.png)

---

## Review

Below is a quiz to review the recently covered material. Quizzes are _not_ graded.

```c-lms
start-activity: Measurement Error Quiz
```


<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 4 - Measurement Error in Manufacturing<a class="anchor" id="DS103L5_page_4"></a>

[Back to Top](#DS103L5_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">




# Measurement Error in Manufacturing

Data varies randomly.  Period.  But some of that variation is due to individual differences, and some is due to measurement error. In manufacturing, a *gage repeatability and reproducibility (R&R)* study looks at this variability. These studies can help operators from making costly measurement errors. 

Gage R&R will help you determine how large your variation is and the source of that variation. There are three main variation sources: 

* **Part-to-part variation:** The normal range over which measurements are made - the part of your data you actually want to measure, that is not a mistake of any kind.
* **Repeatability:** The variation because of the *gage*, or measurement tool, itself.
* **Reproducibility:** The variation from different people using the gage.

Together, repeatability and reproducibility make up *measurement error* or *noise*, and are together *gage R&R*. This noise is a nuisance that adds uncertainty to your data. 

---

## Total Variation

Gage R&R determines the ratio of noise to total variation, which is called the *percentage of total variation* or *%TV*.  It also measures the size of the noise relative to the specification range, which is called the *percentage of tolerance*. Gage R&R also separates the variability into its sources: part-to-part variation, repeatability and reproducibility.

Here are some benchmarks for the accuracy of a measuring system:

* **Good:** Very low noise, preferably less than 1% of the total variability in your data, indicated as a gauge R&R of less than 10%.
* **Questionable:** Noise between 1% and 9% of the total variability, or a gage R&R between 10% and 30%.
* **Poor:** Noise greater than 9% of the total variation, or a gage R&R greater than 30%.

The data from a gage R&R study can tell you where the system is broken and how to fix it.  If repeatability is high, but reproducibility is low, this means that the person doing the measuring is doing it the same way every time, but that the number isn't accurate.  If the reproducibility is high, but the repeatability is low, this means you need better operator training in the use of the gage, since the measurement tool is accurate, but the person doing the measuring is not.

---

## Gage Performance Curves

A gage performance curve shows the probability of accepting a part compared to its standards. In the image below, the red lines indicate the probability of measuring a part within standards, while the blue lines are the specification limits. 

The graph on the left shows a good system, with a gage R&R value of 7%.  You can see that there is little chance of rejecting a good part or accepting a bad one except at the very edges of the specification limits. The graph in the middle shows a questionable system, with a gage R&R value of 14%.  Error is now much closer to the specification limits. Lastly, the graph on the right shows a poor system, with a gage of 32%. You'll note that errors are more common. 

![Three gage performance curves. For each, the x axis is labeled reference value for measured parts. The y axis is labeled probability of acceptance. The curve on the left has a gage R and R value of seven percent. The curve in the middle has a gage R and R value of fourteen percent. The curve on the right has a gage R and R value of thirty two percent.](Media/L07-08.png)

Gauge R&R determines whether a measurement system is working, and can tell you what's working if it's not.

---

## Setting up a Gage R&R Study

All manufacturers are required to do gage repeatability and reproducibility studies for auditing purposes. Normal sample sizes are 10 parts, three operators, and three trials for a total of 90 measurements, though if you need to use smaller numbers you can within reason. Typically, however, you may collect data something like this: 

![A collection of data for a gage R and R study showing ten parts, three operators, and three trials.](Media/L07-11.png)

The three operators may be part of the production line, quality inspectors, lab technicians, or many others. It doesn’t matter who collects the data; however, it does matter that the operators never know which part they are measuring. This is called a *blinded study*. Giving parts numbers or letters to keep track of them, while keeping their true identity a secret to the operators is important, because operators can influence measurement with their prior knowledge and expectations. Even if an operator is consciously trying not to bias things, it can still happen. 

---

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 5 - Reliability and Validity<a class="anchor" id="DS103L5_page_5"></a>

[Back to Top](#DS103L5_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">





The concepts of accuracy and precision carry on to surveys as well, and surveys are subject to just as much measurement bias as any other measuring tool.  While they are certainly not destructive, at times they can be disruptive. Asking people to think about their thinking can cause them to think differently! So meta. 

---

## Reliability

When referring to a survey, precision is known as *reliability*.  You are measuring the same concept over and over again.  There are several types of survey reliability.  *Inter-rater reliability* is the idea that the same person taking the survey again should score similarly. *Inter-item reliability* is the idea that all items on your survey scale are measuring the same thing.  You will soon learn how to calculate both in R, and examine closely a metric called *Chronbach's alpha*. 

---

## Validity

Similarly, accuracy gets a new name as well when dealing with surveys: *validity*.  Validity is the idea that you are actually accurately measuring what you are supposed to be measuring.  Measuring the validity of a survey is a little trickier than measuring the reliability, as conceptually, you are trying to figure out whether you are measuring the thing you think you are! There are a few ways to go about it.  First, you can utilize comparisons to validate. For instance, if you have a survey on chronic pain, and you give it to those with chronic pain and pain-free controls, you would expect that those with chronic pain would score very highly on your survey, and those without any pain at all should score very, very low.  At gut-instinct level, if this is not the case, you know you have made a grave error in your survey design.  

Second, you can compare your survey results to other surveys measuring similar things.  In the chronic pain survey example, you may want to compare your newly completed survey with others that have been out there for a long time, to see if the survey responses correlate well.  If they do, chances are you've got yourself a validated survey.  If they don't, it's time to go back to the drawing board, buddy!

The third way to test for validity is through factor analysis. Factor analysis will determine how well your items hang together and whether there are any odd items out.  It can also help you determine if your items fall into subgroups.

Over the next few pages, you will learn how to conduct both reliability and validity tests for surveys.

---


<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 6 - Factor Analysis<a class="anchor" id="DS103L5_page_6"></a>

[Back to Top](#DS103L5_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">



# Factor Analysis

Another important thing to think about when analyzing surveys is how well the items "hang together" and whether you are measuring more than one concept in your survey.  You may ask several questions about a similar broad topic, but is that all one topic, or does it really have some subtopics in it? *Factor analysis* has the answer to these questions and more! The basic goal of factor analysis is to see how items fall together and to see if they group in any particular patterns that make sense logically.

## Types of Factor Analysis

There are two broad types of factor analysis: *exploratory factor analysis* and *confirmatory factor analysis*. Exploratory factor analysis, abbreviated *EFA*, is used when you don't really have an inkling of what your data will yield.  You are intrepid explorers, traversing unknown survey data worlds! Confirmatory factor analysis, abbreviated *CFA* (so original!), is either for after you have completed EFA or when you are so confident about what your data holds you feel you can skip the EFA and just want a validation check. You are confirming your thoughts about the data with CFA. An example of when you might proceed straight to CFA is when you have already used a validated, previously studied set of survey items, and just want to make sure that your data is behaving the same way as it did for others.  

The most common type of factor analysis is definitely EFA, and it's a good thing, because it's easier, too! Conducting a CFA is actually a form of structural equation modeling (SEM), and you won't get into that here. However, you will learn how to rock the heck out of an EFA, and that knowledge will take you a long way!

---

## Assumptions of EFA

There are only three assumptions for EFA - yes you heard that right - three! Let the party commence! 

---

### Sample Size 

Although there are many different opinions about sample size for EFA, the safest rule you can follow is to have at least 300 data points. However, you may be able to get away with as few as 150 data points if you have a small number of survey questions you're examining and those survey questions are moderately correlated with each other. 

---

### Absence of Multicollinearity

*Multicollinearity*, or having a lot of overlap between variables, is a problem, because it will make sorting your survey items into distinct groups quite difficult.  Chances are that if your survey items all have really high multicollinearity, then you should have asked fewer survey items, because they are all getting at the same concept! You can test for multicollinearity by running a correlation matrix on all your survey items. If anything correlates with anything else at .9 or higher, than it's got to go, and you'll want to eliminate it from your analysis.  Though that's a good guideline, you may run into situations where lower correlations also cause problems.  You'll be able to catch this by looking at the *determinants*. You can think of determinants as another measure of how well survey items are correlated. When you run a determinant test, you are looking for a value of greater than .00001.    

---

### Some Relationship between Survey Items

Although multicollinearity is to be avoided, it's important that there is some relationship between your survey items. Otherwise, they probably shouldn't be grouped together at all! So you'll also want to scan your correlation matrix for any variable that has multiple correlations with other items of .3 or lower, which is a good indication it's not going to play nicely with the others and should be removed. You can also run a catch-all test to make sure that there is some relation between all the variables - this is *Bartlett's test*, which you will want to be significant, since it tests against an *identity matrix*, or a matrix that assumes no relationship between all variables (correlations of 0 for everything).

---

## Factor Rotation

The other big thing you need to know about EFA before diving in is *factor rotation*.  In order to better see the relationships between your different survey items, you will want to rotate the data.  You can rotate it 90 degrees, which is called *orthogonal rotation* and is really meant for when you theoretically don't think your survey items are related, or you can rotate it with *oblique rotation*, which does not maintain right angles at 90 degrees.  Oblique rotation is when you theoretically believe your survey items should be related. The most common types of orthogonal rotation are *varimax* and *quartimax*.  The most common types of oblique rotation are *oblimin* and *promax*. You don't need to know the mathematical differences between then, and chances are, you will use a process of trial and error in which you'll try at least two different rotation types for each data set.

In the image below, you'll see that Figure A shows off the raw data, which is scattered all over the place.  Figure B, in the middle, shows a type of orthogonal rotation, in which the axes are now turned 90 degrees from where they once were.  And Figure C shows a type of oblique rotation, which also rotates the data, just not at 90 degrees.  In this example, the data remain spread apart (probably because there are only three data points), but in most cases, as you rotate, the data will start to clump together, forming factors.

![Three figures. Figure A, factor rotation, shows raw data, which is scattered all over the place. Figure B, varimax orthogonal rotation, the axes are now turned 90 degrees from where they once were. Figure C, oblimin oblique rotation, also rotates the data but not at 90 degrees](Media/factor1.png)

---

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 7 - Factor Analysis Setup in R<a class="anchor" id="DS103L5_page_7"></a>

[Back to Top](#DS103L5_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">



# Factor Analysis Setup

Now that you understand the basics of factor analysis, you will run one of your own in R!

---

## Load Libraries

You will need to install and load several libraries in order to complete factor analysis in R. You will use ```corpcor``` for correlations, and ```GPArotation``` for the factor analysis proper.  ```psych``` will help you with interpreting the factor loadings, and ```IDPmisc``` can be used to remove missing data.

```{r}
library("corpcor")
library("GPArotation")
library("psych")
library("IDPmisc")
```

---

## Load in Data

For this walkthrough, you will  be using **[data from a survey on financial wellbeing](https://repo.exeterlms.com/documents/V2/DataScience/Metrics-Data-Processing/financialWB.zip)**  The codebook is located **[here](https://s3.amazonaws.com/files.consumerfinance.gov/f/documents/cfpb_nfwbs-puf-codebook.pdf)**. Check out the variable list starting on page 5 if you'd like to know what all the survey items are (or at least the ones you'll be working with).

---

## Question Setup

With the data above, you will be determining how a set of questions from the financial wellbeing survey hang together and whether there are any subscales. To do this, you will perform factor analysis.  In factor analysis, there is no x or y variables - you are simply seeing how variables fit together.

---

## Data Wrangling

Before you begin, there is one data wrangling item that needs to take place - you will subset your data.  The function you'll use in R for factor analysis does not allow you to specify variables, so you'll need to trim your data to only the variables you are interested in looking at to begin with. In order to subset, take a look at the data and identify the columns you want to keep. In this case, you want the items that start with ```FWB```. They are contained in columns numbered 8-17.  With the below code, you will only have those columns in your new dataset to use:

```{r}
financialWB1 <- financialWB[, 8:17]
```

---

## Test Assumptions

Now that you have the columns you'll be examining in the factor analysis, you'll need to test the assumptions for them! You will be looking at sample size and how well the variables relate to each other.

---

### Sample Size

Sample size should ideally be 300 or more. Luckily, there are 6,394 rows here, so you have met this assumption!

---

### Absence of Multicollinearity

Next, you will test for the absence of multicollinearity. The first way to do this is with a correlation matrix.  You can use the function ```cor()``` to do that: 

```{r}
financialWBmatrix <- cor(financialWB1)
```

And then to view it, you can easily use the ```View()``` function to read it easier in R (as opposed to printing it), and you can make use of hte ```round()``` function so that you are only seeing two decimal places, which makes things easier to sort through.  The ```2``` in the code below indicates the number of decimal places you would like to see. 

```{r}
View(round(financialWBmatrix, 2))
```

Here is the output: 

![A ten column, ten row correlation matrix](Media/factor2.png)

In it, you want to look at only half the matrix (remember that the top and bottom halves along the diagonal are mirror image of each other). As you go down the columns, starting to look only after the 1.0 on the diagonal, look for any correlations that are higher than .9. This would indicate really high multicollinearity, and if there's an item that has a correlation of .9, you will most likely want to remove that item. A quick scan indicates that there is nothing above .9 here and you are good to go. 

---

### Some Relationship between Survey Items

You will also want to look at the correlation matrix to ensure that correlations aren't too low, since factor analysis requires some relationship between the variables.  Look for any variable that correlates with more than one variable lower than .3. Again, this doesn't seem to be a problem in the matrix above, so you are good to go!

---

#### Bartlett's Test

To double check your findings from the correlation matrix, you can also run Bartlett's test with this simple line: 

```{r}
cortest.bartlett(financialWB1)
```

Here is the output:

```text
R was not square, finding R from data
$chisq
[1] 35500.63

$p.value
[1] 0

$df
[1] 45
```

First, you will get a warning in red that ```R was not square, finding R from data```.  That can be ignored; it is just acknowledging that you fed in raw data instead of a matrix, which is perfectly fine.

Next, you will see a Chi-Square value (```chisq```) and a *p* value. You want this test to be significant, and if it is, this means that you have suitable correlations (not too high, not too low) to proceed with a factor analysis.

----

#### Check your Determinants

If you want further proof that you are good to proceed forward, you can also check the determinants, which is basically another measure of how variables relate to each other. You'll do this by using the function ```det()```: 

```{r}
det(financialWBmatrix)
```

It takes the argument of the correlation matrix you had created a few steps earlier, and produces this output:

```text
[1] 0.003861618
```

If this value is greater than .00001 (yes, that's 4 zeros), then again, you have a sufficient relation between your variables to proceed with a factor analysis.  With all ways to test - correlation matrix, Bartlett's test, and determinants - you are good to go! You'll do the actual factor analysis work on the next page.

---

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 8 - Factor Analysis in R<a class="anchor" id="DS103L5_page_8"></a>

[Back to Top](#DS103L5_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Factor Analysis in R

Now that you know your data has met the assumptions for factor analysis, you can hop right into the good stuff! 

---

## Initial Pass to Determine Approximate Number of Factors


The first thing you will do is to run a basic principal components analysis (fancy term for factor analysis) with as many factors as you have survey items in your factor analysis, and without any rotation. You'll use the function ```principal()```, with the arguments of the trimmed dataset, the argument ```nfactors=``` for the number of factors you want to use, and the argument ```rotate="none"``` to indicate that you are not rotating your factors yet.  

```{r}
pcModel1 <- principal(financialWB1, nfactors = 10, rotate = "none")
pcModel1
```

Calling the name of the model, ```pcModel1```, will let you see the results:

```text
Principal Components Analysis
Call: principal(r = financialWB1, nfactors = 10, rotate = "none")
Standardized loadings (pattern matrix) based upon correlation matrix
         PC1  PC2   PC3   PC4   PC5   PC6   PC7   PC8   PC9  PC10 h2       u2 com
FWB1_1 -0.79 0.38  0.02  0.10  0.09  0.06 -0.05 -0.21  0.37  0.17  1  2.2e-16 2.3
FWB1_2 -0.76 0.44  0.14 -0.06  0.07 -0.17 -0.03 -0.12 -0.35  0.19  1 -2.2e-15 2.6
FWB1_3  0.77 0.29 -0.22  0.15  0.15  0.15 -0.44 -0.07 -0.09 -0.11  1  3.3e-16 2.5
FWB1_4 -0.76 0.44  0.08 -0.06  0.06 -0.19  0.00  0.19  0.08 -0.36  1 -2.0e-15 2.5
FWB1_5  0.64 0.37 -0.29 -0.58 -0.13  0.05  0.07 -0.04  0.05  0.03  1  4.4e-16 3.2
FWB1_6  0.71 0.29 -0.39  0.34  0.17 -0.13  0.32  0.04 -0.02  0.04  1  2.2e-16 3.3
FWB2_1  0.81 0.17  0.26 -0.03  0.06 -0.14 -0.14  0.36  0.11  0.24  1 -6.7e-16 2.2
FWB2_2 -0.79 0.25 -0.12  0.11 -0.16  0.42  0.06  0.27 -0.09  0.07  1 -2.0e-15 2.4
FWB2_3  0.73 0.18  0.46 -0.06  0.28  0.28  0.20 -0.08 -0.03 -0.09  1 -2.2e-16 2.9
FWB2_4  0.72 0.32  0.24  0.24 -0.49 -0.05  0.02 -0.12  0.01 -0.06  1 -2.2e-16 2.9

                       PC1  PC2  PC3  PC4  PC5  PC6  PC7  PC8  PC9 PC10
SS loadings           5.61 1.06 0.66 0.57 0.43 0.39 0.37 0.33 0.30 0.28
Proportion Var        0.56 0.11 0.07 0.06 0.04 0.04 0.04 0.03 0.03 0.03
Cumulative Var        0.56 0.67 0.73 0.79 0.83 0.87 0.91 0.94 0.97 1.00
Proportion Explained  0.56 0.11 0.07 0.06 0.04 0.04 0.04 0.03 0.03 0.03
Cumulative Proportion 0.56 0.67 0.73 0.79 0.83 0.87 0.91 0.94 0.97 1.00

Mean item complexity =  2.7
Test of the hypothesis that 10 components are sufficient.

The root mean square of the residuals (RMSR) is  0 
 with the empirical chi square  0  with prob <  NA 

Fit based upon off diagonal values = 1
```

Because you asked for ten factors, R has tried to break down your data into ten factors.  You see then along the top matrix labeled as ```PC1```, ```PC2```, etc.  and you can also see them in the bottom matrix.  What you are really looking at here on the first pass is the ```SS loadings``` column on the bottom, which contains something called *eigenvalues*. The larger the eigenvalue, the more likely that the factor is important. Typically there is a cutoff of 1, so if you see a factor with an eigenvalue of > 1, chances are it's something real to examine.

---

### Examine the Scree Plot

For those of you who are visual, you can also check out the scree plot, accessibly by the command below.  You are plotting the eigenvalues generated by your model, and the argument ```type="b"``` shows both a line and points on the same graph.

```{r}
plot(pcModel1$values, type="b")
```

Here is the graph that generates:

![A graph generated by a scree plot. The x axis is labeled index and lists values from two to ten. The y axis is labeled P C model one dollar sign values and lists values from one to five . From left to right, the data plotted starts high on the y axis, at above five, sharply drops to one, and then continues to slowly drop as it moves rightward to ten.](Media/factor3.png)

What you are looking for on this graph is when the plot seems to break or shear off.  The name "scree plot" comes from the geology world, where scree means falling rocks off a cliff. In this case, you can see that there is quite a jump down between 1 and 2, and another not-quite-so-large jump down between 2 and 3. The rest really seem to trail off after that. This info, along with examining the eigenvalues, tells us that probably there are two factors.  So you can now test that assumption, to see if the model fit improves with two factors.

---

## Second Pass to Test the Suspected Number of Factors

Now, you'll run similar code again, but this time, you will change ```nfactors=``` to 2 instead of 10: 

```{r}
pcModel2 <- principal(financialWB1, nfactors = 2, rotate = "none")
```

---

### Examining Residuals to Determine Model Fit

You don't really need to even examine the output right now, but you will use the new model generated, ```pcModel2```, to examine model fit through the residuals. The basic idea behind this test is that the model fits your data very well if there is very little difference between the correlation matrix and the loadings generated through your model.  The difference between them is known as the *residual*. A general rule of thumb is that you have good model fit if the percentage of large residuals (over .05) is less than 50%.  In order to make all this easier, you will go through a series of steps.  The first line creates your residuals, using the ```factor.residuals()``` function.  The argument it takes are your correlation matrix and the loadings from your most recent factor analysis model.

```{r}
residuals <- factor.residuals(financialWBmatrix, pcModel2$loadings)
```

The second line formats the residuals as a matrix using the function ```as.matrix()```, and keeps only the top half (remember, the top triangle and bottom triangle mirror each other), using the function ```upper.tri()```. 

```{r}
residuals <- as.matrix(residuals[upper.tri(residuals)])
```

The next line will find only the large residuals values and put them in a new variable named ```largeResid```.  This uses the ```abs()``` function to take the absolute value, and qualifies a large residual as > .05:

```{r}
largeResid <- abs(residuals) > .05
```

Then you can find the number of residuals that are large by using the ```sum()``` function:

```{r}
sum(largeResid)
```

And lastly, you can get the percentage of residuals that are large as compared to the total number of rows, by using the ```sum()``` function on ```largeResid``` divided by the number of rows in the residuals you generated, using the ```nrow()``` function.

```{r}
sum(largeResid/nrow(residuals))
```

When you go through this, you find that the final output is: 

```text
[1] 0.3777778
```

Meaning that 37% of residuals are large.  This is under 50%, so having only two factors is a pretty good model fit for the data.

---

## Rotate the Factors to Determine Where Each Survey Item Fits

You will now play around with rotating the factors, to see where each survey item fits.  You will mostly likely try multiple rotations, and you may even try them with different numbers of factors as well if you think the results above may point to a few different numbers.

---

### Oblique Rotation

You will now run your model a third time, but using the argument ```rotate="oblimin"```.  Oblimin is the most commonly used type of oblique rotation, and you'll start with it because it's assumed that these survey items are conceptually related to each other.  They are, after all, all about financial well-being. Keep the number of factors as two (```nfactors=2```), because an examination of residuals showed that it was a good fit for the data.

```{r}
pcModel3 <- principal(financialWB1, nfactors = 2, rotate = "oblimin")
pcModel3
```

Here are the results you receive: 

```text
Principal Components Analysis
Call: principal(r = financialWB1, nfactors = 2, rotate = "oblimin")
Standardized loadings (pattern matrix) based upon correlation matrix
         TC1   TC2   h2   u2 com
FWB1_1 -0.05  0.84 0.76 0.24 1.0
FWB1_2  0.04  0.90 0.76 0.24 1.0
FWB1_3  0.80 -0.03 0.67 0.33 1.0
FWB1_4  0.04  0.90 0.78 0.22 1.0
FWB1_5  0.81  0.13 0.55 0.45 1.0
FWB1_6  0.77  0.00 0.58 0.42 1.0
FWB2_1  0.69 -0.20 0.69 0.31 1.2
FWB2_2 -0.19  0.69 0.68 0.32 1.2
FWB2_3  0.65 -0.14 0.57 0.43 1.1
FWB2_4  0.81  0.03 0.62 0.38 1.0

                       TC1  TC2
SS loadings           3.64 3.03
Proportion Var        0.36 0.30
Cumulative Var        0.36 0.67
Proportion Explained  0.55 0.45
Cumulative Proportion 0.55 1.00

 With component correlations of 
      TC1   TC2
TC1  1.00 -0.64
TC2 -0.64  1.00

Mean item complexity =  1
Test of the hypothesis that 2 components are sufficient.

The root mean square of the residuals (RMSR) is  0.06 
 with the empirical chi square  2155.11  with prob <  0 

Fit based upon off diagonal values = 0.99
```

You now have access to what is called a *pattern matrix*, and the print outs in the columns ```TC1``` and ```TC2``` tell you how well each item fits into that factor.  The values can range from 0 to 1, and the larger the value, the better the fit.  If you want to examine the output all at once, it can be handy to put it into a spreadsheet or print it out and highlight the values that load highly.  What is meant by loading highly? Your exact cutoff will determine how your data looks, but generally anything above .3 - .4 loads on that factor ok.  Take a look at the image below: 

![A pattern matrix listing ten items and how well each item fits into one of two factors. Each items fit for each factor is listed in column T C one and T C two.](Media/factor4.png)

The first factor has high loadings for items ```FWB1_3```, ```FWB1_5```, ```FWB1_6```, ```FWB2_1```, ```FWB2_3```, and ```FWB2_4```.
The second factor has high loadings for items ```FWB1_1```, ```FWB1_2```, ```FWB1_4```, and ```FWB2_2```. 

Luckily this is pretty cut and dry in ths example, but sometimes you will find that things load relatively highly on more than one factor and you have to make the best decision you can. There is a lot of subjective judgement in factor analysis. 

When you examine the actual items and what the questions are asking you find that factor 1, ```TC1```, is all about the negative side of financial well being - not having enough money or not being in control of your finances.  The second factor, ```TC2```, looks like it is the opposite - about being good with money management. 

Now that process wasn't too bad with a small number of factors and a small number of survey items, but what if you had more? Then it would become a much larger task and a bit of a pain.  Happily, the ```psych``` package swoops in to save the day and make your life easier! The line of code below will print out only the loadings that are higher than .3, and sorts them from largest to smallest.

```{r}
print.psych(pcModel3, cut = .3, sort=TRUE)
```

Here is the output:

```text
Principal Components Analysis
Call: principal(r = financialWB1, nfactors = 2, rotate = "oblimin")
Standardized loadings (pattern matrix) based upon correlation matrix
       item   TC1   TC2   h2   u2 com
FWB1_5    5  0.81       0.55 0.45 1.0
FWB2_4   10  0.81       0.62 0.38 1.0
FWB1_3    3  0.80       0.67 0.33 1.0
FWB1_6    6  0.77       0.58 0.42 1.0
FWB2_1    7  0.69       0.69 0.31 1.2
FWB2_3    9  0.65       0.57 0.43 1.1
FWB1_4    4        0.90 0.78 0.22 1.0
FWB1_2    2        0.90 0.76 0.24 1.0
FWB1_1    1        0.84 0.76 0.24 1.0
FWB2_2    8        0.69 0.68 0.32 1.2

                       TC1  TC2
SS loadings           3.64 3.03
Proportion Var        0.36 0.30
Cumulative Var        0.36 0.67
Proportion Explained  0.55 0.45
Cumulative Proportion 0.55 1.00

 With component correlations of 
      TC1   TC2
TC1  1.00 -0.64
TC2 -0.64  1.00

Mean item complexity =  1
Test of the hypothesis that 2 components are sufficient.

The root mean square of the residuals (RMSR) is  0.06 
 with the empirical chi square  2155.11  with prob <  0 

Fit based upon off diagonal values = 0.99
```

From this, easily you can see that your suspicions were right, and you basically get the exact same results.

---

### Orthogonal Rotation

You can also try this out with a type of orthogonal rotation.  Varimax roation is the most common orthogonal rotation, so give it a shake. Although it is theoretically less likely that orthogonal rotation will provide the best fit, because it assumes your survey items are not related to each other, it's still not a bad thing to try out:

```{r}
pcModel4 <- principal(financialWB1, nfactors = 2, rotate = "varimax")
print.psych(pcModel4, cut=.3, sort=TRUE)
```

When you run the ```psych``` function that makes things easier to view, you can see that results are very similar in terms of where items fit on the factors, but that some items now load on both factors, though they still have one dominant factor. For instance, take a look at ```FWB1_3``` below: it loads better on ```RC1```, with a value of .76, but also loads onto ```RC2``` with a value of -.31.   Obviously, you'd want to keep it on ```RC2``` because it loads better, but there is the possibility of other options. The same thing has happened with many of the other items as well. 

```text
Principal Components Analysis
Call: principal(r = financialWB1, nfactors = 2, rotate = "varimax")
Standardized loadings (pattern matrix) based upon correlation matrix
       item   RC1   RC2   h2   u2 com
FWB1_3    3  0.76 -0.31 0.67 0.33 1.3
FWB2_4   10  0.75       0.62 0.38 1.2
FWB1_5    5  0.72       0.55 0.45 1.1
FWB1_6    6  0.72       0.58 0.42 1.3
FWB2_1    7  0.71 -0.42 0.69 0.31 1.6
FWB2_3    9  0.66 -0.36 0.57 0.43 1.6
FWB1_4    4        0.84 0.78 0.22 1.2
FWB1_2    2        0.84 0.76 0.24 1.2
FWB1_1    1 -0.33  0.81 0.76 0.24 1.3
FWB2_2    8 -0.41  0.71 0.68 0.32 1.6

                       RC1  RC2
SS loadings           3.54 3.13
Proportion Var        0.35 0.31
Cumulative Var        0.35 0.67
Proportion Explained  0.53 0.47
Cumulative Proportion 0.53 1.00

Mean item complexity =  1.3
Test of the hypothesis that 2 components are sufficient.

The root mean square of the residuals (RMSR) is  0.06 
 with the empirical chi square  2155.11  with prob <  0 

Fit based upon off diagonal values = 0.99
```

If examining the factor loadings doesn't give you information about the model fit for this second rotation, you can go through process of checking the residuals again. In this case, the fact that items load on more than one factor is proof enough that the oblique rotation of oblimin worked much better than the orthogonal rotation of varimax.

---

## What's This All Mean?

Your take-away here is that the financial well-being survey items have two subscales - you have one about high levels of financial well-being, and one about low levels of financial well-being. That can be valuable to know, because if you are giving out a survey in the future and can only have a very small number of items, you can either choose with subscale you're more interested in (high or low levels of financial well being) or you can pick an item or two off each scale and leave it at that.

---


<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 9 - Calculating Reliability<a class="anchor" id="DS103L5_page_9"></a>

[Back to Top](#DS103L5_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Calculating Reliability

Now that you have a basic understanding of the concept of reliability, you'll learn how to calculate both types in R! 

---

## Load Libraries

The only library you will need to calculate reliability in R is ```psych```. This contains the function ```alpha()``` which is used for all the things! 

```{r}
library("psych")
```

<div class="panel panel-info">
    <div class="panel-heading">
        <h3 class="panel-title">Tip!</h3>
    </div>
    <div class="panel-body">
        <p>There is also a function called "alpha" in the ggplot2 package! To avoid having R get confused, it's advised that you NOT have ggplot2 loaded at the same time.  You can uncheck the box in the packages tab to remove ggplot2 if you have already loaded it.</p>
    </div>
</div>

---

## Load in Data

You will use the same data as you did for factor analysis - **[the survey on financial wellbeing](https://repo.exeterlms.com/documents/V2/DataScience/Metrics-Data-Processing/financialWB.zip)**  The codebook is located **[here](https://s3.amazonaws.com/files.consumerfinance.gov/f/documents/cfpb_nfwbs-puf-codebook.pdf)**. Check out the variable list starting on page 5 if you'd like to know what all the survey items are (or at least the ones you'll be working with).

---

## Question Setup

Really, the main question you are answering here is: 

> Is my survey reliable? Does it measure the same thing every time?

You are hoping that the answer will be "yes" to avoid many tears.

---

## Data Wrangling

There are two data wrangling tasks you will need to perform.   First, you'll need to subset your data.  You will want one dataframe for each scale you unearthed in your factor analysis.  You may also need to *reverse code* your data if you have any items that are "negative." For instance, in this survey, there are items that are the opposite of financial well being - things like "Because of my money situation...I will never have the things I want in life." It's important that when you look at the reliability of a scale, all the items are facing the same direction (all trending positively or all trending negatively).  Since you will often have "reversed" items in a survey to try and catch your respondents unaware and give away fewer clues as to what you are measuring, you'll just need to do some wrangling afterwards to resolve this issue.

---

### Subsetting your Data

Your first task will be to make a dataframe for each factor you discovered in factor analysis.  You found two - one for "good" financial well being, and a second for "bad" financial well being. To make it easy, you can subset each of these out of ```financialWB1``` that you used for factor analysis by feeding in a vector of column numbers you want to keep for each: 

```{r}
goodFWB <- financialWB1[, c(1,2,4,8)]
badFWB <- financialWB1[, c(3,5,6,7,9,10)]
```

Remember that keeping the format ```[,``` in front means that you are retaining all rows of your data.

---

### Reverse Coding Items

Next, you'll reverse code all the items that are "negative." In this case, discerning them is easy - they are all on the ```badFWB``` scale.  The reverse coding process is very similar to any other recoding you've done in R, but it specifies that you are changing the values to be a mirror image of each other.  So, if the respondent reported a ```1``` for the question, it should be changed to a ```5```, if they reported a ```2``` it should change to a ```4```, and so on. Take a look at the code for the rest:

```{r}
financialWB1$FWB1_3r <- NA
financialWB1$FWB1_3r[financialWB1$FWB1_3 == 1] <- 5
financialWB1$FWB1_3r[financialWB1$FWB1_3 == 2] <- 4
financialWB1$FWB1_3r[financialWB1$FWB1_3 == 3] <- 3
financialWB1$FWB1_3r[financialWB1$FWB1_3 == 4] <- 2
financialWB1$FWB1_3r[financialWB1$FWB1_3 == 5] <- 1

financialWB1$FWB1_5r <- NA
financialWB1$FWB1_5r[financialWB1$FWB1_5 == 1] <- 5
financialWB1$FWB1_5r[financialWB1$FWB1_5 == 2] <- 4
financialWB1$FWB1_5r[financialWB1$FWB1_5 == 3] <- 3
financialWB1$FWB1_5r[financialWB1$FWB1_5 == 4] <- 2
financialWB1$FWB1_5r[financialWB1$FWB1_5 == 5] <- 1

financialWB1$FWB1_6r <- NA
financialWB1$FWB1_6r[financialWB1$FWB1_6 == 1] <- 5
financialWB1$FWB1_6r[financialWB1$FWB1_6 == 2] <- 4
financialWB1$FWB1_6r[financialWB1$FWB1_6 == 3] <- 3
financialWB1$FWB1_6r[financialWB1$FWB1_6 == 4] <- 2
financialWB1$FWB1_6r[financialWB1$FWB1_6 == 5] <- 1

financialWB1$FWB2_1r <- NA
financialWB1$FWB2_1r[financialWB1$FWB2_1 == 1] <- 5
financialWB1$FWB2_1r[financialWB1$FWB2_1 == 2] <- 4
financialWB1$FWB2_1r[financialWB1$FWB2_1 == 3] <- 3
financialWB1$FWB2_1r[financialWB1$FWB2_1 == 4] <- 2
financialWB1$FWB2_1r[financialWB1$FWB2_1 == 5] <- 1

financialWB1$FWB2_3r <- NA
financialWB1$FWB2_3r[financialWB1$FWB2_3 == 1] <- 5
financialWB1$FWB2_3r[financialWB1$FWB2_3 == 2] <- 4
financialWB1$FWB2_3r[financialWB1$FWB2_3 == 3] <- 3
financialWB1$FWB2_3r[financialWB1$FWB2_3 == 4] <- 2
financialWB1$FWB2_3r[financialWB1$FWB2_3 == 5] <- 1

financialWB1$FWB2_4r <- NA
financialWB1$FWB2_4r[financialWB1$FWB2_4 == 1] <- 5
financialWB1$FWB2_4r[financialWB1$FWB2_4 == 2] <- 4
financialWB1$FWB2_4r[financialWB1$FWB2_4 == 3] <- 3
financialWB1$FWB2_4r[financialWB1$FWB2_4 == 4] <- 2
financialWB1$FWB2_4r[financialWB1$FWB2_4 == 5] <- 1
```

As you can see, you need to do this for every item that is reversed. It can be a little tedious, but once you've got the first one going, it can save time and increase efficiency if you copy and paste and then change the item numbers. Luckily, the column names in this dataset are ideally setup for just that!

---

### Dropping the Old (Non-Recoded) Items

Once you have completed all your reverse coding, you'll want to drop out the old items that were negative and not reverse-coded.  You can use the same subsetting algorithm as before, and just keep the original positive data and your newly reverse-coded negative data:

```{r}
financialWB2 <- financialWB1[, c(1,2,4,8,11,12,13,14,15,16)]
```

And with that, you are now ready to get your reliability on!

---

## Test Assumptions

Made you sweat! Guess what? There are no assumptions for testing reliability in R! Breathe a sigh of relief and proceed to the next step!

---


<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 10 - Calculating Reliability in R<a class="anchor" id="DS103L5_page_10"></a>

[Back to Top](#DS103L5_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

```c-lms
topic: Calculating Reliability in R
```

# Calculating Reliability in R

Now that you've done all the prep work, running the actual reliability function is blindingly easy.  Just make use of ```alpha()``` and place as the sole argument the data frame that contains your data! You will want to do this for each subscale and for your data as a whole: 

```{r}
alpha(goodFWB)
alpha(badFWB)
alpha(financialWB2)
```

But to make this easy, start by looking at just the first one.  Here's the output for ```goodFWB```:

```text
Reliability analysis   
Call: alpha(x = goodFWB)

  raw_alpha std.alpha G6(smc) average_r S/N    ase mean sd median_r
      0.88      0.88    0.85      0.66 7.7 0.0024  3.2  1     0.67

 lower alpha upper     95% confidence boundaries
0.88 0.88 0.89 

 Reliability if an item is dropped:
       raw_alpha std.alpha G6(smc) average_r S/N alpha se   var.r med.r
FWB1_1      0.84      0.84    0.79      0.64 5.4   0.0035 0.00293  0.62
FWB1_2      0.85      0.85    0.79      0.65 5.6   0.0033 0.00102  0.65
FWB1_4      0.84      0.85    0.79      0.65 5.5   0.0034 0.00159  0.65
FWB2_2      0.87      0.87    0.82      0.69 6.7   0.0029 0.00015  0.69

 Item statistics 
          n raw.r std.r r.cor r.drop mean  sd
FWB1_1 6394  0.88  0.88  0.82   0.77  3.0 1.2
FWB1_2 6394  0.86  0.87  0.81   0.75  3.2 1.1
FWB1_4 6394  0.86  0.87  0.82   0.77  3.3 1.1
FWB2_2 6394  0.84  0.83  0.74   0.70  3.4 1.3

Non missing response frequency for each item
       -4 -1    1    2    3    4    5 miss
FWB1_1  0  0 0.14 0.15 0.33 0.24 0.13    0
FWB1_2  0  0 0.08 0.15 0.36 0.28 0.12    0
FWB1_4  0  0 0.06 0.14 0.38 0.30 0.12    0
FWB2_2  0  0 0.08 0.16 0.28 0.23 0.25    0
```

The output above covers both types of reliability - inter-rater and inter-item! You'll learn which output pertains to each next.

---

## Interpret Output for Inter-Rater Reliability

The first thing you'll need to look at for inter-rater reliability is Chronbach's alpha, which shows up on your R output as ```raw_alpha```.  Lucky for you, you don't actually have to collect the same survey from people twice to get a sense for inter-rater reliability (though it is an option, if you want to be thorough!). Chronbach's alpha does something called *split-half reliability*, which randomly splits the data in half for each person every possible way and then finds the correlation between them and takes the average, which is what R reports in ```raw_alpha```.  Typically, you want a Chronbach's alpha score of approximately .8 to have good reliability, however, it is based on the number of items in the scale or subscale, so interpreting it can be a little bit subjective.  Only have three items in your subscale? Then maybe a Chronbach's alpha score of about .7 is ok.  Have 30 items in your scale? Then .8 may not be looking so hot. The hard and fast rule, however, is that anything under .7 is probably not good reliability.

You will notice that the ```raw_alpha``` score for ```goodFWB``` is alpha = .88, so the scale at first blush has pretty good reliability.

---

### Increasing your Reliability

You may be able to increase your scale's reliability by dropping one or more items, because it doesn't play nice with the others.  This information is found in the ```Reliability if an item is dropped``` table in your R output. If the ```raw_alpha``` score in that table for any of the items listed in the row is higher than the overall Chronbach's alpha, you saw above, than you may want to try removing that item and see how the factor analysis plays out as well.

---

## Interpret Output for Inter-Item Reliability

Determining how nicely each item plays with each other is the realm of inter-item reliability, and this information can be found both in the ```Item statistics``` and the ```Non missing response frequency for each item``` sections.  Under ```Item statistics```, you see first ```raw.r```.  This is the correlation of the item with the scale as a whole.  But there's one big problem with ```raw.r```: it correlates the item with the scale in its entirety, *including* the item itself.  So often correlations are auto-inflated, and you don't really want to use this statistic.  The solution is shown in the ```r.drop``` column.  Here, it has removed the item in question from the scale as a whole, to avoid increasing the correlation value unduly.  This ```r.drop``` column contains what is called the ```corrected item-total correlation``` or sometimes called the ```item-rest correlation```, because it is the correlation of the item with "the rest" of the scale. You are looking for all of these ```r.drop``` values to be above .3, and if you find one that isn't, you may want to play around with removing that item, both for reliability and validity analysis.

In the ```Non missing response frequency``` table, you will find the percentage of people who gave each response option for each item.  The idea here is just to make sure that everyone is not answering all one way for any particular item.  It should be relatively evenly distributed, or at the very least, not clumped up! There's a pretty good chance that something is wrong with your question if absolutely everyone is answering the question the same way!

---

## Interpret Output for the Other 2 Sets of Data

You've now walked through how to interpret the ```alpha()``` results for the subscale ```goodFWB```.  Without walking through every step, you'll now go through the other two.  

---

### Interpreting the Bad Financial Well Being Subscale

Here's the output for ```badFWB```:

```text
Reliability analysis   
Call: alpha(x = badFWB)

  raw_alpha std.alpha G6(smc) average_r S/N    ase mean   sd median_r
      0.87      0.87    0.86      0.53 6.7 0.0025  2.6 0.92     0.52

 lower alpha upper     95% confidence boundaries
0.86 0.87 0.87 

 Reliability if an item is dropped:
       raw_alpha std.alpha G6(smc) average_r S/N alpha se  var.r med.r
FWB1_3      0.84      0.84    0.81      0.51 5.2   0.0032 0.0052  0.51
FWB1_5      0.86      0.86    0.84      0.56 6.3   0.0027 0.0040  0.55
FWB1_6      0.85      0.85    0.83      0.54 5.8   0.0029 0.0047  0.54
FWB2_1      0.84      0.84    0.81      0.51 5.1   0.0032 0.0036  0.51
FWB2_3      0.85      0.85    0.83      0.54 5.8   0.0029 0.0036  0.52
FWB2_4      0.85      0.85    0.83      0.53 5.6   0.0030 0.0053  0.52

 Item statistics 
          n raw.r std.r r.cor r.drop mean  sd
FWB1_3 6394  0.82  0.82  0.77   0.72  2.5 1.2
FWB1_5 6394  0.73  0.72  0.63   0.59  2.8 1.3
FWB1_6 6394  0.76  0.76  0.69   0.65  3.1 1.2
FWB2_1 6394  0.83  0.83  0.80   0.74  2.3 1.2
FWB2_3 6394  0.76  0.77  0.70   0.65  2.0 1.1
FWB2_4 6394  0.78  0.78  0.73   0.68  2.7 1.1

Non missing response frequency for each item
       -4 -1    1    2    3    4    5 miss
FWB1_3  0  0 0.22 0.31 0.28 0.11 0.09    0
FWB1_5  0  0 0.19 0.22 0.31 0.15 0.12    0
FWB1_6  0  0 0.09 0.21 0.38 0.16 0.15    0
FWB2_1  0  0 0.28 0.33 0.23 0.09 0.07    0
FWB2_3  0  0 0.39 0.33 0.17 0.06 0.04    0
FWB2_4  0  0 0.15 0.31 0.32 0.15 0.08    0
```

The overall Chronbach's alpha is .87, which is good and suggests that this scale is reliable.  Looking at the ```Reliability if an item is dropped``` table, it looks like there is no one item that, if-dropped, would improve the overall reliability. Inter-item reliability is also good, with corrected-item totals of approximately .6 or greater.  So the items go together well.  And no one item is having respondents all answer the same way, as evidenced by the ```Non missing response frequency``` table. Overall, this subscale has good inter-rater and inter-item reliability!

---

### Interpreting the Financial Well Being Scale as a Whole

Although some folks think that you can skip this step, it is generally idea to examine the reliability of your scale as a whole, not just look at the reliability of the subscales. Looking at the scale as a whole can provide one more great "gut check" on your data.  You expect that whether broken up into subgroups or not, that the items on your scale should all measure similar things. 

Here are the results of running ```alpha()``` on the entirety of the financial well being scale:

```text
Reliability analysis   
Call: alpha(x = financialWB2)

  raw_alpha std.alpha G6(smc) average_r S/N    ase mean   sd median_r
      0.92      0.92    0.92      0.52  11 0.0016  3.3 0.89     0.51

 lower alpha upper     95% confidence boundaries
0.91 0.92 0.92 

 Reliability if an item is dropped:
        raw_alpha std.alpha G6(smc) average_r  S/N alpha se  var.r med.r
FWB1_1       0.90      0.90    0.90      0.51  9.5   0.0018 0.0059  0.51
FWB1_2       0.91      0.91    0.91      0.52  9.7   0.0018 0.0055  0.51
FWB1_4       0.91      0.91    0.90      0.52  9.6   0.0018 0.0055  0.51
FWB2_2       0.90      0.90    0.91      0.51  9.5   0.0018 0.0068  0.50
FWB1_3r      0.91      0.91    0.91      0.52  9.6   0.0018 0.0080  0.50
FWB1_5r      0.91      0.91    0.91      0.54 10.6   0.0016 0.0059  0.52
FWB1_6r      0.91      0.91    0.91      0.53 10.1   0.0017 0.0074  0.51
FWB2_1r      0.90      0.90    0.90      0.51  9.3   0.0018 0.0070  0.50
FWB2_3r      0.91      0.91    0.91      0.52  9.9   0.0017 0.0073  0.51
FWB2_4r      0.91      0.91    0.91      0.53 10.0   0.0017 0.0075  0.51

 Item statistics 
           n raw.r std.r r.cor r.drop mean  sd
FWB1_1  6394  0.80  0.79  0.77   0.73  3.0 1.2
FWB1_2  6394  0.76  0.76  0.74   0.69  3.2 1.1
FWB1_4  6394  0.77  0.77  0.75   0.71  3.3 1.1
FWB2_2  6394  0.80  0.79  0.77   0.73  3.4 1.3
FWB1_3r 6385  0.77  0.77  0.74   0.71  3.5 1.2
FWB1_5r 6381  0.67  0.66  0.60   0.58  3.2 1.3
FWB1_6r 6385  0.71  0.71  0.67   0.64  2.9 1.2
FWB2_1r 6384  0.82  0.81  0.80   0.76  3.7 1.2
FWB2_3r 6383  0.73  0.74  0.70   0.67  4.0 1.1
FWB2_4r 6382  0.73  0.73  0.69   0.66  3.3 1.1

Non missing response frequency for each item
        -4 -1    1    2    3    4    5 miss
FWB1_1   0  0 0.14 0.15 0.33 0.24 0.13    0
FWB1_2   0  0 0.08 0.15 0.36 0.28 0.12    0
FWB1_4   0  0 0.06 0.14 0.38 0.30 0.12    0
FWB2_2   0  0 0.08 0.16 0.28 0.23 0.25    0
FWB1_3r  0  0 0.09 0.11 0.28 0.31 0.22    0
FWB1_5r  0  0 0.12 0.15 0.31 0.23 0.19    0
FWB1_6r  0  0 0.15 0.16 0.38 0.21 0.09    0
FWB2_1r  0  0 0.07 0.09 0.24 0.33 0.28    0
FWB2_3r  0  0 0.04 0.06 0.17 0.33 0.39    0
FWB2_4r  0  0 0.08 0.15 0.32 0.31 0.15    0
```

Overall Chronbach's alpha is excellent - a ```raw_alpha``` score of .92! Further, the removal of not one item will better that reliability, which is keen.  All items seem to correlate well with each other, as the corrected-item totals shown in ```r.drop``` are all approximately .6, a moderate correlation.  Further, the response frequencies seem to be decently spread out for each item, which also lends reliability to your data.

---

## Draw Conclusions about Your Scale

After examining the financial wellbeing scale as a whole and examining its subscales, your conclusion is that this is a very reliable survey! The person who created it should be proud that they are measuring the same thing over and over again!

---

## Summary

* Accuracy and Precision are two different ways to describe data with respect to a target.
* Bias happens when a systematic error is introduced into measurement, whether it is known or unknown.
* Repeatability is the portion of variation that comes from the same operator repeating a measurement using the same gauge. It is a measure of precision.
* Reproducibility is the variation of results conducted by different individuals. It measures the ability of multiple people getting the same results.

---

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 11 - Key Terms<a class="anchor" id="DS103L5_page_11"></a>

[Back to Top](#DS103L5_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Key Terms

Below is a list and short description of the important keywords learned in this lesson. Please read through and go back and review any concepts you do not fully understand. Great Work!

<table class="table table-striped">
    <tr>
        <th>Keyword</th>
        <th>Description</th>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Destructive Test</td>
        <td>When you have to destroy something to accurately measure it.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Disruptive Test</td>
        <td>When you have to disturb or change a process to accurately measure it.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Bias</td>
        <td>Built-in error that comes with measuring something.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Accuracy</td>
        <td>Measuring something correctly.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Precision</td>
        <td>Measuring something the same way each time.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Part-to-Part Variation</td>
        <td>A normal and expected error. There will naturally be small differences in parts or processes.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Repeatability</td>
        <td>Variation due to the measurement tool.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Reproducibility</td>
        <td>Variation due to the people doing the measuring.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Gage R&R</td>
        <td>The combination of part-to-part variation, repeatability, and reproducibility.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Measurement Error</td>
        <td>A combination of repeatability and reproducibility.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Percentage of Total Variation</td>
        <td>The amount of variation that is part-to-part versus measurement error.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Percentage of Tolerance</td>
        <td>The amount of part-to-part variation compared to your specification range.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Reliability</td>
        <td>Precision as it relates to survey measurement.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Validity</td>
        <td>Accuracy as it relates to survey measurement.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Inter-Rater Reliability</td>
        <td>The extent to which the same person taking the survey over and over again gets the same score.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Inter-Item Reliability</td>
        <td>All the items on your survey measure the same thing.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Factor Analysis</td>
        <td>A statistic that helps you validate surveys by determining how well items hang together.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Exploratory Factor Analysis (EFA)</td>
        <td>Used when you don't have a good idea of the structure of your data.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Confirmatory Factor Analysis (CFA)</td>
        <td>Used when you already have an idea of what to expect in terms of data structure.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Determinants</td>
        <td>Another measure of how well items are related.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Bartlett's Test</td>
        <td>Tests your data for a relationship between items. It should be significant to meet the assumption for factor analysis.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Identity Matrix</td>
        <td>A matrix that assumes no relationship between variables (all correlations zero).</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Factor Rotation</td>
        <td>The shifting of variables to extract the most meaning from them.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Orthogonal Rotation</td>
        <td>Rotation of factors at 90 degrees. The most common types are varimax and quartimax.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Oblique Rotation</td>
        <td>Rotation of factors not at 90 degrees.  The most common types are oblimin and promax.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Reverse Coding</td>
        <td>Reversing the survey response items (i.e. from 5 to 1) so they all face in the same direction.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Cronbach's alpha</td>
        <td>A measure of inter-rater reliability.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Corrected Item Total</td>
        <td>Computes a correlation of the item with the scale minus itself.</td>
    </tr>
</table>

---

## Key R Libraries

<table class="table table-striped">
    <tr>
        <th>Keyword</th>
        <th>Description</th>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>corpcor</td>
        <td>For completing rotations.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>GPArotation</td>
        <td>For rotating factors.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>psych</td>
        <td>For easier factor interpretation and for reliability analysis.</td>
    </tr>
</table>

---

## Key R Code

<table class="table table-striped">
    <tr>
        <th>Keyword</th>
        <th>Description</th>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>cor()</td>
        <td>Produces a correlation matrix.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>round()</td>
        <td>Rounds information to the specified number of decimal places.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>cortest.bartlett()</td>
        <td>Computes Bartlett's test.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>det()</td>
        <td>Computes determinants.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>principal()</td>
        <td>Computes an exploratory factor analysis.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>nfactors=</td>
        <td>An argument to principal() where you can specify the number of factors.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>rotate=</td>
        <td>An argument to principal() where you can specify the type of rotation to use.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>factor.residuals()</td>
        <td>Provides residuals for your factor analysis.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>abs()</td>
        <td>Provides the absolute value of something.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>print.pysch()</td>
        <td>Prints the factor loadings in an easy-to-read fashion.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>cut=</td>
        <td>An argument to print.psych() </td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>alpha()</td>
        <td>Computes reliability analysis.</td>
    </tr>
</table>
