## R Programming For SPSS Users
From 2003 - 2009, I taught software training courses for SPSS Inc.  The contents of this notebook are partially inspired by the course titled "Statistical Analysis Using SPSS" that I frequenty taught during that time.  Another important source is a book titled "R for SAS and SPSS Users" by Robert Muenchen.  If you already know how to write SPSS Statistics syntax (code) or SAS code and want to learn how to do similiar things in R, Bob's book is highly recommended.  The following URL will take you to the web site for Bob's book: https://r4stats.com/


### Procedures covered
- head(): returns the first six rows in the data frame   
- summary(): frequencies & descriptive statistics   
- 


### Useful R Packages to consider

Frank Harrell’s Hmisc package  
https://cran.r-project.org/web/packages/Hmisc/  
https://hbiostat.org/R/Hmisc/  
contains functions that produce more informative output than standard R functions  


In [1]:
# R Programming For SPSS Users
# Last updated 2/5/2023

version # show the version of R used in Anaconda

               _                           
platform       x86_64-w64-mingw32          
arch           x86_64                      
os             mingw32                     
system         x86_64, mingw32             
status                                     
major          3                           
minor          6.1                         
year           2019                        
month          07                          
day            05                          
svn rev        76782                       
language       R                           
version.string R version 3.6.1 (2019-07-05)
nickname       Action of the Toes          

### Useful R programming tips

#### attach, detach, search
If you are doing a lot of work on a single dataframe, you can specify an "active" dataframe
   using attach() and then later using detach()


### Working with datasets included in R Base

In [8]:
# Displays a list of the dataset contain in the installed version of R Base  
library(help = "datasets")  
# favorite datasets include: mtcars, iris, titanic

# head(mtcars)
# summary(mtcars)

# head(UCBAdmissions)

In [3]:
# Convert cyl and gear (Numeric) to factor using as.factor()
# alternative syntax: mtcars$gear <- as.factor(mtcars$gear)
# as.numeric(mtcars$carb) -> mtcars$carb

# To convert a variable from factor to number
as.factor(mtcars$cyl)->mtcars$cyl  
as.factor(mtcars$gear)->mtcars$gear 
as.factor(mtcars$am)->mtcars$am
as.factor(mtcars$carb) -> mtcars$carb

summary(mtcars)


      mpg        cyl         disp             hp             drat      
 Min.   :10.40   4:11   Min.   : 71.1   Min.   : 52.0   Min.   :2.760  
 1st Qu.:15.43   6: 7   1st Qu.:120.8   1st Qu.: 96.5   1st Qu.:3.080  
 Median :19.20   8:14   Median :196.3   Median :123.0   Median :3.695  
 Mean   :20.09          Mean   :230.7   Mean   :146.7   Mean   :3.597  
 3rd Qu.:22.80          3rd Qu.:326.0   3rd Qu.:180.0   3rd Qu.:3.920  
 Max.   :33.90          Max.   :472.0   Max.   :335.0   Max.   :4.930  
       wt             qsec             vs         am     gear   carb  
 Min.   :1.513   Min.   :14.50   Min.   :0.0000   0:19   3:15   1: 7  
 1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000   1:13   4:12   2:10  
 Median :3.325   Median :17.71   Median :0.0000          5: 5   3: 3  
 Mean   :3.217   Mean   :17.85   Mean   :0.4375                 4:10  
 3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000                 6: 1  
 Max.   :5.424   Max.   :22.90   Max.   :1.0000                 8: 1  

In [4]:
# R Base: Summary function
# produces summary statistics of both continuous and categorical variables  
head(iris, n=10)   # By default, the head() function returns the first 6 rows by default
summary(iris)  # species is a categorical variable, all others are continuous)  


Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
5.1,3.5,1.4,0.2,setosa
4.9,3.0,1.4,0.2,setosa
4.7,3.2,1.3,0.2,setosa
4.6,3.1,1.5,0.2,setosa
5.0,3.6,1.4,0.2,setosa
5.4,3.9,1.7,0.4,setosa
4.6,3.4,1.4,0.3,setosa
5.0,3.4,1.5,0.2,setosa
4.4,2.9,1.4,0.2,setosa
4.9,3.1,1.5,0.1,setosa


  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
 Median :5.800   Median :3.000   Median :4.350   Median :1.300  
 Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
       Species  
 setosa    :50  
 versicolor:50  
 virginica :50  
                
                
                

In [9]:
# Stanford course CS109: A Titanic Probability
# https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/problem12.html
# reads in a copy of the Titanic survival data from a URL
X <- read.csv(url("https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv"))

# convert Survived and Pclass to factors
X$Survived <- as.factor(X$Survived)
X$Pclass <- as.factor(X$Pclass)

# Creates a new dichotomous variable for Adult/Child
X$Child_cat[X$Age < 18] <- "Child"
X$Child_cat[X$Age >=18] <- "Adult"

head(X,10)
summary(X)

Survived,Pclass,Name,Sex,Age,Siblings.Spouses.Aboard,Parents.Children.Aboard,Fare,Child_cat
0,3,Mr. Owen Harris Braund,male,22,1,0,7.25,Adult
1,1,Mrs. John Bradley (Florence Briggs Thayer) Cumings,female,38,1,0,71.2833,Adult
1,3,Miss. Laina Heikkinen,female,26,0,0,7.925,Adult
1,1,Mrs. Jacques Heath (Lily May Peel) Futrelle,female,35,1,0,53.1,Adult
0,3,Mr. William Henry Allen,male,35,0,0,8.05,Adult
0,3,Mr. James Moran,male,27,0,0,8.4583,Adult
0,1,Mr. Timothy J McCarthy,male,54,0,0,51.8625,Adult
0,3,Master. Gosta Leonard Palsson,male,2,3,1,21.075,Child
1,3,Mrs. Oscar W (Elisabeth Vilhelmina Berg) Johnson,female,27,0,2,11.1333,Adult
1,2,Mrs. Nicholas (Adele Achem) Nasser,female,14,1,0,30.0708,Child


 Survived Pclass                                  Name         Sex     
 0:545    1:216   Capt. Edward Gifford Crosby       :  1   female:314  
 1:342    2:184   Col. John Weir                    :  1   male  :573  
          3:487   Col. Oberst Alfons Simonius-Blumer:  1               
                  Don. Manuel E Uruchurtu           :  1               
                  Dr. Alfred Pain                   :  1               
                  Dr. Alice (Farnham) Leader        :  1               
                  (Other)                           :881               
      Age        Siblings.Spouses.Aboard Parents.Children.Aboard
 Min.   : 0.42   Min.   :0.0000          Min.   :0.0000         
 1st Qu.:20.25   1st Qu.:0.0000          1st Qu.:0.0000         
 Median :28.00   Median :0.0000          Median :0.0000         
 Mean   :29.47   Mean   :0.5254          Mean   :0.3833         
 3rd Qu.:38.00   3rd Qu.:1.0000          3rd Qu.:0.0000         
 Max.   :80.00   Max.   :8.0000   

In [7]:
#install.packages("gmodels")
#CrossTable(workshop, gender, chisq = TRUE, format = "SAS")



  There is a binary version available but the source version is later:
        binary   source needs_compilation
gmodels 2.18.1 2.18.1.1             FALSE



installing the source package 'gmodels'



ERROR: Error in CrossTable(workshop, gender, chisq = TRUE, format = "SAS"): could not find function "CrossTable"


In [12]:
# simple crosstab table of counts
# Higher survival rates in children?
apply(Titanic, c(3, 4), sum)

# NOTE: There appears to be something funky about the Titantic dataset in R Base

Unnamed: 0,No,Yes
Child,52,57
Adult,1438,654


In [None]:
## Grouped summaries of continuous variables  