# R Basics

This is a note for the basics of R, an awesome language for data analysis. Here are some sample data I am going to use:

* stateData.csv: It contains information of all states in US.

## Table of Content

* [Working Directory](# working-directory)

## Working Directory

Before working on any projects, you may want to set up a correct working directory. It is very simple to check the current working directory and change it.

In [1]:
getwd()

In [2]:
setwd('..')
getwd()

## Vector

Vector is one of the data types in R. It looks like Python list, but it must contain the same type of data: character (string), logical (True or False) or numeric.

`c()` is a generic function that combines arguments to form a vector.

In [10]:
number <- c(1:10)
number

You can add value to a vector as well.

In [12]:
number <- c(number, 11:20)
number

You can select the value in a vector you want based on some boolean calculation, similar to Python Pandas Series.

In [13]:
name <- c('Amanda', 'Bob', 'Chris')

In [15]:
name_length = nchar(name)
name_length

In [16]:
name[name_length == 3]

## DataFrame

DataFrame is a data format in R. It has rows and columns, just like Excel files. You can easily get dataframe by reading a .csv file into R. I am going to use 'stateData.csv' file for the following notes.

In [33]:
state_info = read.csv('stateData.csv')

### Overview of DataFrame

In [34]:
state_info

X,state.abb,state.area,state.region,population,income,illiteracy,life.exp,murder,highSchoolGrad,frost,area
Alabama,AL,51609,2,3615,3624,2.1,69.05,15.1,41.3,20,50708
Alaska,AK,589757,4,365,6315,1.5,69.31,11.3,66.7,152,566432
Arizona,AZ,113909,4,2212,4530,1.8,70.55,7.8,58.1,15,113417
Arkansas,AR,53104,2,2110,3378,1.9,70.66,10.1,39.9,65,51945
California,CA,158693,4,21198,5114,1.1,71.71,10.3,62.6,20,156361
Colorado,CO,104247,4,2541,4884,0.7,72.06,6.8,63.9,166,103766
Connecticut,CT,5009,1,3100,5348,1.1,72.48,3.1,56.0,139,4862
Delaware,DE,2057,2,579,4809,0.9,70.06,6.2,54.6,103,1982
Florida,FL,58560,2,8277,4815,1.3,70.66,10.7,52.6,11,54090
Georgia,GA,58876,2,4931,4091,2.0,68.54,13.9,40.6,60,58073


In [35]:
head(state_info, 10) # Default is 6

X,state.abb,state.area,state.region,population,income,illiteracy,life.exp,murder,highSchoolGrad,frost,area
Alabama,AL,51609,2,3615,3624,2.1,69.05,15.1,41.3,20,50708
Alaska,AK,589757,4,365,6315,1.5,69.31,11.3,66.7,152,566432
Arizona,AZ,113909,4,2212,4530,1.8,70.55,7.8,58.1,15,113417
Arkansas,AR,53104,2,2110,3378,1.9,70.66,10.1,39.9,65,51945
California,CA,158693,4,21198,5114,1.1,71.71,10.3,62.6,20,156361
Colorado,CO,104247,4,2541,4884,0.7,72.06,6.8,63.9,166,103766
Connecticut,CT,5009,1,3100,5348,1.1,72.48,3.1,56.0,139,4862
Delaware,DE,2057,2,579,4809,0.9,70.06,6.2,54.6,103,1982
Florida,FL,58560,2,8277,4815,1.3,70.66,10.7,52.6,11,54090
Georgia,GA,58876,2,4931,4091,2.0,68.54,13.9,40.6,60,58073


In [36]:
tail(state_info, )

Unnamed: 0,X,state.abb,state.area,state.region,population,income,illiteracy,life.exp,murder,highSchoolGrad,frost,area
45,Vermont,VT,9609,1,472,3907,0.6,71.64,5.5,57.1,168,9267
46,Virginia,VA,40815,2,4981,4701,1.4,70.08,9.5,47.8,85,39780
47,Washington,WA,68192,4,3559,4864,0.6,71.72,4.3,63.5,32,66570
48,West Virginia,WV,24181,2,1799,3617,1.4,69.48,6.7,41.6,100,24070
49,Wisconsin,WI,56154,3,4589,4468,0.7,72.48,3.0,54.5,149,54464
50,Wyoming,WY,97914,4,376,4566,0.6,70.29,6.9,62.9,173,97203


In [37]:
names(state_info)

In [38]:
str(state_info) # Structure

'data.frame':	50 obs. of  12 variables:
 $ X             : Factor w/ 50 levels "Alabama","Alaska",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ state.abb     : Factor w/ 50 levels "AK","AL","AR",..: 2 1 4 3 5 6 7 8 9 10 ...
 $ state.area    : int  51609 589757 113909 53104 158693 104247 5009 2057 58560 58876 ...
 $ state.region  : int  2 4 4 2 4 4 1 2 2 2 ...
 $ population    : int  3615 365 2212 2110 21198 2541 3100 579 8277 4931 ...
 $ income        : int  3624 6315 4530 3378 5114 4884 5348 4809 4815 4091 ...
 $ illiteracy    : num  2.1 1.5 1.8 1.9 1.1 0.7 1.1 0.9 1.3 2 ...
 $ life.exp      : num  69 69.3 70.5 70.7 71.7 ...
 $ murder        : num  15.1 11.3 7.8 10.1 10.3 6.8 3.1 6.2 10.7 13.9 ...
 $ highSchoolGrad: num  41.3 66.7 58.1 39.9 62.6 63.9 56 54.6 52.6 40.6 ...
 $ frost         : int  20 152 15 65 20 166 139 103 11 60 ...
 $ area          : int  50708 566432 113417 51945 156361 103766 4862 1982 54090 58073 ...


In [39]:
dim(state_info) # Dimension

In [40]:
row.names(state_info)

In [None]:
row.names(state_info) <- c()