<h1 align=center>Import Data into R</h1>

Data could exist in various formats. For each format R has a specific function and argument. This tutorial explains how to import data to R

1. Read CSV
2. Read Excel files
3. readxl_example()
4. read_excel()
5. excel_sheets()
6. Import data from other Statistical software
7. Read sas
8. Read STATA
9. Read SPSS

#### Read CSV

One of the most widely data store is the .csv (comma-separated values) file formats. R loads an array of libraries during the start-up, including the utils package. This package is convenient to open csv files combined with the reading.csv() function. Here is the syntax for read.csv


### Argument:

1. **file**: PATH where the file is stored
2. **header**: confirm if the file has a header or not, by default, the header is set to TRUE
3. **sep**: the symbol used to split the variable. By default, `,`.

In [7]:
PATH <- 'Data/mtcars.csv'
df <- read.csv(PATH, header =  TRUE, sep = ',')
head(df)

X,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Mazda RX4,21.0,6,160,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225,105,2.76,3.46,20.22,1,0,3,1


In [9]:
tail(df)

Unnamed: 0,X,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
27,Porsche 914-2,26.0,4,120.3,91,4.43,2.14,16.7,0,1,5,2
28,Lotus Europa,30.4,4,95.1,113,3.77,1.513,16.9,1,1,5,2
29,Ford Pantera L,15.8,8,351.0,264,4.22,3.17,14.5,0,1,5,4
30,Ferrari Dino,19.7,6,145.0,175,3.62,2.77,15.5,0,1,5,6
31,Maserati Bora,15.0,8,301.0,335,3.54,3.57,14.6,0,1,5,8
32,Volvo 142E,21.4,4,121.0,109,4.11,2.78,18.6,1,1,4,2


In [11]:
length(df)

In [13]:
class(df$X)

#### Read Excel files
Excel files are very popular among data analysts. Spreadsheets are easy to work with and flexible. R is equipped with a library readxl to import Excel spreadsheet.

In [14]:
require(readxl)

Loading required package: readxl
"package 'readxl' was built under R version 3.5.3"

to check if readxl is installed in your machine. If you install r with r-conda-essential, the library is already installed. You should see in the command window:

In [4]:
library(readxl)

"package 'readxl' was built under R version 3.5.3"

If the package does not exit, you can install it with the conda library or in the terminal, use conda install -c mittner r-readxl.

Use the following command to load the library to import excel files.

**readxl_example()**
We use the examples included in the package readxl during this tutorial.

In [5]:
readxl_example()

In [6]:
readxl_example("geometry.xls")

#### read_excel()
The function read_excel() is of great use when it comes to opening xls and xlsx extention.

In [7]:
# Store the path of `datasets.xlsx`
example <- readxl_example("datasets.xlsx")
# Import the spreadsheet
df <- read_excel(example)
# Count the number of columns
length(df)

In [9]:
example <- readxl_example("datasets.xlsx")
excel_sheets(example)

In [11]:
example <- readxl_example("datasets.xlsx")
quake <- read_excel(example, sheet = "quakes")
quake_1 <-read_excel(example, sheet = 4)
identical(quake, quake_1)

We can control what cells to read in 2 ways

1. Use n_max argument to return n rows
2. Use range argument combined with cell_rows or cell_cols

For example, we set n_max equals to 5 to import the first five rows.

In [19]:
# Read the first five row: with header
iris <-read_excel(example, n_max =5, col_names =TRUE)

We can also use the argument range to select rows and columns in the spreadsheet. In the code below, we use the excel style to select the range A1 to B5.

In [20]:
# Read the first five row: without header
iris_no_header <-read_excel(example, n_max =5, col_names =FALSE)

New names:
* `` -> ...1
* `` -> ...2
* `` -> ...3
* `` -> ...4
* `` -> ...5


We can also use the argument range to select rows and columns in the spreadsheet. In the code below, we use the excel style to select the range A1 to B5.

In [18]:
# Read rows A1 to B5
example_1 <-read_excel(example, range = "A1:B5", col_names =TRUE)
dim(example_1)

In the second example, we use the function cell_rows() which controls the range of rows to return. If we want to import the rows 1 to 5, we can set cell_rows(1:5). Note that, cell_rows(1:5) returns the same output as cell_rows(5:1).

In [22]:
# Read rows 1 to 5
example_2 <-read_excel(example, range =cell_rows(1:5),col_names =TRUE)			
dim(example_2)

In [24]:
iris_row_with_header <-read_excel(example, range =cell_rows(2:3), col_names=TRUE)
iris_row_no_header <-read_excel(example, range =cell_rows(2:3),col_names =FALSE)

New names:
* `` -> ...1
* `` -> ...2
* `` -> ...3
* `` -> ...4
* `` -> ...5


In [26]:
#We can select the columns with the letter, like in Excel.
# Select columns A and B
col <-read_excel(example, range =cell_cols("A:B"))
dim(col)