<img src="materials/images/introduction-to-r-programming-cover.png"/>


# 👋 Welcome, before you start
<br>

### 📚 Module overview

R is a statistical programming language that is very effective for computation and high-level graphics. It is commonly used for data analytics and data science.

We are going through five lessons in this module:

- [**Lesson 1: R Basic Data Types**](Lesson_1_R_Basic_Data_Types.ipynb)

- [**Lesson 2: R Data Structures**](Lesson_2_R_Data_Structures.ipynb)

- <font color=#E98300>**Lesson 3: Importing Data**</font>    `📍You are here.`

- [**Lesson 4: Conditionals and Loops**](Lesson_4_Conditionals_and_Loops.ipynb)

- [**Lesson 5: Functions**](Lesson_5_Functions.ipynb)

</br>

### ✅ Exercises
We encourage you to try the exercise questions in this module, and use the [**solutions to the exercises**](Exercise_solutions.ipynb) to help you study.

</br>

<div class="alert alert-block alert-info">
<h3>⌨️ Keyboard shortcut</h3>

These common shortcut could save your time going through this notebook:
- Run the current cell: **`Enter + Shift`**.
- Add a cell above the current cell: Press **`A`**.
- Add a cell below the current cell: Press **`B`**.
- Change a code cell to markdown cell: Select the cell, and then press **`M`**.
- Delete a cell: Press **`D`** twice.

Need more help with keyboard shortcut? Press **`H`** to look it up.
</div>

---

# Lesson 3: Importing Data

Data can be formatted in various ways so we use parsers to import the data as a data frame.


`🕒 This module should take about 10 minutes to complete.`

`✍️ This notebook is written using R.`

### csv file
Comma separated values (csv) is a common way that data can be formatted. To import a csv file, read.csv() is used. The name of the file is passed in and the dataset will be imported as a data frame.

For example:

In [2]:
df <- read.csv("data/heart_disease.csv")

<div class="alert alert-block alert-success">
<b>Note:</b> Comma Separated Values (CSV) is one of the most common file formats for storing data, and CSV files are supported by nearly all data upload interfaces.
</div>

#### head()
The head() function displays the first six rows (by default):

In [3]:
head(df)

Unnamed: 0_level_0,age,sex,chest_pain,rest_bp,chol,max_hr,st_depr,target
Unnamed: 0_level_1,<int>,<chr>,<int>,<int>,<int>,<int>,<dbl>,<chr>
1,63,Female,3,145,233,150,2.3,Yes
2,37,Female,2,130,250,187,3.5,Yes
3,41,Male,1,130,204,172,1.4,Yes
4,56,Female,1,120,236,178,0.8,Yes
5,57,Male,0,120,354,163,0.6,Yes
6,57,Female,0,140,192,148,0.4,Yes


In [5]:
# Display the first 15 rows:

head(df, 15)

Unnamed: 0_level_0,age,sex,chest_pain,rest_bp,chol,max_hr,st_depr,target
Unnamed: 0_level_1,<int>,<chr>,<int>,<int>,<int>,<int>,<dbl>,<chr>
1,63,Female,3,145,233,150,2.3,Yes
2,37,Female,2,130,250,187,3.5,Yes
3,41,Male,1,130,204,172,1.4,Yes
4,56,Female,1,120,236,178,0.8,Yes
5,57,Male,0,120,354,163,0.6,Yes
6,57,Female,0,140,192,148,0.4,Yes
7,56,Male,1,140,294,153,1.3,Yes
8,44,Female,1,120,263,173,0.0,Yes
9,52,Female,2,172,199,162,0.5,Yes
10,57,Female,2,150,168,174,1.6,Yes


#### tail()
The tail() function displays the last six rows (by default):

In [4]:
tail(df)

Unnamed: 0_level_0,age,sex,chest_pain,rest_bp,chol,max_hr,st_depr,target
Unnamed: 0_level_1,<int>,<chr>,<int>,<int>,<int>,<int>,<dbl>,<chr>
298,59,Female,0,164,176,90,1.0,No
299,57,Male,0,140,241,123,0.2,No
300,45,Female,3,110,264,132,1.2,No
301,68,Female,0,144,193,141,3.4,No
302,57,Female,0,130,131,115,1.2,No
303,57,Male,1,130,236,174,0.0,No


#### unique()
The unique() function will display the unique values within a column:

In [6]:
unique(df$sex)

#### table()
The table() function will display the count of each unique value within a column:

In [7]:
# Count the frequency of each category in a column/vector:

table(df$sex)


Female   Male 
   207     96 

<div class="alert alert-block alert-success">
<b>Keep in mind:</b> Each column requested with "$" is a vector, so any function that can be applied to a vector can also be applied to a column (e.g., table(), length(), mean(), sort() ).
</div>

### Table format

Another way that we can import data is by using read.table(). This reads a file in table format and creates a data frame from it. We can set its `sep` parameter to indicate how the data is separated (e.g., sep="," for comma separated values). Setting the `header` parameter to TRUE indicates that the first row contains the names of the columns.

For example:

In [8]:
df_table <- read.table("data/heart_disease.csv", sep=",", header=TRUE, )

In [9]:
# Preview the data imported via read.table():

head(df_table)

Unnamed: 0_level_0,age,sex,chest_pain,rest_bp,chol,max_hr,st_depr,target
Unnamed: 0_level_1,<int>,<chr>,<int>,<int>,<int>,<int>,<dbl>,<chr>
1,63,Female,3,145,233,150,2.3,Yes
2,37,Female,2,130,250,187,3.5,Yes
3,41,Male,1,130,204,172,1.4,Yes
4,56,Female,1,120,236,178,0.8,Yes
5,57,Male,0,120,354,163,0.6,Yes
6,57,Female,0,140,192,148,0.4,Yes


---

# 🌟 Ready for the next one?
<br>

- [**Lesson 4: Conditionals and Loops**](Lesson_4_Conditionals_and_Loops.ipynb)

- [**Lesson 5: Functions**](Lesson_5_Functions.ipynb)

---

# Contributions & acknowledgment

Thanks Antony Ross for contributing the content for this notebook.

---

Copyright (c) 2022 Stanford Data Ocean (SDO)

All rights reserved.