# Getting Started with R

In our first 'real' week of the course we provide an introduction to R, structured as follows:

- <a href='#What is R'>What is R</a>  
-   <a href='#Why R?'>Why R?</a>
-  <a href='#Downloading and installing R'>Downloading and installing R</a>
-   <a href='#Setting up your environment'>Setting up your environment</a>
-   <a href='#Elementary Commands'>Elementary Commands</a>
- <a href='#Object assignment'>Object assignment</a>
-  <a href='#Data types'>Data types</a>
- <a href='#Data structures'>Data structures</a>
-  <a href='#Loops'>Loops</a>
-  <a href='#Importing files'>Importing files</a>
-  <a href='#Importing packages'>Importing packages</a>
- <a href='#Graphing and visualisation'>Graphing and visualisation</a>




### What is R <a id='What is R'></a>

R is a programming language created by **R**obert Gentleman and **R**oss Ihaka (get it? R?) at the Department of Statistics at the University of Auckland in New Zealand in 1995. R  is a statistical language in that it was designed with statistical applications in mind, and as a result it has myriad in-built statistical features which make statistical analysis easy to use and intuitive (next term you will see Python is even better). It is freely available and is also the most popular statistical language in the world.

 ### Why R<a id='Why R'></a>



It's an important question! It would have probably been pretty easy to just offer this course entirely in Python, a reasonable choice considering Python is also free, even more intuitive than R and currently world's 3rd most popular programming language (R is 20th). R's huge advantage for applications like hypothesis testing, plotting statistical distributions, regression, matrix algebra and many other mathematical tasks is that it is **purpose-built** to deal with them. As we will soon see R's in-built functions mean that as soon as you load your environment (well get to what an environment is soon, don't worry), R is ready to go for statistical analysis. In Python by contrast we have to download and install several extra packages for us to be able to do statistics, and once we do the commands (code we need to write) to conduct them are longer as we have to call these packages every time we want to use them.

As for other statistical languages like SAS, SPSS and STATA - **R is free**! Of course being at LSE means we get access to lots of paid software and most workplaces also do too, however the best thing about R being free is that while the paid packages you have access to at any point in your working or academic lives are liable to change R will always be an option! That's not to disincentivise you from learning STATA or any other paid language, it's just that R is a pretty natural choice for the statistical language used in an open-source course about using code to do statistics, maths, economics and metrics.

   ### Downloading and installing R <a id='Downloading and installing R'></a>



R is available for download at The R Project's website: https://www.r-project.org. It must downloaded through 'mirrors' or sites hosted by a range of universities. The mirror you choose doesn't matter, each will download the same version of R. Once you download the relevant version for your operating system you'll be prompted to install it as you'd install any other package to your computer. 

Once you've installed R open the terminal application (pre-installed as Terminal on Mac, Ubuntu on Linux (Dell laptops) and cmd.exe for Windows) and type the letter 'r' to check that R has installed correctly. If it has then you should see a short preamble about the version of R you're using and then the command prompt '>', which is where you can type your first command:

In [1]:
print('Hello World')

[1] "Hello World"


It's a bit of a cheesy first command but it's essentially a computer science rite of passage that that's the first one you ever write. Another good starting command to try is:

In [2]:
1+1

If you want to try a 'cool' one then try: 

In [3]:
for (i in 1:1000) print(i)

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
[1] 11
[1] 12
[1] 13
[1] 14
[1] 15
[1] 16
[1] 17
[1] 18
[1] 19
[1] 20
[1] 21
[1] 22
[1] 23
[1] 24
[1] 25
[1] 26
[1] 27
[1] 28
[1] 29
[1] 30
[1] 31
[1] 32
[1] 33
[1] 34
[1] 35
[1] 36
[1] 37
[1] 38
[1] 39
[1] 40
[1] 41
[1] 42
[1] 43
[1] 44
[1] 45
[1] 46
[1] 47
[1] 48
[1] 49
[1] 50
[1] 51
[1] 52
[1] 53
[1] 54
[1] 55
[1] 56
[1] 57
[1] 58
[1] 59
[1] 60
[1] 61
[1] 62
[1] 63
[1] 64
[1] 65
[1] 66
[1] 67
[1] 68
[1] 69
[1] 70
[1] 71
[1] 72
[1] 73
[1] 74
[1] 75
[1] 76
[1] 77
[1] 78
[1] 79
[1] 80
[1] 81
[1] 82
[1] 83
[1] 84
[1] 85
[1] 86
[1] 87
[1] 88
[1] 89
[1] 90
[1] 91
[1] 92
[1] 93
[1] 94
[1] 95
[1] 96
[1] 97
[1] 98
[1] 99
[1] 100
[1] 101
[1] 102
[1] 103
[1] 104
[1] 105
[1] 106
[1] 107
[1] 108
[1] 109
[1] 110
[1] 111
[1] 112
[1] 113
[1] 114
[1] 115
[1] 116
[1] 117
[1] 118
[1] 119
[1] 120
[1] 121
[1] 122
[1] 123
[1] 124
[1] 125
[1] 126
[1] 127
[1] 128
[1] 129
[1] 130
[1] 131
[1] 132
[1] 133
[1] 134
[1] 135
[1] 136
[1] 137
[1] 138
[1] 

Which prints the numbers 1,2,3...1000 in a fraction of a second! This is called a 'for loop', which we cover in more detail in <a href='#Loops'>Loops</a>

That's all we'll cover in the way of commands for now but in the rest of this notebook we'll give you lots more to try out. The important thing is that you can write commands in your terminal and that these give the expected output. Installation can sometimes be a bit of a hassle, as can the next section  <a href='#Setting up your environment'>Setting up your environment</a>, but once you get through these two stages there is huge potential for what we can do next. If your installation hasn't worked then check out the relevant online guides available for installing R and running it in the terminal.

  ### Setting up your environment<a id='Setting up your environment'></a>



If you're looking at your first few commands and their output (admit it, you tried to make R print the numbers between 1 and 100,000,000,000,000 and it broke) then you may notice that R is kind of annoying and ugly. Or at least, it's not like the R we're seeing here. For example it's probably all black and white, you can't click any part of your code and edit it and those little '>' prompts aren't particularly appealing. Don't worry, that's because we haven't set up our enviornment yet. What we need is the Jupyter Notebook, the development environment we're writing this notebook in. 

We've used the word 'environment' a lot without explaining what it means. Essentially the environment we code in is just program(s) we use to run and edit our code. In the previous section your environment was just the terminal. If you had the idea to write your code in a text file and then run it in terminal then the text editor you used would also be part of your environment. Jupyter notebooks are online web-applications which are a really user-friendly environment. They run from your web browser (Safari, Explorer, Firefox) and as mentioned they are the environment that this entire course was written on! Oh and you're reading one right now. Jupyter is so-called because it began as a way to support the languages Julia, Python and R in the same platform. It's currently able to support many more languages, including Ruby, Java, C and more. 

Before you ask: web-application doesn't mean we will always need an internet connection. Once we've downloaded the Jupyter Notebook software we can run it on our browser even without an internet connection. It might seem strange at first to be on Safari while everyone else in your flat/house/halls is complaining the wifi is broken but that's just one of the many cool things about being able to code. 

To install the Jupyter Notebook software we also need to have Python installed. Python is pre-installed on Mac and Linux (together called Unix btw) but needs to be downloaded and installed for Windows. You can do this at https://www.python.org/downloads/ just like you did for R but once again if you're having trouble then please consult the internet's many troubleshooting guides. Once Python is installed we head to: https://jupyter.org/install to install Jupyter Notebooks. The easiest way to do this is to download 'Anaconda', which is another environment, from: https://www.anaconda.com. All you do then is click the 'Jupyter' app on the Anaconda welcome screen and you will get taken straight to Jupyter open in your default browser! There other ways to install Jupyter but we'll leave those to you if you've got your own preferences.

Once you've opened Jupyter you'll be shown your 'User' folder, which will itself be a collection of folders. Now is a good time to create your own 'Hands-on Econ' folder, which you can do using the 'New' button at the top right corner. Once you're in your new folder click new again and create an R notebook. You should see an essentially blank webpage containing just a line like the one below:

If so then congrats! You're in! This is the environment which we'll use for the rest of this course, both for the rest of this term and for all of next term too. Try the commands out from before to make sure everything is working as it did in the terminal but if it is then well done - you're ready to learn how to code. The last important thing is to name the file your using by clicking the 'Untitled' text at the top of your browser and naming the file: 'MT Week 2'.

### Data types<a id='Data types'></a>

We've already seen the `print()` and `+` commands but, what about `'Hello World'` and `1`. These are examples of 2 of R's 4 main data types: character, numerical, logical and integer. To find what the type of an object is we use the `class()` function:


In [4]:
class('Hello World')

In [5]:
class(1)

Character is the simplest: it is just anything surrounded by quotation marks. So `'Hello World'`,`'Hello Worlds'`, `'1'` and `'qfln1f19p4hr17ofi'` are all characters. Numeric are also pretty simple: they are just any number, positive or negative, whole or not. Logical may sound strange but it just the values `TRUE` and `FALSE`:

In [6]:
class(TRUE)

It's important to note that `TRUE` and `FALSE` *don't* have quotation marks around them. If they did they'd be characters like anything else:

In [7]:
class('TRUE')

The last (main) data type is 'integer', which may sound simple (the integers are just the positive and negative whole numbers, so -32, 4 and 0 are all integers) but when we check:

In [8]:
class(-32)
class(4)
class(0)

We get numeric, not integer. The key is that we have to *tell* R that these are integers by adding L to their end, so:

In [9]:
class(-32L)
class(4L)
class(0L)

Are all integers.

### Object assignment<a id='Object assignment'></a>



In [10]:
print('TRUE')

[1] "TRUE"


### Elementary Commands<a id='Elementary Commands'></a>



`print()` will simply return whatever we type between the brackets. This might sound quite intuitive but be careful because if we want to print only the text within the brackets then we need to include quotation marks `''`  or `""`. If we tried to use the command:

```r
print(Hello World)``` 

then we would get an error because R doesn't know what `Hello World` means. When we put quotation marks around it we make `Hello World` into the 'character' `'Hello World'`.

 ### Data structures <a id='Data structures'></a>



 ### Loops <a id='Loops'></a>



  ### Importing files<a id='Importing files'></a>



  ### Importing packages <a id='Importing packages'></a>

  ### Graphing and visualisation<a id='Graphing and visualisation'></a>