In [None]:
options(jupyter.rich_display = F)

# Let's harvest some number of our own!
**by Serhat Çevikel**

For basic simulation, exercise and visualization purposes, it is better that we have mechanisms to create our own data series out of nothing!

## Sequences with seq()

Colon operator `:` creates simple sequences as "from" to "to" with 1 increments. Suppose we want to have more complex sequences. Such as, with different increments or providing a length or using non integer values

In [None]:
?seq

Get the Fahrenheit equivalent of celcius values as a sequence

In [None]:
seq(32, 212, length.out = 101)

Get values from 1 to 11 in increments of 3:

In [None]:
seq(1, 11, 3)

**EXERCISE 1:**

Get the leap years in 21th century. Two questions:

- Is 2000 a part of 21st century?
- Is 2100 a leap year?

**SOLUTION 1:**

In [None]:
seq(2004, 2100, 4)

## Uniform numeric values within a range

In [None]:
runif(10)

In [None]:
runif(20, -1, 1)

## Random out of normal distribution

By default mean is 0, sd is 1:

In [None]:
rnorm(10, 10, 10)

## Randomly select out of a set

In [None]:
sample(1:2, 10, replace = T)

In [None]:
sample(1:10, 10, replace = F)

## Deterministic randomness

The random numbers we create are not truly random but pseudo-random: The random sequences are created by following a modulo operator starting with a seed value. If we provide the starting "seed", the sequence will always be deterministic and reproducible!

In [None]:
set.seed(1000)
runif(10)

In [None]:
set.seed(1000)
runif(10)

In [None]:
set.seed(1000)
runif(10)

# Let's visualize

Now that we can create our own datasets, let's visualize the data in a simplistic way

In [None]:
set.seed(1000)
series_1 <- runif(100, -10, 10)
series_1

In [None]:
set.seed(1200)
series_2 <- runif(100, -10, 10)
series_2

In [None]:
series_3 <- series_1 + series_2

In [None]:
plot(series_1)

Now make a scatterplot between **series_1** and **series_3**:

In [None]:
plot(series_1, series_3)

We may change the labs, the plot title, color of markers and type of markers:

In [None]:
plot(series_1, series_3, pch=5, col="blue", xlab="x observations", ylab="y observations")
title("Weight vs. height")

## Line plots

Let's generate our own stock price sequence

First log returns:

In [None]:
logret <- rnorm(100, 0, 0.01)
logret

Convert to $e^x$:

In [None]:
logexp <- exp(logret)
logexp

Get cumulative products:

In [None]:
logcum <- cumprod(logexp)
logcum

And let's plot:

In [None]:
plot(1:100, logcum, type = "l")

Seems to walk not so randomly!

## Histograms

Create normal distributed numbers:

In [None]:
set.seed(2000)
vec_norm <- rnorm(100, 10, 2)

Create a histogram:

In [None]:
hist(vec_norm)

Default breaks are 6 to 14 in wholenumbers

We may instruct to create fewer or more bins by bin count:

In [None]:
hist(vec_norm, 5)

In [None]:
hist(vec_norm, 20)

Or explicitly tell the cutting points of bins:

Note that bin range should include all the data interval. If you change seed to 1000 and generate **vec_norm** again, resulted vector should include numbers bigger than 15 which will result in an error with `hist` function.

>  <span style="color:red">Error in hist.default(vec_norm, seq(5, 15, by = 0.5)): some 'x' not counted; maybe 'breaks' do not span range of 'x'</span>

Read documentation with `?hist` for further information

In [None]:
hist(vec_norm, seq(5, 15, by = 0.5))

# Exercises

**EXERCISE 2:**

Write an R expression that simulates the outcome of the 6/49 Lottery (Sayısal Loto), where one draws 6 numbers from 1, 2, ..., 49. Note that the same number cannot appear twice in one drawing.

**SOLUTION 2:**

In [None]:
sample(49, 6, replace = F)

**EXERCISE 3:**

Generate 1000 random numbers, drawn from the normal distribution with standard deviation 2, and another 1000 with standard deviation 0.5.

Plot the histogram for each set of numbers. What can you say about the effect of the standard deviation?

**SOLUTION 3:**

In [None]:
dist1 <- rnorm(1000, sd = 2)
dist2 <- rnorm(1000, sd = 0.5)

hist(dist1)
hist(dist2)

**EXERCISE 4:**

Throw 10 coins and count the number of heads.

Repeat this experiment ten times, and find the mean of the number of heads.

**SOLUTION 4:**

If we take T as **Heads** and F as **Tails**;

|Explanation|Code|
|:---|---:|
|10 coin toss|`sample(c(T, F), 10, replace = T)`|
|Head count|`sum(sample(c(T, F), 10, replace = T))`|
|Repeating this 10 times|`replicate(10, sum(sample(c(T, F), 10, replace = T)))`|
|Getting the mean|`mean(replicate(10, sum(sample(c(T, F), 10, replace = T))))`|

In [None]:
mean(replicate(10, sum(sample(c(T, F), 10, replace = T))))

**EXERCISE 5:**

Throw 3 dice 10000 times. Plot the histogram of the outcomes (outcomes should be between 3 and 18).

Hint: You can use `sample`, `cumsum`, `seq` and the exercise 5 from CMPE_140_02_PS! 

**SOLUTION 5:**

|Explanation|Code|
|:---|---:|
|Rolling three dice experiment|`sum(sample(6, 3, replace = T))  `|
|Repeat experiment 10000 times|`replicate(10000, sum(sample(6, 3, replace = T))) `|
|Getting histogram|`hist(replicate(10000, sum(sample(6, 3, replace = T))))`|

In [None]:
hist(replicate(10000, sum(sample(6, 3, replace = T))))