In [None]:
options(jupyter.rich_display = F)

# Let's harvest some number of our own!
**by Serhat Çevikel**

For basic simulation, exercise and visualization purposes, it is better that we have mechanisms to create our own data series out of nothing!

## Sequences with seq()

Colon operator `:` creates simple sequences as "from" to "to" with 1 increments. Suppose we want to have more complex sequences. Such as, with different increments or providing a length or using non integer values

In [None]:
?seq

Get the Fahrenheit equivalent of celcius values as a sequence

In [None]:
seq(32, 212, length.out = 101)

Get values from 1 to 11 in increments of 3:

In [None]:
seq(1, 11, 3)

**EXERCISE 1:**

Get the leap years in 21th century. Two questions:

- Is 2000 a part of 21st century?
- Is 2100 a leap year?

**SOLUTION 1:**

In [None]:
seq(2004, 2100, 4)

## Uniform numeric values within a range

In [None]:
runif(10)

In [None]:
runif(20, -1, 1)

## Random out of normal distribution

By default mean is 0, sd is 1:

In [None]:
rnorm(10, 10, 10)

## Randomly select out of a set

In [None]:
sample(1:2, 10, replace = T)

In [None]:
sample(1:10, 10, replace = F)

## Deterministic randomness

The random numbers we create are not truly random but pseudo-random: The random sequences are created by following a modulo operator starting with a seed value. If we provide the starting "seed", the sequence will always be deterministic and reproducible!

In [None]:
set.seed(1000)
runif(10)

In [None]:
set.seed(1000)
runif(10)

In [None]:
set.seed(1000)
runif(10)

# Let's visualize

Now that we can create our own datasets, let's visualize the data in a simplistic way

In [None]:
set.seed(1000)
series_1 <- runif(100, -10, 10)
series_1

In [None]:
set.seed(1200)
series_2 <- runif(100, -10, 10)
series_2

In [None]:
series_3 <- series_1 + series_2

In [None]:
plot(series_1)

Now make a scatterplot between **series_1** and **series_3**:

In [None]:
plot(series_1, series_3)

We may change the labs, the plot title, color of markers and type of markers:

In [None]:
plot(series_1, series_3, pch=5, col="blue", xlab="x observations", ylab="y observations")
title("Weight vs. height")

## Line plots

Let's generate our own stock price sequence

First log returns:

In [None]:
logret <- rnorm(100, 0, 0.01)
logret

Convert to $e^x$:

In [None]:
logexp <- exp(logret)
logexp

Get cumulative products:

In [None]:
logcum <- cumprod(logexp)
logcum

And let's plot:

In [None]:
plot(1:100, logcum, type = "l")

Seems to walk not so randomly!

## Histograms

Create normal distributed numbers:

In [None]:
set.seed(2000)
vec_norm <- rnorm(100, 10, 2)

Create a histogram:

In [None]:
hist(vec_norm)

Default breaks are 6 to 14 in wholenumbers

We may instruct to create fewer or more bins by bin count:

In [None]:
hist(vec_norm, 5)

In [None]:
hist(vec_norm, 20)

Or explicitly tell the cutting points of bins:

Note that bin range should include all the data interval. If you change seed to 1000 and generate **vec_norm** again, resulted vector should include numbers bigger than 15 which will result in an error with `hist` function.

>  <span style="color:red">Error in hist.default(vec_norm, seq(5, 15, by = 0.5)): some 'x' not counted; maybe 'breaks' do not span range of 'x'</span>

Read documentation with `?hist` for further information

In [None]:
hist(vec_norm, seq(5, 15, by = 0.5))

# Exercises

**EXERCISE 2:**

Write an R expression that simulates the outcome of the 6/49 Lottery (Sayısal Loto), where one draws 6 numbers from 1, 2, ..., 49. Note that the same number cannot appear twice in one drawing.

**SOLUTION 2:**

In [None]:
sample(49, 6, replace = F)

**EXERCISE 3:**

Generate 1000 random numbers, drawn from the normal distribution with standard deviation 2, and another 1000 with standard deviation 0.5.

Plot the histogram for each set of numbers. What can you say about the effect of the standard deviation?

**SOLUTION 3:**

In [None]:
dist1 <- rnorm(1000, sd = 2)
dist2 <- rnorm(1000, sd = 0.5)

hist(dist1)
hist(dist2)

**EXERCISE 4:**

Throw 10 coins and count the number of heads.

Repeat this experiment ten times, and find the mean of the number of heads.

**SOLUTION 4:**

If we take T as **Heads** and F as **Tails**;

|Explanation|Code|
|:---|---:|
|10 coin toss|`sample(c(T, F), 10, replace = T)`|
|Head count|`sum(sample(c(T, F), 10, replace = T))`|
|Repeating this 10 times|`replicate(10, sum(sample(c(T, F), 10, replace = T)))`|
|Getting the mean|`mean(replicate(10, sum(sample(c(T, F), 10, replace = T))))`|

In [None]:
mean(replicate(10, sum(sample(c(T, F), 10, replace = T))))

**EXERCISE 5:**

Throw 3 dice 10000 times. Plot the histogram of the outcomes (outcomes should be between 3 and 18).

**SOLUTION 5:**

|Explanation|Code|
|:---|---:|
|Rolling three dice experiment|`sum(sample(6, 3, replace = T))  `|
|Repeat experiment 10000 times|`replicate(10000, sum(sample(6, 3, replace = T))) `|
|Getting histogram|`hist(replicate(10000, sum(sample(6, 3, replace = T))))`|

In [None]:
hist(replicate(10000, sum(sample(6, 3, replace = T))))

**EXERCISE 6:**

Create a sample of uniformly generated numbers between 0 and 1 with size 8! = 40320. Assign to run1

Now create vectors run2 to run8, such as:
- run2 is a vector of 8! / 2 items, each a sum of 2 uniformly generated numbers
- run3 is a vector of 8! / 3 items, each a sum of 3 uniformly generated numbers
...

and continue the pattern until run8

Now:

1) Plot the histograms of run1 to run8, do you see the pattern, what is that?

2) Get the five point statistical summaries of run1 to run8 using `summary()` function. What is the pattern you see?

**SOLUTION 6: (LONG WAY)**

In [None]:
run1 <- runif(factorial(8))
runcount <- length(run1)

run2 <- run1[1:runcount %% 2 == 0] + run1[1:runcount %% 2 == 1]

run3 <- run1[1:runcount %% 3 == 0] +
run1[1:runcount %% 3 == 1] +
run1[1:runcount %% 3 == 2]

run4 <- run1[1:runcount %% 4 == 0] +
run1[1:runcount %% 4 == 1] +
run1[1:runcount %% 4 == 2] +
run1[1:runcount %% 4 == 3]

run5 <- run1[1:runcount %% 5 == 0] +
run1[1:runcount %% 5 == 1] +
run1[1:runcount %% 5 == 2] +
run1[1:runcount %% 5 == 3] +
run1[1:runcount %% 5 == 4]

run6 <- run1[1:runcount %% 6 == 0] +
run1[1:runcount %% 6 == 1] +
run1[1:runcount %% 6 == 2] +
run1[1:runcount %% 6 == 3] +
run1[1:runcount %% 6 == 4] +
run1[1:runcount %% 6 == 5]

run7 <- run1[1:runcount %% 7 == 0] +
run1[1:runcount %% 7 == 1] +
run1[1:runcount %% 7 == 2] +
run1[1:runcount %% 7 == 3] +
run1[1:runcount %% 7 == 4] +
run1[1:runcount %% 7 == 5] +
run1[1:runcount %% 7 == 6]

run8 <- run1[1:runcount %% 8 == 0] +
run1[1:runcount %% 8 == 1] +
run1[1:runcount %% 8 == 2] +
run1[1:runcount %% 8 == 3] +
run1[1:runcount %% 8 == 4] +
run1[1:runcount %% 8 == 5] +
run1[1:runcount %% 8 == 6] +
run1[1:runcount %% 8 == 7]

hist(run1)
hist(run2)
hist(run3)
hist(run4)
hist(run5)
hist(run6)
hist(run7)
hist(run8)

summary(run1)
summary(run2)
summary(run3)
summary(run4)
summary(run5)
summary(run6)
summary(run7)
summary(run8)

**SOLUTION 6: (SHORT WAY)**

In [None]:
runcount <- factorial(8)
run1 <- runif(runcount)
run2 <- replicate(runcount / 2, sum(runif(2)))
run3 <- replicate(runcount / 3, sum(runif(3)))
run4 <- replicate(runcount / 4, sum(runif(4)))
run5 <- replicate(runcount / 5, sum(runif(5)))
run6 <- replicate(runcount / 6, sum(runif(6)))
run7 <- replicate(runcount / 7, sum(runif(7)))
run7 <- replicate(runcount / 8, sum(runif(8)))

hist(run1)
hist(run2)
hist(run3)
hist(run4)
hist(run5)
hist(run6)
hist(run7)
hist(run8)

summary(run1)
summary(run2)
summary(run3)
summary(run4)
summary(run5)
summary(run6)
summary(run7)
summary(run8)

**EXERCISE 7:**

Randomly assign number 1 to 5 to a sample of 1000 cases - each can be regarded as a category of observations

Now create normally distributed 1000 numbers each:
- with a standard deviation equal to one third of the category value (1/3 for category 1 and so on)
- and mean equal to the half of the category value (0.5 for category 1 and so on)

**Hint:** First create a sample of standart normally distributed values


- Now first plot the histogram of the overall sample, how does it look?
- Then plot the histogram of the values for each category, how do they look?

**SOLUTION 7:**

In [None]:
categories <- sample(5, 10000, replace = T)
zscores <- rnorm(10000)
values <- categories/2 + zscores * categories/3

hist(values)

hist(values[categories == 1])
hist(values[categories == 2])
hist(values[categories == 3])
hist(values[categories == 4])
hist(values[categories == 5])

**EXERCISE 8:**

Suppose a stock price falls to 5 from 10 and bounces back. Geometric returns are:

```r
5/10 - 1
10/5 - 1

[1] -0.5
[1] 1
```

Now let's calculate the logarithmic returns with the same values:

```r
log(5/10)
log(10/5)

[1] -0.6931472
[1] 0.6931472
```

See the symmetry now. exp() is the inverse of log():

```r
10 * exp(log(5 / 10)) * exp(log(10 / 5))
[1] 10
```

First create a sample of 5000 of values with a mean of 0.05 and sd of 0.5. These may be regarded as **LOGARITHMIC** stock returns at a very highly volatile stock market, plot its histogram.

Now calculate the price series starting with 1

Then calculate the geometric returns of this synthetic series, plot those returns' histogram.

Compare the histograms and five point summaries of both return series and comment on the results

**SOLUTION 8:**

In [None]:
returns_ln <- rnorm(5000, mean = 0.05, sd = 0.5)
hist(returns_ln)

prices <- c(1, cumprod(exp(returns_ln)))
returns_geo <- prices[-1] / prices[-length(prices)] - 1
hist(returns_geo)

summary(returns_ln)
summary(returns_geo)