In [2]:
using Random, Statistics, Distributions, DataFrames

## Draw Random Numbers from Uniform Distributions

#### Write code to draw a set of 10,000 random numbers that is uniformly distributed in (-2,3). 
 - Hint: Stretch $U(0,1)$ to fit the bound of $U(-2,3)$.
 
#### What are the theoretical mean and standard deviation of the distribution $U(-2,3)$ ?
 
#### Show the mean and the standard deviation of the set of random numbers you've drawn. 

*劉浩揚*
##### suppose $x \sim U(0,1)$ then $(5x-2) \sim U(-2,3)$

*魏上傑*

$ \text{The theoretical mean of } U(a,b) \text{ is }\frac{a+b}{2}. $  
$ \text{The theoretical standard deviation of } U(a,b) \text{ is } 
\sqrt{\frac{(b-a)^2}{12}}. $  
$ \text{Hence, the theoretical mean of } U(-2,3) \text{ is }
\frac{3-2}{2}=0.5, 
\text{ and the standard deviation of } U(-2,3) 
\text{ is } \sqrt{\frac{(3-(-2))^2}{12}} \approx1.44$

In [68]:
x_1 = -2 .+ 5 .* rand(10000)

println("The sample mean is " , round(mean(x_1) ; digits = 2 ) )
println("The sample standard deviation is " , round(std(x_1) ; digits = 2 ) )

The sample mean is 0.51
The sample standard deviation is 1.44


In [10]:
d = Uniform(-2, 3);
println("theoretical mean = ", mean(d));
println("theoretical standard deviation = ", sqrt(var(d)));

a1 = rand(d, 1000);
println("mean = ", mean(a1));
println("standard deviation = ", sqrt(var(a1)));

theoretical mean = 0.5
theoretical standard deviation = 1.4433756729740645
mean = 0.5865395243573632
standard deviation = 1.4203939785081834


## Draw Random Numbers from Normal Distributions

### Use `randn()` to draw a $1000\times 2$ matrix of random numbers from $N(2,3)$ (a normal distribution with mean=2 and variance=3). Use `mean()` and `var()` to show that the empirical mean and variance are close to the theoretical values.
- Hint: `randn()` generates N(0,1) random variables; you have to scale it to the appropriate mean and variance.
- Hint: For constants `a` and `b`: $x \sim N(\mu, \sigma^2)$, then $a*x \sim N(a*\mu, \ a^2 \sigma^2)$ and $x+b \sim N(\mu+b, \ \sigma^2)$.

### Use `rand()` to draw a $1000\times 2$ matrix of random numbers from $N(2,3)$. Use `mean()` and `var()` to show that the empirical mean and variance are close to the theoretical values.
- Hint: `rand()` could take distributions as arguments, as we've shown in the class.

In [11]:
m1 = randn(1000, 2) * sqrt(3) .+ 2;
println("m1 mean(empirical) = ", mean(m1));
println("m1 variance(empirical) = ", var(m1));

d = Normal(2, sqrt(3));
m2 = rand(d, (1000, 2));
println("m2 mean(empirical) = ", mean(m2));
println("m2 variance(empirical) = ", var(m2));

m1 mean(empirical) = 2.0479601096093774
m1 variance(empirical) = 2.971758986826394
m2 mean(empirical) = 2.020465885597819
m2 variance(empirical) = 2.9590113329909977


## Draw Regression Data: Cross-Sectional Model

Suppose you write your own routine to do fancy estimation on cross-sectional and panel data models. You want to conduct a Monte Carlo analysis to see if the routine works as expected and the returned answer is correct. The first thing you have to do is to generate data with pre-specified parameter values. (So that you could apply your estimation routine to the data and see if the estimated parameter values match the pre-specified (*true*) values.) 

Let's start from the cross-sectional model. The model is:
\begin{aligned} 
  y_i & = \alpha + \beta' x_i + \epsilon_i,\qquad i=1,\ldots,N,\\
  \epsilon_i & \sim N(0, \sigma^2).
\end{aligned}   

There could be more than one $x_i$ variable in the model; let's denote the number of $x_i$ as $\textrm{nofX}$. Write a function to generate data $\{y_i, x_i\}$.The function should allow users to choose values of $\{\alpha, \beta, \sigma^2, \textrm{nofX}, N\}$.
  - Hint: The $x_i$s are assumed (in econometrics) to be fixed and exogenous and therefore the distribution from which they are generated is inconsequential. （前面那句看不懂意思沒關係，重要是下面這句：）You may assume that they are generated from normal distributions.

In [69]:
#吳天冷

#建立Cross-Sectional Model，其中y是N*1，α為N*1（但大家α都一樣），β為M*1，x是N*M，ϵ是N*1，維度不對會報錯
function CSM(; α = 0, β = [1], σ² = 1, M = 1, N = 1) 
    x = rand(Normal(0, sqrt(1)), N, M)
    # @show size(x)
    # @show x
    # @show size(β)
    y = α .+ x * β + rand(Normal(0, sqrt(σ²)), N)
    # @show y
    #建立Dataframe（從matrix建），並命名column
    matrix =  hcat(repeat(1:N), y, x)
    df = DataFrame(matrix, :auto)
    rename!(df, [Symbol("x_$i") for i in -1:M])
    rename!(df, Dict(:"x_-1" => "i", :"x_0" => "y"))
    return df
end

CSM(; α = 1, β = [2, 3, 4], σ² = 3, M = 3, N = 6) #範例

Row,i,y,x_1,x_2,x_3
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64
1,1.0,1.82067,0.0584865,0.721556,0.126647
2,2.0,7.50526,-0.932962,0.415234,2.12783
3,3.0,8.01874,-1.27465,1.21139,1.24602
4,4.0,7.88548,0.424825,1.32257,1.08737
5,5.0,-1.22088,-0.454669,-1.23507,0.254427
6,6.0,5.87114,1.12323,-0.783457,1.00569


In [None]:
# 曾勁松

rng = MersenneTwister(1234)
A=Normal(2,sqrt(3))
function f(α,β,σ2,nofX,N)
    ϵ=rand(Normal(0,sqrt(σ2)),(N,1))
    x_i=rand(rng,A,(N,nofX)) #從N(2,3)中抽樣出x
    α=fill(α,(N,1))
    β=fill(β,(nofX,1))
    y_i=α+x_i*β+ϵ
    dy=DataFrame(y=vec(y_i), x1=x_i[:,1])
    dx=DataFrame(x_i,:auto)
    return innerjoin(dy,dx,on=:x1)
end
f(2,3,1,7,3)

## Draw Regression Data: Panel Model

Suppose you also want to generate panel data to test your routine. The model is
\begin{aligned}
    y_{it} & = \alpha_i + \beta x_{it} + \epsilon_{it},\qquad i=1,\ldots,N,\ t=1,\ldots,T,\\
   \epsilon_{it} & \sim N(0, \sigma^2).
\end{aligned}   

Here, $i$ is the individual index and $t$ is the time index. For instance, $w_{13}$ means the value of $w$ for the 1st individual at 3rd time period. Assume the above model is the random-effect (RE) panel data model where $\alpha_i \sim N(0,\sigma_a^2)$ is a random variable which is independently distributed from $x_{it}$. Write a function to generate data of $\{y_{it}, x_{it}\}$ with the options of $\{\beta, \sigma^2, \sigma_a^2, \textrm{nofX}, N, T\}$. 

  - Hint: Draw $\alpha_i$ and expand it (`repeat()`) to fill the time periods. Generate $x_{it}$ and $\epsilon_{it}$. Then combine these elements according to the equation to create $y_{it}$.
  - Hint: You don't really need to understand what is RE model to generate the data. Just follow the notation and it should be ok.
  - Hint: `repeat()` would be useful here.
  - Hint: The structure of the dataset should look like the following. Note that $\alpha_i$ is constant within a given $i$ but would change acorss different $i$'s. 


|	i	|	t	|	y_it	|	alpha_i	|	x_it	|
| ---	| ---	|	--- 	| ---		|	---     |
|	1	|	1	|	0.173 	|	0.12	|	0.183 	|
|	1	|	2	|	0.372 	|	0.12	|	0.804 	|
|	1	|	3	|	0.239 	|	0.12	|	0.072 	|
|	1	|	4	|	0.791 	|	0.12	|	0.272 	|
|	2	|	1	|	0.443 	|	-0.45	|	0.705 	|
|	2	|	2	|	0.825 	|	-0.45	|	0.619 	|
|	2	|	3	|	0.681 	|	-0.45	|	0.769 	|
|	2	|	4	|	0.694 	|	-0.45	|	0.575 	|
|	3	|	1	|	0.192 	|	1.29	|	0.067 	|
|	3	|	2	|	0.072 	|	1.29	|	0.553 	|
|	3	|	3	|	0.522 	|	1.29	|	0.280 	|
|	3	|	4	|	0.021 	|	1.29	|	0.306 	|







In [4]:
# Last year method

struct data_panel
    fid  # firm id
    tid  # time id
    y
    x
end


function DGP_panel(; N::Int=10, T::Int=2, nofx::Int=1, α::Float64=0.5, 
                   β::Vector=[0.5], σₐ²::Float64=1.0, σₑ²::Float64=1.0, seed::Int=33668324) 
    
    length(β) == nofx || throw("The length of β should equal nofx.")

    if seed == 33668324              # meaning no user-supplied seed
        rng1 = Random.default_rng()  # inherit from global RNG
    else     
        rng1 = Xoshiro(seed)    # user-supplied RNG
    end
    
    
    firmid = repeat(1:N, inner=T)
    timeid = repeat(1:T, outer=N)
        αᵢ = repeat(randn(rng1, N)*sqrt(σₐ²), inner=T)  # or, repeat(rand(Normal(0, sqrt(σₐ²)), inner=T))
    
    x = rand(rng1, Normal(0,1), N*T, nofx)     # or, randn(N*T, nofx)
    e = rand(rng1, Normal(0,sqrt(σₑ²)), N*T, 1)
    y = αᵢ .+ x*β .+ e

    return data_panel(firmid, timeid, y, x)
end


# example
mybeta = ones(4)*0.6  # a vector of [0.6, 0.6, 0.6, 0.6]
data2 = DGP_panel(N=3, T=2, nofx=4, β=mybeta)

[data2.fid data2.tid data2.y] |> display
data2.x |> display


6×3 Matrix{Float64}:
 1.0  1.0  -0.879908
 1.0  2.0  -0.162918
 2.0  1.0  -1.82062
 2.0  2.0  -2.83872
 3.0  1.0  -1.86913
 3.0  2.0  -1.50121

6×4 Matrix{Float64}:
 0.0779672   1.0548     0.366233  -1.42159
 0.940961   -2.59507    0.367747   0.446513
 1.17443    -0.299261  -1.64253   -1.24851
 0.0150571  -0.123509   0.574206  -1.33367
 1.60237    -0.602435   0.824673  -0.161014
 0.30057     1.50224    1.0848    -1.56717

In [3]:
# 曾勁松

rng = MersenneTwister(1234)
A=Normal(2,sqrt(3))
function f(β,σ2,σ2_α,nofX,N,T)
    ϵ_it=rand(Normal(0,sqrt(σ2)),(N*T,1))
    α_i=repeat(rand(Normal(0,sqrt(σ2_α)),(N,1)), inner=(T,1))
    x_it=rand(rng,A,(N*T,nofX))#從N(2,3)中抽樣出x
    β=fill(β,(nofX,1))
    y_it=α_i+x_it*β+ϵ_it
    i=repeat(1:N, inner=(T,1))
    t=repeat(1:T,N)
    dy=DataFrame(y=vec(y_it), x1=x_it[:,1])
    dx=DataFrame(x_it,:auto)
    dit=DataFrame(i=vec(i),t=vec(t),α=vec(α_i), x1=x_it[:,1])
    return innerjoin(dit,dx,dy,on=:x1)
end
df = f(2,3,1,3,3,3)

Row,i,t,α,x1,x2,x3,y
Unnamed: 0_level_1,Int64,Int64,Float64,Float64,Float64,Float64,Float64
1,1,1,1.17278,3.50229,1.10456,2.12491,14.2757
2,1,2,1.17278,0.438134,1.02918,-0.604016,3.12878
3,1,3,1.17278,1.14354,1.96659,4.70922,17.2432
4,2,1,0.556965,0.436107,2.22181,-0.419217,7.15047
5,2,2,0.556965,3.49719,5.20911,3.91478,25.8002
6,2,3,0.556965,5.83108,0.566272,0.0830876,14.5937
7,3,1,-1.99704,2.92286,2.19069,-3.56224,0.874304
8,3,2,-1.99704,1.52934,1.56495,1.8718,6.79833
9,3,3,-1.99704,2.87007,2.64036,2.2615,13.0043


Control Flow : https://docs.julialang.org/en/v1/manual/control-flow/ 
 - In the expression a `&&` b, the subexpression b is only evaluated if a evaluates to true.
 - In the expression a `||` b, the subexpression b is only evaluated if a evaluates to false.

In [54]:
# 曾政夫

struct pair
    x
    y
end

function f(beta::Vector,var,var_a,nofX,N,T)
    
    length(beta) == nofX || throw("The length of beta should equal nofx.")
    typeof(beta) == Vector{Float64} || throw("The length of beta should be Vector{Float64}")
    
    epsilon=rand(Normal(0,sqrt(var)),(N*T,1))
    x=randn(N*T,nofX)
    a=rand(Normal(0,sqrt(var_a)),N)
    alpha=repeat(a ,inner=T)
    y=x*beta.+alpha.+epsilon
    return pair(x, y)
end

β = [2]
β = [2.0, 3.0]
data = f(β, 4, 5, length(β), 3, 4)
data.x |> display
data.y |> display

12×2 Matrix{Float64}:
 -0.445132   -0.640003
 -0.160366   -0.400925
 -0.0438909   0.3287
  0.118579   -0.653552
  1.16653     0.0509771
  0.676893    0.00435148
  2.33208     0.259782
  0.317683    1.79712
 -0.624953   -1.91713
  0.372931   -0.424095
  0.278377    0.688705
 -0.121375   -1.29128

12×1 Matrix{Float64}:
 -3.1261535339024156
 -0.27588132136541776
  3.5495468116783453
  0.5938085549447303
 -0.1893032161343995
  1.5838936437356477
  6.123655439918991
  2.796525467366466
 -9.604030421235173
 -1.1080666549001703
  2.2673847330490524
 -1.4940480000357397

In [None]:
# warning: beta should vector as well not Real, length(beta) = nofX, X is (N*T × nofX) matrix 
function Panel_Data(; beta::Real, sigma_squared::Real, sigma_squared_alpha::Real, nofX::Int, N::Int, T::Int) #😍😕
    sigma=sqrt(sigma_squared)
    sigma_alpha=sqrt(sigma_squared_alpha)

    alpha=rand(Normal(0, sigma_alpha),N)  #or alpha= randn(N) .* sigma_alpha
    
    alpha_tiled = repeat(alpha, outer=[1, T]) # 得到(N x T)的矩陣
    
    X = randn(N, T, nofX)  # N x T x nofX array😕
    
    errors = randn(N, T) .* sigma
    
    
    Y= alpha_tiled .+ beta.*X .+errors
    # correct: Y = alpha_tiled .+ sum([X[:, :, i]*beta[i] for i=axes(X, 3)]) .+ errors😕
    
    
    return Y, X
    
end
Y, X = Panel_Data(beta=1.0, sigma_squared=1.0, sigma_squared_alpha=0.5, nofX=3, N=10, T=5);

### julia syntax
- 可以的話將程式碼包成函數(function)
- 函數以及變數的命名盡量明確(ex: `cdf(x)` `draw_data(x)`-而不是`f(x)`, `g(x)`)
- struct 命名以大寫分隔
```julia
struct PanelModel
    data
    index
end
```