# 作業2

## Draw Random Numbers from Uniform Distributions


### Write code to draw a set of 10,000 random numbers that is uniformly distributed in (-2,3). 
 - Hint: Stretch $U(0,1)$ to fit the bound of $U(-2,3)$.

In [1]:
using Random

Random.seed!(123)

vec1= rand(10000)  # draw 10000 random number from U(0,1)

vec2= vec1*5       # U(0,5)

vec3= vec2 .- 2    # U(-2,3)


10000-element Vector{Float64}:
  2.531498193987405
  0.21746866229802286
  1.7283669056969702
  0.5604152001830718
 -0.7307545552924521
 -0.3292423180904058
  0.1366394043679957
  2.33773600127979
 -1.5043319257819792
 -1.3735629615422482
  1.4610433102736953
 -1.3172426243127133
 -1.8395166332362638
  ⋮
 -1.2926166268116366
  2.4532212744055286
  2.534988425824621
  0.6848462508464195
  0.012016257922934592
  0.7141451488643304
  0.047574664567274105
  0.6425886323990513
  0.5171579955244585
 -1.2438671341528074
  2.9380322389357154
  2.2763546577676426

### What are the theoretical mean and standard deviation of the distribution $U(-2,3)$ ?

$ \text{The theoretical mean of } U(a,b) \text{ is }\frac{a+b}{2}. $  
$ \text{The theoretical standard deviation of } U(a,b) \text{ is } \sqrt{\frac{(b-a)^2}{12}}. $  
$ \text{Hence, the theoretical mean of } U(-2,3) \text{ is } \frac{3-2}{2}=0.5 \text{ and the standard deviation of } U(-2,3) \text{ is } \sqrt{\frac{(3-(-2))^2}{12}} \approx1.083$


### Show the mean and the standard deviation of the set of random numbers you've drawn. 

In [2]:
using Statistics

@show mean(vec3)   
@show std(vec3)

mean(vec3) = 0.5052948303444671
std(vec3) = 1.440600975309783


1.440600975309783

## Draw Random Numbers from Normal Distributions


### Use `randn()` to draw a $1000\times 2$ matrix of random numbers from $N(2,3)$ (a normal distribution with mean=2 and variance=3). Use `mean()` and `var()` to show that the empirical mean and variance are close to the theoretical values.
- Hint: `randn()` generates N(0,1) random variables; you have to scale it to the appropriate mean and variance.
- Hint: For constants `a` and `b`: $x \sim N(\mu, \sigma^2)$, then $a*x \sim N(a*\mu, \ a^2 \sigma^2)$ and $x+b \sim N(\mu+b, \ \sigma^2)$.

In [9]:
using Random, Statistics
Random.seed!(123)

vec1= randn(1000,2)

vec2= vec1 * sqrt(3)

vec3= vec2 .+ 2

@show mean(vec3)
@show var(vec3)

mean(vec3) = 1.9549979184134927
var(vec3) = 2.973121950912073


2.973121950912073

### Use `rand()` to draw a $1000\times 2$ matrix of random numbers from $N(2,3)$. Use `mean()` and `var()` to show that the empirical mean and variance are close to the theoretical values.
- Hint: `rand()` could take distributions as arguments, as we've shown in the class.

In [6]:
using Distributions

d= Normal(2, sqrt(3))



Normal{Float64}(μ=2.0, σ=1.7320508075688772)

In [8]:
using Random

mat = rand(d, 1000, 2)  # 1000x2

@show mean(mat)
@show var(mat)

mean(mat) = 1.965122969076173
var(mat) = 2.968564501198857


2.968564501198857

## Draw Regression Data: Cross-Sectional Model

Suppose you write your own routine to do fancy estimation on cross-sectional and panel data models. You want to conduct a Monte Carlo analysis to see if the routine works as expected and the returned answer is correct. The first thing you have to do is to generate data with pre-specified parameter values. (So that you could apply your estimation routine to the data and see if the estimated parameter values match the pre-specified (*true*) values.) 

Let's start from the cross-sectional model. The model is:
\begin{aligned} 
  y_i & = \alpha + \beta' x_i + \epsilon_i,\qquad i=1,\ldots,N,\\
  \epsilon_i & \sim N(0, \sigma^2).
\end{aligned}   

There could be more than one $x_i$ variable in the model; let's denote the number of $x_i$ as $\textrm{nofX}$. Write a function to generate data $\{y_i, x_i\}$.The function should allow users to choose values of $\{\alpha, \beta, \sigma^2, \textrm{nofX}, N\}$.
  - Hint: The $x_i$s are assumed (in econometrics) to be fixed and exogenous and therefore the distribution from which they are generated is inconsequential. （前面那句看不懂意思沒關係，重要是下面這句：）You may assume that they are generated from normal distributions.

$ \text{In matrix form, it should be } Y=X\beta+ \epsilon$  

Y is N x 1 matrix. N is the number of obs.  
X is N x (nofX+1) matrix. nofX is the number of independent variables. We plus 1 to include the intercept.  
$\beta$ is (nofX+1) x 1 matrix.  
$\epsilon$ is N x 1 matrix.
 

In [25]:
using Random, Distributions

function Regression_Data(; alpha::Real, beta::Vector, sigma_squared::Real, nofX::Int, N::Int)
    
    sigma=sqrt(sigma_squared)
    X=randn(N,nofX)  # X is a N by nofX matrix
    errors=rand(Normal(0, sigma), N)
    
    Y= alpha .+ X*beta .+ errors
    
    return Y, X
end

Regression_Data (generic function with 1 method)

In [26]:
Y, X=Regression_Data(alpha=1,beta=[2,3,4], sigma_squared=1, nofX=3, N=100)


([6.953153023183896, -3.6070280198786517, -5.272297467650319, 1.3155458901269552, -2.2308705616729787, 3.406181151450773, -2.917292531825376, 0.3663025189244533, 2.755629773939812, 0.16423261073184414  …  -3.530772274732671, -8.723874463124904, 1.8812353695133266, 9.397518233956394, -2.242611329627276, 2.5388480412233703, 0.31266930626090617, 7.553207859959147, -5.593196304860311, 1.7875849291584642], [-0.18706368588768155 1.5070202282526264 0.037660303141450205; -1.3259652642501223 -0.2389057665035784 -0.27724441542203515; … ; -0.5116699658900369 -1.3549358359671198 -0.31908842821495287; 0.6380873241077928 -0.8919591420699102 0.4656869218881949])

## Draw Regression Data: Panel Model

Suppose you also want to generate panel data to test your routine. The model is
\begin{aligned}
    y_{it} & = \alpha_i + \beta x_{it} + \epsilon_{it},\qquad i=1,\ldots,N,\ t=1,\ldots,T,\\
   \epsilon_{it} & \sim N(0, \sigma^2).
\end{aligned}   

Here, $i$ is the individual index and $t$ is the time index. For instance, $w_{13}$ means the value of $w$ for the 1st individual at 3rd time period. Assume the above model is the random-effect (RE) panel data model where $\alpha_i \sim N(0,\sigma_a^2)$ is a random variable which is independently distributed from $x_{it}$. Write a function to generate data of $\{y_{it}, x_{it}\}$ with the options of $\{\beta, \sigma^2, \sigma_a^2, \textrm{nofX}, N, T\}$. 

  - Hint: Draw $\alpha_i$ and expand it (`repeat()`) to fill the time periods. Generate $x_{it}$ and $\epsilon_{it}$. Then combine these elements according to the equation to create $y_{it}$.
  - Hint: You don't really need to understand what is RE model to generate the data. Just follow the notation and it should be ok.
  - Hint: `repeat()` would be useful here.
  - Hint: The structure of the dataset should look like the following. Note that $\alpha_i$ is constant within a given $i$ but would change acorss different $i$'s. 


|	i	|	t	|	y_it	|	alpha_i	|	x_it	|
| ---	| ---	|	--- 	| ---		|	---     |
|	1	|	1	|	0.173 	|	0.12	|	0.183 	|
|	1	|	2	|	0.372 	|	0.12	|	0.804 	|
|	1	|	3	|	0.239 	|	0.12	|	0.072 	|
|	1	|	4	|	0.791 	|	0.12	|	0.272 	|
|	2	|	1	|	0.443 	|	-0.45	|	0.705 	|
|	2	|	2	|	0.825 	|	-0.45	|	0.619 	|
|	2	|	3	|	0.681 	|	-0.45	|	0.769 	|
|	2	|	4	|	0.694 	|	-0.45	|	0.575 	|
|	3	|	1	|	0.192 	|	1.29	|	0.067 	|
|	3	|	2	|	0.072 	|	1.29	|	0.553 	|
|	3	|	3	|	0.522 	|	1.29	|	0.280 	|
|	3	|	4	|	0.021 	|	1.29	|	0.306 	|







In [100]:
alpha=[1,2,3]
repeat(alpha,3)

9-element Vector{Int64}:
 1
 2
 3
 1
 2
 3
 1
 2
 3

In [101]:
repeat(alpha, inner=3)

9-element Vector{Int64}:
 1
 1
 1
 2
 2
 2
 3
 3
 3

In [102]:
repeat(alpha, outer=[1,3]) #outer=[row, col] row出現1次 col出現3次

3×3 Matrix{Int64}:
 1  1  1
 2  2  2
 3  3  3

In [108]:
repeat(alpha, outer=[2,3]) #row出現2次 col出現3次

6×3 Matrix{Int64}:
 1  1  1
 2  2  2
 3  3  3
 1  1  1
 2  2  2
 3  3  3

In [22]:
using Random, Distributions



function Panel_Data(; beta::Vector, sigma_squared::Real, sigma_squared_alpha::Real, nofX::Int, N::Int, T::Int)
    sigma=sqrt(sigma_squared)
    sigma_alpha=sqrt(sigma_squared_alpha)

    alpha=rand(Normal(0, sigma_alpha),N)  #or alpha= randn(N) .* sigma_alpha
    
    alpha_tiled = repeat(alpha,inner=T) 
    
    X = randn(N*T, nofX)  # NT x nofX matrix
    
    errors = randn(N*T) .* sigma
    
    
    Y= alpha_tiled .+ X*beta .+errors
    
    
    return Y, X
    
end



Panel_Data (generic function with 1 method)

In [23]:
beta=[1,2]
Panel_Data(beta=[1,2], sigma_squared=1.0, sigma_squared_alpha=0.5, nofX=length(beta), N=3, T=4)

([-3.142348803733829, -2.8792339189708467, -4.17257687596944, 3.90615371638058, -3.2085122891928686, 1.9065822767642786, -1.0016124906171315, 4.635482372071168, 3.8050669242985204, 3.3187819275175863, 1.6358431544614145, 3.059907343135486], [-0.6513083078833882 -0.696970340868155; 0.25420445585623486 -1.2025644570738383; … ; 0.11970663933230476 0.8683073105941759; 0.2076172841891845 0.8579112719186806])