This is just to test the basic functions of the Julia language. I will start a project where I practice basic linear algebra and then doing maximization techniques and doing some basic econometrics. I will practice a little bit with the dataset. The idea is the following: 
# OLS from Scratch in Julia

## Problem Setup

We want to simulate data from the following linear model:

\[
y_i = 1 + 0.5x_{1i} - 0.3x_{2i} + \varepsilon_i, \quad i = 1, \dots, 100
\]

where:

- \(x_{1i}, x_{2i} \sim N(0,1)\) (independent standard normal draws)  
- \(\varepsilon_i \sim N(0, 0.2^2)\) (random noise)  

## Tasks

1. **Simulate data**  
   - Generate \(x_{1i}\) and \(x_{2i}\).  
   - Construct the design matrix \(X\) with a column of ones (for the intercept) and the two regressors.  
   - Generate \(y\) using the true model.  

2. **Estimate parameters manually**  
   - Compute the OLS estimator:  
     \[
     \hat{\beta} = (X'X)^{-1} X'y
     \]  

3. **Compare with truth**  
   - Compare your estimates \(\hat{\beta}\) with the true coefficients \([1, 0.5, -0.3]\).  

4. **Visualization**  
   - Plot predicted values \(\hat{y}\) against the actual values \(y\).  
   - Add a 45° line for reference.  

---


In [9]:
#First we simply generate the random variables that we will need for our study. We also generate the true dgp. 

n,k= 100, 3 ;  #define the sample size and the number of parameters that we have. 

X=[ones(n) rand(n,k-1)] ; #define the main variables. But this should be a matrix n*k
epsilon= rand(n)*0.2 ; #n*1 vector

true_beta=[1, 0.5, 0.3] ; #k*1 vector

y_i=  X*true_beta+ epsilon ; #true dgp n*1



In [10]:
#Now we generate the OLS coefficients in a simple and elegant way. 



beta_hat= inv(X'X)*X'y_i

3-element Vector{Float64}:
 1.0915610303674592
 0.4943292001115156
 0.3277593780762066

In [11]:
using DataFrames, GLM, Plots
df = DataFrame(X1 = X[:,2], X2 = X[:,3], y = y_i)
ols_model = lm(@formula(y ~ X1 + X2), df)

[33m[1m│ [22m[39mThis may mean DataStructures [864edb3b-99cc-5e75-8d2d-829cb0a9cfe8] does not support precompilation but is imported by a module that does.
[33m[1m└ [22m[39m[90m@ Base loading.jl:2541[39m
[33m[1m│ [22m[39mThis may mean DataStructures [864edb3b-99cc-5e75-8d2d-829cb0a9cfe8] does not support precompilation but is imported by a module that does.
[33m[1m└ [22m[39m[90m@ Base loading.jl:2541[39m
[33m[1m│ [22m[39mThis may mean DataStructures [864edb3b-99cc-5e75-8d2d-829cb0a9cfe8] does not support precompilation but is imported by a module that does.
[33m[1m└ [22m[39m[90m@ Base loading.jl:2541[39m
[33m[1m│ [22m[39mThis may mean DataStructures [864edb3b-99cc-5e75-8d2d-829cb0a9cfe8] does not support precompilation but is imported by a module that does.
[33m[1m└ [22m[39m[90m@ Base loading.jl:2541[39m
[33m[1m│ [22m[39mThis may mean StatsBase [2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91] does not support precompilation but is imported by a module 

StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

y ~ 1 + X1 + X2

Coefficients:
────────────────────────────────────────────────────────────────────────
                Coef.  Std. Error      t  Pr(>|t|)  Lower 95%  Upper 95%
────────────────────────────────────────────────────────────────────────
(Intercept)  1.09156    0.0154326  70.73    <1e-84   1.06093    1.12219
X1           0.494329   0.0208574  23.70    <1e-41   0.452933   0.535725
X2           0.327759   0.0193155  16.97    <1e-30   0.289423   0.366095
────────────────────────────────────────────────────────────────────────

In [12]:
#now suppose we want to compute the standard errors of this simple model. 
using LinearAlgebra

epsilon_hat= y_i - X*beta_hat
sigma_sqrt= (epsilon_hat'*epsilon_hat)/(n-k)
as_covar= sigma_sqrt*inv(X'X) 
se= sqrt.(diag(as_covar))

#assuming heteroskedasticity

3-element Vector{Float64}:
 0.015432617116023863
 0.020857415406009237
 0.019315514672206074

Here are the notes from the Julia class volume 2. This is using the Gauss Quadrature package.

In [None]:
using FastGaussQuadrature

n=2 

z,w= gausslegendre(n)
f=0.5*exp.((0.5)*(1.0.+z)) ##
sum(w.*f)

##we approximate quite well the function. 

1.7182810043725218

In [3]:
exp(1)-1

1.718281828459045

In [None]:
##Add a package for plots 
using FastGaussQuadrature 

#change of variables!! 
n=20
p,w=gausslegendre(n)


D(p)= (10-(p/2))*(p<=20) ;
S(p)=sum((D.(0.5*p*(1.0.+z))*p/2).*w)

using PyPlot
P=range(0.30, 100)
Plot(P,S.(P))

In [3]:
##Example 3
using FastGaussQuadrature 

n=11
z,w=gausshermite(n)
μ=0.1
σ=0.5

[sum(w.*exp.(sqrt(2)*σ*z.+μ)/sqrt(n))-exp(μ+σ^2/2)]


1-element Vector{Float64}:
 -0.5830627422130561

*Montecarlo approximations* 

More flexible and allows to approximate less smooth functions...(more intense computationally though...)


In [2]:
N=100
x=rand(N)

[sum((exp.(x)+exp.(1.0.-x)))/(2*N), 
    exp(1)-1, 
    sum((exp.(x)+exp.(1.0.-x)))/(2*N)-(exp(1)-1) ]

##We need a lot of points to actually approximate the function we need. This is super inefficient and takes a lot of time. This is done when there is no alternative (i.e compute the posterior distribution in Bayesian econometrics). 

3-element Vector{Float64}:
  1.7177801991658965
  1.718281828459045
 -0.0005016292931485644

Try quasimontecarlo with the previous case inefficiency. Pending

*4 _Solving Nonlinear equations_*


**Fixed Point** Problems of divergence $\rightarrow$ impose restrictions (check theorem) 

In [None]:
##Code by myself the iteration!

In [11]:
f(x)= exp((x-2)^2)-1
x0=0.95

crit=1
epsilon=1e-6
lambda=0.95

while crit>epsilon
    x1=f(x0)
    crit= abs((x1-x0/x0))
    x0=lambda*x0+ (1-lambda)*x1
    println([x0 crit])
end 

#it does not converge


[1.0030842917426683 1.011685834853365]
[1.03800963669833 0.7015911908559018]
[1.0622562268835287 0.5229414404023056]
[1.0796116844181414 0.4093653775757815]
[1.0922763944468838 0.3329058849929889]
[1.1016382612444189 0.2795137303975861]
[1.1086210970439427 0.2412949772348969]
[1.1138629809383254 0.21345877493159726]
[1.1178163612075809 0.1929305863234383]
[1.120808223766382 0.177653612383597]
[1.1230782132150379 0.16620801273950025]
[1.124803798464021 0.1575899181947018]
[1.1261174310322637 0.1510764498288757]
[1.1271185449527412 0.1461397094418131]
[1.1278821183782406 0.1423900134627294]
[1.1284648783944944 0.13953731870331954]
[1.128909853082485 0.13736437215430852]
[1.1292497429607844 0.1357076506484729]
[1.129509436486906 0.13444361348321587]
[1.129707897653282 0.1334786598144233]
[1.1298595886553973 0.13274171769559207]
[1.129975545784729 0.13217873124203106]
[1.1300641951945007 0.1317485339801654]
[1.1301319726694994 0.13141974469447248]
[1.1301837952129812 0.13116842353913194]
[

Excessive output truncated after 524297 bytes.

]
[1.1303521899850584 0.13035218998506082]
[1.1303521899850584 0.13035218998506082]
[1.1303521899850584 0.13035218998506082]
[1.1303521899850584 0.13035218998506082]
[1.1303521899850584 0.13035218998506082]
[1.1303521899850584 0.13035218998506082]
[1.1303521899850584 0.13035218998506082]
[1.1303521899850584 0.13035218998506082]
[1.1303521899850584 0.13035218998506082]
[1.1303521899850584 0.13035218998506082]
[1.1303521899850584 0.13035218998506082]
[1.1303521899850584 0.13035218998506082]
[1.1303521899850584 0.13035218998506082]
[1.1303521899850584 0.13035218998506082]
[1.1303521899850584 0.13035218998506082]
[1.1303521899850584 0.13035218998506082]
[1.1303521899850584 0.13035218998506082]
[1.1303521899850584 0.13035218998506082]
[1.1303521899850584 0.13035218998506082]
[1.1303521899850584 0.13035218998506082]
[1.1303521899850584 0.13035218998506082]
[1.1303521899850584 0.13035218998506082]
[1.1303521899850584 0.13035218998506082]
[1.1303521899850584 0.13035218998506082]
[1.13035218998

InterruptException: InterruptException:

In [9]:
using PyPlot
x=range(0,4,100)
plot(x, f.(x))
plot(x,x)

1-element Vector{PyCall.PyObject}:
 PyObject <matplotlib.lines.Line2D object at 0x153a5bbc0>