Here, I exploit three methods to estimate $\beta$ in the following fixed effects model
$$
y_{it} = \beta x_{it} + c_i + u_{it}\quad i = 1, \cdots, N; t = 1, \cdots, T,
$$
where $c_i$'s are omitted effects with
$$
E(c_i|X_i) = h(X_i) \equiv \alpha_i.
$$

First is to use the least squares dummy variable model
$$
y_{it} = \beta x_{it} + \alpha_i + \varepsilon_{it},
$$
where
$$
\varepsilon_{it} = u_{it} + v_i \equiv u_{it} + (c_i - \alpha_i).
$$
In matrix form, it can be written as
$$
y = Z \begin{pmatrix} \beta\\ \alpha \end{pmatrix} + \varepsilon \equiv
\begin{pmatrix} X & D \end{pmatrix}\begin{pmatrix} \beta\\ \alpha \end{pmatrix} + \varepsilon,
$$
where
$$
D = I_N \otimes 1_T = \begin{pmatrix} 1_T & 0 & \cdots & 0\\
0 & 1_T & \cdots & 0\\
\vdots & \vdots & \ddots & \vdots\\
0 & 0 & \cdots & 1_T \end{pmatrix}.
$$
So, the estimator of $\beta$, $\hat{\beta}^{dummy}$, is the first element of
$$
(Z'Z)^{-1}Z'y.
$$

Second is to use the entity-demeaned approach, which is essentially equivalent to dummy variable approach and is given by
$$
(y_{it} - \bar{y}_{i\cdot}) = \beta (x_{it} - \bar{x}_{i\cdot}) + (u_{it} - \bar{u}_{i\cdot}),
$$
or in matrix form,
$$
(My) = (MX) \beta + (Mu),
$$
where
$$
M = I_{N\times T} - D(D'D)^{-1}D'.
$$
So, the estimator of $\beta$ is
$$
\hat{\beta}^{demeaned} = (X'MX)^{-1}(X'My).
$$

Third is to regress the following first difference equation
$$
(y_{it} - y_{i,t-1}) = \beta (x_{it} - x_{i,t-1}) + (u_{it} - u_{i,t-1}),\quad i = 1, \cdots, N; t = 2, \cdots, T,
$$
or in matrix form,
$$
\Delta y = \Delta X\beta + \Delta u.
$$
So, the estimator of $\beta$ is
$$
\hat{\beta}^{diff} = (\Delta X'\Delta X)^{-1}(\Delta X'\Delta y).
$$

It can be shown that
$$
\underset{N\to\infty}{\rm{plim}}\hat{\beta}^{dummy} =
\underset{N\to\infty}{\rm{plim}}\hat{\beta}^{demeaned} =
\underset{N\to\infty}{\rm{plim}}\hat{\beta}^{diff} = \beta.
$$
Furthermore, it is plausible that
$$
{\rm{Var}}\hat{\beta}^{dummy} = {\rm{Var}}\hat{\beta}^{demeaned} \le {\rm{Var}}\hat{\beta}^{diff},
$$
where the equality is because the dummy variable approach is essentially equivalent to the entity-demeaned approach (due to the Frisch-Waugh-Lovell Theorem)
and the inequality is because some information may be lost when the first difference is taken.

To examine this, I will do the following Monte Carlo study.

First, I generate a standard normal distribution sample of size $N\times T + N\times T + N$, denoted by $(r_1, \cdots, r_{N\times T + N\times T + N})$.

Next, I let
$$
\begin{cases}
x_{11} = 2 r_1,\\
\vdots\\
x_{1T} = 2 r_T,\\
\vdots\\
x_{N1} = 2 r_{(N-1)\times T+1},\\
\vdots\\
x_{NT} = 2 r_{N\times T},\\
u_{11} = r_{N\times T + 1},\\
\vdots\\
u_{1T} = r_{N\times T + T},\\
\vdots\\
u_{N1} = r_{N\times T + (N-1)\times T+1},\\
\vdots\\
u_{NT} = r_{N\times T + N\times T},\\
v_{1} = r_{N\times T + N\times T + 1},\\
\vdots\\
v_{N} = r_{N\times T + N\times T + N},
\end{cases}
$$
so that
$$
\begin{cases}
x_{11}, \cdots, x_{1T}, \cdots, x_{N1}, \cdots, x_{NT}\overset{iid}{\sim}\mathscr{N}(0,4),\\
u_{11}, \cdots, u_{1T}, \cdots, u_{N1}, \cdots, u_{NT}\overset{iid}{\sim}\mathscr{N}(0,1),\\
v_{1}, \cdots, v_{N}\overset{iid}{\sim}\mathscr{N}(0,1),\\
X, U, V\text{ are mutually independent.}
\end{cases}
$$

For simplicity, I let $h(X_i) = \bar{x}_{i\cdot}$ and $\beta = 3$,
and thus
$$
\begin{cases}
\alpha = \begin{pmatrix} \bar{x}_{1\cdot}\\\vdots\\\bar{x}_{N\cdot}\end{pmatrix},\\
c = \alpha + V,\\
y = X\beta + Dc + U.
\end{cases}
$$

The Python code and the correponding result are given as follows.

In [1]:
import numpy as np
import pandas as pd
random = np.random.RandomState(2202566)

In [2]:
def compare(N, T=5, beta=3, n=200):
    betas_dummy = []
    betas_demeaned = []
    betas_diff = []
    for I in range(n):
        R = random.normal(size=N*T+N*T+N)
        X = 2 * R[:N*T]
        U = R[N*T:2*N*T]
        V = R[2*N*T:]
        D = np.kron(np.eye(N), np.ones(T)).T
        C = D.T.dot(X) / T + V
        Y = D.dot(C) + beta * X + U
        # dummy variable approach
        Z = np.vstack([X, D.T]).T
        betas_dummy.append(np.linalg.inv(Z.T.dot(Z)).dot(Z.T).dot(Y)[0])
        # entity-demeaned approach
        M = np.eye(N*T) - D.dot(D.T) / T
        betas_demeaned.append(X.T.dot(M).dot(Y) / X.T.dot(M).dot(X))
        # difference approach
        X2 = X[[bool(J%5) for J in range(N*T)]]
        X1 = X[[bool((J-4)%5) for J in range(N*T)]]
        dX = X2 - X1
        Y2 = Y[[bool(J%5) for J in range(N*T)]]
        Y1 = Y[[bool((J-4)%5) for J in range(N*T)]]
        dY = Y2 - Y1
        betas_diff.append(dX.T.dot(dY) / dX.T.dot(dX))
    mean = []
    se = []
    bias = []
    rmse = []
    types = ['dummy', 'demeaned', 'diff']
    bias_dummy = np.array(betas_dummy) - beta
    bias_demeaned = np.array(betas_demeaned) - beta
    bias_diff = np.array(betas_diff) - beta
    for regtype in types:
        mean.append(np.array(eval('betas_' + regtype)).mean())
        se.append(np.array(eval('betas_' + regtype)).std(ddof=1) / np.sqrt(n))
        bias.append(np.median(eval('bias_' + regtype)))
        rmse.append(np.sqrt((eval('bias_' + regtype) ** 2).sum() / n))
    result = pd.DataFrame({'mean': mean, 'se': se,
                           'bias_median': bias, 'rmse': rmse}, index=types)
    print('N = {0}, T = {1}, beta = {2}, n = {3}'.format(N, T, beta, n))
    display(result)

In [3]:
compare(10)

N = 10, T = 5, beta = 3, n = 200


Unnamed: 0,mean,se,bias_median,rmse
dummy,3.003532,0.005174,0.001966,0.073072
demeaned,3.003532,0.005174,0.001966,0.073072
diff,3.010612,0.006218,0.008483,0.088354


In [4]:
compare(50)

N = 50, T = 5, beta = 3, n = 200


Unnamed: 0,mean,se,bias_median,rmse
dummy,2.998543,0.002268,-0.002575,0.032024
demeaned,2.998543,0.002268,-0.002575,0.032024
diff,3.000123,0.002542,-0.000586,0.035862


In [5]:
compare(100)

N = 100, T = 5, beta = 3, n = 200


Unnamed: 0,mean,se,bias_median,rmse
dummy,3.000071,0.001827,0.00292,0.025769
demeaned,3.000071,0.001827,0.00292,0.025769
diff,2.999335,0.001972,0.002338,0.027821


In [6]:
compare(500)

N = 500, T = 5, beta = 3, n = 200


Unnamed: 0,mean,se,bias_median,rmse
dummy,3.002037,0.000856,0.001905,0.012252
demeaned,3.002037,0.000856,0.001905,0.012252
diff,3.001232,0.001018,0.001507,0.014409


In [7]:
compare(1000)

N = 1000, T = 5, beta = 3, n = 200


Unnamed: 0,mean,se,bias_median,rmse
dummy,2.999709,0.000564,-0.000263,0.007963
demeaned,2.999709,0.000564,-0.000263,0.007963
diff,3.000644,0.000662,0.000628,0.009359


Here, I have run the simulation for $n=200$ times. Each time I used the three methods to calculate the estimates of $\beta$.

For each method, I used $\hat{\beta}_1, \cdots, \hat{\beta}_n$ to denote the simulated estimates.

To evaluate the three methods, I reported four statistics, which are defined as:

mean of estimates (mean),
$$
\bar{\hat{\beta}} = \frac1n\sum_{i=1}^n\hat{\beta}_i,
$$
standard error of estimates (se),
$$
se\left(\hat{\beta}\right) = \sqrt{\frac1{n(n-1)}\sum_{i=1}^n\left(\hat{\beta}_i-\bar{\hat{\beta}}\right)^2},
$$
median of the differences between estimate and estimand (bias_median),
$$
Median\left(\hat{\beta}_i-\beta\right),
$$
and root mean square error of estimates (rmse),
$$
RMSE\left(\hat{\beta}\right) = \sqrt{\frac1n\sum_{i=1}^n\left(\hat{\beta}_i-\beta\right)^2}.
$$

Apparently, all three methods converge to the true value of $\beta$, and the standard error and RMSE results indicate that the dummy variable approach as well as the entity-demeaned approach are more efficient than the difference approach, as expected above.