## Reading a LaTeX Table into a Pandas DataFrame

This code demonstrates how to read a table stored in LaTeX format and convert it into a Pandas DataFrame. The code uses Python and relies on the Pandas library for data manipulation and the Astropy library to facilitate the conversion from LaTeX.

This code is helpful when you have tabular data in LaTeX format and want to analyze or manipulate it using Pandas, a popular data analysis library in Python.


In [20]:
import pandas as pd
from astropy.table import Table
tab = Table.read('table.tex').to_pandas()
tab

Unnamed: 0,Edge Number M1,Edge Number M2,Mean Degree M1,Mean Degree M2,Median Degree M1,Median Degree M2,Avg CC M1,Avg CC M2,Avg PL M1,Avg PL M2
0,4183,4448,9.69,10.31,9.0,10.0,0.46,0.44,3.44,3.36
1,4366,4428,10.15,10.3,9.0,9.0,0.45,0.44,3.38,3.36
2,4255,4517,9.82,10.42,9.0,9.0,0.46,0.44,3.43,3.36
3,4014,4370,9.48,10.32,8.0,9.0,0.46,0.44,3.46,3.35
4,4281,4714,9.82,10.81,9.0,10.0,0.46,0.44,3.43,3.3
5,4473,4606,10.12,10.42,9.0,9.5,0.46,0.45,3.41,3.35
6,4341,4520,10.01,10.43,9.0,9.0,0.46,0.44,3.4,3.34
7,4180,4572,9.58,10.47,9.0,10.0,0.46,0.44,3.48,3.34
8,4308,4525,9.9,10.4,9.0,10.0,0.46,0.44,3.42,3.35
9,4295,4416,10.0,10.28,9.0,9.0,0.46,0.43,3.39,3.34


### Creating Summary Statistics
The describe function offers a fundamental method for generating summary statistics for all the variables within your dataframe.

In [21]:
tab.describe()

Unnamed: 0,Edge Number M1,Edge Number M2,Mean Degree M1,Mean Degree M2,Median Degree M1,Median Degree M2,Avg CC M1,Avg CC M2,Avg PL M1,Avg PL M2
count,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0
mean,4269.6,4511.6,9.857,10.416,8.9,9.45,0.459,0.44,3.424,3.345
std,124.313403,102.213502,0.223659,0.15313,0.316228,0.497214,0.003162,0.004714,0.030984,0.017795
min,4014.0,4370.0,9.48,10.28,8.0,9.0,0.45,0.43,3.38,3.3
25%,4201.0,4433.0,9.7225,10.3125,9.0,9.0,0.46,0.44,3.4025,3.34
50%,4288.0,4518.5,9.86,10.41,9.0,9.25,0.46,0.44,3.425,3.35
75%,4332.75,4560.25,10.0075,10.4275,9.0,10.0,0.46,0.44,3.4375,3.3575
max,4473.0,4714.0,10.15,10.81,9.0,10.0,0.46,0.45,3.48,3.36


### Let's Transpose the Output - Ideal for a Standard Scientific Data Presentation


In [27]:
tab.describe(percentiles=[0.5]).T

#ALTERNATIVE WAY OF WRITING ABOVE
#np.round(tab.describe(percentiles=[0.5]), 2).transpose() # np.round() -- CHANGE TO TWO DECIMALS

Unnamed: 0,count,mean,std,min,50%,max
Edge Number M1,10.0,4269.6,124.313403,4014.0,4288.0,4473.0
Edge Number M2,10.0,4511.6,102.213502,4370.0,4518.5,4714.0
Mean Degree M1,10.0,9.857,0.223659,9.48,9.86,10.15
Mean Degree M2,10.0,10.416,0.15313,10.28,10.41,10.81
Median Degree M1,10.0,8.9,0.316228,8.0,9.0,9.0
Median Degree M2,10.0,9.45,0.497214,9.0,9.25,10.0
Avg CC M1,10.0,0.459,0.003162,0.45,0.46,0.46
Avg CC M2,10.0,0.44,0.004714,0.43,0.44,0.45
Avg PL M1,10.0,3.424,0.030984,3.38,3.425,3.48
Avg PL M2,10.0,3.345,0.017795,3.3,3.35,3.36


We won't typically want the percentile columns in a scientific publication. We can select only those columns we want, then output to CSV.

In [30]:
np.round(tab.describe(percentiles=[0.5]), 2).T[['mean', 'std', 'min', '50%', 'max']]


Unnamed: 0,mean,std,min,50%,max
Edge Number M1,4269.6,124.31,4014.0,4288.0,4473.0
Edge Number M2,4511.6,102.21,4370.0,4518.5,4714.0
Mean Degree M1,9.86,0.22,9.48,9.86,10.15
Mean Degree M2,10.42,0.15,10.28,10.41,10.81
Median Degree M1,8.9,0.32,8.0,9.0,9.0
Median Degree M2,9.45,0.5,9.0,9.25,10.0
Avg CC M1,0.46,0.0,0.45,0.46,0.46
Avg CC M2,0.44,0.0,0.43,0.44,0.45
Avg PL M1,3.42,0.03,3.38,3.42,3.48
Avg PL M2,3.34,0.02,3.3,3.35,3.36


### Exporting to LaTeX

In scientigic fields, LaTeX is often preferred over Word for document formatting. Pandas provides robust LaTeX capabilities. For instance, the first of the following two cells of code demonstrates how to export to a *.tex file, while the second displays the LaTeX code's structure.

In [40]:
tab_stats = np.round(tab.describe(percentiles=[0.5]), 2).T[['mean', 'std', 'min', '50%', 'max']]
tab_stats.to_latex('summary_stats.tex')


  tab_stats.to_latex('summary_stats.tex')


In [35]:
from tabulate import tabulate

statistics = np.round(tab.describe(percentiles=[0.5]), 2).T[['mean', 'std', 'min', '50%', 'max']]
# Format statistics table as LaTeX
statistics_latex = tabulate(statistics, headers='keys', tablefmt='latex')

# Print the LaTeX table
print(statistics_latex)

\begin{tabular}{lrrrrr}
\hline
                   &    mean &    std &     min &     50\% &     max \\
\hline
 Edge Number M1    & 4269.6  & 124.31 & 4014    & 4288    & 4473    \\
 Edge Number M2    & 4511.6  & 102.21 & 4370    & 4518.5  & 4714    \\
 Mean  Degree M1   &    9.86 &   0.22 &    9.48 &    9.86 &   10.15 \\
 Mean Degree M2    &   10.42 &   0.15 &   10.28 &   10.41 &   10.81 \\
 Median Degree M1  &    8.9  &   0.32 &    8    &    9    &    9    \\
 Median  Degree M2 &    9.45 &   0.5  &    9    &    9.25 &   10    \\
 Avg CC M1         &    0.46 &   0    &    0.45 &    0.46 &    0.46 \\
 Avg CC M2         &    0.44 &   0    &    0.43 &    0.44 &    0.45 \\
 Avg PL M1         &    3.42 &   0.03 &    3.38 &    3.42 &    3.48 \\
 Avg PL M2         &    3.34 &   0.02 &    3.3  &    3.35 &    3.36 \\
\hline
\end{tabular}
