# Documentation Tooling Evaluation

In the  [Hexatomic research project](https://hexatomic.github.io), we have evaluated different software documentation tools for suitability of documenting software sustainably (see <https://hexatomic.github.io/documentation/tooling/evaluation.html>).

We present the evaluation data here.

In [None]:
# Prepare the raw data
import pandas as pd

# The evaluation scores on a scale from 1-5
data = [
    [5, 3, 4, 3, 4, 3, 4, 4], # Scores for Sphinx (rST)
    [3, 5, 1, 3, 2, 3, 4, 4], # Scores for Sphinx (CM)
    [3, 4, 1, 4, 3, 3, 3, 3], # Scores for Asciidoctor
    [3, 5, 1, 3, 4, 3, 2, 1], # Scores for mkDocs
    [4, 5, 1, 3, 4, 4, 5, 3], # Scores for mdBook
    [4, 5, 1, 3, 4, 2, 2, 2] # Scores for Jekyll
]

# The indices of the rows = tool name
idx = index=['Sphinx (rST)','Sphinx (CM)','Asciidoctor','mkDocs', 'mdBook', 'Jekyll']

# The index of the evaluation category
cols = columns=['1', '3a', '3b', '3c', '3d', '3e', '3f', '4']

# Create a data frame and print it
df = pd.DataFrame(data, index=idx, columns=cols)
df

In [None]:
# Calculate the mean of the sub-categories in category 3 (usability)
cat_cols = df.loc[:, '3a':'3f']

# Add the means to a new column in the data frame
df['mean(3)'] = cat_cols.mean(axis=1)

# Change the position of the mean column to go before the '4' column
new_order = [0, 1, 2, 3, 4, 5, 6, 8, 7]
df = df[df.columns[new_order]]

# Print the data frame
df

## Evaluation 1

The evaluation categories `1`, `mean(3)` and `4` are weighted equally.

In [None]:
# Create a copy of the original data frame for this evaluation
df1 = df.copy()

# Create a sub-data frame
sum_cols1 = df1[['1', 'mean(3)', '4']]

# Print the sub-data frame
sum_cols1

In [None]:
# Calculate the mean for sum_cols
simple_avg1 = sum_cols1.mean(axis=1)

# Add a 'score' column with the simple averages to the data frame copy
df1['score'] = simple_avg1

# Order the data frame by score
scored1 = df1.sort_values('score', ascending=False)

# Print the data frame
scored1

## Evaluation 2

Ignore the JavaDoc integration category (3b).

In [None]:
# Create a copy of the original data frame for this evaluation
df2 = df.copy()

# Remove the '3b' column
df2.drop('3b', axis=1, inplace=True)

# Re-calculate mean(3)
cat_cols2 = df2.loc[:, '3a':'3f']
df2['mean(3)'] = cat_cols2.mean(axis=1)

# Print df2
df2

In [None]:
# Create a sub-data frame
sum_cols2 = df2[['1', 'mean(3)', '4']]

# Print the sub-data frame
sum_cols2

In [None]:
# Calculate the mean for sum_cols
simple_avg2 = sum_cols2.mean(axis=1)

# Add a 'score' column with the simple averages to the data frame copy
df2['score'] = simple_avg2

# Order the data frame by score
scored2 = df2.sort_values('score', ascending=False)

# Print the data frame
scored2

## Evaluation 3

Ignore the exportability category (4).

In [None]:
# Create a copy of the original data frame for this evaluation
df3 = df.copy()

# Remove the '3b' column
df3.drop(columns=['3b', '4'], axis=1, inplace=True)

# Re-calculate mean(3)
cat_cols3 = df3.loc[:, '3a':'3f']
df3['mean(3)'] = cat_cols3.mean(axis=1)

# Print df3
df3

In [None]:
# Create a sub-data frame
sum_cols3 = df3[['1', 'mean(3)']]

# Print the sub-data frame
sum_cols3

In [None]:
# Calculate the mean for sum_cols
simple_avg3 = sum_cols3.mean(axis=1)

# Add a 'score' column with the simple averages to the data frame copy
df3['score'] = simple_avg3

# Order the data frame by score
scored3 = df3.sort_values('score', ascending=False)

# Print the data frame
scored3

## Averaging evaluation scores

Finally, the scores from all evaluations are averaged to produce a final score.

In [None]:
# Concatenate the score columns from all 3 evaluation data frames in a new data frame
df_avg = pd.concat([df1['score'], df2['score'], df3['score']], axis=1, keys=['score 1', 'score 2', 'score 3'])

# Print the new data frame
df_avg

In [None]:
# Calculate the mean for sum_cols
final_avg = df_avg.mean(axis=1)

# Add a 'avg score' column with the simple averages to the data frame
df_avg['avg score'] = final_avg

# Order the data frame by score
scored_avg = df_avg.sort_values('avg score', ascending=False)

# Print the data frame
scored_avg