## Data collection methodology

The information we collected is a by-product of a larger systematic review we conducted related to graph layout algorithms, which included 206 papers---the core of them being the last 7 years of Graph Drawing proceedings, filtering out the papers with no computational evaluations.
We further expanded our collection of graph layout algorithm papers through IEEE search, and the ACM and Wiley digital libraries, to include papers from TVCG and CGF.
For every paper, we collected which features were handled by the graph layout algorithm presented, and what dataset was used in the evaluation. 
In order to find the datasets, we first tried looking for the paper's supplemental material, then tried Googling the name of the dataset or the name of the paper, and if we had not been able to find anything yet, we resorted to emailing the authors. 
When in doubt about the artifact replication policy, we emailed the owners or authors to ask about it. 
In cases where it was explicitly mentioned that approval should be received prior to redistribution, we did not redistribute the datasets. However, when we did not receive an answer or where it was not explicitly stated, we did collect and store the dataset on our own storage solution to preserve it. 
If any owner or author of a dataset discovers their own work linked in our collection and disagrees with this choice, we kindly request them to contact us, and we will promptly remove the entry. Furthermore, we want to emphasize that we do not assert any ownership rights over the datasets listed.

- talk about how the graph feature labels were collected
- facilitating comparisons and replication of results <- justification for what we did
- linking to specific papers that use that dataset - not seen in other network repositories!  


In [None]:
#| output: false
#| echo: false
# show literature data column names
literature_data.columns

In [None]:
#| echo: false
literature_data['year'] = pd.to_numeric(literature_data['year'], errors='coerce')
literature_data.dropna(subset=['year'], inplace=True)
literature_data['year'] = literature_data['year'].astype(int)

# Count the number of papers per year
year_counts = literature_data['year'].value_counts().sort_index()

# Ensure all years are represented, even if count is 0
all_years = range(year_counts.index.min(), year_counts.index.max() + 1)
year_counts = year_counts.reindex(all_years, fill_value=0)

# Create a horizontal bar chart
plt.figure(figsize=(8, 8))
bars = plt.barh(year_counts.index, year_counts.values, color='skyblue')
plt.xlabel('Number of Papers', color='gray')
# plt.ylabel('Year')
plt.title('Number of Papers Collected Per Year of Publication', color='gray')

# Adding the text labels on the bars
for bar in bars:
  if bar.get_width() > 0:
    plt.text(bar.get_width() + 0.1, bar.get_y() + bar.get_height()/2, 
             str(int(bar.get_width())), va='center', color='gray', fontsize=8)
             
plt.gca().xaxis.set_major_locator(MaxNLocator(integer=True))
# Set the color of x-axis and y-axis to gray
# plt.gca().spines['bottom'].set_color('gray')
plt.gca().spines['left'].set_color('lightgray')
plt.gca().tick_params(axis='y', colors='gray')

# Turn off the top and right spines
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.gca().spines['bottom'].set_visible(False)
plt.gca().xaxis.set_visible(False)

In [None]:
#| echo: False
literature_data['Conference'] = literature_data['Conference'].replace('Computer Graphics Forum', 'CGF')
literature_data['Conference'] = literature_data['Conference'].replace('Information Visualization', 'IV')
literature_data['Conference'] = literature_data['Conference'].replace('Graphics Interface', 'GI')

conference_counts = literature_data['Conference'].value_counts()
conference_counts = conference_counts[conference_counts > 1]

# Sort the conferences so that higher values are at the top
conference_counts = conference_counts.sort_values(ascending=True)

# Create a horizontal bar chart with a categorical colorscheme and adjusted figure size
plt.figure(figsize=(6, 8))  # Adjusted figure size
bars = plt.barh(conference_counts.index, conference_counts.values, color=sns.color_palette("pastel", len(conference_counts)))

# Set title with gray text color
plt.title('Number of Papers Collected Per Conference (More than 1 Entry)', color='gray')

# Adding the text labels on the bars with gray text color and smaller font size
for bar in bars:
    plt.text(bar.get_width() + 0.5, bar.get_y() + bar.get_height()/2, 
             str(int(bar.get_width())), va='center', color='gray', fontsize=10)

# Turn off the x-axis, right and top spines, set left spine color to gray
plt.gca().xaxis.set_visible(False)
plt.gca().spines['left'].set_color('lightgray')
plt.gca().spines['right'].set_visible(False)
plt.gca().spines['bottom'].set_visible(False)
plt.gca().spines['top'].set_visible(False)

# Set the color of y-axis ticks to gray
plt.gca().tick_params(axis='y', colors='gray')

In [None]:
#| echo: False
# count the total Papers
total_papers = literature_data['Conference'].count()
print("Total papers: ", total_papers)