bibnets

bibnets is an R package for constructing bibliometric networks from scholarly metadata. It reads common export formats, converts multi-valued fields such as authors, references, keywords, countries, and affiliations into sparse incidence matrices, and returns edge lists for co-authorship, co-citation, bibliographic coupling, keyword co-occurrence, direct citation, historiograph, and custom co-occurrence analyses.

The package is designed to keep the computational core small and inspectable: the only CRAN imports are Matrix, stats, and utils. Optional graph and bibliometric packages are used only when the user explicitly requests their formats or runs equivalence tests.

Main Features

8 dedicated network builders plus one generic builder: author_network(), document_network(), reference_network(), keyword_network(), source_network(), institution_network(), country_network(), historiograph(), and conetwork().
Counting methods for full, fractional, paper-level, strength, and position-aware authorship weighting, including harmonic, arithmetic, geometric, adaptive geometric, golden-ratio, first-author, last-author, first-last, and custom position-weighted schemes.
Attention-style position weights through attention = "lead", "last", "proximity", or "circular" for author, keyword, country, and institution networks.
6 similarity measures: none, association strength, cosine, Jaccard, inclusion, and equivalence.
Readers for scholarly exports: Scopus, Web of Science, OpenAlex nested data, OpenAlex flat CSV, BibTeX, RIS, Lens.org, Dimensions, Crossref, and generic CSV files.
Network reduction and export: backbone(), prune(), filter_top(), to_gephi(), to_graphml(), to_igraph(), to_tbl_graph(), to_matrix(), and to_cograph().
Temporal and historical analysis: temporal_network(), local_citations(), and historiograph().
Standard output: all network builders return a bibnets_network edge list with from, to, weight, and count columns.

Install

Once accepted on CRAN:

install.packages("bibnets")

Development version from GitHub:

# install.packages("remotes")
remotes::install_github("mohsaqr/bibnets")

Quick Start

library(bibnets)

# Read a file or folder. The reader is detected from file content.
data <- read_biblio("export.csv")

# Common networks
authors <- author_network(data, type = "collaboration")
refs    <- reference_network(data, type = "co_citation", min_occur = 2)
docs    <- document_network(data, type = "coupling", similarity = "cosine")
keys    <- keyword_network(data, similarity = "association")

# Inspect the standard edge-list schema
head(authors)
summary(authors)

The edge list separates two quantities:

count: the raw binary co-occurrence count for the pair.
weight: the analysis weight after counting and optional similarity normalization.

With similarity = "none" and counting = "full", weight and count are usually the same. Once fractional counting or similarity normalization is used, they intentionally diverge.

Reading Data

Use read_biblio() when you have a file, a vector of files, or a directory:

data <- read_biblio("scopus_export.csv")
data <- read_biblio(c("wos_1.txt", "wos_2.txt"))
data <- read_biblio("exports/")

Use a format-specific reader when the source is already known:

scopus <- read_scopus("scopus.csv")
wos    <- read_wos("savedrecs.txt")
oa     <- read_openalex_csv("openalex_works.csv")
dim    <- read_dimensions("dimensions.csv")
lens   <- read_lens("lens.csv")

For custom CSV files, tell read_biblio() which columns should be split into list-columns:

data <- read_biblio(
  "my_data.csv",
  format = "generic",
  id = "paper_id",
  actors = c("Authors", "Keywords"),
  sep = ";"
)

All readers try to return the same core columns: id, title, year, journal, doi, cited_by_count, abstract, type, authors, references, and keywords. Source-specific extras such as countries, affiliations, index_keywords, and keywords_plus are preserved when available.

Network Builders

Co-authorship

edges <- author_network(data, type = "collaboration")

Two authors are linked when they appear on the same paper. Use counting to change how each paper contributes:

author_network(data, "collaboration", counting = "full")
author_network(data, "collaboration", counting = "fractional")
author_network(data, "collaboration", counting = "harmonic")
author_network(data, "collaboration", counting = "first_last")

Use attention instead of counting when the goal is position-based weighting independent of the standard counting families. The same option is available on keyword_network(), country_network(), and institution_network():

author_network(data, attention = "lead")       # first author dominant
author_network(data, attention = "last")       # last author dominant
author_network(data, attention = "proximity")  # middle authors weighted most
author_network(data, attention = "circular")   # first and last upweighted

attention and counting are mutually exclusive: when attention is non-NULL, the network is built directly from positional weights and the type/counting arguments are ignored.

Reference Co-citation

refs <- reference_network(data, type = "co_citation", min_occur = 2)

Two references are linked when they are cited together by the same paper. This is a column-mode projection of the papers x references matrix.

Bibliographic Coupling and Direct Citation

coupling <- document_network(data, type = "coupling", similarity = "cosine")
direct   <- document_network(data, type = "citation")

Coupling links two papers when they cite the same references. Direct citation returns directed within-corpus citation edges from citing paper to cited paper.

Keywords, Sources, Countries, and Institutions

keyword_network(data, field = "keywords")
source_network(data, type = "coupling", min_occur = 2)
country_network(data, type = "collaboration", counting = "fractional")
institution_network(data, type = "collaboration", counting = "fractional")

Entity labels are trimmed and uppercased before matrix construction so that minor casing differences do not create separate nodes.

Generic Co-networks

conetwork() is useful when a dedicated wrapper is not needed:

conetwork(data, "keywords")
conetwork(data, "authors", by = "keywords")
conetwork(data, "journal", by = "references", similarity = "cosine")

With one field, entities are linked when they co-occur in the same paper. With by, entities are linked through shared values of another field.

Counting and Normalization

Counting controls how each paper contributes to edge weights. Similarity normalization controls how pair-level totals are rescaled after projection.

Method	Main use	Interpretation
`"full"`	all networks	Each observed co-occurrence contributes 1.
`"fractional"`	all networks	Contribution is scaled by list size so large teams or long reference lists do not dominate.
`"paper"`	co-occurrence networks	Each paper contributes a fixed total amount spread across pairs.
`"strength"`	coupling networks	Uses reference-frequency weighting and row-list-size normalization for Perianes-Rodríguez-style coupling strength.
`"harmonic"`	author collaboration	Authorship credit decreases by harmonic rank and sums to 1 per paper.
`"arithmetic"`	author collaboration	Authorship credit decreases linearly by position.
`"geometric"`	author collaboration	Authorship credit decays geometrically by position.
`"adaptive_geometric"`	author collaboration	Geometric decay adapts to the number of authors.
`"golden"`	author collaboration	Geometric decay based on the golden ratio.
`"first"` / `"last"`	author collaboration	Only the first or last author receives credit.
`"first_last"`	author collaboration	First and last authors are upweighted.
`"position_weighted"`	author collaboration	User-supplied position weights.

Similarity options:

keyword_network(data, similarity = "association")
keyword_network(data, similarity = "cosine")
keyword_network(data, similarity = "jaccard")
keyword_network(data, similarity = "inclusion")
keyword_network(data, similarity = "equivalence")

Association strength is often useful for co-occurrence data because it compares the observed co-occurrence against the product of the two marginal frequencies. Cosine normalization is often easier to interpret for coupling because it scales shared references by the geometric mean of the two reference-list lengths.

Network Reduction

edges <- author_network(data, "collaboration")

strong_edges <- prune(edges, threshold = 3)
local_top    <- prune(edges, top_n = 5)
top_nodes    <- filter_top(edges, n = 50)
backbone_net <- backbone(edges, alpha = 0.05)

Use:

prune(threshold = x) for an absolute edge-weight cutoff.
prune(top_n = k) to retain locally strong edges for each node.
filter_top(n = k) to keep only the most connected nodes.
backbone(alpha = x) to apply the Serrano-Boguñá-Vespignani disparity filter for multiscale weighted networks.

Temporal Networks and Historiographs

temporal_network(data, keyword_network, window = 3)
temporal_network(data, author_network, "collaboration",
                 window = 2, strategy = "sliding")

lcs <- local_citations(data)
h   <- historiograph(data, n = 30)

temporal_network() supports fixed, sliding, and cumulative windows. If a window cannot be built, it warns with the window label instead of silently dismissing the failure.

local_citations() counts within-corpus citations. historiograph() then builds a directed network among the top locally cited documents.

Export

to_matrix(edges)
to_gephi(edges)
to_graphml(edges, file = "network.graphml")

if (requireNamespace("igraph", quietly = TRUE)) {
  g <- to_igraph(edges)
}

Optional converters are guarded by requireNamespace(), so packages such as igraph, tidygraph, and cograph are not required unless their output formats are requested.

Important References

Batagelj, V., & Cerinšek, M. (2013). On bibliographic networks. Scientometrics, 96(3), 845-864. doi:10.1007/s11192-012-0940-1.
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A. M., Nunes, C., & Jatowt, A. (2020). YAKE! Keyword extraction from single documents using multiple local features. Information Sciences, 509, 257-289. doi:10.1016/j.ins.2019.09.013.
Hagen, N. T. (2008). Harmonic allocation of authorship credit: Source-level correction of bibliometric bias assures accurate publication and citation analysis. PLOS ONE, 3(12), e4021. doi:10.1371/journal.pone.0004021.
Liu, X. Z., & Fang, H. (2023). A geometric counting method adaptive to the author number. Journal of Informetrics, 17(2), 101404. doi:10.1016/j.joi.2023.101404.
Perianes-Rodríguez, A., Waltman, L., & van Eck, N. J. (2016). Constructing bibliometric networks: A comparison between full and fractional counting. Journal of Informetrics, 10(4), 1178-1195. doi:10.1016/j.joi.2016.10.006.
Serrano, M. Á., Boguñá, M., & Vespignani, A. (2009). Extracting the multiscale backbone of complex weighted networks. Proceedings of the National Academy of Sciences, 106(16), 6483-6488. doi:10.1073/pnas.0808904106.
Small, H. (1973). Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24(4), 265-269. doi:10.1002/asi.4630240406.
van Eck, N. J., & Waltman, L. (2009). How to normalize co-occurrence data? An analysis of some well-known similarity measures. Journal of the American Society for Information Science and Technology, 60(8), 1635-1651. doi:10.1002/asi.21075.

Vignettes

Two vignettes ship with the package:

vignette("bibnets") — end-to-end workflow: builders, counting and similarity options, attention weighting, network reduction, temporal windows, historiograph, and exports.
vignette("reading-data", package = "bibnets") — readers for each supported source (Scopus, Web of Science, OpenAlex JSON and flat CSV, BibTeX, RIS, Lens, Dimensions, Crossref), the standard schema, generic CSV input, and multi-source merging.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
R		R
data-raw		data-raw
data		data
inst		inst
man		man
sidelined		sidelined
tests		tests
tutorials		tutorials
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGES.md		CHANGES.md
CLAUDE.md		CLAUDE.md
DESCRIPTION		DESCRIPTION
HANDOFF.md		HANDOFF.md
LEARNINGS.md		LEARNINGS.md
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.md		README.md
cran-comments.md		cran-comments.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bibnets

Main Features

Install

Quick Start

Reading Data

Network Builders

Co-authorship

Reference Co-citation

Bibliographic Coupling and Direct Citation

Keywords, Sources, Countries, and Institutions

Generic Co-networks

Counting and Normalization

Network Reduction

Temporal Networks and Historiographs

Export

Important References

Vignettes

License

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

bibnets

Main Features

Install

Quick Start

Reading Data

Network Builders

Co-authorship

Reference Co-citation

Bibliographic Coupling and Direct Citation

Keywords, Sources, Countries, and Institutions

Generic Co-networks

Counting and Normalization

Network Reduction

Temporal Networks and Historiographs

Export

Important References

Vignettes

License

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages