# Introduction to CitationCompass

CitationCompass is a lightweight package for annotating and extracting citable portions of scientific code from Python modules. It is meant for authors of a codebase to make discovery of citable blocks of code easier (as opposed to being designed to analyze un-annotated code).  In this tutorial we examine the different ways authors can inject citation information.

In [None]:
import citation_compass as cc

## Extracting Citations

Before we demonstrate approaches to annotate the citable blocks of code, let's look at how the user will access them. CitationCompass provides two mechanisms: retrieving everything that has been annotated (`get_all_citations()`) and retrieving annotated items that have been used (`get_used_citations()`). Both of these functions return a list of strings.

Throughout the notebook we use corresponding wrapper functions `print_all_citations()` and `print_used_citations()` that display the same information in a user readable format. 

We start by adding an annotated function (we will cover the details of how this works later in the notebook) so we have something to retrieve.

In [None]:
@cc.cite_function
def my_function():
    """This is my function.

    Citation: CitationCompass 2025
    """
    return 0

We can then look at the list of all citations and the list of used citations:

In [None]:
def print_citations_state():
    """Helper function to print the state of the citations."""
    print("\nALL\n---\n")
    cc.print_all_citations()
    print("\nUSED\n----\n")
    cc.print_used_citations()


print_citations_state()

As we can see the function has been included in the list of all citations, but not the list of used citations. This changes if we call the function (which means it has been used).

In [None]:
_ = my_function()
print_citations_state()

## Citation Formats

Each citation is displayed as the thing to cite, including module and name information, and then the citation string. By default CitationCompass will parse the object's docstring.  The extractor looks for sections denoted by keywords 'citation', 'citations', 'reference', or 'references'. These citation sections can be provided in either numpy or Google format.

Underlined delimited section sections look for section delimiters of the form "keyword\n-------" with at least 2 dashes making up the underline. The citation section includes all text until the end of the string or the next section header.

## Citing Functions and Methods

As we saw above, an author can annotate a function using the `@cite_function` decorator. Cited functions will be included on the all citations list when they are defined (imported) and the used citation list when they are first called. Despite the name containing the word "function", this decorator also works with class methods.

We can add another function (with a more complex docstring) and see how that impacts the citations.

In [None]:
@cc.cite_function
def my_function_2(x):
    """This is my second function.

    Citation
    --------
    CitationCompass
    February 2025

    Parameters
    ----------
    x : int
        This is an integer

    Returns
    -------
    int
        The same integer
    """
    return x


print_citations_state()

Note that both functions are listed in the "all" list, but only the first appears in the "used" list.

Since wrapping a function (in order to track when it is used) adds a little bit of overhead, users may wish to switch off this capability for functions that are called many times. Users can do this by adding a `track_used=False` parameter to the decorator. This will automatically add the function to the "used" list (whether or not it is actually used), so the citation does not get missed.

In [None]:
@cc.cite_function(track_used=False)
def my_function_4(x):
    """This is my second function.
    Citation: CitationCompass
    """
    return x


print_citations_state()

## Citing Classes

An author can annotate a class by inheriting from CitationCompass's `CiteClass` class. Cited classes will be included on the all citations list when they are defined and the used citation list when the first object is instantiated.

In [None]:
class ExampleClass(cc.CiteClass):
    """My Example class.

    Citation: Citation here
    """

    def __init__(self):
        self.x = 0


print_citations_state()

We add the class to the "used" list as soon as we create the first instance.

In [None]:
obj = ExampleClass()
print_citations_state()

## Citing Modules

An author can add an annotation for a module (or submodule) by adding `cite_module(__name__)` function to the module's file. This will automatically determine the name of the current (sub)module and mark it for citation. The module is added to **both** the "all" and "used" list when the `cite_module` function is called, which can depend on how the end user imports the code. If the user is importing indvidual functions, the `cite_module` function may not be evaluated. As such, the author may want to include `cite_module(__name__)` in the modules `__init__.py`.```

In [None]:
cc.cite_module("citation_compass", "CitationCompass, 2025")
print_citations_state()

Author's can also mark imported modules for citation by passing in a string with the name of that module, such as `cite_module("astropy")`. This allows authors to call out modules they use and know can be cited.

## Adding Citations Inline

If the block of code that needs to be cited does not naturally fit into one of the use cases above (class, function, etc.), an author can manually insert a citation using `cite_inline(name, citation_text)`. The name must be a unique tag for this citation and the citation text can be anything the author wants to display to the end user.

In [None]:
cc.cite_inline("my_citation", "This is my custom citation.")
print_citations_state()

Manual citations are marked as used as soon as they are inserted (the line of code is executed).

## Citing Objects

An author can cite an instantiated object using the `cite_object(obj)` function. Note that we do not expect this to be a typical use case. Most users will want to use a class-level citation instead. However citing an object can be used with objects from external packages. Cited objects will be referenced by the object's class information. Cited objects are added to both the all citations and used citations list as soon as the `cite_object` function is called.

For example we could cite Python's list. Since list's docstring does not contain any of our citation labels, it just appends the docstring itself.

In [None]:
my_list = [1, 2, 3]
cc.cite_object(my_list)
print_citations_state()

## Citation Contexts

Users might sometimes want only the citations for elements of code used within a given block of code.  CitationCompass provides a context manager that can perform tracking within a subset of code.

In [None]:
with cc.CitationContext("sub_context") as context:
    print(f"At the context start: {context.get_citations()}")
    _ = my_function()
    print(f"At the context end: {context.get_citations()}")

The context specific list is automatically cleaned up after the `with` block.

Note that only some citations are tracked at time of use. For example, citations to modules or functions with `track_used=False` will not show up within this context. Therefore care needs to be taken when using a context manager.

In [None]:
with cc.CitationContext("sub_context") as context:
    print(f"At the context start: {context.get_citations()}")
    _ = my_function_4(4)
    print(f"At the context end: {context.get_citations()}")

Citation context managers can be assigned arbitrary (unique) labels and can be nested.

## Duplicate Citation Keys

Ideally the keys for every citable object should be unique. CitationCompass tries to ensure this for most automatically generated citations by including package information in the key for classes and functions. However there can still be collisions, especially when citations are manually added. In case of a duplicate key with a non-duplicate citation text, CitationCompass will append the citation information so that it all shows up.

Let's start by looking at what happens if we try to re-add the exact same (key, text) combination.

In [None]:
with cc.CitationContext("sub_context2") as context:
    for i in range(5):
        cc.cite_inline("my_repeated_citation", "This is my custom citation.")
        print(f"After {i + 1} cite_inline: {context.get_citations()}")

Nothing changes with the repeared calls. But if we add different citation text, we see an expanded entry.

In [None]:
with cc.CitationContext("sub_context3") as context:
    for i in range(5):
        cc.cite_inline("my_numbered_citation", f"citation text {i}")
        print(f"After {i + 1} cite_inline: {context.get_citations()}")

    cc.cite_inline("my_numbered_citation", "citation text 0")
    print(f"After re-adding call 0 cite_inline: {context.get_citations()}")

We continue to expand the citation string to include the new information when it is not a repeat of anything that has been previously added.

## Conclusion

CitationCompass is designed for a package's author to annotate their code and include functionality for retrieving the citations annotated. For example a command line tool may include a flag `--show_citations` that displays the citations (all or used) at the end of the run.

While CitationCompass will also pull in annotations from imported packages that also use CitationCompass, we recommend that the code author's err on the side of being comprehensive in case the sub-package changes approach.