# Astronomy Data System (ADS) API Wrapper Tutorial

This notebook demonstrates how to use the improved ADS API wrapper to query the NASA Astrophysics Data System. The wrapper provides a simplified interface for searching papers, retrieving citations, and analyzing academic publications in astronomy and astrophysics.

## Setup

First, you'll need to obtain an API key from the ADS service:
1. Go to https://ui.adsabs.harvard.edu/
2. Create an account or sign in
3. Go to 'Account' → 'API Token' and generate a token

Let's start by importing the necessary modules and setting up our API key.

In [None]:
import os
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from IPython.display import display, HTML

# Import our ADS wrapper module
from myads import ADSQueryWrapper

# Set up API key (replace with your own or use environment variable)
# For security, it's better to use an environment variable
ADS_API_TOKEN = os.environ.get('ADS_API_TOKEN', 'your_api_token')

# Create the wrapper instance
ads = ADSQueryWrapper(ADS_API_TOKEN)

## Basic Search

Let's start with a basic search for papers. We'll search for recent papers about exoplanets.

In [None]:
# Define search parameters
query = "title:exoplanet AND year:2023"
fields = "title,author,bibcode,citation_count,read_count,pubdate"

# Execute the search
result = ads.get(query, fields, sort="citation_count desc", rows=10)

# Display metadata about the search
print(f"Query execution time: {result.query_time} ms")
print(f"Total papers found: {result.num_found}")
print(f"Retrieved papers: {len(result.papers_df)}")

Now let's examine the results using the DataFrame that was created:

In [None]:
# Display the DataFrame with the search results
result.papers_df.head()

## Working with Paper Objects

The wrapper also provides a convenient `papers` property that returns `ADSPaper` objects. Let's examine some of these papers.

In [None]:
# Loop through the first few papers
for i, paper in enumerate(result.papers):
    if i >= 3:  # Just show the first 3
        break
        
    print(f"Paper {i+1}:")
    print(f"  Title: {paper.title}")
    print(f"  Authors: {paper.author[0] if isinstance(paper.author, list) else paper.author}...")
    print(f"  Citations: {paper.citation_count}")
    print(f"  ADS Link: {paper.ads_link}")
    print()

## Finding Citations

One common use case is finding papers that cite a specific publication. Let's take one of the papers we found and see who has cited it.

In [None]:
# Get the bibcode of the most cited paper from our search
top_paper_bibcode = result.papers_df.iloc[0]['bibcode']
top_paper_title = result.papers_df.iloc[0]['title']

print(f"Finding citations for: {top_paper_title}")
print(f"Bibcode: {top_paper_bibcode}")

# Query for citations
citations = ads.citations(
    top_paper_bibcode,
    fl="title,author,bibcode,pubdate,citation_count",
    rows=20
)

print(f"\nFound {citations.num_found} papers that cite this work")
citations.papers_df[['title', 'pubdate', 'citation_count']].head(5)

## Finding References

Similarly, we can find papers that are referenced by a specific publication.

In [None]:
# Query for references
references = ads.references(
    top_paper_bibcode,
    fl="title,author,bibcode,pubdate,citation_count",
    rows=20
)

print(f"Found {references.num_found} papers referenced by this work")

# Show the referenced papers
references.papers_df[['title', 'pubdate', 'citation_count']].head(5)

## Author Search

Let's search for papers by a specific author and analyze their publication history.

In [None]:
# Search for papers by a well-known astronomer
author = "Seager, S."
author_papers = ads.search_author(
    author,
    fl="title,bibcode,author,citation_count,pubdate,read_count",
    sort="citation_count desc",
    rows=100
)

print(f"Found {author_papers.num_found} papers by {author}")
print(f"Retrieved {len(author_papers.papers_df)} papers")

# Show the most cited papers
author_papers.papers_df[['title', 'citation_count', 'years_since_pub']].head(5)

## Visualization: Publication History

Let's visualize the author's publication history.

In [None]:
# Extract publication years
author_papers.papers_df['pub_year'] = author_papers.papers_df['pubdate'].apply(
    lambda x: int(x.split('-')[0]) if isinstance(x, str) and '-' in x else np.nan
)

# Count publications per year
pub_counts = author_papers.papers_df['pub_year'].value_counts().sort_index()

# Plot
plt.figure(figsize=(12, 6))
pub_counts.plot(kind='bar', color='skyblue')
plt.title(f'Publications by {author} per Year')
plt.xlabel('Year')
plt.ylabel('Number of Publications')
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

## Visualization: Citation Impact

Now let's analyze the citation impact of the author's papers.

In [None]:
# Create a scatter plot of citations vs. years since publication
plt.figure(figsize=(12, 8))
plt.scatter(
    author_papers.papers_df['years_since_pub'],
    author_papers.papers_df['citation_count'],
    alpha=0.7,
    c=author_papers.papers_df['citation_count_per_year'],
    cmap='viridis',
    s=100
)

plt.colorbar(label='Citations per Year')
plt.title(f'Citation Impact of Papers by {author}')
plt.xlabel('Years Since Publication')
plt.ylabel('Total Citations')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## Using Pagination to Get All Results

The ADS API limits the number of results per query to 2000. Let's use the pagination feature to get more results if needed.

In [None]:
# Search for a broader topic that might have many papers
query = "title:galaxy AND title:evolution"
fields = "title,bibcode,author,citation_count,pubdate"

# Get multiple pages of results (limited to 300 for this example)
all_results = ads.get_all_results(query, fields, sort="citation_count desc", max_results=3000)

print(f"Retrieved {len(all_results)} pages of results")

# Combine all DataFrames
combined_df = pd.concat([page.papers_df for page in all_results], ignore_index=True) if all_results else None

print(f"Total papers retrieved: {len(combined_df)}")
combined_df.head()

## Advanced Analysis: Finding Collaboration Networks

Let's do a more advanced analysis to find frequent collaborators of our author.

In [None]:
# Extract all co-authors
co_authors = {}

for paper in author_papers.papers:
    if hasattr(paper, 'author') and isinstance(paper.author, list):
        for co_author in paper.author:
            if co_author != author:  # Skip the main author
                co_authors[co_author] = co_authors.get(co_author, 0) + 1

# Create DataFrame of collaborators
collaborators_df = pd.DataFrame({
    'co_author': list(co_authors.keys()),
    'papers_together': list(co_authors.values())
}).sort_values('papers_together', ascending=False)

# Display top collaborators
print(f"Top collaborators with {author}:")
collaborators_df.head(10)

## Visualizing Top Collaborators

In [None]:
# Plot top 15 collaborators
plt.figure(figsize=(12, 8))
top_n = 15
top_collaborators = collaborators_df.head(top_n)

# Create horizontal bar plot
sns.barplot(data=top_collaborators, y='co_author', x='papers_together', palette='viridis')
plt.title(f'Top {top_n} Collaborators with {author}')
plt.xlabel('Number of Papers Together')
plt.ylabel('Co-author')
plt.tight_layout()
plt.show()

## Research Topic Analysis

Let's analyze the research topics by looking at paper titles.

In [None]:
# Simple word count analysis from titles
from collections import Counter
import re

# Common words to exclude
stop_words = {'the', 'a', 'an', 'and', 'in', 'of', 'to', 'for', 'on', 'with', 'from'}

# Extract words from titles
all_words = []
for title in author_papers.papers_df['title']:
    if isinstance(title, str):
        words = re.findall(r'\b[a-zA-Z]{3,}\b', title.lower())
        all_words.extend([w for w in words if w not in stop_words])

# Count words
word_counts = Counter(all_words)
top_words = pd.DataFrame({
    'word': list(word_counts.keys()),
    'count': list(word_counts.values())
}).sort_values('count', ascending=False)

# Display top words
print(f"Most common words in paper titles by {author}:")
top_words.head(15)

## Creating a Word Cloud

In [None]:
# Create a word cloud
try:
    from wordcloud import WordCloud
    
    # Generate word cloud
    wordcloud = WordCloud(width=800, height=400, background_color='white', 
                          max_words=100, contour_width=3, contour_color='steelblue')
    
    # Generate from frequencies
    wordcloud.generate_from_frequencies(word_counts)
    
    # Display
    plt.figure(figsize=(16, 8))
    plt.imshow(wordcloud, interpolation='bilinear')
    plt.axis('off')
    plt.title(f'Common Research Topics in Papers by {author}', fontsize=16)
    plt.tight_layout()
    plt.show()
except ImportError:
    print("WordCloud package not installed. Install with: pip install wordcloud")

## Comparative Analysis

Let's compare the publication patterns of two researchers in the same field.

In [None]:
# Define a second author to compare with
author2 = "Marcy, G."

# Get papers for the second author
author2_papers = ads.search_author(
    author2,
    fl="title,bibcode,author,citation_count,pubdate,read_count",
    sort="citation_count desc",
    rows=100
)

print(f"Found {author2_papers.num_found} papers by {author2}")

# Extract publication years for both authors
author2_papers.papers_df['pub_year'] = author2_papers.papers_df['pubdate'].apply(
    lambda x: int(x.split('-')[0]) if isinstance(x, str) and '-' in x else np.nan
)

# Count publications per year for second author
pub_counts2 = author2_papers.papers_df['pub_year'].value_counts().sort_index()

# Combine data for comparison
comparison_df = pd.DataFrame({
    f"{author}": pub_counts,
    f"{author2}": pub_counts2
}).fillna(0)

# Plot comparison
plt.figure(figsize=(14, 8))
comparison_df.plot(kind='bar', figsize=(14, 8))
plt.title(f'Publication Comparison: {author} vs {author2}')
plt.xlabel('Year')
plt.ylabel('Number of Publications')
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.legend()
plt.tight_layout()
plt.show()

## Analyzing Citation Impact over Time

Let's compare the citation impact of both authors.

In [None]:
# Calculate average citations per paper by year
citation_by_year1 = author_papers.papers_df.groupby('pub_year')['citation_count'].mean()
citation_by_year2 = author2_papers.papers_df.groupby('pub_year')['citation_count'].mean()

# Combine data
citation_comparison = pd.DataFrame({
    f"{author}": citation_by_year1,
    f"{author2}": citation_by_year2
}).fillna(0)

# Plot
plt.figure(figsize=(14, 8))
citation_comparison.plot(kind='line', marker='o', figsize=(14, 8))
plt.title('Average Citations per Paper by Publication Year')
plt.xlabel('Publication Year')
plt.ylabel('Average Citations')
plt.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout()
plt.show()

## Finding Related Papers

Let's explore papers related to a specific topic and analyze their connections.

In [None]:
# Search for papers on a specific topic
topic = "exoplanet atmospheres"
topic_papers = ads.get(
    f"abs:\"{topic}\"",
    fl="title,abstract,bibcode,author,citation_count,pubdate",
    sort="citation_count desc",
    rows=50
)

print(f"Found {topic_papers.num_found} papers about {topic}")

# Display the most influential papers
topic_papers.papers_df[['title', 'citation_count', 'pubdate']].head(5)

## Conclusion

In this tutorial, we've explored the various capabilities of the ADS API wrapper:

1. Basic paper searches
2. Citation analysis
3. Author-specific queries
4. Publication patterns and metrics
5. Collaboration networks
6. Research topic analysis
7. Comparative bibliometrics

The wrapper makes it easy to perform complex queries and analyze publication data for astronomical research. You can extend these examples to build more sophisticated analyses for your specific research needs.