# Notebook #4 for ARC/SSAD Project

The final task for this project is to create an ADS Library of ARC/SSAD matches, curate the remaining/missing items to ADS, and then create a separate library for those added.

## Task 4: Curate missing items and create ADS Libraries

Outline:
- Step 1: Curate records of missing items
- Step 2: Create ADS Library of matched items
- Step 3: Create ADS Library of newly added items

### Step 2: Create ADS Library of Matched Items

In [None]:
import pandas as pd
import numpy as np

# Open my excel sheet as a data frame
df = pd.read_excel("AHED/ADS-Matched-with-notes.xlsx")

# Take bibcode list, drop nulls and duplicates
df = df['BIBCODE'].replace('indexed', np.nan)
df = df.replace('Indexed', np.nan)
df = df.dropna()
df = df.drop_duplicates(keep="last")

# Format bibcodes to list
bibs = df.to_list()
print("Number of bibcodes:", len(bibs))

In [None]:
import requests
import json

# --- API REQUEST --- 
token = "pHazHxvHjPVPAcotvj7DIijROZXUjG5vXa2OaCQO"
url = "https://api.adsabs.harvard.edu/v1/biblib/libraries"
    
data = { 
    "name":"ARC/SSAD Library",
    "description":"Library of records ADS matches of ARC/SSAD holdings",
    "public": True,
    "bibcode": bibs
}
headers = {'Authorization': 'Bearer ' + token}
response = requests.post(url, data=json.dumps(data), headers=headers)

print(response.status_code)

### Step 3: Create ADS Library of added/curated items

In [None]:
import pandas as pd

# Load the txt files of records I made
bibs = pd.read_csv('/Users/sao/Documents/Curation/2021-10_AHED Missing.txt', delimiter='\t')
book1 = pd.read_csv('/Users/sao/Documents/Curation/2021-10_CCTP Book.txt', delimiter='\t')
book2 = pd.read_csv('/Users/sao/Documents/Curation/2021-10_SRML Book.txt', delimiter='\t')

# Convert to data frame
bibs = pd.DataFrame(bibs)
book1_r = pd.DataFrame(book1)
book2_r = pd.DataFrame(book2)

bibs.columns=['data']
book1_r.columns=['data']
book2_r.columns=['data']

# Join all the lines together in one column
bib_list = pd.concat([bibs, book1_r, book2_r], axis='index', ignore_index=True)
bib_list

In [None]:
# Isolate the bibcodes only by grabbing the rows that start iwht %R
rslt = bib_list[bib_list['data'].str.startswith('%R')]
rslt['data'] = rslt['data'].map(lambda x: x.lstrip('%R '))
print('Number of bibcodes:',len(rslt),'\n',rslt)

In [None]:
# Convert to list
library = rslt['data'].to_list()
library

In [None]:
import requests
import json

# --- API REQUEST --- 
token = "pHazHxvHjPVPAcotvj7DIijROZXUjG5vXa2OaCQO"
url = "https://api.adsabs.harvard.edu/v1/biblib/libraries"
    
data = { 
    "name":"ARC/SSAD Library 2",
    "description":"Library of records ADS added/curated from references of ARC/SSAD holdings",
    "public": True,
    "bibcode": library
}
headers = {'Authorization': 'Bearer ' + token}
response = requests.post(url, data=json.dumps(data), headers=headers)

print(response.status_code)