# ImpactDB Paper Upload Demo
This notebook demostrates how papers can be added to [ImpactDB](https://impact-database.com) using an API


Users are required to authenticate themselves before they can upload data to the database via a token. Users can access their tokens by signing into ImpactDB and navigating to their user information page: [ImpactDB User Info Page](https://impact-database.com)

Note: testing of the api can be done using this url: http://localhost:5001/impact-db/us-central1/uploadPaper


### Load Imports

In [1]:
import json
import requests
import pandas as pd
import datetime

from IPython.core.display import HTML

custom_css = """
<style>
    div.cell_output {
        max-width: 1920px;
        overflow-x: auto;
        display: block;
    }
</style>
"""

display(HTML(custom_css))

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


### Define a function to handle date validation

In [2]:
# helper function to handle date
def validate_or_convert_date(date_value):
    if isinstance(date_value, str):
        try:
            datetime.datetime.strptime(date_value, '%Y-%m-%d')
            return date_value  # It's a valid date string
        except ValueError:
            return None  # It's not a valid date string
    elif isinstance(date_value, int):
        return f"{date_value}-01-01"  # Convert year to YYYY-MM-DD
    else:
        return None  # Not a string or integer, invalid format

### 1. Upload paper data
These papers are from the E. coli database

In [3]:
e_coli_papers = pd.read_csv('../data/ecoli_robust_cleaned.csv')

# If the DataFrame row is already in a suitable format like a dictionary
first_paper = e_coli_papers.iloc[0].to_dict()

# fix the date
first_paper['date'] = validate_or_convert_date(first_paper['date'])

first_paper

{'title': 'Growing E. coli to high cell densityâ\x80\x94a historical perspective on method development',
 'authors': 'J Shiloach, R Fass',
 'journal': 'Biotechnology advances',
 'date': '2005-01-01',
 'doi': 'https://doi.org/10.1016/j.biotechadv.2005.04.004',
 'abstract': "E. coli\xa0is the major bacterial platform for expressing simple heterologous proteins. Growing\xa0E. coli\xa0to high densities has been the subject of numerous studies since the early 1970s, exploring the limits of\xa0bacterial culture\xa0density in order to achieve maximum productivity. Research strategies were focused on improving the cultivation techniques, manipulating the bacteria's physiology or both. As a result, batch, fed batch and dialysis\xa0fermentation techniques\xa0had been developed. These growth strategies, together with optimization of media composition and the application of molecular biology methods, made it possible to grow\xa0E. coli\xa0to cell densities of up to 190 g/l (dry weight), while avoi

### Upload one paper

In [4]:
# this is the url of the api endpoint (or the testing endpoint)
# url = 'https://us-central1-impact-db.cloudfunctions.net/uploadExperimentalData'
url = 'http://localhost:5001/impact-db/us-central1/uploadPaper'

# replace YOUR_JWT_TOKEN with the token you get from your user info page: https://impact-database.com/userinfo
your_jwt_token = "YOUR_JWT_TOKEN"

# headers for the request
headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {your_jwt_token}",
}

# data to be uploaded
data = {
    'species': 'escherichia',
    'paper': first_paper,
}

# send the post request to the api
response = requests.post(url, data=json.dumps(data), headers=headers)

print(response.text)

Error: Invalid JWT. Please include a valid JWT from https://impact-database.com/userinfo.


### An example loop to upload many papers

In [5]:
for _, row in e_coli_papers[:3].iterrows():
    paper = row.to_dict()

    paper['date'] = validate_or_convert_date(paper['date'])

    data = {
        'species': 'escherichia',
        'paper': paper,
    }
    response = requests.post(url, data=json.dumps(data), headers=headers)

    print(response.text)


Error: Invalid JWT. Please include a valid JWT from https://impact-database.com/userinfo.
Error: Invalid JWT. Please include a valid JWT from https://impact-database.com/userinfo.
Error: Invalid JWT. Please include a valid JWT from https://impact-database.com/userinfo.
