# Common edge cases and errors in sample submission

In [None]:
!lndb init --storage "testsample" --schema "bionty,wetlab"

In [None]:
import pandas as pd
import numpy as np
import lamindb as ln
import lamindb.schema as lns
import re
from lnschema_wetlab.dev import parse_and_insert_df, datasets
import pytest

Let's create simple biosample and techsample examples.

In [None]:
biosample = datasets.biosample()
techsample = datasets.techsample()

In [None]:
biosample

In [None]:
techsample

## Case #1: the same user submits duplicate entries

LaminDB's sample submission does not duplicate an existing entry.

When an existing entry is submitted, LaminDB simply returns the existing entry in the return object without raising a warning or an error.

In [None]:
# Submit samplesheets
res1 = parse_and_insert_df(biosample, "biosample")
res2 = parse_and_insert_df(biosample, "biosample")
res3 = parse_and_insert_df(techsample, "techsample")
res4 = parse_and_insert_df(techsample, "techsample")

# Fetch database entries
species = ln.select(lns.bionty.Species).all()
biosamples = ln.select(lns.wetlab.Biosample).all()
techsamples = ln.select(lns.wetlab.Techsample).all()

# Check for non-duplication
assert len(species) == (
    len(biosample["Species"].unique()) - int(None in biosample["Species"].unique())
)
assert len(biosamples) == len(biosample)
assert len(techsamples) == len(techsamples)

## Case #2: different users submit duplicate entries

LaminDB does not duplicate an existing entry, even if the user who submits the existing entry is different.

The behavior of `parse_and_insert_df` is the same as in Case #1 (existing entry is returned, no wwarnings or errors are raised.)

In [None]:
!lndb login testuser2@lamin.ai --password goeoNJKE61ygbz1vhaCVynGERaRrlviPBVQsjkhz

In [None]:
# Submit samplesheets
res1 = parse_and_insert_df(biosample, "biosample")
res2 = parse_and_insert_df(techsample, "techsample")

# Fetch database entries
species = ln.select(lns.bionty.Species).all()
biosamples = ln.select(lns.wetlab.Biosample).all()
techsamples = ln.select(lns.wetlab.Techsample).all()

# Check for non-duplication
assert len(species) == len(biosample["Species"].unique())
assert len(biosamples) == len(biosample)
assert len(techsamples) == len(techsamples)

## Case #3: user submits an existing table with two additional rows or with small modifications to the values of a column

For any samplesheet, LaminDB will always fetch existing entries and add new entries.

The return object will always contain every entry from the samplesheet (whether it has been fetched or added).

If there is the slightest modification to an existing entry, LaminDB will add the modified entry as a new entry (providing that the modified field in the samplesheet is a field in the target table), unless there exists a unique constraint on a field that has not also been modified. In this case, LaminDB will return an error.

Let's extend our biosample table with two new entries:
* An existing entry named "hc_hea_021" with a single modification on the "Experiment" field.
* An entirely new entry named "cv_conv_05"

Both entries will be added to the database since there are no unique constraints in the `Biosample` schema.

In [None]:
new_entries = {
    "Name": ["hc_hea_021", "cv_con_075"],
    "Species": ["human", "human"],
    "Cell Type": ["CD4", "CD8+T"],
    "Experiment": ["001", "007"],
    "Donor": ["021", "027"],
    "Disease": [None, "U07.1"],
    "Custom 1": ["healthy", "convalescent"],
    "Custom 2": ["control", "covid-19"],
    "Custom 3": [12.11, 7.83],
}
biosample_mod = pd.concat([biosample, pd.DataFrame(new_entries)])

# Submit samplesheets
res = parse_and_insert_df(biosample_mod, "biosample")

# Fetch database entries
species = ln.select(lns.bionty.Species).all()
biosamples = ln.select(lns.wetlab.Biosample).all()

# Check for successful addition
assert len(species) == (
    len(biosample_mod["Species"].unique())
    - int(None in biosample_mod["Species"].unique())
)
assert len(biosample) != len(biosample_mod)
assert len(biosamples) == len(biosample_mod)

## Case #4: user submits a sheet with a typo (e.g. mispelled 'human' species as 'humen')

Knowledge (biology and ontology related) checks are not yet implemented. Typos will be added to the database.

## Case #5: user submits entries with an invalid table name (inexistent table)

LaminDB raises an error when it cannot match the table name passed to `parse_and_insert_df` with an existing schema table (perfect, case-insensitive match).

In [None]:
NO_TABLE_MATCH_ERROR = "Table [a-zA-Z0-9_]* does not exist."
with pytest.raises(ValueError, match=NO_TABLE_MATCH_ERROR) as e:
    exception = e
    res = parse_and_insert_df(biosample, "biosample_inexistent")
exception.exconly()