# DeepCCS
This Notebook contains all the code needed to reproduce the results of section "CCS prediction" from the DeepCCS paper.
It is also an example of how to use the DeepCCS python module. 
For most users, we recommand the use of the command line interface which is a lot easier to use if you don't have a prior knowledge of python and/or machine learning.

In [1]:
import numpy as np
import h5py as h5
import datetime
from DeepCCS.utils import *
from DeepCCS.model.DeepCCS import DeepCCSModel
from DeepCCS.model.encoders import AdductToOneHotEncoder, SmilesToOneHotEncoder
from DeepCCS.model.splitter import SMILESsplitter
from keras.callbacks import ModelCheckpoint
from keras import backend as K
from sklearn.metrics import r2_score, mean_absolute_error, median_absolute_error
from sklearn.model_selection import train_test_split
import seaborn as sns
from matplotlib import pyplot
from plotly.offline import init_notebook_mode, iplot, plot
from plotly import graph_objs as go
init_notebook_mode()

Using TensorFlow backend.
IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.


In [2]:
# PATHs and variables
DATE = datetime.datetime.now().strftime("%Y-%m%-d-")
datafile = "DATASETS.h5"
models_path = "ModelsAndTestSets/"

## Read data
We first read the data from the hdf5 file supplied with DeepCCS. This file was created using data from different sources (see references in README) and isomeric SMILES were retrieved from PubChem using the web API.

In [3]:
datasets_names = ["MetCCS_train_pos", "MetCCS_train_neg", "MetCCS_test_pos", "MetCCS_test_neg", 
                  "Astarita_pos", "Astarita_neg","Baker", "McLean", "CBM"] 

datasets = [read_dataset(datafile, d_name) for d_name in datasets_names] # Read
datasets = [filter_data(d_set) for d_set in datasets] # Filter

# Data exploration
This step will help in removing unwanted data from the dataset and getting to know the data.

In [None]:
#Look at SMILES length using a violin plot
#Exceptionnaly uses seaborn because plotly does not provide an easy way to create a violin plot.

#SMILES = [d["SMILES"] for d in datasets]
#smiles_splitter = SMILESsplitter()
#l_split_smiles = []
#sources = []
#for j, d in enumerate(SMILES):
#    l = [len(smiles_splitter.split(i)) for i in d]
#    l_split_smiles += l
#    sources += [datasets_names[j]] * len(l)
    


#sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.set_style("whitegrid")
#ax = sns.violinplot(y=l_split_smiles, x=sources)
#sns.show()

# Model training
We will perform model training in 2 ways:
 * Single dataset split --> ten models
 * 10 different splits --> one model per split

For both case, the datasets are splitted using the partitionning from the paper.

## Single split
We first split the datasets between training, validation and testing set. Then, we train 10 models for which the weight are initialized randomly.

In [4]:
np.random.seed(777)
save_test_sets_data = {}
pooled_set = []
test_sets = []
train_set = []
validation_set = []
test_sets_names = []
for i, dset in enumerate(datasets_names):
    if dset in ["MetCCS_train_pos", "MetCCS_train_neg"]:
        pooled_set.append([np.array(datasets[i]["SMILES"]),
                           np.array(datasets[i]["Adducts"]),
                           np.array(datasets[i]["CCS"])])
    elif dset in ["MetCCS_test_pos", "MetCCS_test_neg", "Astarita_pos", "Astarita_neg"]:
        test_sets.append([np.array(datasets[i]["SMILES"]),
                          np.array(datasets[i]["Adducts"]),
                          np.array(datasets[i]["CCS"])])
        test_sets_names.append(dset)
    elif dset in ["Baker", "McLean", "CBM"]:
        smiles = np.array(datasets[i]["SMILES"])
        ccs = np.array(datasets[i]["CCS"])
        adducts = np.array(datasets[i]["Adducts"])
        
        # We use binary masks to split the datasets between pooled and test
        mask_pooled = np.zeros(len(smiles), dtype=int)
        mask_pooled[:int(len(smiles) * 0.8)] = 1  # The remaining 20% goes in the test set.
        np.random.shuffle(mask_pooled)
        mask_test = 1 - mask_pooled
        mask_pooled = mask_pooled.astype(bool)
        mask_test = mask_test.astype(bool)
        
        pooled_set.append([smiles[mask_pooled], adducts[mask_pooled], ccs[mask_pooled]])
        test_sets.append([smiles[mask_test], adducts[mask_test], ccs[mask_test]])
        test_sets_names.append(dset)
# Split pooled between train (90%) and validation (10%)
smiles_pooled = np.concatenate([i[0] for i in pooled_set])
adducts_pooled = np.concatenate([i[1] for i in pooled_set])
ccs_pooled = np.concatenate([i[2] for i in pooled_set])

mask_train = np.zeros(len(smiles_pooled), dtype=int)
mask_train[:int(len(smiles_pooled) * 0.9)] = 1  # The remaining 10% goes in the validation set.
np.random.shuffle(mask_train)
mask_valid = 1 - mask_train
mask_train = mask_train.astype(bool)
mask_valid = mask_valid.astype(bool)

train_set = [smiles_pooled[mask_train], adducts_pooled[mask_train], ccs_pooled[mask_train]]
validation_set = [smiles_pooled[mask_valid], adducts_pooled[mask_valid], ccs_pooled[mask_valid]]

Test sets still contain each test dataset individually to easily calculate metrics independently.

Now, we will encode the SMILES and adducts in order to be able to feed them to the network. We will use one-hot vector encoding. We will use the encoders already implemented in DeepCCS. The main advantage of using DeepCCS encoders is that the SMILES encoder automatically pad the smiles to a specified length and that they offer  load/save functionnalities.

In [5]:
smiles_encoder = SmilesToOneHotEncoder()
smiles_encoder.fit(np.concatenate([dset["SMILES"] for dset in datasets]))
train_set[0] = smiles_encoder.transform(train_set[0])
validation_set[0] = smiles_encoder.transform(validation_set[0])

adducts_encoder = AdductToOneHotEncoder()
adducts_encoder.fit(np.concatenate([dset["Adducts"] for dset in datasets]))
train_set[1] = adducts_encoder.transform(train_set[1])
validation_set[1] = adducts_encoder.transform(validation_set[1])

In [6]:
smiles_encoder.save_encoder("SMILES_encoder.json")
adducts_encoder.save_encoder("Adducts_encoder.json")

## Train the neural network
The data is ready. We can now create our model and train it. Since the results are dependant of the initialisation, we will retrain the network 10 times in order to have a good idea of the reproducibility. Each model will be saved in a file so that they can be reloaded.

Let's just look at the network structure before we go further...

In [7]:
model = DeepCCSModel()
model.adduct_encoder = adducts_encoder
model.smiles_encoder = smiles_encoder
model.create_model()
model.model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
smile (InputLayer)              (None, 250, 37)      0                                            
__________________________________________________________________________________________________
conv1d_1 (Conv1D)               (None, 247, 64)      9536        smile[0][0]                      
__________________________________________________________________________________________________
conv1d_2 (Conv1D)               (None, 244, 64)      16448       conv1d_1[0][0]                   
__________________________________________________________________________________________________
max_pooling1d_1 (MaxPooling1D)  (None, 243, 64)      0           conv1d_2[0][0]                   
__________________________________________________________________________________________________
conv1d_3 (


Update your `Model` call to the Keras 2 API: `Model(outputs=Tensor("de..., inputs=[<tf.Tenso...)`



In [None]:
#Save test set to file for furter use
f_out = h5.File(models_path+DATE+"SingleSplit_TestSets.h5", 'w')
dt = h5.special_dtype(vlen=str)
for j, name in enumerate(test_sets_names):
    f_out.create_dataset(name, data=np.array(test_sets[j]), dtype=dt)
f_out.close()

In [None]:
# This can take a while... We will only train the networks in a first loop.
# We will evaluate the models once the training is completed
for i in range(10):
    # Model training
    model = DeepCCSModel()
    model.adduct_encoder = adducts_encoder
    model.smiles_encoder = smiles_encoder
    model.create_model()
    m_checkpoint = ModelCheckpoint(models_path+DATE+"SingleSplit_ModelWeigth_"+str(i)+".h5", save_best_only=True, save_weights_only=True)
    model.train_model(X1_train= train_set[0], X2_train=train_set[1], Y_train=train_set[2],
                X1_valid=validation_set[0], X2_valid=validation_set[1], Y_valid=validation_set[2],
                model_checkpoint=m_checkpoint, nbr_epochs=150, verbose=1)
    K.clear_session()

# Multi-split
We re-split the datasets randomly before training each model. This will allow to measure the impact of dataset splitting on the model performances.

In [None]:
np.random.seed(666)
r2 = []
mean_relative = []
median_relative = []

datasets = [read_dataset(datafile, d_name) for d_name in datasets_names] # Read
datasets = [filter_data(d_set) for d_set in datasets] # Filter

for xi in range(10):
    pooled_set = []
    test_sets = []
    train_set = []
    validation_set = []
    test_sets_names = []
    for i, dset in enumerate(datasets_names):
        if dset in ["MetCCS_train_pos", "MetCCS_train_neg"]:
            pooled_set.append([np.array(datasets[i]["SMILES"]),
                               np.array(datasets[i]["Adducts"]),
                               np.array(datasets[i]["CCS"])])
        elif dset in ["MetCCS_test_pos", "MetCCS_test_neg", "Astarita_pos", "Astarita_neg"]:
            test_sets.append([np.array(datasets[i]["SMILES"]),
                              np.array(datasets[i]["Adducts"]),
                              np.array(datasets[i]["CCS"])])
            test_sets_names.append(dset)
        elif dset in ["Baker", "McLean", "CBM"]:
            smiles = np.array(datasets[i]["SMILES"])
            ccs = np.array(datasets[i]["CCS"])
            adducts = np.array(datasets[i]["Adducts"])

            # We use binary masks to split the datasets between pooled and test
            mask_pooled = np.zeros(len(smiles), dtype=int)
            mask_pooled[:int(len(smiles) * 0.8)] = 1  # The remaining 20% goes in the test set.
            np.random.shuffle(mask_pooled)
            mask_test = 1 - mask_pooled
            mask_pooled = mask_pooled.astype(bool)
            mask_test = mask_test.astype(bool)

            pooled_set.append([smiles[mask_pooled], adducts[mask_pooled], ccs[mask_pooled]])
            test_sets.append([smiles[mask_test], adducts[mask_test], ccs[mask_test]])
            test_sets_names.append(dset)
    # Split pooled between train (90%) and validation (10%)
    smiles_pooled = np.concatenate([i[0] for i in pooled_set])
    adducts_pooled = np.concatenate([i[1] for i in pooled_set])
    ccs_pooled = np.concatenate([i[2] for i in pooled_set])

    mask_train = np.zeros(len(smiles_pooled), dtype=int)
    mask_train[:int(len(smiles_pooled) * 0.9)] = 1  # The remaining 20% goes in the test set.
    np.random.shuffle(mask_train)
    mask_valid = 1 - mask_train
    mask_train = mask_train.astype(bool)
    mask_valid = mask_valid.astype(bool)

    train_set = [smiles_pooled[mask_train], adducts_pooled[mask_train], ccs_pooled[mask_train]]
    validation_set = [smiles_pooled[mask_valid], adducts_pooled[mask_valid], ccs_pooled[mask_valid]]

    # One hot vector Encoding
    train_set[0] = smiles_encoder.transform(train_set[0])
    validation_set[0] = smiles_encoder.transform(validation_set[0])
    
    train_set[1] = adducts_encoder.transform(train_set[1])
    validation_set[1] = adducts_encoder.transform(validation_set[1])

    #Model training
    model = DeepCCSModel()
    model.adduct_encoder = adducts_encoder
    model.smiles_encoder = smiles_encoder
    model.create_model()
    m_checkpoint = ModelCheckpoint(models_path+DATE+"MultiSplit_ModelWeigth_"+str(xi)+".h5", save_best_only=True, save_weights_only=True)
    model.train_model(X1_train= train_set[0], X2_train=train_set[1], Y_train=train_set[2],
                X1_valid=validation_set[0], X2_valid=validation_set[1], Y_valid=validation_set[2],
                model_checkpoint=m_checkpoint, nbr_epochs=150, verbose=1)
    
    #Save test data in case kernel dies.
    f_out = h5.File(models_path+DATE+"MultiSplit_" + str(xi) + "_TestSet.h5", 'w')
    dt = h5.special_dtype(vlen=str)
    for j, name in enumerate(test_sets_names):
        f_out.create_dataset(name, data=np.array(test_sets[j]), dtype=dt)
    f_out.close()
    K.clear_session()


# Evaluate the models performances
## Single split experiment

In [9]:
#Load single split results

test_sets_file = h5.File(models_path+DATE+"SingleSplit_TestSets.h5", 'r')
test_sets_names = []
test_sets = []
for dset in test_sets_file:
    test_sets_names.append(dset)
    print(dset)
    test_sets.append(np.array(test_sets_file[dset]))
    print(len(np.array(test_sets_file[dset])[0]))

#Convert back CCS from string to float
for i,j in enumerate(test_sets):
    test_sets[i][2] = np.array(test_sets[i][2], dtype=float)
    
#Create a global test set that contains everything.
test_smiles_global = np.concatenate([t[0] for t in test_sets])
test_adducts_global = np.concatenate([t[1] for t in test_sets])
test_ccs_global = np.concatenate([t[2] for t in test_sets])
test_sets.append([test_smiles_global, test_adducts_global, test_ccs_global])
test_sets_names = test_sets_names + ["global"]

print("There are {} test sets.".format(len(test_sets)))
print("They are: {}".format(test_sets_names))

ss_r2 = []
ss_mean_relative = []
ss_median_relative = []

adducts_encoder = AdductToOneHotEncoder()
adducts_encoder.load_encoder("Adducts_encoder.json")
smiles_encoder = SmilesToOneHotEncoder()
smiles_encoder.load_encoder("SMILES_encoder.json")

for i in range(10):
    model = DeepCCSModel()
    model.adduct_encoder = adducts_encoder
    model.smiles_encoder = smiles_encoder
    model.create_model()
    model.model.load_weights(models_path+DATE+"SingleSplit_ModelWeigth_"+str(i)+".h5")
    model._is_fit = True
    for t in test_sets:
        predictions = model.predict(t[0], t[1]).flatten()
        ss_r2.append(r2_score(y_true=t[2], y_pred=predictions))
        ss_mean_relative.append(relative_mean(Y_true=t[2], Y_pred=predictions))
        ss_median_relative.append(relative_median(Y_true=t[2], Y_pred=predictions))

#Reshape for table creation        
ss_r2 = np.reshape(ss_r2, (10,8))
ss_mean_relative = np.reshape(ss_mean_relative, (10,8))
ss_median_relative = np.reshape(ss_median_relative, (10,8))

Agilent_neg
57
Agilent_pos
74
CBM
72
McLean
52
PNL
172
Waters_neg
113
Waters_pos
92
There are 8 test sets.
They are: [u'Agilent_neg', u'Agilent_pos', u'CBM', u'McLean', u'PNL', u'Waters_neg', u'Waters_pos', 'global']


In [10]:
#Results table
test_sets_names = [""] + test_sets_names
metrics = [ss_r2,  ss_median_relative, ss_mean_relative]
averages = [["R2", "median_relative_error", "mean_relative_error"]]
stds = [["R2", "median_relative_error", "mean_relative_error"]]
for i, dset in enumerate(test_sets):
    t_averages = []
    t_stds = []
    for metric in metrics:
        t_averages.append(np.round(np.mean([d[i] for d in metric]), decimals=3))
        t_stds.append(np.round(np.std([d[i] for d in metric]), decimals=3))
    averages.append(t_averages)
    stds.append(t_stds)
    
table_average = go.Table(
    header=dict(values=test_sets_names),
    cells=dict(values=averages))

table_stds = go.Table(
    header=dict(values=test_sets_names),
    cells=dict(values=stds))

iplot([table_average])
iplot([table_stds])

In [11]:
print(test_sets_names)
print(ss_r2[1])
print(ss_median_relative[1])

['', u'Agilent_neg', u'Agilent_pos', u'CBM', u'McLean', u'PNL', u'Waters_neg', u'Waters_pos', 'global']
[ 0.97778957  0.96852226  0.93652493  0.9940649   0.94818697  0.9659747
  0.9266496   0.97705955]
[ 2.30536868  1.63401977  2.21375592  1.77327552  2.52938721  2.21003961
  4.22518478  2.28376343]


In [12]:
#Extract results of the first model..

metrics = [ss_r2,  ss_median_relative, ss_mean_relative]

values_to_table = [[1,2,3,4,5,6,7,8,9,10],
                   [m[-1] for m in ss_r2], 
                   [m[-1] for m in ss_median_relative], 
                   [m[-1] for m in ss_mean_relative]]

table_search_best_model = go.Table(
    header=dict(values=["Model","R2", "Median relative", "mean relative"]),
    cells=dict(values=np.round(values_to_table, decimals=3)))

iplot([table_search_best_model])

## Multi-split results

In [13]:
#Load multi-split results
ms_r2 = []
ms_mean_relative = []
ms_median_relative = []

for xi in range(10):
    model = DeepCCSModel()
    model.adduct_encoder = adducts_encoder
    model.smiles_encoder = smiles_encoder
    model.create_model()
    model.model.load_weights(models_path+DATE+"MultiSplit_ModelWeigth_"+str(xi)+".h5")
    model._is_fit = True
    
    test_set_file = h5.File(models_path+DATE+"MultiSplit_"+str(xi)+"_TestSet.h5", 'r')
    test_sets_names = []
    test_sets = []
    for dset in test_sets_file:
        test_sets_names.append(dset)
        test_sets.append(np.array(test_sets_file[dset]))

    #Convert back CCS from string to float
    for i,j in enumerate(test_sets):
        test_sets[i][2] = np.array(test_sets[i][2], dtype=float)

    #Create a global test set that contains everything.
    test_smiles_global = np.concatenate([t[0] for t in test_sets])
    test_adducts_global = np.concatenate([t[1] for t in test_sets])
    test_ccs_global = np.concatenate([t[2] for t in test_sets])
    test_sets.append([test_smiles_global, test_adducts_global, test_ccs_global])
    test_sets_names = test_sets_names + ["global"]
    
    print("There are {} test sets.".format(len(test_sets)))
    print("They are: {}".format(test_sets_names))
    
    for t in test_sets:
        predictions = model.predict(t[0], t[1]).flatten()
        ms_r2.append(r2_score(y_true=t[2], y_pred=predictions))
        ms_mean_relative.append(relative_mean(Y_true=t[2], Y_pred=predictions))
        ms_median_relative.append(relative_median(Y_true=t[2], Y_pred=predictions))
        
ms_r2 = np.reshape(ms_r2, (10,8))
ms_mean_relative = np.reshape(ms_mean_relative, (10,8))
ms_median_relative = np.reshape(ms_median_relative, (10,8))

There are 8 test sets.
They are: [u'Agilent_neg', u'Agilent_pos', u'CBM', u'McLean', u'PNL', u'Waters_neg', u'Waters_pos', 'global']
There are 8 test sets.
They are: [u'Agilent_neg', u'Agilent_pos', u'CBM', u'McLean', u'PNL', u'Waters_neg', u'Waters_pos', 'global']
There are 8 test sets.
They are: [u'Agilent_neg', u'Agilent_pos', u'CBM', u'McLean', u'PNL', u'Waters_neg', u'Waters_pos', 'global']
There are 8 test sets.
They are: [u'Agilent_neg', u'Agilent_pos', u'CBM', u'McLean', u'PNL', u'Waters_neg', u'Waters_pos', 'global']
There are 8 test sets.
They are: [u'Agilent_neg', u'Agilent_pos', u'CBM', u'McLean', u'PNL', u'Waters_neg', u'Waters_pos', 'global']
There are 8 test sets.
They are: [u'Agilent_neg', u'Agilent_pos', u'CBM', u'McLean', u'PNL', u'Waters_neg', u'Waters_pos', 'global']
There are 8 test sets.
They are: [u'Agilent_neg', u'Agilent_pos', u'CBM', u'McLean', u'PNL', u'Waters_neg', u'Waters_pos', 'global']
There are 8 test sets.
They are: [u'Agilent_neg', u'Agilent_pos', u'C

In [14]:
test_sets_names = [""] + test_sets_names
metrics = [ms_r2,  ms_median_relative, ms_mean_relative]
averages = [["R2", "median_relative_error", "mean_relative_error"]]
stds = [["R2", "median_relative_error", "mean_relative_error"]]
for i, dset in enumerate(test_sets):
    t_averages = []
    t_stds = []
    for metric in metrics:
        t_averages.append(np.round(np.mean([d[i] for d in metric]), decimals=3))
        t_stds.append(np.round(np.std([d[i] for d in metric]), decimals=3))
    averages.append(t_averages)
    stds.append(t_stds)
    
table_average = go.Table(
    header=dict(values=test_sets_names),
    cells=dict(values=averages))

table_stds = go.Table(
    header=dict(values=test_sets_names),
    cells=dict(values=stds))

iplot([table_average])
iplot([table_stds])

In [15]:
#Extract the results for the global test set for each model independently in order to find the best model.
values_to_table = [[1,2,3,4,5,6,7,8,9,10],
                   [m[-1] for m in ms_r2], 
                   [m[-1] for m in ms_median_relative], 
                   [m[-1] for m in ms_mean_relative]]

table_search_best_model = go.Table(
    header=dict(values=["Model","R2", "Median relative", "mean relative"]),
    cells=dict(values=np.round(values_to_table, decimals=3)))

iplot([table_search_best_model])




# Dataset comparison
Since predictions for the Waters datasets (Astarita et al. 2014) give results that are lower than what is observed for other dataset, we investigate the global variability of CCS measurement in this multi-source dataset.

In [16]:
def compare_two_CCS_datasets(data1, data2):
    df1 = pd.DataFrame({"SMILES": data1["SMILES"],
                        "Adducts": data1["Adducts"],
                        "CCS1": data1["CCS"]})
    
    df2 = pd.DataFrame({"SMILES": data2["SMILES"],
                        "Adducts": data2["Adducts"],
                        "CCS2": data2["CCS"]})

    merged_df = pd.merge(left=df1, right=df2, on=["SMILES", "Adducts"], how='inner')
    n = len(merged_df["SMILES"])
    if n == 0:
        diff = 0
    else:
        # (1-2)/((1+2)/2)*100 
        ccs1 = np.array(merged_df["CCS1"])
        ccs2 = np.array(merged_df["CCS2"])
        diff = np.average((ccs1-ccs2)/((ccs1+ccs2)/2)*100)
    return n,diff

for i, d1 in enumerate(datasets_names):
    for j, d2 in enumerate(datasets_names):
        n_identical, diff_identical = compare_two_CCS_datasets(datasets[i], datasets[j])
        print("{}, {}, {}, {}".format(d1,d2,n_identical, diff_identical))

MetCCS_train_pos, MetCCS_train_pos, 342, 0.0
MetCCS_train_pos, MetCCS_train_neg, 0, 0
MetCCS_train_pos, MetCCS_test_pos, 1, 0.36862140121
MetCCS_train_pos, MetCCS_test_neg, 0, 0
MetCCS_train_pos, Astarita_pos, 1, 4.36169844559
MetCCS_train_pos, Astarita_neg, 0, 0
MetCCS_train_pos, Baker, 79, -1.77408501016
MetCCS_train_pos, McLean, 7, -0.0337819971416
MetCCS_train_pos, CBM, 4, 1.13823824605
MetCCS_train_neg, MetCCS_train_pos, 0, 0
MetCCS_train_neg, MetCCS_train_neg, 356, 0.0
MetCCS_train_neg, MetCCS_test_pos, 0, 0
MetCCS_train_neg, MetCCS_test_neg, 0, 0
MetCCS_train_neg, Astarita_pos, 0, 0
MetCCS_train_neg, Astarita_neg, 0, 0
MetCCS_train_neg, Baker, 71, -3.73742097786
MetCCS_train_neg, McLean, 13, -1.91652438081
MetCCS_train_neg, CBM, 0, 0
MetCCS_test_pos, MetCCS_train_pos, 1, -0.36862140121
MetCCS_test_pos, MetCCS_train_neg, 0, 0
MetCCS_test_pos, MetCCS_test_pos, 74, 0.0
MetCCS_test_pos, MetCCS_test_neg, 0, 0
MetCCS_test_pos, Astarita_pos, 57, 2.91256147508
MetCCS_test_pos, Astarita_

As we suspected, the CCS measurement from the Astarita paper are more divergent from the training data than other testing set. Although the specific reason will remain unknow, we can still mention that the calibration of TWIMS instrument impact CCS measurement and that it should be carefully executed.