# Get fitting data
We are going to fit to the solvation energies available in [Ward et al.](https://pubs.acs.org/doi/abs/10.1021/acs.jpca.1c01960), which are available on the Materials Data Facility

In [1]:
from foundry import Foundry
from ase import units 
import pandas as pd

  from .autonotebook import tqdm as notebook_tqdm


## Download the dataset
The data we want is on Foundry, so we can 

In [2]:
client = Foundry()
client.load('10.18126/jos5-wj65', globus=False);

In [3]:
data = client.load_data()

Combine it into a single dataset

In [4]:
data = pd.concat(data['train'], axis=1)

## Convert solvation energies to eV
They are in kcal/mol now, and we'll want them in eV to easily compare to CP2K results

In [5]:
sol_cols = [c for c in data.columns if 'sol' in c]

In [6]:
data[sol_cols] *= units.kcal / units.mol  # Converts to eV

In [7]:
data[sol_cols].describe()

Unnamed: 0,sol_acetone,sol_acn,sol_dmso,sol_ethanol,sol_water
count,130258.0,130258.0,130258.0,130258.0,130258.0
mean,-0.383766,-0.372061,-0.321112,-0.388667,-0.324313
std,0.113302,0.116708,0.12581,0.124607,0.18791
min,-3.16293,-3.237863,-3.174669,-3.595782,-3.657593
25%,-0.436546,-0.426281,-0.38333,-0.454199,-0.438054
50%,-0.372613,-0.360252,-0.309325,-0.380756,-0.316671
75%,-0.314307,-0.300804,-0.241648,-0.312391,-0.199511
max,0.132352,0.137165,0.182151,0.159237,0.199236


## Save to disk
So we can use it later

In [8]:
data.to_csv('reference-data.csv.gz', index=False)