# Accessing the data in bulk

This short tutorial explains how to retrieve full tables from the database into [pandas DataFrames](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html).

## The following tables are available from ``mendeleev``

* elements
* ionicradii
* ionizationenergies
* oxidationstates
* groups
* series
* isotopes

``mendeleev`` provides a convenient function `get_table` to perform the task at hand. The function can be directly imported from `mendeleev`

In [1]:
from mendeleev import get_table

To retrieve a table call the ``get_table`` with the table name as argument. Here we'll get probably the most important table ``elements`` with basis data on each element

In [2]:
ptable = get_table('elements')

Now we can use [pandas'](http://pandas.pydata.org) capabilities to work with the data. 

In [3]:
ptable.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 118 entries, 0 to 117
Data columns (total 70 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   annotation                     118 non-null    object 
 1   atomic_number                  118 non-null    int64  
 2   atomic_radius                  90 non-null     float64
 3   atomic_volume                  91 non-null     float64
 4   block                          118 non-null    object 
 5   boiling_point                  96 non-null     float64
 6   density                        95 non-null     float64
 7   description                    109 non-null    object 
 8   dipole_polarizability          117 non-null    float64
 9   electron_affinity              77 non-null     float64
 10  electronic_configuration       118 non-null    object 
 11  evaporation_heat               88 non-null     float64
 12  fusion_heat                    75 non-null     flo

For clarity let's take only a subset of columns 

In [4]:
cols = ['atomic_number', 'symbol', 'atomic_radius', 'en_pauling', 'block', 'vdw_radius_mm3']

In [5]:
ptable[cols].head()

Unnamed: 0,atomic_number,symbol,atomic_radius,en_pauling,block,vdw_radius_mm3
0,1,H,25.0,2.2,s,162.0
1,2,He,120.0,,s,153.0
2,3,Li,145.0,0.98,s,255.0
3,4,Be,105.0,1.57,s,223.0
4,5,B,85.0,2.04,p,215.0


It is quite easy now to get descriptive statistics on the data.

In [6]:
ptable[cols].describe()

Unnamed: 0,atomic_number,atomic_radius,en_pauling,vdw_radius_mm3
count,118.0,90.0,85.0,94.0
mean,59.5,149.844444,1.748588,248.468085
std,34.207699,40.07911,0.634442,36.017828
min,1.0,25.0,0.7,153.0
25%,30.25,135.0,1.24,229.0
50%,59.5,145.0,1.7,244.0
75%,88.75,178.75,2.16,269.25
max,118.0,260.0,3.98,364.0


## Isotopes table

Let try and retrieve another table, namely ``isotopes``

In [7]:
isotopes = get_table('isotopes', index_col='id')

In [8]:
isotopes.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 406 entries, 1 to 406
Data columns (total 11 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   atomic_number      406 non-null    int64  
 1   mass               377 non-null    float64
 2   abundance          288 non-null    float64
 3   mass_number        406 non-null    int64  
 4   mass_uncertainty   377 non-null    float64
 5   is_radioactive     406 non-null    bool   
 6   half_life          121 non-null    float64
 7   half_life_unit     85 non-null     object 
 8   spin               323 non-null    float64
 9   g_factor           323 non-null    float64
 10  quadrupole_moment  320 non-null    float64
dtypes: bool(1), float64(7), int64(2), object(1)
memory usage: 35.3+ KB


### Merge the elements table with the isotopes

We can now perform SQL-like merge operation on two ``DataFrame``s and produce an [outer](http://pandas.pydata.org/pandas-docs/stable/merging.html#database-style-dataframe-joining-merging) join 

In [9]:
import pandas as pd

In [10]:
merged = pd.merge(ptable[cols], isotopes, how='outer', on='atomic_number')

now we have the following columns in the ``merged`` ``DataFrame``

In [11]:
merged.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 406 entries, 0 to 405
Data columns (total 16 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   atomic_number      406 non-null    int64  
 1   symbol             406 non-null    object 
 2   atomic_radius      328 non-null    float64
 3   en_pauling         313 non-null    float64
 4   block              406 non-null    object 
 5   vdw_radius_mm3     350 non-null    float64
 6   mass               377 non-null    float64
 7   abundance          288 non-null    float64
 8   mass_number        406 non-null    int64  
 9   mass_uncertainty   377 non-null    float64
 10  is_radioactive     406 non-null    bool   
 11  half_life          121 non-null    float64
 12  half_life_unit     85 non-null     object 
 13  spin               323 non-null    float64
 14  g_factor           323 non-null    float64
 15  quadrupole_moment  320 non-null    float64
dtypes: bool(1), float64(10), i

In [12]:
merged.head()

Unnamed: 0,atomic_number,symbol,atomic_radius,en_pauling,block,vdw_radius_mm3,mass,abundance,mass_number,mass_uncertainty,is_radioactive,half_life,half_life_unit,spin,g_factor,quadrupole_moment
0,1,H,25.0,2.2,s,162.0,1.007825,0.99972,1,6e-10,False,,,0.5,5.585695,0.0
1,1,H,25.0,2.2,s,162.0,2.014102,0.00028,2,8e-10,False,,,1.0,0.857438,0.00286
2,1,H,25.0,2.2,s,162.0,,,3,,True,,,0.5,5.957994,0.0
3,2,He,120.0,,s,153.0,3.016029,2e-06,3,2e-08,False,,,0.5,-4.254995,0.0
4,2,He,120.0,,s,153.0,4.002603,0.999998,4,4e-10,False,,,0.0,0.0,0.0


To display all the isotopes of Silicon

In [13]:
merged[merged['symbol'] == 'Si']

Unnamed: 0,atomic_number,symbol,atomic_radius,en_pauling,block,vdw_radius_mm3,mass,abundance,mass_number,mass_uncertainty,is_radioactive,half_life,half_life_unit,spin,g_factor,quadrupole_moment
28,14,Si,110.0,1.9,p,229.0,27.976927,0.92191,28,3e-09,False,,,0.0,0.0,0.0
29,14,Si,110.0,1.9,p,229.0,28.976495,0.04699,29,3e-09,False,,,0.5,-1.11058,0.0
30,14,Si,110.0,1.9,p,229.0,29.97377,0.0311,30,2e-08,False,,,0.0,0.0,0.0
