# Tutorial 1: Database setup

In this tutorial you are going to learn how to:

<div class="alert alert-block alert-warning">
    
**[Initialize a local database for materials data](#Initialization)**
   
**[Select an API for a remote data source](#Select-an-API)**
   
**[Select a backend for data storage](#Select-a-Backend)**
   
**[Download data from the NOMAD Archive](#Download-data)**
   
**[Interpret log files](#Interpret-log-files)**

**[Analyzing data](#Analyzing-data)**

</div>

Let's get started!

First, we need to import:

In [1]:
from simdatframe import MaterialsDatabase

## Initialization

You can create a new database using:

```python
database = MaterialsDatabase(
    filename='tutorial1.db', # name of the database file
    filepath='data', # relative path where the database and logs are stored
    rootpath='.', # root path to which the filepath will be evaluated
    key_name='mid', # name of the unique keys used in the database
    api=None, # APIClass object that connects to an external database
              # default (None) connects to the NOMAD Archive
    backend='ase', # file backend to store data
    silent_logging=False, # Set `True` to not write log messages to screen
)
```

You can omit all keyword arguments. The default name for the database file is `materials_database.db`. 

Let's look at them one by one:

**filename**

This is the name of the database file. It is passed to the `backend`, which creates the file, or, if it exists, loads the data from there.

**filepath**

The path where the database file should be created. If this path does not exist, the `backend` will create it. This path is relative to the _rootpath_.

**rootpath**

The path to an existing directory, which acts as the root for the creation of the file structure.

**key_name**

This is the name of the unique database key. Each entry in the database must have this key, it is used consitently throughout the whole package. This allows to connect the materials, their descriptors and similarity  relations. The key itself is created by the `API`. If you read from an existing database file, make sure that the correct `key_name` is used.

**api**

This argument is used to define from which external source the data is downloaded. The default is the NOMAD Archive, which used the [NOMAD Python client](https://nomad-lab.eu/prod/v1/docs/index.html). You can create your own `API`, to download data from any source. For details please refer to the documentation. This object can be passed as a keyword argument. More details are in [the next section](#Select-an-API).

**backend**

The `backend` is used to handle the file storage for the database files. By default the [ASE Database](https://wiki.fysik.dtu.dk/ase/ase/db/db.html) is used. It stores the atomic structures and properties in a lightweight SQL database. It is possible to create your own backends. For more details please refer to the documentation, and read the [section below](#Select-a-Backend).

**silent_logging**

When downloading large quanties of data from external sources, it can happen that web requests fail or the data format is different than expected. For that reason, the `MaterialsDatabase` does continue operation even if certian data that should be downloaded is not available or can not be processed. However, for keep track of the provenance, most error messages (also those from the API) are written to a log file. We will see how to access these logs [later](#Interpret-log-files). At the same time, they are written to the screen. If you want to suppress the writing to screen, set `silent_logging = True`. 

## Select an API

To download data from different sources, it is possible to use a custom defined API object. In this tutorial, we are going to use the default API, but we set it explicitly. First, we import the API code:

In [2]:
from simdatframe.apis.NOMAD_client_API import API

Next, we create the API object:

In [3]:
api = API(logger = None)

Because we use the NOMAD client, which asynchroneously downloads data, from a Jupyter notebook we have to do a little setup:

In [4]:
import nest_asyncio 
nest_asyncio.apply()

This specific API does not require any setup, but you can use this e.g. to set the _url_ of a website. The logger is set to `None`, however, when we pass it to the `MaterialsDatabase`, the logger will be set, such that all error messages are sent to the log file. We can already use the `API` object to download data:

In [5]:
material = api.get_calculation("AW45kWwD6Qq3wgdy6CpMgjX3wjOh")

Fetching remote uploads...
1 entries are qualified and added to the download list.
Downloading required data...


In [6]:
print(material)

Material(mid = AW45kWwD6Qq3wgdy6CpMgjX3wjOh, data = {'mid', 'energy_total', 'atoms'}, properties = set())


As you can see from the output, we downloaded the [a material from NOMAD](https://nomad-lab.eu/entry/id/AW45kWwD6Qq3wgdy6CpMgjX3wjOh). The properties include the total energy and the atomic positions, which are available via:

In [7]:
print(material.data["energy_total"])

-956939.7336141159


In [8]:
print(material.atoms)

Atoms(symbols='PbI2', pbc=True, cell=[[4.557999762860691, 0.0, 0.0], [-2.2789998764296207, 3.947343586727941, 0.0], [0.0, 0.0, 6.985999558607339]])


as properties of a `Material` object. You can read more about those in the documentation.

In the next step, we create a backend to store our data.

## Select a Backend

## Download data

## Interpret-log-files

## Analyzing data