# `PointsDirectory` - A class used to encapsulate all calculations for many geometries (of a dataset)

The `ichor.core.files.PointsDirectory` class can be used to easily work with thousands of files which are generated when getting Gaussian, AIMAll, etc. calculations for many geometries.

The general structure of a `PointsDirectory`-like directory is like so:

```
.
|--- SYSTEM0001
|   |--- SYSTEM0001_atomicfiles
|   |   |--- h2.int
|   |   |--- h3.int
|   |   |--- o1.int
|   |--- SYSTEM0001.gjf
|   |--- SYSTEM0001.wfn
|--- SYSTEM0002
|   |--- SYSTEM0002_atomicfiles
|   |   |--- h2.int
|   |   |--- h3.int
|   |   |--- o1.int
|   |--- SYSTEM0002.gjf
|   |--- SYSTEM0002.wfn
...
...
...
```

Essentially, `PointsDirectory` is a classed that is used to parse a directory contains many sub-directories. Each sub-directory (e.g. `SYSTEM0001`, `SYSTEM0002`) contains all relevant calculations for **one** geometry. Each of the *sub-directories* can be individually read in as a `ichor.core.files.PointDirectory` instance (note that there is no *s* in this case.)

This class makes it easy to access calculations for many geometries very easily. For example, it can be used if you want to get the `.wfn` energy of all geometries.

## `PointDirectory` strucutre

The `PointDirectory` class encapsulates a directory, containing all relevant calculations for **one** geometry. It subclasses from `ichor.core.files.directory.AnnotatedDirectory`. This gives us the ability to define *class* variables, which are of specific file types. Then the `AnnotatedDirectory._parse` method is what parses all files in the directory. The extensions of the files determine what the file type, and thus the class which is going to be used to parse the file.

## Obtaining Gaussian results

### Obtaining total energy from `PointsDirectory`

In [20]:
from ichor.core.files import PointsDirectory

# PointsDirectory("path_to_directory_with_wfn_and_int_files")
points_dir = PointsDirectory("../../../example_files/urea_example_points_directory")

for point_directory in points_dir:

    print(point_directory.name, point_directory.wfn.total_energy)

urea0000 -225.17620443033
urea0001 -225.081384781131


### Accessing IQA energy

In [21]:
for point_directory in points_dir:

    print(point_directory.ints.iqa)

# note that this is for A A'

{'C1': -37.857821914, 'O2': -75.361149077, 'N3': -54.974866964, 'N4': -54.940748886, 'H5': -0.51125155556, 'H6': -0.52313765036, 'H7': -0.498359498, 'H8': -0.50876638824}
{'C1': -37.82774214, 'O2': -75.372219569, 'N3': -54.930291248, 'N4': -54.950141186, 'H5': -0.49270810195, 'H6': -0.5019590637, 'H7': -0.51138493187, 'H8': -0.49501200059}


### Accessing Mulipole Moments

In [22]:
for point_directory in points_dir:

    print(point_directory.ints.global_spherical_multipoles)

# note these are not rotated

{'C1': {'q00': 2.0566949812, 'q10': 0.43547872781, 'q11c': -0.11072742898, 'q11s': -0.058793978143, 'q20': 0.26199432667, 'q21c': -0.19584871924, 'q21s': 0.060993929677, 'q22c': -0.34915275413, 'q22s': 0.7005351433, 'q30': -0.46399151978, 'q31c': -0.076106962971, 'q31s': -0.63067578604, 'q32c': 0.13837424126, 'q32s': -0.19269925576, 'q33c': -0.32411444133, 'q33s': 0.086714013101, 'q40': -0.08844184554, 'q41c': 0.68304098201, 'q41s': 0.28844244849, 'q42c': 0.38965399325, 'q42s': -0.9370677379, 'q43c': 0.01477248443, 'q43s': 0.56492211508, 'q44c': 0.86785954459, 'q44s': 1.2164272089, 'q50': 0.64442652355, 'q51c': -1.0766687502, 'q51s': 1.0621360834, 'q52c': -1.6117534507, 'q52s': 0.71110515289, 'q53c': -2.0172675763, 'q53s': -0.29786028162, 'q54c': 0.087737831647, 'q54s': -0.1413840078, 'q55c': -0.42342204159, 'q55s': 1.7161766165}, 'O2': {'q00': -1.3289438603, 'q10': 0.386303531, 'q11c': -0.17380989349, 'q11s': -0.23541024678, 'q20': 0.094856993798, 'q21c': -0.12768108116, 'q21s': -0.12

## Converting to SQLite3 database

Reading thousands of files every time is very time consuming (especially on hard drives), so it is much more efficient to read the data once and store it in a database. `ichor` has SQLite3 support implemented, meaning a `PointsDirecotry` can be readily converted to an SQLite3 database. **NOTE: ONLY RAW DATA FROM CALCULATIONS IS STORED IN THE DATABSE. NO POSTPROCESSING IS DONE. ANY POSTPROCESSING MUST BE DONE AT A LATER STEP (e.g. rotating multipole moments).**

Code snipped to produce database:

```python

from ichor.core.files import PointsDirectory

pd = PointsDirectory("points_directory_path")
pd.write_to_sqlite3_database()
```

**Note 1: It takes a while to read all files, so this should be submitted on compute.**

**Note 2: If the dataset is large and split into many `PointsDirectory`-like directories, then you can do**

```python
from ichor.core.files import PointsDirectory
from pathlib import Path

parent_dir = Path("parent_dir")

for d in parent_dir.iterdir():

    pd = PointsDirectory("points_directory_path")
    pd.write_to_sqlite3_database("large_database.db")
```

where all the information will be stored into one database.

## SQLite Database Schema Diagram

The following is that the schema diagram looks like for the table currently. The image was made with DBVisualizer. Note that these **all** fields might not be populated if the database. That depends on the raw data that is present in the `PointsDirectory`. For example, if only Gaussian are ran, then the AIMAll-related data will be missing from the database.

![alt text](../../../example_files/sql_database_schema.svg "SQLite3 Schema")

## Generating CSV files with Features from SQLite3 Database

CSV files containing (ALF) features and relevant outputs can be generated from an SQLite3 database like so:

```python
from ichor.core.sql.query_database import (
    get_alf_from_first_db_geometry,
    write_processed_data_for_atoms_parallel,
    write_processed_data_for_atoms
)

db_path = "DATABASE_PATH"

# note that you can also define an ALF manually as well
# or get it from some other molecular geometry
# that contains the same atom sequencing as in the database
alf = get_alf_from_first_db_geometry(db_path)

# note that this will write files out in parallel
# use write_processed_data_for_atoms for serial

write_processed_data_for_atoms_parallel(
    db_path,
    alf,
    ncores=4,
    calc_multipoles=True, # rotates multipoles using C matrix
    calc_forces=False, # rotates forces using C matrix, note that this takes a while
)
```