# Using Databases in PyXMIP

![a](https://img.shields.io/badge/Subject:-Databases-blue)
![b](https://img.shields.io/badge/Difficulty:-Easy-green)
![c](https://img.shields.io/badge/Author:-Eliza_Diggins-green)

---

In this example guide, we're going to walk through the use / interaction with ``pyXMIP``'s database classes.


## Contents


# Accessing Databases

---

In ``pyXMIP``, databases are represented by classes in the ``pyxmip.structures.databases`` module. There are various types of database, ranging from local databases (local catalogs) to remote databases like NED and SIMBAD. The purpose of the ``databases`` module is to provide an easy, intuitive link between the user and the database-specific querying and managing tasks than are necessary to successfully interact with the relevant data.

## Local Databases

Local databases are instances where a user might choose to load a catalog of their own to cross-reference against. To demonstrate, we'll use the eROSITA Hard band catalog for the ERASS 1 survey as our database. The firsting thing to do is to load the data into ``pyXMIP``. If you're trying to follow along, you can find the eRASS 1 data [here](https://erosita.mpe.mpg.de/dr1/AllSkySurveyData_dr1/Catalogues_dr1/MerloniA_DR1/eRASS1_Hard.tar.gz). Once it's been unzipped, you should have a file ``eRASS1_Hard.v1.0.fits``. Let's go ahead and load the file into a database!


In [1]:
import pyXMIP as pyxm
from pyXMIP.utilities.logging import mainlog

mainlog.verbosity = 2

# -- read the table into memory as a SourceTable -- #
catalog = pyxm.SourceTable.read("data/eRASS1_Hard.v1.0.fits")

print(f"The catalog has length {len(catalog)}.")

We can now load the table as a database using the ``pyxm.LocalDatabase`` class:

In [2]:
database = pyxm.LocalDatabase(catalog, "example_database")

### Basic Properties of Local Databases

Congrats, you've just loaded a table as a ``LocalDatabase``! Let's start exploring the database.

The second parameter we passed above is the ``name`` of the ``LocalDatabase`` instance. We can access it using ``database.name``.

In [3]:
print(database.name)

These local databases operate much like normal ``SourceTable`` objects; you can access the raw data using ``database.table``. What makes these objects useful is that you can immediately perform all of ``pyXMIP``'s core functionality for cross-matching. 

The first thing to demonstrate is **querying** the database. This allows us to pull all instances within a given radius. Let's try pulling all of the matches within 1 degree of the galactic center:

In [4]:
from astropy.coordinates import SkyCoord
import astropy.units as u

query_data = database.query_radius(
    SkyCoord(0, 0, unit="deg", frame="galactic"), 1 * u.deg
)

In [5]:
query_data

---

## Remote Databases

---

As you might expect, the transition from the ``LocalDatabase`` class to the ``RemoteDatabase`` class includes a moderate increase in complexity. Just like local databases, remote databases are used for querying, cross matching, and creating Poisson atlases (all of these topics are covered in other guides). The only major differences are:

- Query remote databases requires sending / receiving HTTP information from the database. This can take a lot longer.
- Different online databases might have quite a number of settings / configurations for getting / receiving their data.
- Remote databases are generally much much larger and therefore cannot be reasonably converted to local databases.

For these reasons, there are a few differences in the implementation of remote databases versus the local database.

- Many of the common remote databases are **built-in** to the ``pyXMIP`` infrastructure.
  - Unlike local databases, where you can load the database from a table; if you need to create your own remote database, you will need to write a    ``RemoteDatabase`` class. Doing so is beyond the scope of this brief example; however, details can be found elsewhere in the documentation.
- Each remote database has a ``.query_config`` attribute (which can be set as a kwarg when initializing the database).
  - This is a ``dict`` containing various settings to configure for your particular need. The exact details may vary from database to database.
  - **Example:** In the NED database, by default, the cross-matching table will only return the match name, RA, DEC, and object type. This can be changed by setting the ``kept_columns`` value in the ``query_config``.

Nonetheless, remote databases operate in much the same way that ``LocalDatabases`` do. 


In [6]:
from pyXMIP.structures.databases import NED

In [7]:
from astropy.coordinates import SkyCoord
import astropy.units as u

database = NED()
query_data = database.query_radius(
    SkyCoord(0, 0, unit="deg", frame="galactic"), 1 * u.arcmin
)

In [8]:
query_data[:5]

Notice that these remote databases are **much larger** and therefore may take a long time to respond for a given search radius.

Unlike ``LocalDatabases``, which read a schema directly from the source table or are provided a schema by the user, remote databases **must** be provided with a schema.

- For the various built-in databases (NED, SIMBAD, etc.) there are "standard" schema stored in the ``database_class.default_query_schema`` attribute.
  - Generally, these don't need to be overridden, but if you need to, a custom schema can be supplied to the ``__init__`` method when instantiating the instance.
 
As an example; 

In [12]:
# Initialize a "standard" NED instance.
ned_database_standard = NED()

# Make and edit a copy of the default schema.
default_ned_schema = NED.default_query_schema
default_ned_schema.column_map["TYPE"] = "something new"

# Initialize a custom NED instance.
ned_database_custom = NED("NED_custom", query_schema=default_ned_schema)

# Compare the schema
print(ned_database_standard.name, ned_database_custom.name)
print(
    ned_database_standard.query_schema.column_map,
    ned_database_custom.query_schema.column_map,
)

As you can see, the schema are now different. In general, it's not super useful to edit the query schema, but other settings (particularly ``query_config``) operate the same way and may need to be overridden in various applications.

# Beyond Simple Queries

---

Now that you've been aquainted with the two basic types of databases and how to perform queries from them, it's now worth looking at some of the more advanced methods which exist for these classes.

##

## What is a database?

---

Databases are contained in the ``pyxmip.structures.databases`` module and form a backbone for interacting with both local and external sources of catalog information. Under the hood, these are simply wrappers for the case-specific querying behavior of individual types and instances of databases.

All databases, regardless of type are descended from the abstract class the ``databases.SourceDatabase`` class. 