# Working with AsteroCat
This notebook goes through a few tools for interacting with AsteroCat, mainly in the form of adding entries to the basket. The data will be stored in a fairly accesible manner, so reading and using data from the basket is fairly straightforward and we'll not go into too much detail here since it's probably going to be very user-specific.

## The basket structure
The main idea behind the AsteroCat catalog is to list targets that have been shown to exhibit solar-like oscillations. Non-detections can also be listed, but stars that have no yet been analyzed should not be.

For each star, or catalog item, there can be several entries corresponding to each observation of oscillations that has been made. For example if someone detects oscillations in a given star with \textit{Kepler} and then someone else also sees them with TESS, we should see two separate entries for that target, each showing some basic information about the observations.

## An entry contains...
The entry should probably be divided into a few sets of columns, one detailing the detection, one for the data and one for additional information on the target. 

- **Entry date:** When the entry is made.

- **Detection probability:** Preferably 0-100. If a source doesn't provide this then 100 is fine, as long as the detection can be visually verified. The same for if a detection is only made with visual inspection.
- **Detection source:** Which publication the detection is made in. Where possible try to use the original publication of the detection. If not from a publication just use initials or some useful identifier. 
- **Detection method:** This can be the name of the pipeline or if it's done visually. If the detection from a particular method requires visual verification it should be labeled as a visual detection, not as due to the method.
- **Detection date:** When the detection is made. This is probably redundant with the source, so could just be the year.
- **Detection numax:** optional, numax of the target if the detection method provides it.
- **Detection dnu:** optional, dnu of the target if the detection method provides it.
- **Detection multiplicity:** The number of identifiable oscillators in the spectrum. Usually 1, but should be >1 even if it's just a background star. True binarity can always be varified later. 

- **Data source:** The mission or instrument the data comes from. 
- **Data type:** RV or photometry (other?) 
- **Data length:** Length of the time series
- **Data cadence:** Sampling rate of the time series. Just use the median rate.
- **Data duty cycle:** 0-1. Number of cadences over the time series length divided by the median cadence.

- **Target sky coords** Not sure, but could be useful for cross-matching purposes.
- **Target simbad colors** Not sure, but could be useful for cross-matching purposes.

## Once an entry has been created...
When the user adds a a name for the target to the entry, we need to initiate a crossmatch to see if it the target is already in the catalog. Each unique target that has been entered will have a catalog identifier, and so if the target already exists it should be added to that catalog item. If not, a new entry should be added with a new catalog item identifier created for it. This identifier is for internal use only, and will have a bunch of other target names associated. 

At the top level these identifier should be listed in a catalog index file which shows the catalog name of the star, and probably a few other names for the target, probably Gaia, 2MASS, Beyer, HD, HR, HIP, Kepler, EPIC and TIC? 

The user can cross-reference their target name or catalog of names with this file to find the AsterCat entry name, which then allows for access to the data storage file. 

In [42]:
import pandas as pd
import os
from asterosearch import search
import numpy as np

class CatAdder():
    def __init__(self, path=None):

        self.catPath = self.setPath(path, 'AsteroCat.csv')
        
        self.catIndexPath = self.setPath(path, 'catIndex.csv')
        
        self.cat = pd.read_csv(self.catPath)
        
        self.catIndex = pd.read_csv(self.catIndexPath)
        
        self.catKeys = list(self.cat.keys())
        
        _dtypes = ['Int64']*2 + ['str']*6 + ['float']*14
        
        self.blankCatEntry = pd.DataFrame({key: pd.Series(dtype=_dtypes[i]) for i, key in enumerate(self.catKeys)})
        
        self._defaultValues = {}
        
        self.cats_avail = search.cats_avail
                
        # Do you want to set soem default values for this session? 
        # If yes, cycle through columns to set default values
    
    def getBlank(self):
        
        return self.blankEntry.copy()
    
    def setPath(self, path, defaultBaseName):
        
        if path is None:
            fullpath = os.path.join(*[os.getcwd(), defaultBaseName])
            
#         elif os.path.isfile(path):
#             fullpath = path
        
        elif os.path.isdir(path):
            fullpath = os.path.join(*[path, defaultBaseName])
        
#         elif path == os.path.basename(path):
#             fullpath = os.path.join(*[os.getcwd(), path])
            
        else:
            raise ValueError('Hmm.. donno where that cat is. Please provide full path to file or directory.')      

        return fullpath
        #self.catPath = fullpath
    
    
    def getCatID(self, inputID):
        
        s = search(inputID)

        s.query_simbad()

        idx = np.zeros(len(CA.catIndex), dtype=bool)

        for key in CA.cats_avail:

            _idx = (CA.catIndex[key].values == s.IDs[key].values) & (CA.catIndex[key].values != [''])
            
            if any(_idx):
                print(key, CA.catIndex[key].values, s.IDs[key].values)
            idx += _idx
        print(self.catIndex.loc[idx, 'catID'])
        N = len(self.catIndex.loc[idx, 'catID'].values) 
        
        if N == 1:
            catID = self.catIndex.loc[idx, 'catID'].values[0]
            print('1')
        elif N == 0:
            print('0')
            if len(self.catIndex) == 0:
                maxCat = 0
            else:
                maxCat = max(self.catIndex['catID']) 
                
            catID = maxCat + 1
            
            s.IDs['catID'] = catID
            
            self.catIndex = pd.concat([self.catIndex, s.IDs])
            
        else:
            raise ValueError('Found multiple catIDs for this target. Something is wrong with the cat!')
        
        #self.catIndex.to_csv(self.catIndexPath, index=False)
                             
        return catID
        
    def appendEntry(self):
        pass
        # Take one entry and append it to cat
    
        
    def addOne(self, inputID, entry=None, catID=None):
                
        newRow = self.blankCatEntry.copy()

        if entry is None:
            pass
    
        elif isinstance(entry, dict):
            
            for key in self.catKeys:
                if key in entry.keys():
                    newRow.at[0, key] = entry[key]
            
        elif isinstance(entry, pd.core.frame.DataFrame):
            for key in self.catKeys:
                if key in entry.keys():
                    newRow.at[0, key] = entry.loc[0, key]
      
        else:
            raise ValueError("Donno what kind of input you're trying pass here bub.")
        
        
        # query if some columns are blank using default if necessary.

        newRow.at[0, 'catID'] = self.getCatID(inputID)
        
        self.cat = pd.concat([self.cat, newRow])
    
    def addMany(self):
        pass
        # Inputs types: pandas df, dictionary

In [43]:
CA = CatAdder()

In [44]:
CA.catIndex

Unnamed: 0,catID,SPOCS,KIC,TIC,EPIC,Gaia DR1,Gaia DR2,Gaia DR3,HD,HIP,...,AG,GSC,Ci,PLX,SKY#,WISEA,WISE,PSO,ALLWISE,input


In [45]:
CA.cat

Unnamed: 0,catID,Detection multiplicity,Entry date,Detection source,Detection method,Detection date,Data source,Data type,Data length,Data cadence,...,Target dec,Target G magnitude,Target U magnitude,Target B magnitude,Target V magnitude,Target R magnitude,Target I magnitude,Detection probability,Detection numax,Detection dnu


In [46]:
CA.addOne('KIC8006161',  entry={'Detection multiplicity': 1, 'Entry date': 'tomorrow'})

Series([], Name: catID, dtype: object)
0


In [47]:
CA.addOne('KIC4448777',  entry={'Detection multiplicity': 2, 'Entry date': 'today'})

Series([], Name: catID, dtype: object)
0


In [48]:
CA.catIndex

Unnamed: 0,catID,SPOCS,KIC,TIC,EPIC,Gaia DR1,Gaia DR2,Gaia DR3,HD,HIP,...,AG,GSC,Ci,PLX,SKY#,WISEA,WISE,PSO,ALLWISE,input
0,1,SPOCS 2146,KIC 8006161,TIC 123234496,,Gaia DR1 2117278998434010240,Gaia DR2 2117279002732678656,Gaia DR3 2117279002732678656,HD 173701,HIP 91949,...,AG+43 1559,GSC 03130-00722,Ci 18 2455,PLX 4329,SKY# 34420,,,,,KIC 8006161
0,2,,KIC 4448777,TIC 120893801,,,Gaia DR2 2100458330254478464,Gaia DR3 2100458330254478464,,,...,,,,,,,,,,KIC 4448777


In [49]:
CA.cat

Unnamed: 0,catID,Detection multiplicity,Entry date,Detection source,Detection method,Detection date,Data source,Data type,Data length,Data cadence,...,Target dec,Target G magnitude,Target U magnitude,Target B magnitude,Target V magnitude,Target R magnitude,Target I magnitude,Detection probability,Detection numax,Detection dnu
0,1,1,tomorrow,,,,,,,,...,,,,,,,,,,
0,2,2,today,,,,,,,,...,,,,,,,,,,
