## Exploring the MPC's Isolated Tracklet File
#### Matthew J. Holman
##### Matthew J. Payne


20 November 2017

Slavishly copied from the ITF.ipynb 

* Just using this to create some input files that are equivalent to the ITF
* These will be from the Numbered & Unnumbered MPC files
* They will be cut down (in size)
* Will probably over-sample the rare objects (NEOs, Trojans, Centaurs, TNOs)
* Will probably make a file that is ostentatiously for training/exploration
* Will probably make a file that is like the training file, but in which we pretend it's unknown ITF-like tracklets (so need to renumber/rename the tracklets)

### The NOVAS package

First, get the USNO's python NOVAS package.  We'll need that.

http://aa.usno.navy.mil/software/novas/novas_py/novaspy_intro.php

Just type 

pip install novas

pip install novas_de405

Here's the reference:

Barron, E. G., Kaplan, G. H., Bangert, J., Bartlett, J. L., Puatua, W., Harris, W., & Barrett, P. (2011) “Naval Observatory Vector Astrometry Software (NOVAS) Version 3.1, Introducing a Python Edition,” Bull. AAS, 43, 2011.

### The kepcart library

You will need to make sure you have a copy of the kepcart library.  There is a copy of it on the MPC bitbucket site, with some instructions.

In [21]:
%matplotlib inline
import numpy as np
import scipy.interpolate
import matplotlib.gridspec as gridspec
import matplotlib as mpl
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import pandas as pd
pd.set_option('display.width', 500)
pd.set_option('display.max_columns', 200)
pd.set_option('display.notebook_repr_html', True)
import math
import kepcart as kc
import healpy as hp
import collections
import astropy
import sys,os

In [2]:
import MPC_library

In [3]:
Observatories = MPC_library.Observatories

In [4]:
ObservatoryXYZ = Observatories.ObservatoryXYZ

In [5]:
nside=32
for i in range(hp.nside2npix(nside)):
    print(i, hp.pix2vec(32, i, nest=True))

hp.query_disc(nside, (1.0, 0.0, 0.0), 0.000, inclusive=True, nest=True)

hp.vec2pix(nside, 1.0, 0.0, 0.0, nest=True)

0 (0.70695331253988136, 0.70695331253988125, 0.020833333333333332)
1 (0.68894172521921881, 0.72361812314290153, 0.041666666666666664)
2 (0.72361812314290175, 0.6889417252192187, 0.041666666666666664)
3 (0.70572436191476351, 0.7057243619147634, 0.0625)
4 (0.67024603285840201, 0.73950253916911868, 0.0625)
5 (0.65090093052950704, 0.75457506862563206, 0.083333333333333329)
6 (0.68714213557550174, 0.72172795503035236, 0.083333333333333329)
7 (0.66790557688419916, 0.73692024393589628, 0.10416666666666666)
8 (0.7395025391691189, 0.67024603285840179, 0.0625)
9 (0.72172795503035259, 0.68714213557550163, 0.083333333333333329)
10 (0.75457506862563228, 0.65090093052950682, 0.083333333333333329)
11 (0.7369202439358965, 0.66790557688419894, 0.10416666666666666)
12 (0.70326001790076054, 0.70326001790076043, 0.10416666666666666)
13 (0.68413230010135762, 0.71856662597008081, 0.125)
14 (0.71856662597008103, 0.6841323001013575, 0.125)
15 (0.69954722459920071, 0.6995472245992006, 0.14583333333333331)
16 (

4522

In [6]:
ObservatoryXYZ['W14']

(0.046526560882400647, -0.82059606538689689, 0.567774)

### Reading the MPC data

Now let's look at some 80-character MPC records.

In [8]:
N = 3
with open('itf_new.txt', 'r') as f:
    lines = [next(f) for x in range(N)]
    for line in lines:
        print(line.rstrip('\n'))

     /07K07O* C1997 09 28.39921 01 17 21.49 +14 49 44.2          21.4 Vi     691
     /07K07O  C1997 09 28.42094 01 17 20.46 +14 49 38.3          21.1 Vi     691
     /07K07O  C1997 09 28.44271 01 17 19.44 +14 49 31.8          20.9 Vi     691


### Reading the MPC observation files

Dealing with files line by line in python is not fast.  

The itf.txt, NumObs.txt, and UnnObs.txt files have a mix of 1-line and 2-line formats.  


In [9]:
# This routine checks the 80-character input line to see if it contains a special character (S, R, or V) that indicates a 2-line 
# record.
def is_two_line(line):
    note2 = line[14]
    return note2=='S' or note2=='R' or note2=='V'

In [10]:
# This routine opens and reads filename, separating the records into those in the 1-line and 2-line formats.
# The 2-line format lines are merged into single 160-character records for processing line-by-line.
def split_MPC_file(filename):
    filename_1_line = filename.rstrip('.txt')+"_1_line.txt"
    filename_2_line = filename.rstrip('.txt')+"_2_line.txt"
    with open(filename_1_line, 'w') as f1_out, open(filename_2_line, 'w') as f2_out:
        line1=None
        with open(filename, 'r') as f:
            for line in f:
                if is_two_line(line):
                    line1=line
                    continue
                if line1 != None:
                    merged_lines = line1.rstrip('\n') + line
                    f2_out.write(merged_lines)
                    line1 = None
                else:
                    f1_out.write(line)
                    line1 = None


# First significant insert by MJP (Nov 20th)
### There are NumObs_1_line.txt & UnnObs_1_line.txt files in 
### ~/Dropbox/MPC/
#### I have no idea how old they are, but I'm going to use them anyway

Below are routines that read the files after they have been split into their respective formats.  

In [11]:
def readMPC_1_line(filename='NumObs_1_line.txt', nrows=1000000):
    colspecs = [(0, 5), (5, 12), (12, 13), (13, 14), (14, 15), (15, 32), (32, 44), (44, 56), (65, 71), (77, 80)]
    colnames = ['objName', 'provDesig', 'disAst', 'note1', 'note2', 'dateObs', 'RA', 'Dec', 'MagFilter', 'obsCode']
    df = pd.read_fwf(filename, colspecs=colspecs, names=colnames, header=None, nrows=nrows)
    return df

In [12]:
def convertObs80(line):
    objName   = line[0:5]
    provDesig = line[5:12]
    disAst    = line[12:13]
    note1     = line[13:14]
    note2     = line[14:15]
    dateObs   = line[15:32]
    RA        = line[32:44]
    Dec       = line[44:56]
    mag       = line[65:70]
    filt      = line[70:71]
    obsCode   = line[77:80]
    return objName, provDesig, disAst, note1, note2, dateObs, RA, Dec, mag, filt, obsCode


In [13]:
def splitMagFilter(magFilter):
    pieces = magFilter.split()
    if len(pieces)==0:
        return None, None
    elif len(pieces)==1:
         return pieces[0], None
    else:
        return pieces[0], pieces[1]

        

# Second significant insert by MJP (Nov 20th)
### Going to develop against UnnObs_1_line.txt, as it is more managable in size, memory, time, etc


In [17]:
### Take the opportunity to do conversions & also stuff everything into a dictionary #################
dict__   = {}
DATA_DIR = "../MPC/"
with open(DATA_DIR + "UnnObs_1_line.txt", 'r') as f:
    for line in f:
        # Extract the data 
        objName, provDesig, disAst, note1, note2, dateObs, RA, Dec, mag, filt, obsCode = convertObs80(line)

        # Create a list by designation
        try:
            dict__[provDesig].append( line )
        except:
            dict__[provDesig] = [line]

                
print("Unique provDesigs = ",len(list(dict__.keys())))

Unique provDesigs =  509382


# Define a little function to parse an input string from the .txt files, and do the required calculations

In [31]:
def parseline(line):
    try:
        objName, provDesig, disAst, note1, note2, dateObs, RA, Dec, mag, filt, obsCode = convertObs80(line)
        jd_utc = MPC_library.date2JD(dateObs)
        jd_tdb  = MPC_library.EOP.jdTDB(jd_utc)
        raDeg, decDeg = MPC_library.RA2degRA(RA), MPC_library.Dec2degDec(Dec)
        x = np.cos(decDeg*np.pi/180.)*np.cos(raDeg*np.pi/180.)
        y = np.cos(decDeg*np.pi/180.)*np.sin(raDeg*np.pi/180.)  
        z = np.sin(decDeg*np.pi/180.)
        if filt.isspace():
            filt = '-'
        if mag.isspace():
            mag = '----'
        xh, yh, zh = Observatories.getObservatoryPosition(obsCode, jd_utc)
        outstring = "%11s %s %4s %5s %s %13.6lf %12.7lf %12.7lf %12.7lf %12.6lf %12.7lf %12.7lf\n"% \
              (provDesig, dateObs, obsCode, mag, filt, jd_tdb, x, y, z, xh, yh, zh)
    except:
        outstring = "EXCEPTION"
    return outstring 


# For now I am going to limit the selection to:
##### Objects that have date & RA/Dec formats that can be easily read (old data can get fucked-up & Holman hasn't coded common edge cases)
##### Objects with at least 20 detections

# Later on may want to also restrict:
##### Objects that are observed recently (but perhaps also with long time spans)
##### Objects of certain types (i.e. to match against some pre-constructed list of designations) 


In [33]:
pDs_of_interest_ = list(dict__.keys()) # ['J89C07T','K14B37N','K16G17B']

In [34]:
Nobs_min = 20
dict_str__ = {} 
for k,v in dict__.items():
    if len(v) > Nobs_min and k in pDs_of_interest_:
        for l,line in enumerate(v):
            outstring = parseline(line)
            if outstring != "EXCEPTION":
                try :  dict_str__[k].append(outstring)
                except:dict_str__[k]=[outstring]
print("Unique Desigs Passing Criteria= ",len(list(dict_str__.keys())))

Unique Desigs Passing Criteria=  137481


# Need to be able to split an object into its component tracklets 

In [69]:
def obj2tracklets(list_strings):
    # Critical sepn in time 
    t_sepn = 0.5
    # Get list of JDs
    N=0 ; JDprev = -1 ; dict__ = {} 
    for s in list_strings:
        jd_tdb = float(s.split()[7])
        ### print(jd_tdb, JDprev, jd_tdb - JDprev > t_sepn)
        # If the sepn is more than a night, make a new dict entry 
        if jd_tdb - JDprev > t_sepn:
            N += 1 
            dict__[N] = [s]
        else:
            dict__[N].append(s)
        # Shuffle JDs ready for next loop 
        JDprev = jd_tdb
    return dict__ 
        

# For this version (training) I want to segment the designation into tracklets ...
# ... i.e. just adding a suffix in this simple case

In [70]:

def assign_trackletIDs( objectDict__  ):
    newObjectDict__ = {} 
    for local_k,v in objectDict__.items():
        origDesig  = v[0][:12].strip()
        newDesig= "%s_%03d " % (origDesig, local_k)
        newObjectDict__[local_k] = [newDesig + V[12:] for V in v ]
    return newObjectDict__

In [71]:
relabelled_str_dict__ = { k : assign_trackletIDs( obj2tracklets(v)) for k,v in dict_str__.items() }
print( len(list(relabelled_str_dict__.keys())) )
print(list(relabelled_str_dict__.keys())[0])
print(relabelled_str_dict__[list(relabelled_str_dict__.keys())[0]])

137481
K09U01M
{1: ['K09U01M_001 2009 10 17.26918   G96 20.9  V 2455121.769946    0.9360065    0.3519631   -0.0037035     0.911275    0.3702929    0.1605510\n', 'K09U01M_001 2009 10 17.27622   G96 21.0  V 2455121.776986    0.9360705    0.3517914   -0.0038615     0.911224    0.3703957    0.1605949\n', 'K09U01M_001 2009 10 17.28329   G96 21.4  V 2455121.784056    0.9361343    0.3516196   -0.0040205     0.911172    0.3704989    0.1606390\n', 'K09U01M_001 2009 10 17.29036   G96 20.9  V 2455121.791126    0.9361981    0.3514478   -0.0041801     0.911120    0.3706021    0.1606831\n', 'K09U01M_001 2009 10 17.31642   G96 21.4  V 2455121.817186    0.9364319    0.3508171   -0.0047662     0.910929    0.3709821    0.1608455\n', 'K09U01M_001 2009 10 17.31715   G96 20.8  V 2455121.817916    0.9364392    0.3507973   -0.0047861     0.910923    0.3709927    0.1608500\n', 'K09U01M_001 2009 10 17.31789   G96 20.9  V 2455121.818656    0.9364448    0.3507823   -0.0047987     0.910918    0.3710035    0.16085

# Now lets cycle through the data and produce a file that's ~ 1,000,000 (1e6) observations long
### Given the >20 observation criteria above, this should be <=50,000 objects

In [81]:
DATA_DIR = "../MPC/"
ITF_DIR = "../ITF/"
Nmax = 1000000
Nout = 0 
Nobjects = 0
with open(ITF_DIR + "UnnObs_Training_1_line_A.mpc", 'w') as f:
    for obj,objDict in relabelled_str_dict__.items():
        if Nout < Nmax:
            for nTrcklt, lstObs in objDict.items():
                for Obs in lstObs:
                    f.write(Obs)
                    Nout+=1
        else:
            break
        Nobjects+=1
print(ITF_DIR + "UnnObs_Training_1_line_A.mpc", " should have %d objects and %d lines in it" % (Nobjects,Nout))

../ITF/UnnObs_Training_1_line_A.mpc  should have 21359 objects and 1000054 lines in it


# Now lets try and do similar to the above, but use a RANDOMLY GENERATED tracklet name
## -- Will want to try and keep track of this ...

In [132]:
import string
import random
UC = string.ascii_uppercase

def generate_randomID(checkList):
    # K06R55S , J99C09G, /97P02Q, 4F40F1F
    #a_ = list(range(17))
    #a_.extend(list(range(90,99)))
    n1 = "%01d" % random.choice(range(9))
    n2 = "%02d" % random.choice(range(99))
    n3 = "%02d" % random.choice(range(99))
    #while True:
    r = "    " + n1 +random.choice(UC) + n2 + random.choice(UC) + n3 + " " 
    #if r not in checkList:
    #checkList.append(r)
    #break
    return r


def assign_randomtrackletIDs( objectDict__ , checkList ):
    newObjectDict__ = {} 
    mapping = [] 
    for local_k,v in objectDict__.items():
        origDesig  = v[0][:12].strip()
        newDesig= generate_randomID(checkList)
        newObjectDict__[local_k] = [newDesig + V[12:] for V in v ]
        mapping.append( (origDesig, newDesig) )
    return newObjectDict__, mapping


checkList = []
randomized_str_dict__ = {}
Mapping_ = []
for i,k in enumerate(dict_str__.keys()):
    v = dict_str__[k]
    newObjectDict__, mapping = assign_randomtrackletIDs( obj2tracklets(v), checkList)
    randomized_str_dict__[k] = newObjectDict__
    Mapping_.extend(mapping)
    
    if i%10000 == 0 : print(i)
    
#randomized_str_dict__ = { k : assign_randomtrackletIDs( obj2tracklets(v), checkList) for k,v in dict_str__.items() }
#print( len(list(randomized_str_dict__.keys())) )
#print(list(randomized_str_dict__.keys())[0])
#print(randomized_str_dict__[list(randomized_str_dict__.keys())[0]])
#print(len(checkList))
#print(checkList)
#print(mapping)

0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
110000
120000
130000


# Now lets cycle through the data and produce a file that's ~ 1,000,000 (1e6) observations long
### Given the >20 observation criteria above, this should be <=50,000 objects
### To make sure get different data, going to skip the first 2e6

In [149]:
DATA_DIR = "../MPC/"
ITF_DIR = "../ITF/"
Nmin = 2000000
Nmax = 3000000
Nout = 0 
Nobjects = 0
d__ = {} 
with open(ITF_DIR + "UnnObs_Test_1_line_B.mpc", 'w') as f:
    for obj,objDict in randomized_str_dict__.items():
        if Nout < Nmax:
            for nTrcklt, lstObs in objDict.items():
                for Obs in lstObs:
                    if Nout > Nmin: 
                        f.write(Obs)
                        d__[Obs[:12].strip()] = Obs
                    Nout+=1
        else:
            break
        Nobjects+=1

print(list(d__.keys())[1])
with open(ITF_DIR + "UnnObs_Test_1_line_B_designation_mapping.txt", 'w') as f:
    for line in Mapping_:
        #print(line[1].strip() , list(d__.keys())[1])
        #sys.exit(0)
        if line[1].strip() in d__:
            f.write("%s  %s\n" % (line[0].strip(),line[1].strip()))


8O02D85


# Now let's try and make some interesting subsets of the data 
## Need to grab some names/IDs of different types of object
## Do this using the code developed in orbit_cat/Categorize.ipynb


In [150]:
Files_ = ['CometEls.txt','Distant.txt','NEA.txt','PHA.txt','Unusual.txt']

def read_files(Files_):
    data = {}
    for f in Files_:
        with open('orbit_cat/'+f,'r') as fh:
            data[f]=fh.readlines()
    return data

def parse_files(d__):
    Desigs__ = {} 
    AEI__ = {} 
    for f,data in d__.items():
        Desigs__[f] = [ line.split()[0] for line in data]
        AEI__[f]    = [ (float(line.split()[10]),float(line.split()[8]),float(line.split()[7])) for line in data]
    return Desigs__, AEI__

def create_DesigLists(Desigs__, AEI__):
    NEO_       = NEOs(Desigs__)
    TNO_, CEN_ = DIST(Desigs__, AEI__)
    COM_       = COMETs(Desigs__)
    return NEO_, TNO_, CEN_, COM_

def NEOs(D__):
    D_ = []
    for k,v in D__.items():
        if k in ['NEA.txt','PHA.txt']:
            D_.extend(v)
    return D_

def DIST(D__, AEI__):
    TNO_ = []
    CEN_ = []
    for k,v in D__.items():
        if k in ['Distant.txt']:
            AEI = AEI__[k]
            for D,aei in zip(v, AEI):
                a,e,i = aei
                if a > 30 and a*(1.-e) > 30:
                    TNO_.append(D)
                elif a*(1.-e) > 5 and a*(1.-e) < 30 :
                    CEN_.append(D)
                else:
                    pass
    return TNO_, CEN_

def COMETs(D__):
    return D__['CometEls.txt']
            
d__             = read_files(Files_)
Desigs__ , AEI__ = parse_files(d__)
NEO_, TNO_, CEN_, COM_ = create_DesigLists(Desigs__, AEI__)
for item in [NEO_, TNO_, CEN_, COM_]:
    print (len(item))

18818
1602
431
949


In [151]:
print(CEN_[:10])

['02060', '05145', '07066', '08405', '10199', '10370', '15788', '15820', '15875', '19299']


# Want to go through each of the lists of designations, and use them to produce subsets of the Unnumbered file

In [159]:
DATA_DIR = "../MPC/"
ITF_DIR = "../ITF/"

for D_,Name in zip( [TNO_, CEN_, COM_, NEO_ ],['TNO','CEN','COM', 'NEO',]):
#for D_,Name in zip( [CEN_],['CEN']):
    Nout = 0 
    with open(ITF_DIR + "UnnObs_Training_%s.mpc" % Name, 'w') as f:
        for obj,objDict in relabelled_str_dict__.items():
            for nTrcklt, lstObs in objDict.items():
                if lstObs[0].split()[0][:-4] in D_:
                    for Obs in lstObs:
                        #print(Obs)
                        #sys.exit(0)
                        f.write(Obs)
                        Nout+=1
    print(Name, Nout)



TNO 12309
CEN 4695
COM 0
NEO 720956


In [165]:
DATA_DIR = "../MPC/"
ITF_DIR = "../ITF/"

D_ = TNO_
D_.extend(CEN_)
D_.extend(COM_)
D_.extend(NEO_)
print(len(D_), D_[:3])
sys.exit(0)



102592 ['15760', '15789', '15807']


SystemExit: 0

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


In [168]:
Nout = 0 
with open(ITF_DIR + "UnnObs_Training_%s.mpc" % 'MBA', 'w') as f:
        for obj,objDict in relabelled_str_dict__.items():
            for nTrcklt, lstObs in objDict.items():
                if lstObs[0].split()[0][:-4] not in D_:
                    for Obs in lstObs:
                        #print(Obs)
                        #sys.exit(0)
                        f.write(Obs)
                        Nout+=1
print(Name, Nout)


NEO 5697163
