# Task

For the SIMULATE/ESTIMATE/INFER stuff: Please make a simple Python interface for these 3 capabilities --- as a separate generator_emulator.py interface file --- using the abstract base class module in Python, with method-level documentation. Include a constructor that takes a type specification as input (a dict mapping variable names to type specs, where a type spec is either "numerical", "numerical with a range", or "closed-set categorical with a specific # of outcomes).

Demonstrate it works by providing 4 implementations, + the test cases that show it's working:

- a Naive Bayes implementation in pure Python (that enforces numerical range constraints as a post-processing/rejection step)
- a Bayesian naive Bayes implementation in Venture (that uses separate programs to model each column)
- a heuristic mixture-model based method that fits the mixture by using k means clustering (mapping discrete values to numbers, or using your favorite hybrid discrete/continuous distance metric), choosing k via crossvalidation-based model selection

For INFER, do something simple and heuristic for continuous values: fit a mixture of a very-broad-variance Gaussian and a narrow variance Gaussian (heuristically if you want, or via a Bayesian fit in Venture), and test if the weight on the narrow-variance component is above the given confidence threshold.

Provide test cases that show, graphically, that the two naive Bayes implementations work on a couple of representative type signatures (when the true generator is realizable given those hypothesis classes), and another test that shows that if the true generator is realizable under the mixture but not naive bayes (i.e. it has a couple components), the mixture works better given enough data.

### Questions

- No structure learning here?

- How does Naive Bayes work in this scenario?

 Naive Bayes is one of the models implemented in BayesDB, using different conjugate models for each data type.
    1. Dirichlet multinomial model for categorical data
    2. Normal-Inverse-Gamma model for numerical data
    3. Normal-Inverse-Gamma model with rejection step for constrained numerical data
    
- How to implement it in pure Python?    
    
- How would the Bayesian Naive Bayes work?

- What is a type signature, i.e., what does it mean for the true generator to be realizable? 

In [21]:
from abc import ABCMeta, abstractmethod
import pandas as pd
# from venture.shortcuts import *

class BayesDataset(object):
    """Abstract Base Class for Simulate/Estimate/Infer functionalities"""
    __metaclass__ = ABCMeta
    
    @abstractmethod
    def __init__ (self, dataset, typeSpec):

        self.typeSpec = typeSpec
        """typeSpec is a dict mapping variable names to type specs."""
        
        self.dataset = pd.read_excel(dataset)
        """dataset is a table in .xls format."""
            
    @abstractmethod
    def Simulate(self, y, queries):
        """ 
        Generates samples for variable y from the conditional predictive 
        distribution, conditioned queries.
        """
        pass
    
    @abstractmethod
    def Estimate(self):
        """ Estimates the probability that each pair of variables in the dataset are dependent."""
        """ Returns a symmetric off-diagonal matrix with dependence probabilities"""
        pass
    
    @abstractmethod
    def Infer(self, sigma, estimator):
        """Fills in missing values of the database with a point estimate over its predictive distribution """
        """INPUT: estimator- estimator to be used (such as mean, or mode) (object(?))
                     sigma - confidence threshold, under which the missing value is not filled. """
        pass

In [22]:
class NaiveBDS(BayesDataset):
    """a Naive Bayes implementation in pure Python 
    (that enforces numerical range constraints as a post-processing/rejection step"""
    
    def __init__ (self, dataset, typeSpec):
        super(NaiveBDS,self).__init__(dataset,typeSpec)
        
    def Simulate(self, y, queries):
        pass
    
    def Estimate(self):
        pass
    
    def Infer(self, sigma, estimator):
        pass        
            
class BayesNaiveBDS(BayesDataset):
    """a Bayesian naive Bayes implementation in Venture 
    (that uses separate programs to model each column)"""
    pass

class MixtureBDS(BayesDataset):
    """a heuristic mixture-model based method that fits the mixture by using k means clustering
    (mapping discrete values to numbers, or using your favorite hybrid discrete/continuous distance metric),
    choosing k via crossvalidation-based model selection"""
    pass

In [26]:
x = NaiveBDS('satellite.xlsx','all')
x.dataset

Unnamed: 0,"Name of Satellite, Alternate Names",Country/Org of UN Registry,Country of Operator/Owner,Operator/Owner,Users,Purpose,Detailed Purpose,Class of Orbit,Type of Orbit,Longitude of GEO (degrees),...,Unnamed: 55,Unnamed: 56,Unnamed: 57,Unnamed: 58,Unnamed: 59,Unnamed: 60,Unnamed: 61,Unnamed: 62,Unnamed: 63,Unnamed: 64
0,AAUSat-3,NR,Denmark,Aalborg University,Civil,Technology Development,,LEO,,0.00,...,,,,,,,,,,
1,"ABS-1A (Koreasat 2, Mugunghwa 2)",Korea,Multinational,Asia Broadcast Satellite Ltd.,Commercial,Communications,,GEO,,116.54,...,,,,,,,,,,
2,"ABS-2 (Koreasat-8, ST-3)",NR,Multinational,Asia Broadcast Satellite Ltd.,Commercial,Communications,,GEO,,75.00,...,,,,,,,,,,
3,"ABS-3 (Agila 2, Mabuhay 1)",Philippines,Multinational,Asia Broadcast Satellite Ltd.,Commercial,Communications,,GEO,,146.06,...,,,,,,,,,,
4,ABS-3A,NR,Multinational,Asia Broadcast Satellite Ltd.,Commercial,Communications,,GEO,,-3.00,...,,,,,,,,,,
5,"ABS-4 (ABS-2i, MBSat, Mobile Broadcasting Sate...",NR,Multinational,Asia Broadcast Satellite Ltd.,Commercial,Communications,,GEO,,75.00,...,,,,,,,,,,
6,"ABS-6 (ABS-1, LMI-1, Lockheed Martin-Intersput...",NR,Multinational,Asia Broadcast Satellite Ltd.,Commercial,Communications,,GEO,,159.00,...,,,,,,,,,,
7,"ABS-7 (Koreasat 3, Mugungwha 3)",Korea,Multinational,Asia Broadcast Satellite Ltd.,Commercial,Communications,,GEO,,116.18,...,,,,,,,,,,
8,"Advanced Orion 2 (NROL 6, USA 139)",USA,USA,National Reconnaissance Office (NRO),Military,Earth Observation,Electronic Intelligence,GEO,,-14.50,...,,,,,,,,,,
9,"Advanced Orion 3 (NROL 19, USA 171)",USA,USA,National Reconnaissance Office (NRO),Military,Earth Observation,Electronic Intelligence,GEO,,95.40,...,,,,,,,,,,


In [20]:
pd.read_excel('satellite.xlsx')

Unnamed: 0,"Name of Satellite, Alternate Names",Country/Org of UN Registry,Country of Operator/Owner,Operator/Owner,Users,Purpose,Detailed Purpose,Class of Orbit,Type of Orbit,Longitude of GEO (degrees),...,Unnamed: 55,Unnamed: 56,Unnamed: 57,Unnamed: 58,Unnamed: 59,Unnamed: 60,Unnamed: 61,Unnamed: 62,Unnamed: 63,Unnamed: 64
0,AAUSat-3,NR,Denmark,Aalborg University,Civil,Technology Development,,LEO,,0.00,...,,,,,,,,,,
1,"ABS-1A (Koreasat 2, Mugunghwa 2)",Korea,Multinational,Asia Broadcast Satellite Ltd.,Commercial,Communications,,GEO,,116.54,...,,,,,,,,,,
2,"ABS-2 (Koreasat-8, ST-3)",NR,Multinational,Asia Broadcast Satellite Ltd.,Commercial,Communications,,GEO,,75.00,...,,,,,,,,,,
3,"ABS-3 (Agila 2, Mabuhay 1)",Philippines,Multinational,Asia Broadcast Satellite Ltd.,Commercial,Communications,,GEO,,146.06,...,,,,,,,,,,
4,ABS-3A,NR,Multinational,Asia Broadcast Satellite Ltd.,Commercial,Communications,,GEO,,-3.00,...,,,,,,,,,,
5,"ABS-4 (ABS-2i, MBSat, Mobile Broadcasting Sate...",NR,Multinational,Asia Broadcast Satellite Ltd.,Commercial,Communications,,GEO,,75.00,...,,,,,,,,,,
6,"ABS-6 (ABS-1, LMI-1, Lockheed Martin-Intersput...",NR,Multinational,Asia Broadcast Satellite Ltd.,Commercial,Communications,,GEO,,159.00,...,,,,,,,,,,
7,"ABS-7 (Koreasat 3, Mugungwha 3)",Korea,Multinational,Asia Broadcast Satellite Ltd.,Commercial,Communications,,GEO,,116.18,...,,,,,,,,,,
8,"Advanced Orion 2 (NROL 6, USA 139)",USA,USA,National Reconnaissance Office (NRO),Military,Earth Observation,Electronic Intelligence,GEO,,-14.50,...,,,,,,,,,,
9,"Advanced Orion 3 (NROL 19, USA 171)",USA,USA,National Reconnaissance Office (NRO),Military,Earth Observation,Electronic Intelligence,GEO,,95.40,...,,,,,,,,,,
