Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AB Testing with elfi, bernoulli+lognorm #255

Closed
Jakedismo opened this issue Feb 12, 2018 · 4 comments
Closed

AB Testing with elfi, bernoulli+lognorm #255

Jakedismo opened this issue Feb 12, 2018 · 4 comments

Comments

@Jakedismo
Copy link

Jakedismo commented Feb 12, 2018

Summary:

I'm trying to replicate a STAN model I've been using with elfi. I've run into some trouble with Elfi and propably Python.

Description:

The lognorm and bernoulli distances (using scipy stats functions) are implemented in Python, prior is defined (using one lognorm 2,10), a Simulator object is defined (the function is tested and it ouputs similar data as in elfi examples), ABC rejection sampler defined, but sampling from the rejection sampler throws an error.

Reproducible Steps:

I have defined the simulator like this:

def legacy_updated(data, batch_size=1, random_state=None):
    data= np.atleast_1d(data)
    global result_array 
    rows = len(data)
    cols = len(data[0])
    result_array = np.array([])
    for row in range(rows):
        if(1 > row):
            for col in range(cols):
                x = data[row][col]*bernoulli_pmf(bernoulli_theta(data[None][row]))
                y = lognormal_pdf(frozen_lognormalmean(data[row][None],lognormal_sigma(data[row][None])))
                result_array = np.append(result_array,[x+y, x]) 
            return result_array.reshape(1,-1)
        else:
            for col in range(cols):
                y = data[row][col]*bernoulli_pmf((bernoulli_theta(data[None][row])))
                result_array = np.append(result_array,[y,])
            return result_array.reshape(1,-1)

def lognormal_pdf(y):
    s=0.945
    x = np.linspace(ss.lognorm.ppf(0.01, s),ss.lognorm.ppf(0.99, s))
    return ss.lognorm.pdf(x, s)

def bernoulli_pmf(y):
    p = 0.3 
    x = bernoulli_theta(y)
    return ss.bernoulli.pmf(x,p)

def log_mean(y):
    return np.mean(np.log(y),axis=0)

def log_std(y):
    return np.std(np.log(y),axis=0)

def frozen_lognormalmean(y,s):
    rv=ss.lognorm(s)
    return rv.pdf(np.mean(y))

def lognormal_sigma(y):
    s=0.945
    x = np.linspace(ss.lognorm.ppf(0.01, s),ss.lognorm.ppf(0.99, s))
    return x
    
def bernoulli_theta(y):
    p = 0.3 
    x = np.arange(ss.bernoulli.ppf(0.01, p), ss.bernoulli.ppf(9.99,p))
    return ss.bernoulli.rvs(1,p)

Current Output:

Simulator function outputs the data in the same format as in the elfi example, but when I try to generate data with simulator node I get the following error from within the loop.
`TypeError Traceback (most recent call last)
C:\Anaconda\envs\DataScienceEnv\lib\site-packages\elfi\executor.py in execute(cls, G)
69 try:
---> 70 G.node[node] = cls._run(op, node, G)
71 except Exception as exc:

C:\Anaconda\envs\DataScienceEnv\lib\site-packages\elfi\executor.py in _run(fn, node, G)
153
--> 154 output_dict = {'output': fn(*args, **kwargs)}
155 return output_dict

in legacy_updated(data, batch_size, random_state)
9 rows = len(data)
---> 10 cols = len(data[0])
11 result_array = np.array([])

TypeError: object of type 'numpy.float64' has no len()`

Expected Output:

I didn't expect the error since my function is outputting a numpy array, my python skills propably come in play also this really doesn't seem like a big error

ELFI Version:

0.3.1

Python Version:

3.6.5

Operating System:

windows 10

@vuolleko
Copy link
Member

Hi,

Without complete code I'm unable to reproduce this. Most crucially your handling of batch_size remains unclear.

However, based on the error message the problem is that the data argument given to legacy_updated is 1D, but you're trying to use it as 2D. Assuming data is an elfi.Prior, this is the default behaviour, and data.shape[0] = batch_size. You can use the size keyword, e.g. elfi.Prior('lognorm', 2, 10, size=3), to change this.

Also, if you intend to use var et al. as elfi.Summary, note that ELFI uses the first dimension for internal batches, so you probably should replace axis=0 with axis=1 everywhere.

That said, all this assumes that you make use of ELFI's internal batching. It is certainly possible to circumvent this (by essentially forcing batch_size to 1), but I do not recommended such. :)

@Jakedismo
Copy link
Author

Jakedismo commented Feb 12, 2018

Thx for the help, I got the simulator node working but ran to a trouble with the distance node.
I'm not really using batch_size atm since I just want to get the basic concept working first.
I think that this is due to summaries being 1d arrays instead of 2d:
In executing node 'd': all the input array dimensions except for the concatenation axis must match exactly.
That being said I think that I can get this working (most of the issues I'm facing are due to STAN and c++ and I feel like I'm going to a tree with my ass up when doing this in python...)

@vuolleko
Copy link
Member

vuolleko commented Feb 12, 2018

ELFI always uses batches, and batch_size is always the length of the first dimension (unless you hard-code stuff otherwise, which I do not recommend). This is true even if you use batch_size=1, in which case the first dimension of all arrays has length 1.

The summaries are typically 1d arrays of length batch_size (but can be 2d as well).

@Jakedismo
Copy link
Author

I'll propably just refurbish the whole model to a more usable and robust syntax, I'll be in touch once I get something worthwhile going :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants