New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoencoder support via API + Caffe from CSV data #161

Merged
merged 6 commits into from Sep 9, 2016

Conversation

Projects
None yet
2 participants
@beniz
Collaborator

beniz commented Jul 28, 2016

This is an early release of basic autoencoder support with a few options.

This PR contains:

  • basic autoencoder via SigmoidCrossEntropyLoss
  • support for API autoencoder boolean parameter for both mllib and input CSVconnector
  • training + prediction via API
  • ability to control the layers initialization via init and init_std like gaussian and xavier, with application beyond autoencoders
  • a full working example in comments below for now

Known caveats:

  • autoencoder is a supervised method and trains with supervised API parameter. This choice can be discussed but most practitioners would agree that autoencoders are at best semi-supervised techniques
  • autoencoder trains as supervised but requires unsupervised service creation to access the prediction at layer level (e.g. array of features that should match the inputs). This is because in supervised predicted mode the output should be the loss and not the produced features. See the example below anyways.
  • MSE as an alternative to SigmoidCrossEntropyLoss is not yet implemented
  • convergence can be controlled via the eucll, i.e. the Euclidean distance in measure parameter array in train settings.

Forthcoming:

  • built-in SigmoidCrossEntropyLoss at prediction (at the moment only eucll is available)
  • possibly denoising autoencoder setup via API
  • MSE as target
  • autoencoder on straight images instead of CSV input connector
  • more examples
@beniz

This comment has been minimized.

Collaborator

beniz commented Jul 28, 2016

Here is a fully reproducible example on MNIST digit data reconstruction via autoencoder with DD. Follow the steps:

curl -X PUT 'http://localhost:8080/services/mnist' -d '{"mllib":"caffe","description":"mnist autoencoder","type":"supervised","parameters":{"input":{"connector":"csv"},"mllib":{"template":"mlp","db":true,"layers":[500,250,30,250,500],"activation":"sigmoid","dropout":0.0,"init":"gaussian","init_std":0.1,"autoencoder":true}},"model":{"templates":"../templates/caffe/","repository":"/path/to/models/autoenc"}}'
  • train an autoencoder service via API:
curl -X POST 'http://localhost:8080/train' -d '{"service":"mnist","async":true,"parameters":{"mllib":{"gpu":true,"solver":{"iterations":65000,"test_interval":5000,"base_lr":0.01,"solver_type":"NESTEROV","weight_decay":0.0005},"net":{"batch_size":256,"test_batch_size":256}},"input":{"db":true,"autoencoder":true,"ignore":["label"],"separator":",","scale":true},"output":{"measure":["eucll"]}},"data":["/path/to/mnist_train.csv","/path/to/mnist_test.csv"]}'
  • test the autoencoder by reconstructing some MNIST handwritten digits and visualizing them.

First generate a smaller version of mnist_test.csv:

head -n 11 mnist_test.csv > mnist_test_10.csv

Recreate the service as unsupervised"

curl -X DELETE 'http://localhost:8080/services/mnist'
curl -X PUT 'http://localhost:8080/services/mnist' -d '{"mllib":"caffe","description":"mnist autoencoder","type":"unsupervised","parameters":{"input":{"connector":"csv"},"mllib":{"autoencoder":true}},"model":{"repository":"/path/to/models/autoenc"}}'

Next we use a Python script (you need dd_client.py from deepdetect/clients/python into the same directory:

from dd_client import DD                                                                                                               
import numpy as np                                                                                                                     
import matplotlib.pyplot as plt                                                                                                        
import csv                                                                                                                             
import sys                                                                                                                             

host = 'localhost'                                                                                                                     
sname = 'mnist'                                                                                                                        

dd = DD(host)                                                                                                                          
dd.set_return_format(dd.RETURN_PYTHON)                                                                                                 

scale_min = [0]*784                                                                                                                    
scale_max = [255]*784                                                                                                                  

parameters_input = {'separator':',','scale':True,'min_vals':scale_min,'max_vals':scale_max,'ignore':['label']}                         
parameters_mllib = {'extract_layer':'sig'}                                                                                             
parameters_output = {}                                                                                                                 
test_csv = '/path/to/mnist_test_10.csv'                                                      

n = 10                                                                                                                                 
test_imgs = []                                                                                                                         
c = -1                                                                                                                                 
with open(test_csv,'r') as csvfile:                                                                                                    
    csvreader = csv.reader(csvfile,delimiter=',')                                                                                      
    for row in csvreader:                                                                                                              
        if c == -1:                                                                                                                    
            c = 0                                                                                                                      
            continue                                                                                                                   
        row.pop(0)                                                                                                                     
        test_imgs.append(np.array(row).astype(int))                                                                                    
        c = c + 1                                                                                                                      
        if c == n:                                                                                                                     
            break                                                                                                                      

data = [test_csv]                                                                                                                      
pred = dd.post_predict(sname,data,parameters_input,parameters_mllib,parameters_output)                                                 
decoded_imgs = []                                                                                                                      
for i in range(0,10):                                                                                                                  
    predi = pred['body']['predictions'][i]['vals']                                                                                     
    decoded_imgs.append(np.multiply(predi,scale_max))                                                                                  

plt.figure(figsize=(20,4))                                                                                                             
for i in range(n):                                                                                                                     
    ax = plt.subplot(2, n, i + 1)                                                                                                      
    plt.imshow(test_imgs[i].reshape(28, 28))                                                                                           
    plt.gray()                                                                                                                         
    ax.get_xaxis().set_visible(False)                                                                                                  
    ax.get_yaxis().set_visible(False)                                                                                                  

    # display reconstruction                                                                                                           
    ax = plt.subplot(2, n, i + 1 + n)                                                                                                  
    plt.imshow(decoded_imgs[i].reshape(28, 28))                                                                                        
    plt.gray()                                                                                                                         
    ax.get_xaxis().set_visible(False)                                                                                                  
    ax.get_yaxis().set_visible(False)                                                                                                  

plt.show()

This should yield:

autoenc_mnist_dd

where first row is the input and second row the reconstruction through the autoencoder.

@beniz

This comment has been minimized.

Collaborator

beniz commented Aug 1, 2016

Added ability to use an autoencoder in predict mode:

  • in supervised settings, thus returning the loss associated to each predicted sample. This is based on a custom addition to Caffe, see jolibrain/caffe@a0d4889
  • in unsupervised settings, as described above in the issue, thus returning the decoded values

@beniz beniz merged commit 6cd343a into master Sep 9, 2016

@beniz beniz deleted the autoencoder branch Feb 5, 2017

@ahuynh227

This comment has been minimized.

ahuynh227 commented Feb 8, 2018

I followed the instructions step by step trying to train and predict on mnist but the training service keeps failing. After the sigmoid layer is created, it will get stuck here:

INFO - 01:20:01 - Processed 50000 records[01:20:03] /opt/deepdetect/src/caffelib.cc:1964: user batch_size=256 / inputc batch_size=59999
[01:20:03] /opt/deepdetect/src/caffelib.cc:2008: batch_size=1 / test_batch_size=101 / test_iter=99
INFO - 01:20:03 - Initializing solver from parameters:
INFO - 01:20:03 - Creating training net specified in net_param.
INFO - 01:20:03 - The NetState phase (0) differed from the phase (1) specified by a rule in layer inputl
INFO - 01:20:03 - The NetState phase (0) differed from the phase (1) specified by a rule in layer probt
INFO - 01:20:03 - Initializing net from parameters:

INFO - 01:20:03 - Creating layer / name=inputl / type=Data
INFO - 01:20:03 - Creating Layer inputl
INFO - 01:20:03 - inputl -> data
INFO - 01:20:03 - inputl -> label
INFO - 01:20:03 - Opened lmdb ../../models/autoenc/train.lmdb
INFO - 01:20:03 - output data size: 256,785,1,1
INFO - 01:20:03 - Setting up inputl
INFO - 01:20:03 - Top shape: 256 785 1 1 (200960)
INFO - 01:20:03 - Top shape: 256 (256)
INFO - 01:20:03 - Memory required for data: 804864
INFO - 01:20:03 - Creating layer / name=fc_data / type=InnerProduct
INFO - 01:20:03 - Creating Layer fc_data
INFO - 01:20:03 - fc_data <- data
INFO - 01:20:03 - fc_data -> ip0
INFO - 01:20:03 - Setting up fc_data
INFO - 01:20:03 - Top shape: 256 500 (128000)
INFO - 01:20:03 - Memory required for data: 1316864
INFO - 01:20:03 - Creating layer / name=act_Sigmoid_ip0 / type=Sigmoid
INFO - 01:20:03 - Creating Layer act_Sigmoid_ip0
INFO - 01:20:03 - act_Sigmoid_ip0 <- ip0
INFO - 01:20:03 - act_Sigmoid_ip0 -> ip0 (in-place)
INFO - 01:20:03 - Setting up act_Sigmoid_ip0
INFO - 01:20:03 - Top shape: 256 500 (128000)
INFO - 01:20:03 - Memory required for data: 1828864
INFO - 01:20:03 - Creating layer / name=fc_ip0 / type=InnerProduct
INFO - 01:20:03 - Creating Layer fc_ip0
INFO - 01:20:03 - fc_ip0 <- ip0
INFO - 01:20:03 - fc_ip0 -> ip1
INFO - 01:20:03 - Setting up fc_ip0
INFO - 01:20:03 - Top shape: 256 250 (64000)
INFO - 01:20:03 - Memory required for data: 2084864
INFO - 01:20:03 - Creating layer / name=act_Sigmoid_ip1 / type=Sigmoid
INFO - 01:20:03 - Creating Layer act_Sigmoid_ip1
INFO - 01:20:03 - act_Sigmoid_ip1 <- ip1
INFO - 01:20:03 - act_Sigmoid_ip1 -> ip1 (in-place)
INFO - 01:20:03 - Setting up act_Sigmoid_ip1
INFO - 01:20:03 - Top shape: 256 250 (64000)
INFO - 01:20:03 - Memory required for data: 2340864
INFO - 01:20:03 - Creating layer / name=fc_ip1 / type=InnerProduct
INFO - 01:20:03 - Creating Layer fc_ip1
INFO - 01:20:03 - fc_ip1 <- ip1
INFO - 01:20:03 - fc_ip1 -> ip2
INFO - 01:20:03 - Setting up fc_ip1
INFO - 01:20:03 - Top shape: 256 30 (7680)
INFO - 01:20:03 - Memory required for data: 2371584
INFO - 01:20:03 - Creating layer / name=act_Sigmoid_ip2 / type=Sigmoid
INFO - 01:20:03 - Creating Layer act_Sigmoid_ip2
INFO - 01:20:03 - act_Sigmoid_ip2 <- ip2
INFO - 01:20:03 - act_Sigmoid_ip2 -> ip2 (in-place)
INFO - 01:20:03 - Setting up act_Sigmoid_ip2
INFO - 01:20:03 - Top shape: 256 30 (7680)
INFO - 01:20:03 - Memory required for data: 2402304
INFO - 01:20:03 - Creating layer / name=fc_ip2 / type=InnerProduct
INFO - 01:20:03 - Creating Layer fc_ip2
INFO - 01:20:03 - fc_ip2 <- ip2
INFO - 01:20:03 - fc_ip2 -> ip3
INFO - 01:20:03 - Setting up fc_ip2
INFO - 01:20:03 - Top shape: 256 250 (64000)
INFO - 01:20:03 - Memory required for data: 2658304
INFO - 01:20:03 - Creating layer / name=act_Sigmoid_ip3 / type=Sigmoid
INFO - 01:20:03 - Creating Layer act_Sigmoid_ip3
INFO - 01:20:03 - act_Sigmoid_ip3 <- ip3
INFO - 01:20:03 - act_Sigmoid_ip3 -> ip3 (in-place)
INFO - 01:20:03 - Setting up act_Sigmoid_ip3
INFO - 01:20:03 - Top shape: 256 250 (64000)
INFO - 01:20:03 - Memory required for data: 2914304
INFO - 01:20:03 - Creating layer / name=fc_ip3 / type=InnerProduct
INFO - 01:20:03 - Creating Layer fc_ip3
INFO - 01:20:03 - fc_ip3 <- ip3
INFO - 01:20:03 - fc_ip3 -> ip4
INFO - 01:20:03 - Setting up fc_ip3
INFO - 01:20:03 - Top shape: 256 500 (128000)
INFO - 01:20:03 - Memory required for data: 3426304
INFO - 01:20:03 - Creating layer / name=act_Sigmoid_ip4 / type=Sigmoid
INFO - 01:20:03 - Creating Layer act_Sigmoid_ip4
INFO - 01:20:03 - act_Sigmoid_ip4 <- ip4
INFO - 01:20:03 - act_Sigmoid_ip4 -> ip4 (in-place)
INFO - 01:20:03 - Setting up act_Sigmoid_ip4
INFO - 01:20:03 - Top shape: 256 500 (128000)
INFO - 01:20:03 - Memory required for data: 3938304
INFO - 01:20:03 - Creating layer / name=fc_ip4 / type=InnerProduct
INFO - 01:20:03 - Creating Layer fc_ip4
INFO - 01:20:03 - fc_ip4 <- ip4

When I make an API call to check on the status of the training job, I get:

"ERROR - 01:20:51 - service mnist training status call failed"
"ERROR - 01:20:51 - {"code":500,"msg":"InternalError"}

I have tried training with DD running on the host (Ubuntu 14.04 GPU disabled) and in a docker container with no luck so far. Do you know what's possibly wrong here?

@beniz

This comment has been minimized.

Collaborator

beniz commented Feb 8, 2018

Hi, you are correct that this old example doesn't seem to work as is anymore, we'll look into it to provide a proper reproducible demo.

@beniz

This comment has been minimized.

Collaborator

beniz commented Feb 8, 2018

@ahuynh227 commit d26f00e fixes the reported issue.

See the updated steps above, it's been tested and it works fine on our side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment