Skip to content

jiasenlu/CDSSM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 

Repository files navigation

Note: Currently, we only have torch c version batch input sparse linear and sparse covolution.

##implementation of (c)DSSM in torch

##dependencies:

  • torch
  • tds (DSSM dense)

##Data Preprocessing ###Related Functions:

  • Batch.lua
  • WordHash.lua
  • ComputelogPD.lua
  • Preprocess.lua

##Tranining 1: generate data from dataset. The data format follows the C# implementation. Each query and document in the same line, and the seperator is 'Tab'. 2: generate vocabulary for question and answers. Using WordHash.Pair2Voc(). you should get the result like this: ''' Creating Voc file form ... srcVoc contains vocabulary: 5584 tgtVoc contains vocabulary: 10876 ''' 3: Create Pair2Seq Feature and save to txt. Using WordHash.Pair2SeqFea()

4: Convert the seq Feature to Binay file, we give the batchsize here. (this can't be change after you train the model. for orginial data, the batch size is 1024. Using WordHash.SeqFea2Bin(), See more info under the function.

###Related functions

  • (Data Provider): BatchSample.lua, SequenceInputStream.lua, PairInputStream.lua
  • (Model): DSSM_Train, DSSM_MMI_Criterion.lua
  • (Training): th train.lua

##Predicting 1: generate feature file, refer PreProcess.lua for details

  • Preprocessing:

  • (Predict): th predict.lua

##To-do List

  • testing the cu implementation of sparse Linear.
  • implement the cu implementation of sparse convolution.

About

CDSSM implementation in torch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published