Step1:Setup

This is an online community auto-reply chatbot framework. It includes text classification module, text generation module, and deployment script. Users can quickly build their own community auto-reply chatbot. They only need to download this repo, configure the environment on their own machine, import their own data sets, and fill in their community API.

The workflow of the framework

Fetch the latest posts through community API
Classify the type of the posts(this type can be used as the basis for judging whether to reply in the next step)
Generate comments
Reply automatically in the community through community API

The link to paper:CASS: Towards Building a Social-Support Chatbot for Online Health Community

Bibtex formatted citation:

@misc{wang2021cass,
      title={CASS: Towards Building a Social-Support Chatbot for Online Health Community},       
      author={Liuping Wang and Dakuo Wang and Feng Tian and Zhenhui Peng and Xiangmin Fan and Zhan Zhang and Shuai Ma and Mo Yu and Xiaojuan Ma and Hongan Wang},      
      year={2021},
      booktitle = {Conference Companion Publication of the 2019 on Computer Supported Cooperative Work and Social Computing},
      numpages = {31},
      keywords = {{chatbot, bot; pregnancy, healthcare, AI deployment, online community, social support, peer support, emotional support, machine learning, neural network,
      system building, conversational agent, human AI collaboration, human AI interaction, explainable AI, trustworthy AI},
      series = {CSCW ’21}
      }

Reference

OpenNMT：https://github.com/OpenNMT/OpenNMT-py

CNN Classfier：https://github.com/gaussic/text-classification-cnn-rnn

Step1:Setup

Requirements：

Python >= 3.5
Torch == 1.0.0
Torchvision == 0.2.1
Torchtext == 0.4.0

Install onmt from OpenNMT/setup.py:

python setup.py install

Step2:Prepare Your Dataset

1.For text classification model

Prepare four files in Classifier/data/cnews/ ：

data.train.txt
data.val.txt
data.test.txt
data.pred.txt

2.For text generation model

Prepare following files in OpenNMT/data/ ：

src-train.txt
src-val.txt
src-test.txt
tgt-train.txt
tgt-val.txt

Step3:Train the Classification Model

1.CNN parameter in Classifier/cnn_model.py

class TCNNConfig(object):

   embedding_dim = 64  
   seq_length = 600  
   num_classes = 2  
   num_filters = 256  
   kernel_size = 5  
   vocab_size = 5000  

   hidden_dim = 128  

   dropout_keep_prob = 0.5  
   learning_rate = 1e-3  

   batch_size = 64  
   num_epochs = 100  

   print_per_batch = 10  
   save_per_batch = 10

2. Train the model

In Classifier/ directory, run python run_cnn.py train , now it start training

After running the training, the following files are generated in Classifier/data/cnews/:

data.vocab.txt

3. Test the model

In Classifier/ directory, run python run_cnn.py test to test on data.test.txt

4. Predict

Classifier/predict.py provide predict function of CNN model. Run predict.py to predict sentence on Classifer/data/cnews/data.predict.txt. This will output predictions into Classifier/predict.txt.

Step4:Train the Generation Model

1. Preprocess the data

run OpenNMT/preprocess.py

python preprocess.py -train_src data/src-train.txt -train_tgt data/tgt-train.txt -valid_src data/src-val.txt -valid_tgt data/tgt-val.txt -save_data data/demo

Validation files are required and used to evaluate the convergence of the training. It usually contains no more than 5000 sentences.

After running the preprocessing, the following files are generated in OpenNMT/data/:

demo.train.pt: serialized PyTorch file containing training data
demo.valid.pt: serialized PyTorch file containing validation data
demo.vocab.pt: serialized PyTorch file containing vocabulary data

Internally the system never touches the words themselves but uses these indices.

2. Train the model

run OpenNMT/train.py

python train.py -data data/demo -save_model demo-model

The main train command is quite simple. Minimally it takes a data file and a save file. This will run the default model, which consists of a 2-layer LSTM with 500 hidden units on both the encoder/decoder. If you want to train on GPU, you need to set, as an example: CUDA_VISIBLE_DEVICES=1,3 -world_size 2 -gpu_ranks 0 1 to use (say) GPU 1 and 3 on this node only. To know more about distributed training on single or multi nodes, read the FAQ section:xxxxxxx

3. Translate

run OpenNMT/translate_original.py

python translate_original.py -model demo-model_acc_XX.XX_ppl_XXX.XX_eX.pt -src data/src-test.txt -output pred.txt -replace_unk -verbose

Now you have a model that you can use to predict on new data. We do this by running beam search. This will output predictions into pred.txt.

Step5: Run Deployment Script

1.Set API and parameter

In OpenNMT/Deployment.py file, you can fill in your own Url, parameter, and simulative user information:

#################################################################################################
##############You should fill in your community API and simulative user information##############
#########################and modify time parameter if you want###################################

THRESHOLD = 10  # the threshold for deciding whether the chatbot needs to respond to the overlooked post or not 
STUDY_TIME = 60 * 24 * 7  # the whole deployment period 
OBSERVE_INTERVAL = 9  # the interval time between getting latest posts 
COMMENT_INTERVAL = 2  # the interval time detecting if observed posts have been replied 

Community_getLatestPost_Url = ""
Community_toComment_Url = ""
Community_getPostDetail_Url = ""


AI_auth_list = [["username1", "<authorization1>"],
                ["username2", "<authorization2>"],
                ["username3", "<authorization3>"]]
# e.g.
# username = "saltone"
# authorization = "XDS 7.fIC1Fkcg6-Qa6--o9qUP-FyrhLkyLLZOMN6r7Jxxx"

#################################################################################################
##############You should fill in your community API and simulative user information##############
#########################and modify time parameter if you want###################################

2. Run deployment script

run OpenNMT/Deployment.py

In console , you will see following log if you did not do anything :

content: This is a post example
comment: This is a comment example
Do you agree to Comment? Input nothing to confirm or input an appropriate sentence:
 Agree to comment
chatbot will comment on this sentence: This is a comment example

If you input a new sentence, the comment will be refined:

content: This is a post example
comment: This is a comment example
Do you agree to Comment? Input nothing to confirm or input an appropriate sentence: Fighting!!!
chatbot will comment on this sentence: Fighting!!!

Note

1.Different online communities have different APIs and require different parameters. This part needs to be modified according to the specific situation.

2.OpenNMT has been updated to version 1.7, which is not compatible with the version（1.0.0） used in this repo.

3.If you have any questions, please contact me by email:wangliuping17@mails.ucas.ac.cn

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.idea		.idea
Classifer		Classifer
OpenNMT		OpenNMT
study		study
Project Structure.png		Project Structure.png
README.MD		README.MD
workflow.png		workflow.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

Classifer

Classifer

OpenNMT

OpenNMT

study

study

Project Structure.png

Project Structure.png

README.MD

README.MD

workflow.png

workflow.png

Repository files navigation

Step1:Setup

Step2:Prepare Your Dataset

1.For text classification model

2.For text generation model

Step3:Train the Classification Model

1.CNN parameter in Classifier/cnn_model.py

2. Train the model

3. Test the model

4. Predict

Step4:Train the Generation Model

1. Preprocess the data

2. Train the model

3. Translate

Step5: Run Deployment Script

1.Set API and parameter

2. Run deployment script

Note

About

Releases

Packages

Languages

liupingw/CASS-Framework

Folders and files

Latest commit

History

Repository files navigation

Step1:Setup

Step2:Prepare Your Dataset

1.For text classification model

2.For text generation model

Step3:Train the Classification Model

1.CNN parameter in Classifier/cnn_model.py

2. Train the model

3. Test the model

4. Predict

Step4:Train the Generation Model

1. Preprocess the data

2. Train the model

3. Translate

Step5: Run Deployment Script

1.Set API and parameter

2. Run deployment script

Note

About

Resources

Stars

Watchers

Forks

Languages