# RoBERTa-MPU for English sentence classification

## This is a demo for the ChatGPT detector in [MPU](http://arxiv.org/abs/2305.18149).

### download pretrained weights and install dependencies:

In [None]:
# install dependencies
!pip install git+https://github.com/mindspore-lab/mindone.git
!pip install git+https://github.com/mindspore-lab/mindnlp.git

# download some pretrained weights
!wget https://download.mindspore.cn/toolkits/mindone/detect_chatgpt/roberta_18plus.ckpt
!wget https://download.mindspore.cn/toolkits/mindone/detect_chatgpt/tokenizer.json
!mv roberta_18plus.ckpt examples/detect_chatgpt/
!mv tokenizer.json examples/detect_chatgpt/

### use our pretrained classifier

In [None]:
from mindone.pipelines.text_classifiers import BertMPUSequenceClassificationPipeline

pipeline = BertMPUSequenceClassificationPipeline(
    model_name='roberta_base',
    config_path='examples/detect_chatgpt/config.json',
    tokenizer_path='examples/detect_chatgpt/tokenizer.json',
)
pipeline.load_from_pretrained('roberta_18plus.ckpt')

### get some test inputs

In [None]:
test_sentences = [
    "They are not colored . Just as white paint is usually made from minerals found in clay . The crystals in white paint reflects all light equally making it appear white . Just liek snow . Primarily the eye color is based on the density and distribution of melanin in the eye . It just looks a certain color when light illuminates the eye . It reflects light unqually ."
    "Piracy and copyright law can be contentious issues on the internet because they involve complex questions about how to balance the rights of creators and the interests of consumers. Some people argue that artists should have the right to control how their works are distributed and to charge what they feel is appropriate, while others believe that the free exchange of information is important and that artists should not be able to control how their works are used. It's important to remember that copyright law exists to protect the rights of creators and to encourage the creation of new works by ensuring that artists can earn a fair income from their creations. When someone pirates (unauthorized copying) or uses a copyrighted work without permission, they are taking something that belongs to someone else and using it for their own benefit, without paying the person who created it. This can be seen as unfair to the creator and can discourage them from creating new works in the future. At the same time, it's also important to recognize that not everyone has the same access to information and that copyright laws can sometimes make it difficult or impossible for people to access the works they want to use. This is why it's important to have a balance between protecting the rights of creators and ensuring that everyone has access to the information and works they need."
]
test_labels = [0, 1] # 0 is human, 1 is gpt
label_to_meaning = ['human written', 'machine generated']

### run detection!

In [None]:
for i, text in enumerate(test_sentences):
    probs = pipeline.predict(text)
    print(f'text {i} result:', probs)

# Bert-MPU for Chinese sentence classification

## This is a demo for the ChatGPT detector in [MPU](http://arxiv.org/abs/2305.18149).

### download pretrained weights and install dependencies:

In [None]:
# install dependencies
!pip install git+https://github.com/mindspore-lab/mindone.git
!pip install git+https://github.com/mindspore-lab/mindnlp.git

# download some pretrained weights
!wget https://download.mindspore.cn/toolkits/mindone/detect_chatgpt/bert_18plus.ckpt
!wget https://download.mindspore.cn/toolkits/mindone/detect_chatgpt/tokenizer_zh.json
!mv bert_18plus.ckpt examples/detect_chatgpt/
!mv tokenizer_zh.json examples/detect_chatgpt/

### use our pretrained classifier

In [None]:
pipeline = BertMPUSequenceClassificationPipeline(
    model_name='bert_base',
    config_path='examples/detect_chatgpt/config_zh.json',
    tokenizer_path='examples/detect_chatgpt/tokenizer_zh.json',
)
pipeline.load_from_pretrained('bert_18plus.ckpt')

### some tests

In [None]:
test_sentences = [
    "程序流程图又称程序框图，是用统一规定的标准符号描述程序运行具体步骤的图形表示。"
]
test_labels = [0, 1] # 0 is human, 1 is gpt
label_to_meaning = ['人类', '机器']

### run

In [None]:
for i, text in enumerate(test_sentences):
    probs = pipeline.predict(text)
    print(f'text {i} result:', probs)