<center>
<img src="http://corpuslg.org/lael_english/wp-content/uploads/2020/04/lael_50_years_narrow_white.png.400px_300dpi.png" width="300" alt="LAEL 50 years logo">
<h3>APPLIED LINGUISTICS GRADUATE PROGRAMME (LAEL)</h3>
</center>
<hr>

# Using ChatGPT to improve the writing in English for Research Publication Purposes (ERPP)
Execute the cells in sequence with 'Shift+Enter'.

## Preparing the input file
1. Format your input file as plain text (.txt) copying the passages from the original manuscript into it;
2. Exclude titles, subtitles, captions and other portions of text that are not applicable for improvement;
3. Avoid joining multiple paragraphs into a larger one because ChatGPT will consider the content to produce a synthesised response and you might lose details.

## Importing the required libraries

In [1]:
import openai
import pandas as pd
from os import environ
import datetime as dt
from google.cloud import translate
from IPython.display import clear_output

## Importing the required programme variables from the environment

In [2]:
openai.api_key = environ.get('OPENAI_API_KEY', '')
assert openai.api_key
PROJECT_ID = environ.get('PROJECT_ID', '')
assert PROJECT_ID
PROJECT_ID = str(PROJECT_ID)
PARENT = f'projects/{PROJECT_ID}'
#print(openai.api_key)
#print(PROJECT_ID)
#print(PARENT)

## Defining a function to translate passages with Google Cloud Translation API

In [3]:
def translate_text(text: str, target_language_code: str) -> translate.Translation:
    client = translate.TranslationServiceClient()
    response = client.translate_text(
        parent = PARENT,
        contents = [text],
        target_language_code = target_language_code
    )
    return response.translations[0]

## Defining a function to query ChatGPT

In [4]:
def get_completion(prompt, model = 'gpt-3.5-turbo'):
    messages = [{'role': 'user', 'content': prompt}]
    response = openai.ChatCompletion.create(
        model = model,
        messages = messages,
        temperature = 0
    )
    return response.choices[0].message['content']

## Collecting input

In [5]:
end = False
while end == False:
    filename = str(input('Enter the input full filename: '))
    if filename != '':
        try:
            with open(filename, 'r', encoding = 'utf8') as responses:
                print('The file exists.')
            input_file = filename
            output_file = input_file + '.out.txt'
            output_file_json = input_file + '.out.json'
            output_file_excel = input_file + '.out.xlsx'
            end = True
            clear_output()
        except FileNotFoundError:
            print('No such file.')
df_text = pd.read_table(input_file, sep = '\\n', header = None, engine = 'python')
df_text = df_text.rename(columns = {0: 'text'})
print(str(len(df_text)) + ' passages to process.')

63 passages to process.


## Prompting ChatGPT
Note: If you get an error message indicating '**RemoteDisconnected**' just re-execute the cell with 'Shift+Enter'.

### ChatGPT prompts
- This is the one being currently used
> 'Dear ChatGPT, would it be possible for you to improve the writing of certain passages of a research article considering the generally accepted standards of English for Academic Purposes? I am going to provide you with a passage at a time. OK?'
- This prompt explicitly requests ChatGPT to keep each improved passage within a single paragraph but ChatGPT does not keep to it
> 'Dear ChatGPT, would it be possible for you to improve the writing of certain passages of a research article considering the generally accepted standards of English for Academic Purposes? Please keep each improved passage within a single paragraph - do not split it into new paragraphs. I am going to provide you with a paragraph at a time. OK?'

In [6]:
with open(output_file, 'a', encoding = 'utf8') as responses:
    responses.write('ChatGPT revision of writing in ERPP' + '\n\n')
    prompt = 'Dear ChatGPT, would it be possible for you to improve the writing of certain passages of a research article considering the generally accepted standards of English for Academic Purposes? I am going to provide you with a passage at a time. OK?'    
    responses.write(prompt + '\n\n')
    query = get_completion(prompt)
    responses.write(query + '\n\n')
with open(output_file, 'r', encoding = 'utf8') as responses:
    print(responses.read())

ChatGPT revision of writing in ERPP

Dear ChatGPT, would it be possible for you to improve the writing of certain passages of a research article considering the generally accepted standards of English for Academic Purposes? I am going to provide you with a passage at a time. OK?

Of course! I'd be happy to help you improve the writing of your research article passages. Please provide me with the passage you'd like me to work on, and I'll do my best to enhance it according to the accepted standards of English for Academic Purposes.




## Getting improved passages from ChatGPT
1. The programme will display each paragraph being processed;
3. Move the output files to a safe location.

Note: The specification of 'utf8' encoding when opening the output file prevents the code from breaking (**UnicodeEncodeError**) when a passage contains non-printable unicode characters like the ones in this fragment 'variáveis ​​e também' (U+200B). However, in case of error, inspect the passage with websites like [View non-printable unicode characters](https://www.soscisurvey.de/tools/view-chars.php).

### Using Google Cloud Translation API before handing over to ChatGPT
Use this option if you wrote your manuscript in any other language but English.

In [7]:
target_language = 'en'
with open(output_file, 'a', encoding = 'utf8') as responses:
    responses.write('Start time: ' + str(dt.datetime.now()) + '\n\n')
    for index, row in df_text.iterrows():
        responses.write('Passage ' + str(index) + ':\n' + row['text'] + '\n\n')
        print('Passage ' + str(index) + ':\n' + row['text'])
        translation = translate_text(row['text'], target_language)
        source_language = translation.detected_language_code
        translated_passage = translation.translated_text
        df_text.at[index, 'text_translated'] = translated_passage
        responses.write('Translated passage ' + str(index) + ' from (' + source_language + ')' + ':\n' + translated_passage + '\n\n')
        print('\nTranslated passage ' + str(index) + ' from (' + source_language + ')' + ':\n' + translated_passage)
        query = get_completion(translated_passage)
        df_text.at[index, 'text_improved'] = query
        responses.write('Improved passage ' + str(index) + ':\n' + query + '\n\n')
        print('\nImproved passage ' + str(index) + ':\n' + query)
        clear_output(wait = True)
    responses.write('End time: ' + str(dt.datetime.now()) + '\n\n')
print('Job completed!')
df_text.to_json(output_file_json)
writer = pd.ExcelWriter(output_file_excel)
df_text.to_excel(writer, sheet_name = 'text')
writer.close()
df_text.head()

Job completed!


Unnamed: 0,text,text_translated,text_improved
0,O recente advento de uma nova geração de ferra...,The recent advent of a new generation of Artif...,The recent emergence of AI-based tools like Ch...
1,A adoção de tecnologias de Inteligência Artifi...,The adoption of Artificial Intelligence (AI) t...,"These language models, such as OpenAI's GPT-3,..."
2,Considerando a questão de pesquisa ‘Qual é a r...,Considering the research question &#39;What is...,The purpose of this study is to investigate th...
3,Os achados produzidos pelo projeto de pesquisa...,The findings produced by the proposed research...,The proposed research project aims to investig...
4,O projeto contribuirá para o desenvolvimento d...,The project will contribute to the development...,The individual involved in the project will pl...


### Without Google Cloud Translation API
Use this option if you wrote your manuscript in English.

In [None]:
with open(output_file, 'a', encoding = 'utf8') as responses:
    responses.write('Start time: ' + str(dt.datetime.now()) + '\n\n')
    for index, row in df_text.iterrows():
        responses.write('Passage ' + str(index) + ':\n' + row['text'] + '\n\n')
        print('Passage ' + str(index) + ':\n' + row['text'])
        query = get_completion(row['text'])
        df_text.at[index, 'text_improved'] = query
        responses.write('Improved passage ' + str(index) + ':\n' + query + '\n\n')
        print('\nImproved passage ' + str(index) + ':\n' + query)
        clear_output(wait = True)
    responses.write('End time: ' + str(dt.datetime.now()) + '\n\n')
print('Job completed!')
df_text.to_json(output_file_json)
writer = pd.ExcelWriter(output_file_excel)
df_text.to_excel(writer, sheet_name = 'text')
writer.close()
df_text.head()