# Sample code of using HKUST Azure OpenAI API with Python

Here, you'll find the completed sample code. However, if you're looking for a more detailed understanding and want to follow along step by step with thorough explanations, we highly recommend you to read our article below:

👉 https://digitalhumanities.hkust.edu.hk/tutorials/how-to-use-hkust-azure-openai-api-key-with-python-with-sample-code-and-use-case-examples/

The article provides comprehensive explanations and instructions that will help you grasp the underlying concepts.

## Use your OpenAI API key in Python

On 6 Nov 2023, [a new version of OpenAI](https://openai.com/blog/new-models-and-developer-products-announced-at-devday) is released. It is important to note that this version upgrade from the previous version 0.28.1 to version 1.x is a breaking change. The code for the OpenAI Python API library differs between these two versions. Below is the sample code of the respective code snippets for both version 0.28.1 and version 1.x.

While the newly released GPT-4 Turbo model undoubtedly offers enhanced capabilities and power, we have chosen to utilize the gpt-35-turbo model for demonstration here. Rest assured, even though it is a slightly older version, it still delivers impressive performance and serves our demonstration needs.

We encourage you to explore the potential of GPT-4 Turbo yourself once Azure OpenAI support it.

+ Details of different models: https://platform.openai.com/docs/models
+ The pricing details for different models per 1,000 tokens can be found here (default: USD$):
https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/

### OpenAI version 0.28.1

In [None]:
!pip install openai==0.28.1

In [2]:
import openai

# Parameters
openai.api_type = "azure"
openai.api_base = "https://hkust.azure-api.net"
openai.api_version = "2023-05-15"
openai.api_key = "<your openai api key>" #put your api key here

It is recommended to put your API key in a separate configuration file like `.env` for security reasons. This practice can separate your sensitive credentials from your codebase and ensure that they are not exposed if the code is shared or made public.

In [None]:
# You may use the following code if you would like to store your api key in the .env file
# Content in .env file ---->  OPENAI_API_KEY=<your openai api key>

# If you are using jupyterlab and couldn’t locate the key in .env file. Here is an alternative way: https://gargankush.medium.com/storing-api-keys-as-environmental-variable-for-windows-linux-and-mac-and-accessing-it-through-974ba7c5109f

'''

from dotenv import load_dotenv
import os
load_dotenv('.env')
openai.api_key = os.environ['OPENAI_API_KEY']

'''

In [3]:
# Function
def get_response(message, instruction):
    response = openai.ChatCompletion.create(
        engine = 'gpt-35-turbo',
        temperature = 1,
        messages = [
            {"role": "system", "content": instruction},
            {"role": "user", "content": message}
        ]
    )
    
    # print token usage
    print(response.usage)
    # return the response
    return response.choices[0]["message"]["content"]

In [4]:
get_response("Who are you?", "You are an assistant that speaks like Shakespeare.")

{
  "prompt_tokens": 26,
  "completion_tokens": 39,
  "total_tokens": 65
}


'Greetings! I am but a humble assistant, at your service. My tongue doth wag like the great bard Shakespeare, so speaketh thy commands and I shall bend my efforts to fulfill them.'

### OpenAI version 1.X

Please restart the notebook if you have installed another version of openai before. Run the install openai command in a new environment.

In [None]:
!pip install openai --upgrade

In [2]:
from openai import AzureOpenAI

# Parameters
client = AzureOpenAI(
  azure_endpoint = "https://hkust.azure-api.net",
  api_version = "2023-05-15",
  api_key = "<your openai api key>" #put your api key here
)

In [3]:
# Function
def get_response(message, instruction):
    response = client.chat.completions.create(
		model = 'gpt-35-turbo',
        temperature = 1,
        messages = [
            {"role": "system", "content": instruction},
            {"role": "user", "content": message}
        ]
    )
    
    # print token usage
    print(response.usage)
    # return the response
    return response.choices[0].message.content

In [4]:
get_response("Who are you?", "You are an assistant that speaks like Shakespeare.")

CompletionUsage(completion_tokens=26, prompt_tokens=26, total_tokens=52)


'I am but a humble assistant, good sir or lady, at your service. What may I assist thee with this fine day?'

---

## Possible use cases

In [None]:
!pip install openai --upgrade
from openai import AzureOpenAI

# Parameters
client = AzureOpenAI(
  azure_endpoint = "https://hkust.azure-api.net",
  api_version = "2023-05-15",
  api_key = "<your openai api key>" #put your api key here
)
# Function
def get_response(message, instruction):
    response = client.chat.completions.create(
		model = 'gpt-35-turbo',
        temperature = 1,
        messages = [
            {"role": "system", "content": instruction},
            {"role": "user", "content": message}
        ]
    )
    
    # print the response
    print(response.choices[0].message.content)

    # return the response
    return response.choices[0].message.content

### Name Entity Recognition (NER)

In [17]:
# Text copied from https://library.hkust.edu.hk/events/conferences/ai-scholarly-commu-2023/

Text = """
14 November 2023
2:30 pm – 4:00 pm
Search Engine and Large Language Models – Can they truly change the game?
REGISTER
Abstract
Academic search engines are racing to incorporate the latest advancements brought about by Large Language models (LLMs) in terms of their ability to understand queries, extract information and directly generate answers. The first movers in this space were startup and challengers such as Elicit, Consensus.AI, Scite assistant, Scispace but they have recently been joined by established academic search engine provider like Elsevier’s Scopus and Digital Science’s Dimensions joining the fray with more to come.

Using techniques like RAG (Retrieval augmented generation), this first wave of academic search engines hopes to combine search technology with generative AI by grounding the answers generated by LLMs using information context found by search engines, with the hope of reducing hallucinations. But is this enough?

Join Aaron as he shares his experience testing and using these tools and his best guess on how these tools might develop in the future and their impact on research writing in the future.

About the Speaker

Aaron TayMr. Aaron Tay has been an academic librarian for over 10 years in Singapore and has worked in a variety of areas including library discovery, research support & bibliometrics. He is current Lead Data Services at the Singapore Management University Libraries and has been honoured for his contributions to the profession with a few awards including Library Association of Singapore (LAS) Professional Service Award, Congress of Southeast Asian Libraries (CONSAL) award (Silver) and Pacific Rim Research Library Alliance (PRRLA) , Karl Lo award.

A past contributor to NMC horizon report (library edition), as well as a founding member of the Initiative for Open Abstracts, he has blended his interest in discovery and the evolving Scholarly ecosystem and has given talks on how AI/ML might change Scholarly communication. More recently, he has contributed to panels and given keynotes on the impact of AI and in particular large language models on academic libraries and institutions at conferences like CILIP, IATUL and more. He has been blogging at MusingsAboutLibrarianship.blogpost.com since 2009.

 

15 November 2023
4:00 pm – 5:30 pm
Saving Time and Sanity: Using active learning for systematic reviews and meta-analyses
REGISTER
Abstract
Screening thousands of research papers for a systematic review or meta-analysis can be overwhelming. The reality is that there simply isn’t enough time to read every single article.

Join Prof. Dr. Rens van de Schoot as he introduces ASReview, a powerful free and open-source software for systematic reviewing, developed by his research team from Utrecht University. Rens will explain how active learning, a machine learning technique, can accelerate the step of manual screening process by saving up to 95% (!) of screening time. ASReview is more than just a tool; it’s a vibrant community of researchers, users, and developers worldwide, contributing to its open-source mission, and Rens will explain how you can join the movement towards fast, open, and transparent systematic reviews.

About the Speaker

prof rens van de schoot profile photoProf. Dr. Rens van de Schoot works as a full professor for ‘Statistics for Small Data Sets’ at Utrecht University in the Netherlands and as an extra-ordinary professor at North-West University in South Africa. He is also the program director of the research master ‘Methodology and Statistics for the Behavioural, Biomedical, and Social Sciences’. He is known for his many tutorials, checklists, and online (free) course materials in the areas of SEM and Bayesian statistics. Currently, his main research project is the community-driven and fully open-source project ASReview: AI-aided systematic reviewing using Active Learning. 

 

22 November 2023
10:30 am – 12:00 pm
Generative AI for Translational Scholarly Communication
REGISTER
Abstract
Many valuable insights embedded in scientific publications are siloed and rarely translated into results that can directly benefit humans. These research-to-practice gaps impede the diffusion of innovation, undermine evidence-based decision making, and contribute to the disconnect between science and the public. Generative AI systems trained on decades of digitized scholarly publications and other human-produced texts are now capable of generating (mostly) high-quality and (sometimes) trustworthy text, images, and media. Applied in the context of scholarly communication, Generative AI can quickly summarize research findings, generate visual diagrams of scientific content, and simplify technical jargon. In essence, Generative AI has the potential to help tailor language, format, tone, and examples to make research more accessible, understandable, engaging, and useful for different audiences.  

In this talk, I’ll discuss some uses of Generative AI in these contexts as well as challenges towards realizing the potential of these models, e.g., how to effectively design generated translational science communication artifacts, incorporate human feedback in the process, and mitigate the generation of harmful, misleading, or false information. Scholarly communication is undergoing a major transformation with the emergence of these new tools. By using them safely, we can help bridge the research-to-practice gap and maximize the impacts of scientific discovery. 

About the Speaker

Dr Lucy Lu Wang profile photoLucy Lu Wang is an Assistant Professor at the University of Washington Information School. Her research focuses on how to build better AI and NLP systems for extracting and understanding information from scientific texts; for example, can we create systems that leverage up-to-date literature to help us make better and more data-driven healthcare decisions, or design document understanding models that can improve the readability of scientific texts for people who are blind and low vision. Lucy’s work on supplement interaction detection, gender trends in academic publishing, COVID-19 datasets, and document understanding has been featured in Geekwire, Boing Boing, Axios, VentureBeat, and the New York Times. Prior to joining the UW, she was a Young Investigator at the Allen Institute for AI, and she received her PhD in Biomedical Informatics and Medical Education from the University of Washington.
"""

In [18]:
instruction = "You are a helpful assistant."

get_response(f"""
             Please extract each of the speaker name, their corresponding university, their specialist (within 20 words), and presentation title. 
             Text:{Text}
            """, 
            instruction)

Speaker 1:
Name: Aaron Tay
University: Singapore Management University
Specialist: Academic librarian with a focus on library discovery, research support, and bibliometrics.
Presentation Title: Search Engine and Large Language Models – Can they truly change the game?

Speaker 2:
Name: Prof. Dr. Rens van de Schoot
University: Utrecht University (Netherlands) and North-West University (South Africa)
Specialist: Full Professor for ‘Statistics for Small Data Sets’ and Program Director for the research master ‘Methodology and Statistics for the Behavioural, Biomedical, and Social Sciences’
Presentation Title: Saving Time and Sanity: Using active learning for systematic reviews and meta-analyses

Speaker 3:
Name: Dr. Lucy Lu Wang
University: University of Washington Information School
Specialist: Assistant Professor in AI and NLP, with a focus on extracting and understanding information from scientific texts
Presentation Title: Generative AI for Translational Scholarly Communication


'Speaker 1:\nName: Aaron Tay\nUniversity: Singapore Management University\nSpecialist: Academic librarian with a focus on library discovery, research support, and bibliometrics.\nPresentation Title: Search Engine and Large Language Models – Can they truly change the game?\n\nSpeaker 2:\nName: Prof. Dr. Rens van de Schoot\nUniversity: Utrecht University (Netherlands) and North-West University (South Africa)\nSpecialist: Full Professor for ‘Statistics for Small Data Sets’ and Program Director for the research master ‘Methodology and Statistics for the Behavioural, Biomedical, and Social Sciences’\nPresentation Title: Saving Time and Sanity: Using active learning for systematic reviews and meta-analyses\n\nSpeaker 3:\nName: Dr. Lucy Lu Wang\nUniversity: University of Washington Information School\nSpecialist: Assistant Professor in AI and NLP, with a focus on extracting and understanding information from scientific texts\nPresentation Title: Generative AI for Translational Scholarly Commu

In [32]:
instruction = "You are a helpful assistant."

csv_result = get_response(f"""
             Please extract each of the speaker name, their corresponding university, their specialist (within 20 words), and presentation title. Please return the results in csv format.
             Text:{Text}
            """, 
            instruction)

Speaker Name,University,Specialist,Presentation Title
Aaron Tay,Singapore Management University Libraries,Academic search engines and bibliometrics,"Search Engine and Large Language Models – Can they truly change the game?"
Prof. Dr. Rens van de Schoot,Utrecht University,Statistics for Small Data Sets,"Saving Time and Sanity: Using active learning for systematic reviews and meta-analyses"
Dr. Lucy Lu Wang,University of Washington Information School,Artificial Intelligence and Natural Language Processing,"Generative AI for Translational Scholarly Communication"


In [33]:
csv_result

'Speaker Name,University,Specialist,Presentation Title\nAaron Tay,Singapore Management University Libraries,Academic search engines and bibliometrics,"Search Engine and Large Language Models – Can they truly change the game?"\nProf. Dr. Rens van de Schoot,Utrecht University,Statistics for Small Data Sets,"Saving Time and Sanity: Using active learning for systematic reviews and meta-analyses"\nDr. Lucy Lu Wang,University of Washington Information School,Artificial Intelligence and Natural Language Processing,"Generative AI for Translational Scholarly Communication"'

In [34]:
# save the result to csv file
import csv  
with open('outputfile_NER-example.csv', 'w', encoding='UTF8') as f:
    f.write(csv_result)    
    print("A csv file is created and saved in the same folder of this notebook.")

A csv file is created and saved in the same folder of this notebook.


### Sentiment Classification

In [40]:
import pandas as pd

#### Classified by sentiment: Positive, Negative, Neutral

In [48]:
# Create a sample DataFrame
sampledata = {
    'Text': ["I really enjoyed the movie. It was fantastic!", 
             "The service at the restaurant was terrible.", 
             "The customer support was very helpful and responsive.",
             "The shop was okay.",
             "The product I bought was of poor quality."]
}

df = pd.DataFrame(data=sampledata)
df

Unnamed: 0,Text
0,I really enjoyed the movie. It was fantastic!
1,The service at the restaurant was terrible.
2,The customer support was very helpful and resp...
3,The shop was okay.
4,The product I bought was of poor quality.


In [49]:
# Define the instruction for sentiment analysis
instruction = "Please analyze the sentiment of the following text. Only use the exact wording 'positive', 'negative', or 'neutral' in your response. Do not say any other irrelevant things, no punctuation."

# Create a new column for sentiment
df['Sentiment'] = ""

# Iterate over each row in the DataFrame
for index, row in df.iterrows():
    # Get the text from the 'Text' column
    text = row['Text']
    
    # Get the sentiment response using the get_response function
    sentiment = get_response(text, instruction)
    
    # Store the sentiment result in the 'Sentiment' column
    df.at[index, 'Sentiment'] = sentiment

# display the updated DataFrame
df

positive
negative
positive
neutral
negative


Unnamed: 0,Text,Sentiment
0,I really enjoyed the movie. It was fantastic!,positive
1,The service at the restaurant was terrible.,negative
2,The customer support was very helpful and resp...,positive
3,The shop was okay.,neutral
4,The product I bought was of poor quality.,negative


#### Classified by categories: Facilities, Services, Collections

In [55]:
# Use the comments that the library received in LibQUAL 2019 as sample data https://library.hkust.edu.hk/about-us/user-engagement/libqual2019/
sampledata = {
    'Text': ["The best place to stay in UST.",
            "Our library is the best!",
            "I love the ground floor of the library, i think especially in the morning it is so peaceful and calm.",
            "Our library provide information and resources for us, including materials and many classes about how to use them. I really thanks staffs of library. And further I hope that our library could provide more chance and resource of 3D printer and I really love it.",
            "服務良好、環境非常舒適，可以讓我專心溫習、 資源豐富、職員亦很友善，樂意詳細回答我的問 題，活動種類繁多，可以讓我找到興趣，又能學習 新的東西。",
            "图书馆有很好的环境，也会提供各种活动和讲座帮助我们熟悉使用图书馆的资源，喜欢我们的图书馆。"]
            }

libqual2019 = pd.DataFrame(data=sampledata)
libqual2019

Unnamed: 0,Text
0,The best place to stay in UST.
1,Our library is the best!
2,"I love the ground floor of the library, i thin..."
3,Our library provide information and resources ...
4,服務良好、環境非常舒適，可以讓我專心溫習、 資源豐富、職員亦很友善，樂意詳細回答我的問 題，...
5,图书馆有很好的环境，也会提供各种活动和讲座帮助我们熟悉使用图书馆的资源，喜欢我们的图书馆。


In [66]:
# Define the instruction for sentiment analysis
instruction1 = "Please analyze the sentiment of the following text. Only use the exact wording 'positive', 'negative', or 'neutral' in your response. All lowercase. Do not say any other irrelevant things. Do not include full stop in your response."

# Define the instruction for category classification
instruction2 = """
Please classify the following text into these categories (exact wordings) 'Services', 'Facilities', 'Resources', 'Activities' or 'Cannot be classified' in your response. Do not say any other words. Remember only use these 4 categories in your response: 'Services', 'Facilities', 'Resources', 'Activities', 'Cannot be classified'. Use 'Cannot be classified' only when no categories can be assigned to the text.
"""

# Create new columns for sentiment and category
libqual2019['Sentiment'] = ""
libqual2019['Category'] = ""

# Iterate over each row in the DataFrame
for index, row in df.iterrows():
    # Get the text from the 'Text' column
    text = row['Text']
    
    # Get the response using the get_response function
    sentiment = get_response(text, instruction1)
    category = get_response(text, instruction2)
    
    # Store the result in the column
    libqual2019.at[index, 'Sentiment'] = sentiment
    libqual2019.at[index, 'Category'] = category

# display the updated DataFrame
libqual2019

positive
Cannot be classified.
positive
Cannot be classified.
positive
Facilities
positive
Services, Resources.
positive
Services, Facilities, Resources, Activities
positive
Services, Activities.


Unnamed: 0,Text,Sentiment,Category
0,The best place to stay in UST.,positive,Cannot be classified.
1,Our library is the best!,positive,Cannot be classified.
2,"I love the ground floor of the library, i thin...",positive,Facilities
3,Our library provide information and resources ...,positive,"Services, Resources."
4,服務良好、環境非常舒適，可以讓我專心溫習、 資源豐富、職員亦很友善，樂意詳細回答我的問 題，...,positive,"Services, Facilities, Resources, Activities"
5,图书馆有很好的环境，也会提供各种活动和讲座帮助我们熟悉使用图书馆的资源，喜欢我们的图书馆。,positive,"Services, Activities."


### Language Translation
Example: classical Chinese (文言文) to vernacular Chinese (白話文)

In [71]:
instruction = "You are an expert in Mathematics and proficient in classical Chinese."
Text = """
今有田廣十五步，從十六步。問為田幾何？		
答曰：一畝。
方田：又有田廣十二步，從十四步。問為田幾何？		
答曰：一百六十八步。		
方田術曰：廣從步數相乘得積步。以畝法二百四十步除之，即畝數。百畝為一頃。
"""

get_response(f"""
             請把以下文言文翻譯成白話文：
             文言文:{Text}
            """, 
            instruction)

白話文：
現在有一塊土地，長十五步，寬十六步，問這塊土地面積是多少？回答是一畝。

還有一塊土地，長十二步，寬十四步，問這塊土地面積是多少？回答是一百六十八步。

求土地面積的方法叫做「方田術」。可以用長和寬的步數相乘，得到土地的面積。如果要換算成畝，要用每畝二百四十步的面積去除，得到的數字就是畝數。一百畝等於一頃。


In [76]:
instruction = "You are an expert in Mathematics and proficient in classical Chinese."
Text = """
今有田廣十五步，從十六步。問為田幾何？		
答曰：一畝。
方田：又有田廣十二步，從十四步。問為田幾何？		
答曰：一百六十八步。		
方田術曰：廣從步數相乘得積步。以畝法二百四十步除之，即畝數。百畝為一頃。
"""

get_response(f"""
             請以數學公式逐步解釋以下文言文的內容:
             文言文:{Text}
            """, 
            instruction)

首先，將題目翻譯成現代中文：現在有一塊田，長15步，寬16步，問這塊田是多大面積？ 答案是一畝。接著，又有一塊長12步，寬14步的田，問這塊田是多大面積？答案是一百六十八步。

根據方田術，田的面積可以通過廣（長）和從（寬）的步數相乘而得。因此，第一塊田的面積為：

長 × 寬 = 15 × 16 = 240 步

根據一畝等於二百四十步的標準，這塊田的面積為一畝。

同樣地，第二塊田的面積為：

長 × 寬 = 12 × 14 = 168 步

因此，第二塊田的面積為一百六十八步。

最後，方田術還提到，將田地的面積除以二百四十步，就可以得到田的畝數。一頃等於一百畝。
