<a href="https://colab.research.google.com/github/hasse-h/python-NLP/blob/master/child_psych_question_classifier_v1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Instructions** 🤔


Upon starting, select `Runtime -> run all` from menu above.

Once the code has started, you can enter a query to the query cell.

Then select the query cell, and `Runtime -> Run after`

If the model encounters an error, select `Runtime -> run all` and restart


### **Model operating principles**
GPT-4-turbo first selects whether the question is ***closed*** or ***open-ended***, and after this, it will classify it to the subcategories.

Closed questions can be ***posing questions***, ***multiple choice*** or ***too complicated***.

Open-ended questiosn can be ***directives***, ***invitations*** or ***facilitators***.


## Set-up:

In [292]:
# @title
!pip install langchain_openai



In [293]:
# @title
from langchain_core.pydantic_v1 import BaseModel, Field, constr

In [294]:
# @title
from langchain_core.prompts import *

In [295]:
# @title
from langchain_core.output_parsers import *

In [296]:
# @title
!pip install openai



In [297]:
# @title
from langchain_openai import ChatOpenAI
import os

## Enter your API key here:

In [298]:
os.environ["OPENAI_API_KEY"] = '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'

## **Enter a query here:** ❓

In [449]:
query_to_a_child = "could you tell about it"

## Category definitions

In [450]:
closed = """start with modal verbs or phrases such as Can you...,
Could you..., Would you... Do you..., Have you..., etc."""

In [451]:
option_posing = """questions that that ask the user to choose whether they could
or would do something to confirma or deny of a presented fact (for example,
it was last week, wasn’t it? / was it last week?)"""

In [452]:
multiple_choice = """question has a list of options that one is asked
to choose from. A list of two or more options that have an “or” in between
(such as was it a car, house or boat? Is it tall or short?)"""

In [453]:
other = """more than one sentence per question, or something
else than option posing or multiple choice"""

In [454]:
open = """either a statement or typically beginning with
"What...", "How...", “Tell me more…”, "Tell me about..."""

In [455]:
directive = """questions that direct the answerer, where onnly
filler words or discourse markers come before the question word
(for example: “So, who was there with you?”, “ok, and what is it that he was doing?”)."""

In [456]:
invitation = """questions that ask the answerer to tell more, such as
 tell me more about that, tell me more about x, tell me more about it, tell me about x,
 then what happened? What happened then?
what happened after that/before that/first, last/next, and then? """

In [457]:
facilitator = """short utterances that encourage the answerer to
continue talking without actually asking anything.
These are things like: go on, alright, ok, I see, I understand etc.
Also anything that does not fall into other categories)"""

## Main logic:

In [458]:
# @title
llm = ChatOpenAI()

In [459]:
# @title
llm.model_name = 'gpt-4-turbo'

In [460]:
# @title
class QuestionClassificationBasic(BaseModel):
    basic_type: constr(regex='^(Closed|Open)$') = Field(description="Choose '1' or '2' according to the definition.")
    justification: str = Field(description="Justify your choice in 10 to 15 words")

In [461]:
class QuestionClassificationOpen(BaseModel):
    sub_type: constr(regex='^(Directive|Invitation|Facilitator)$') = Field(description="Is the text a directive, invitation or facilitator? Choose between 'Directive', 'Invitation' or 'Facilitator'.")
    justification: str = Field(description="Justify our choice in 10 to 15 words")

In [462]:
class QuestionClassificationClosed(BaseModel):
    sub_type: constr(regex='^(Option Posing|Multiple Choice|Too Complicated)$') = Field(description="Is the text option posing, multiple choice or too complicated? Choose between 'Option Posing', 'Multiple Choice' or 'Too Complicated'.")
    justification: str = Field(description="Justify our choice in 10 to 15 words")

In [463]:
# @title
basic_output_parser = JsonOutputParser(pydantic_object=QuestionClassificationBasic)

In [464]:
# @title
prompt_basic = PromptTemplate(
    template=
     """You are a world-leading forensic psychologist.
     Your task is to classify whether an input text is of type 1 or type 2 according to our definitions below.
     Do not trust on your intuition, follow the definition instead.
     It does not have to be a quesstion, it can be anything, even one word such as 'ok' or 'yes', so you must
     be prepared to classify any text accordingly. Never refuse to classify
     Thsee two categores are defined as follows:"""

      "Closed"
     f"{closed}"

      "Open"
     f"{open}"

    """Think your decisions carefully, step by step, focusing on features of the text
    \n{format_instructions}\n{query}\n""",
    input_variables=["query"],
    partial_variables={"format_instructions": basic_output_parser.get_format_instructions()},
)

In [465]:
basic_chain = prompt_basic | llm | basic_output_parser

## Model basic classification

This takes only the first three words of the question into account, as this is sufficient for determining if the question is Open or Closed

In [466]:
basic_type_classifier = basic_chain.invoke({"query": ' '.join(query_to_a_child.split()[:3])})
print(basic_type_classifier)

{'basic_type': 'Closed', 'justification': "Starts with 'Could you', matching Closed category."}


## More logic:

In [467]:
question_classification =  QuestionClassificationOpen if basic_type_classifier['basic_type'] == 'Open' else QuestionClassificationClosed

In [468]:
final_output_parser = JsonOutputParser(pydantic_object=question_classification)

In [469]:
# @title
prompt_open = PromptTemplate(
    template=
     """You are a world-leading forensic psychologist.
     Your task is to classify whether an input text is
     a 'Directive', 'Invitation' or 'Facilitator'.
     The input text can be anything, even one word such as 'ok' or 'yes', o you must
     be prepared to classify any text.. Never refuse to classify.
     Thsee three categores are defined as follows:"""

    f"{directive}"

    f"{invitation}"

    f"{facilitator}"

    """Think your decisions carefully, step by step,focusing on features of the text
    \n{format_instructions}\n{query}\n""",
    input_variables=["query"],
    partial_variables={"format_instructions": final_output_parser.get_format_instructions()},
)

In [470]:
# @title
prompt_closed = PromptTemplate(
    template=
     """You are a world-leading forensic psychologist.
     Your task is to classify whether an input text is
     'Option Posing', 'Multiple Choice' or 'Other'.
     The input text can be anything, even one word such as 'ok' or 'yes', so you must
     be prepared to classify any text.. Never refuse to classify.
     Thsee three categores are defined as follows:"""

    f"{option_posing}"

    f"{multiple_choice}"

    f"{other}"

    """Think your decisions carefully, step by step, focusing on features of the text
    \n{format_instructions}\n{query}\n""",
    input_variables=["query"],
    partial_variables={"format_instructions": final_output_parser.get_format_instructions()},
)

In [471]:
branch = prompt_open if basic_type_classifier['basic_type'] == 'Open' else prompt_closed

In [472]:
final_chain = branch | llm | final_output_parser

## Model final classification

This takes the entire questions into account

In [473]:
final_classifier = final_chain.invoke({"query": query_to_a_child})
print(final_classifier)

{'sub_type': 'Option Posing', 'justification': 'The text poses a question confirming a time.'}
