# Language Knowledge (Vocabulary)
Duration: 30 minutes
Content: This section tests your knowledge of Japanese vocabulary, including kanji readings, orthography, word formation, contextually-defined expressions, paraphrases, and usage
It mainly composes following five categories:
- ``Reading Kana`` (Pronunciation Questions): Given a kanji word, choose the correct kana reading.
- `Writing Kanji` (Writing Questions): Given a word written in kana, choose the correct kanji representation.
- `Word Meaning` Selection (Vocabulary Understanding): Choose the most suitable word to fill in the sentence from four options.
- `Synonym Replacement`: Select a word that has the same or similar meaning as the underlined word.
- `Vocabulary Usage`: Assess the usage of words in actual contexts, choosing the most appropriate word usage, including some common Japanese expressions or fixed phrases.

In [25]:
import pandas as pd
import json
import os
import pickle
import re
import uuid
from typing import *
from langchain_openai import AzureOpenAI,AzureChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate
from dotenv import load_dotenv
from langchain_aws import ChatBedrock
from langchain.embeddings.base import Embeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.embeddings import HuggingFaceEmbeddings
# from langchain_community.embeddings import XinferenceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from typing import Annotated, Literal, Sequence
from typing_extensions import TypedDict
from Libs.LLMs import *
from langchain import hub
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from typing import Annotated, Sequence
from typing_extensions import TypedDict
from langchain_core.messages import BaseMessage,RemoveMessage,HumanMessage,AIMessage,ToolMessage
from langgraph.graph.message import add_messages
from pydantic import BaseModel, Field
from langgraph.graph import END, StateGraph, START
from langgraph.prebuilt import ToolNode
from langgraph.prebuilt import tools_condition
from langgraph.checkpoint.memory import MemorySaver
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, validator
from typing import List, Optional

load_dotenv()

True

In [26]:
# Import N3 Vocabulary
file_path = '../Vocab/n3.csv'
# Read the CSV file
data = pd.read_csv(file_path)
words = data.iloc[:, :2].sample(frac=1).reset_index(drop=True)
# Display the content of the CSV file
vocab_dict = words.set_index(words.columns[0])[words.columns[1]].to_dict()
vocab_dict = json.dumps(vocab_dict, ensure_ascii=False, separators=(',', ':'))
words.head()

Unnamed: 0,expression,reading
0,配達,はいたつ
1,泊める,とめる
2,行き,いき
3,乗せる,のせる
4,国籍,こくせき


#### load Models

#### Exam Paper Outline
### A. overall thinking the structure of an exam
1. distribution of the difficulty 
2. topics
3. reasoning

In [27]:
from typing import List, Optional

from langchain_core.prompts import ChatPromptTemplate

from pydantic import BaseModel, Field

instruction = """
Section 1: Vocabulary and Grammar
randomly picking some of words in the `Vocabulary`.
- Kanji reading: 8 questions
- Write Chinese characters (choose Chinese characters): 6 questions
- Word Meaning Selection (Vocabulary Understanding): 11 questions
- Synonyms substitution: 5 questions
- word usage: 5 questions
- Grammar fill in the blank (single choice): 13 questions
- Sentence sorting: 5 questions
- Grammar structure selection (cloze test): 4-5 questions

Section 2: Reading Comprehension
randomly picking some of topics in the topic list that user provided.
- Short passages: 4 articles, 1 question per article
- Mid-size passages: 2 articles, 3 questions per article
- Long passages: 1 articles, 4 questions per article
- Information retrieval: 1 articles, 2 questions per article

Section 3: Listening Comprehension
randomly picking some of topics in the topic list that user provided.
- Topic understanding: 6 questions
- Key understanding: 6 questions
- Summary understanding: 3 questions
- Speak up (actively express): 4 questions
- Immediate acknowledgment: 9 questions
"""

direct_gen_outline_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            f"You are a japanese teacher. Your job is to write an outline for a JLPT N3 level exam paper. The JLPT exam paper includes a mix of easy, moderate, and difficult questions to accurately assess the test-taker's proficiency across different aspects of the language."
            f"First, you should abide by the following exam instructions and decide content and number of questions in the each subsection."
            f"Second, the vocabulary should be restricted to N3 level, you can refer to the vocabulary in the word list"
            f"Finally, write the outline of the examination paper and provide question topics according to the instructions."
            f"Instruction: {instruction}, Vocabulary: {words}",
        ),
        ("user", "topic list: {topic}"),
    ]
)


## Data Strcuture

In [28]:
class QuestionTopic(BaseModel):
    question: str = Field(..., title="keyword of the question")
    
    
class Subsection(BaseModel):
    subsection_title: str = Field(..., title="Topic of the subsection")
    description: str = Field(..., title="giving the number of questions")
    question_topics: Optional[List[QuestionTopic]] = Field(
        default_factory=list,
        title="a number of questions according to the section requirements",
    )
    
    @property
    def as_str(self) -> str:
        question_topics_str = "\n".join(
            f"- **{qt.question}**" for qt in self.question_topics
        )
        return f"### {self.subsection_title}\n\n{self.description}\n\n{question_topics_str}".strip()

class Section(BaseModel):
    section_title: str = Field(..., title="Title of the section")
    # description: str = Field(..., title="Ideas of this section")
    subsections: Optional[List[Subsection]] = Field(
        default_factory=list,
        title="Titles and reason for each subsection of the JLPT exam page.",
    )

    @property
    def as_str(self) -> str:
        subsections = "\n\n".join(
            subsection.as_str for subsection in self.subsections or []
        )
        return f"## {self.section_title}\n\n{subsections}".strip()


class Outline(BaseModel):
    page_title: str = Field(..., title="Title of the JLPT exam page")
    sections: List[Section] = Field(
        default_factory=list,
        title="Titles and descriptions for each section of the JLPT exam paper.",
    )

    @property
    def as_str(self) -> str:
        sections = "\n\n".join(section.as_str for section in self.sections)
        return f"# {self.page_title}\n\n{sections}".strip()


In [29]:
example_topic = """
Section 1 - Vocabulary and Grammar: 
店で価格を尋ねる | 購入したい商品の説明 | 割引交渉 | レストランで食べ物を注文する | 食事の好みについて話す | 料理を褒める | 道を尋ねる | 交通手段について話す | 交通状況について話す | タクシーを予約する | 電車の切符を買う | バスの時刻表を尋ねる | 通勤について説明する | 天気の状況について話す | 週末の予定について話す | おすすめを尋ねる | ショッピング体験を説明する | 支払い方法について話す | 領収書を求める | お気に入りのレストランについて話す | 趣味について話す | 仕事のプロジェクトについて話す | 家族について話す | 旅行の計画について話す | 最近の映画について話す | 本について話す | スポーツについて話す | 健康とフィットネスについて話す | 技術について話す | 時事問題について話す | 音楽について話す | 芸術と文化について話す | 教育について話す | キャリア目標について話す | 個人的な成果について話す | 課題と解決策について話す | 将来の抱負について話す | お気に入りのテレビ番組について話す | ペットについて話す | ガーデニングについて話す | 家の改善について話す | ファッションとスタイルについて話す | 環境問題について話す | ボランティア活動について話す | 地域のイベントについて話す
"""

generate_outline_direct = direct_gen_outline_prompt | azure_llm.with_structured_output(Outline)
initial_outline = generate_outline_direct.invoke({"topic": example_topic})

In [30]:
from IPython.display import display, Markdown, Latex
display(Markdown(initial_outline.as_str))

# JLPT N3 Level Exam Paper

## Vocabulary and Grammar

### Kanji Reading

8 questions on reading Kanji from the given vocabulary list.

- **Kanji readings for 配達, 泊める, 行き, 乗せる, 国籍, 招く, 乾燥, そのまま**

### Write Chinese Characters

6 questions on writing Chinese characters based on given readings.

- **Write Kanji for はいたつ, とめる, いき, のせる, こくせき, まねく**

### Word Meaning Selection

11 questions on selecting the correct meaning of words.

- **Choose the meaning for 配達, 泊める, 行き, 乗せる, 国籍, 招く, 乾燥, そのまま, 大部分, 諦める**

### Synonyms Substitution

5 questions on substituting words with synonyms.

- **Find synonyms for 配達, 泊める, 行き, 乗せる, 国籍**

### Word Usage

5 questions on using words in sentences.

- **Use 配達, 泊める, 行き, 乗せる, 国籍 in sentences.**

### Grammar Fill in the Blank

13 questions on filling in blanks with correct grammar.

- **Fill in the blanks with appropriate grammar structures.**

### Sentence Sorting

5 questions on sorting sentences into correct order.

- **Sort sentences related to 店で価格を尋ねる, 購入したい商品の説明, 割引交渉, レストランで食べ物を注文する, 食事の好みについて話す**

### Grammar Structure Selection

4-5 questions on selecting appropriate grammar structures (cloze test).

- **Select correct grammar structures for sentences related to 趣味について話す, 仕事のプロジェクトについて話す, 家族について話す, 旅行の計画について話す**

## Reading Comprehension

### Short Passages

4 articles with 1 question per article.

- **Short passages about 店で価格を尋ねる, 購入したい商品の説明, 割引交渉, レストランで食べ物を注文する**

### Mid-size Passages

2 articles with 3 questions per article.

- **Mid-size passages on 趣味について話す, 仕事のプロジェクトについて話す**

### Long Passages

1 article with 4 questions.

- **Long passage about 旅行の計画について話す**

### Information Retrieval

1 article with 2 questions.

- **Information retrieval from a passage about 家族について話す**

## Listening Comprehension

### Topic Understanding

6 questions on understanding topics.

- **Listening topics: 技術について話す, 時事問題について話す, 音楽について話す, 芸術と文化について話す, 教育について話す, キャリア目標について話す**

### Key Understanding

6 questions on understanding key points.

- **Key points from ペットについて話す, ガーデニングについて話す, 家の改善について話す, ファッションとスタイルについて話す, 環境問題について話す, ボランティア活動について話す**

### Summary Understanding

3 questions on summarizing information.

- **Summaries of 地域のイベントについて話す, 将来の抱負について話す, 課題と解決策について話す**

### Speak Up

4 questions on active expression.

- **Speak about お気に入りのテレビ番組について話す, 最近の映画について話す, 本について話す, スポーツについて話す**

### Immediate Acknowledgment

9 questions on immediate response to listening material.

- **Immediate responses for 天気の状況について話す, 週末の予定について話す, おすすめを尋ねる, ショッピング体験を説明する, 支払い方法について話す, 領収書を求める, お気に入りのレストランについて話す**