# Language Knowledge (Vocabulary)
Duration: 30 minutes
Content: This section tests your knowledge of Japanese vocabulary, including kanji readings, orthography, word formation, contextually-defined expressions, paraphrases, and usage
It mainly composes following five categories:
- ``Reading Kana`` (Pronunciation Questions): Given a kanji word, choose the correct kana reading.
- `Writing Kanji` (Writing Questions): Given a word written in kana, choose the correct kanji representation.
- `Word Meaning` Selection (Vocabulary Understanding): Choose the most suitable word to fill in the sentence from four options.
- `Synonym Replacement`: Select a word that has the same or similar meaning as the underlined word.
- `Vocabulary Usage`: Assess the usage of words in actual contexts, choosing the most appropriate word usage, including some common Japanese expressions or fixed phrases.

In [1]:
import pandas as pd
import json
import os
import pickle
import re
import uuid
from typing import *
from langchain_openai import AzureOpenAI,AzureChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate
from dotenv import load_dotenv
from langchain_aws import ChatBedrock
from langchain.embeddings.base import Embeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.embeddings import HuggingFaceEmbeddings
# from langchain_community.embeddings import XinferenceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from typing import Annotated, Literal, Sequence
from typing_extensions import TypedDict
from IPython.display import display, Markdown, Latex
from langchain import hub
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from typing import Annotated, Sequence
from typing_extensions import TypedDict
from langchain_core.messages import BaseMessage,RemoveMessage,HumanMessage,AIMessage,ToolMessage
from langgraph.graph.message import add_messages
from pydantic import BaseModel, Field
from langgraph.graph import END, StateGraph, START
from langgraph.prebuilt import ToolNode
from langgraph.prebuilt import tools_condition
from langgraph.checkpoint.memory import MemorySaver
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, validator
from typing import List, Optional

load_dotenv()

True

In [2]:
# Import N3 Vocabulary
file_path = '../Vocab/n3.csv'
# Read the CSV file
data = pd.read_csv(file_path)
words = data.iloc[:, :2].sample(frac=1).reset_index(drop=True)
# Display the content of the CSV file
words.head()

Unnamed: 0,expression,reading
0,順調,じゅんちょう
1,末,すえ
2,見送り,みおくり
3,粋,いき
4,全身,ぜんしん


#### load Models

In [3]:
# azure_llm = AzureChatOpenAI(
#     azure_endpoint="https://tooldev-openai.openai.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2025-01-01-preview",
#     api_key=os.environ["AZURE_API_KEY"],
#     model_name="gpt-4o",
#     api_version="2025-01-01-preview",
#     temperature=0.5,
# )

In [4]:
aws_llm = ChatBedrock(
    model_id="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
     # model_id="us.anthropic.claude-3-5-haiku-20241022-v1:0",
    model_kwargs=dict(temperature=0.5),
    region = "us-east-2",
    aws_access_key_id=os.environ["AWS_ACCESS_KEY_ID"],
    aws_secret_access_key=os.environ["AWS_SECRET_ACCESS_KEY"],
)

## Question Explore

# Kanji 读假名（读音问题）

In [5]:
from typing_extensions import TypedDict
from langgraph.graph import StateGraph, START, END
from IPython.display import Image, display


# Graph state
class QuestionState(TypedDict):
    topic: str
    question: str
    improved_question: str
    final_question: str


kanji_example = """
26. さん、避難してください。
	1.	ならんで
	2.	入って
	3.	にげて
	4.	急いで

27. 来週、ここで企業の説明会があります。
	1.	旅行
	2.	会社
	3.	大学
	4.	建物

28. ちょっとバックしてください。
	1.	前に進んで
	2.	後ろに下がって
	3.	横に動いて
	4.	そこで止まって

29. このやり方がベストだ。
	1.	最もよい
	2.	最もよくない
	3.	最も難しい
	4.	最も難しくない

30. 田中さんがようやく来てくれました。
	1.	笑に
	2.	すぐに
	3.	やっと
	4.	初めて
"""

print(words)

# Nodes
def generate_question(state: QuestionState):
    """First LLM call to generate initial question"""

    msg = aws_llm.invoke(f"""You are a Japanese teacher. Your job is to write 5 synonym questions for candidates to identify the most appropriate word with a similar meaning in a JLPT N3 level exam paper. The question format follows:

Each question presents a word in kanji or katakana within a sentence, and candidates must choose the closest synonym from four options. The options should include one correct synonym and three distractors that are plausible but incorrect. The JLPT exam paper includes a mix of easy, moderate, and difficult questions to accurately assess the test-taker’s proficiency across different aspects of the language.
"""
            f"The vocabulary should be restricted to N3 level, you can refer to the vocabulary in the word list, choosing random words for the questions"
            f"please refer the question examples following the formal exam paper"
            f"append the correct answer and explanation of main challenges and why teacher asks this question to candidate in chinese at each question"
            f"Finally, beautify markdown format"
            f"topic list: {state['topic']}"
            f"word list: {words}"
            f"formal exam paper: {kanji_example}")
    
    return {"question": msg.content}


# Build workflow
kanji_workflow = StateGraph(QuestionState)

# Add nodes
kanji_workflow.add_node("generate_question", generate_question)

# Add edges to connect nodes
kanji_workflow.add_edge(START, "generate_question")
kanji_workflow.add_edge("generate_question", END)

# Compile
kanji_graph = kanji_workflow.compile()


     expression reading
0            順調  じゅんちょう
1             末      すえ
2           見送り    みおくり
3             粋      いき
4            全身    ぜんしん
...         ...     ...
2134          芽       め
2135         撒く      まく
2136       がっかり    がっかり
2137        寄せる     よせる
2138        世の中    よのなか

[2139 rows x 2 columns]


In [6]:
# Show workflow
#display(Image(kanji_graph.get_graph().draw_png()))

In [7]:
# Invoke
kanji = kanji_graph.invoke({"topic": "店で商品の特徴や価格を尋ねる | 電車やバスの乗り方を尋ねる | 今日の天気や気温について話す| 映画やドラマの感想を話す | 仕事のスケジュールや業務内容を話す"})
display(Markdown(kanji["question"]))

# JLPT N3 同義語問題 (Synonym Questions)

## 問題 1
彼は**順調に**回復していて、来週には退院できるでしょう。
1. ゆっくり
2. 早く
3. 順番に
4. 問題なく

**正解: 4. 問題なく**

*解析（中文）：「順調」意为"顺利、没有问题地进行"。这个词在描述恢复过程、工作进展等情况时很常用。考察学生是否理解「順調」表示事情进展顺利的含义，而不是速度快慢或按顺序进行。这是N3水平的常见商务和日常用语，特别在谈论工作进度或健康状况时经常使用。*

## 問題 2
**末**の息子は大学生になりました。
1. 最初の
2. 一番小さい
3. 一番大きい
4. 真ん中の

**正解: 2. 一番小さい**

*解析（中文）：「末」在这里指"最小的（孩子）"。这个词在家庭关系描述中很常见，但多义性可能会让学习者混淆。考察学生是否能理解「末っ子」（最小的孩子）的概念，这对于理解日本家庭称谓和社会结构很重要。此题检验学生对日本文化背景知识的掌握程度。*

## 問題 3
友達を駅まで**見送り**に行きました。
1. 迎えに
2. 案内に
3. 送別に
4. 探しに

**正解: 3. 送別に**

*解析（中文）：「見送り」意为"送别、目送"。这个词在日常生活场景中经常使用，特别是在车站或机场送别亲友时。考察学生是否能区分"迎接"和"送别"这两个相反的概念，以及是否理解日本人重视送别礼仪的文化背景。这个问题测试学生在交通和社交场景中的词汇应用能力。*

## 問題 4
この映画は**がっかり**しました。期待していたほど面白くなかったです。
1. 感動
2. 失望
3. 驚き
4. 興奮

**正解: 2. 失望**

*解析（中文）：「がっかり」表示"失望、沮丧"的情感。这个词在表达对电影、餐厅等体验的负面评价时常用。考察学生是否能准确理解和表达情感词汇，特别是在评价娱乐活动时。这个问题与"映画やドラマの感想を話す"（谈论电影或电视剧的感想）的话题直接相关，测试学生在社交场合表达个人意见的能力。*

## 問題 5
店員さんに商品の**特徴**について質問しました。
1. 価格
2. 大きさ
3. 性質
4. 色

**正解: 3. 性質**

*解析（中文）：「特徴」指"特点、特性"。在购物场景中询问商品特点是很常见的交流。考察学生是否能理解在店铺中询问商品信息的常用词汇，这与"店で商品の特徴や価格を尋ねる"（在商店询问商品特点和价格