# Language Knowledge (Vocabulary)
Duration: 30 minutes
Content: This section tests your knowledge of Japanese vocabulary, including kanji readings, orthography, word formation, contextually-defined expressions, paraphrases, and usage
It mainly composes following five categories:
- ``Reading Kana`` (Pronunciation Questions): Given a kanji word, choose the correct kana reading.
- `Writing Kanji` (Writing Questions): Given a word written in kana, choose the correct kanji representation.
- `Word Meaning` Selection (Vocabulary Understanding): Choose the most suitable word to fill in the sentence from four options.
- `Synonym Replacement`: Select a word that has the same or similar meaning as the underlined word.
- `Vocabulary Usage`: Assess the usage of words in actual contexts, choosing the most appropriate word usage, including some common Japanese expressions or fixed phrases.

In [1]:
import pandas as pd
import json
import random
import os
import pickle
import re
import uuid
from typing import *
from langchain_openai import AzureOpenAI,AzureChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate
from dotenv import load_dotenv
from langchain_aws import ChatBedrock
from langchain.embeddings.base import Embeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.embeddings import HuggingFaceEmbeddings
# from langchain_community.embeddings import XinferenceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from typing import Annotated, Literal, Sequence
from typing_extensions import TypedDict
from libs.LLMs import *
from langchain import hub
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from typing import Annotated, Sequence
from typing_extensions import TypedDict
from langchain_core.messages import BaseMessage,RemoveMessage,HumanMessage,AIMessage,ToolMessage
from langgraph.graph.message import add_messages
from pydantic import BaseModel, Field
from langgraph.graph import END, StateGraph, START
from langgraph.prebuilt import ToolNode
from langgraph.prebuilt import tools_condition
from langgraph.checkpoint.memory import MemorySaver
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, validator
from typing import List, Optional

load_dotenv()

True

In [2]:
# Import N3 Vocabulary
file_path = '../Vocab/n3.csv'
# Read the CSV file
data = pd.read_csv(file_path)
words = data.iloc[:, :2].sample(frac=1).reset_index(drop=True)
# Display the content of the CSV file
vocab_dict = words.set_index(words.columns[0])[words.columns[1]].to_dict()
vocab_dict = json.dumps(vocab_dict, ensure_ascii=False, separators=(',', ':'))
vocab_dict

'{"学":"がく","裸":"はだか","覚ます":"さます","予防":"よぼう","代理":"だいり","濠":"ほり","幸運":"こううん","核":"かく","そっくり":"そっくり","さて":"さて","バランス":"バランス","周囲":"しゅうい","推薦":"すいせん","工場":"こうば","批判":"ひはん","重視":"じゅうし","玉":"たま","暮れ":"くれ","様子":"ようす","等しい":"ひとしい","平ら":"たいら","広がる":"ひろがる","丘":"おか","エンジン":"エンジン","掴む":"つかむ","磁器":"じき","塵":"ごみ","記事":"きじ","費用":"ひよう","生理":"せいり","たびたび":"たびたび","図る":"はかる","主要":"しゅよう","デモ":"デモ","酌む":"くむ","破る":"やぶる","以来":"いらい","課程":"かてい","勤め":"つとめ","有無":"うむ","課題":"かだい","羽根":"はね","掲示":"けいじ","与える":"あたえる","賛成":"さんせい","深める":"ふかめる","含む":"ふくむ","アップ":"アップ","雇う":"やとう","家具":"かぐ","診察":"しんさつ","スキー":"スキー","ドレス":"ドレス","物語":"ものがたり","向く":"むく","おめでとう":"おめでとう","出版":"しゅっぱん","越える":"こえる","可愛らしい":"かわいらしい","活用":"かつよう","当然":"とうぜん","微笑む":"ほほえむ","未来":"みらい","無":"む","捲る":"めくる","級":"きゅう","単純":"たんじゅん","揃う":"そろう","詩":"し","黒板":"こくばん","溜める":"ためる","自身":"じしん","息":"いき","正":"せい","デート":"デート","創造":"そうぞう","女子":"じょし","渋滞":"じゅうたい","在る; 有る":"ある","有利":"ゆうり","時刻":"じこく","意地悪":"いじわる","単語":"たんご","影響":"えいきょう","記憶":"きおく","象":"ぞう","動詞":"どうし","飛ばす":"とばす"

#### load Models

#### Exam Paper Outline
### A. overall thinking the structure of an exam
1. distribution of the difficulty 
2. topics
3. reasoning

In [3]:
from typing import List, Optional

from langchain_core.prompts import ChatPromptTemplate

from pydantic import BaseModel, Field

instruction = """
Section 1: Vocabulary and Grammar
- Kanji reading: 8 questions
- Write Chinese characters (choose Chinese characters): 6 questions
- Word Meaning Selection (Vocabulary Understanding): 11 questions
- Synonyms substitution: 5 questions
- word usage: 5 questions
- Grammar fill in the blank (single choice): 13 questions
- Sentence sorting: 5 questions
- Grammar structure selection (cloze test): 4-5 questions

Section 2: Reading Comprehension
- Short passages: 4 articles
- Mid-size passages: 2 articles
- Long passages: 1 articles
- Information retrieval: 1 articles

Section 3: Listening Comprehension
- Topic understanding: 6 questions
- Key understanding: 6 questions
- Summary understanding: 3 questions
- Speak up (actively express): 4 questions
- Immediate acknowledgment: 9 questions
"""

direct_gen_outline_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            f"You are a japanese teacher. Your job is to write an outline for a JLPT Japanese-Language Proficiency Test N3 level exam paper. the complexity should be restricted to N3 level and respect japanese culture. The JLPT Japanese-Language Proficiency Test exam paper includes a mix of easy, moderate, and difficult questions to accurately assess the test-taker's proficiency across different aspects of the language."
            f"First,  randomly pick words in 'Vocabulary' for questions in Vocabulary and Grammar,  But, randomly choose topics for questions in 'TopicList' for Reading Comprehension and Listening Comprehension Sections, don't repeat to choose a same word or topic"  
            f"Second, you should abide by the provided exam instruction and decide the number of questions and content in the each Section."
            f"Finally, write the outline of the examination paper in japanese and provide question topics according to the instructions."
            f"Instruction: {instruction}",
        ),
        ("user", "TopicList: {topic_list}, Vocabulary: {vocab_dict}"),
    ]
)


## Data Strcuture

In [4]:
class QuestionTopic(BaseModel):
    question: str = Field(..., title="a vocabulary or topic hint for a question")
    
    
class Subsection(BaseModel):
    subsection_title: str = Field(..., title="Topic of the subsection")
    description: str = Field(..., title="giving the number of questions and requirements")
    question_topics: Optional[List[QuestionTopic]] = Field(
        default_factory=list
    )
    
    @property
    def as_str(self) -> str:
        question_topics_str = "\n".join(
            f"- **{qt.question}**" for qt in self.question_topics
        )
        return f"### {self.subsection_title}\n\n{self.description}\n\n{question_topics_str}".strip()

class Section(BaseModel):
    section_title: str = Field(..., title="Title of the section")
    # description: str = Field(..., title="Ideas of this section")
    subsections: Optional[List[Subsection]] = Field(
        default_factory=list,
        title="Titles and reason for each subsection of the JLPT exam page.",
    )

    @property
    def as_str(self) -> str:
        subsections = "\n\n".join(
            subsection.as_str for subsection in self.subsections or []
        )
        return f"## {self.section_title}\n\n{subsections}".strip()


class Outline(BaseModel):
    page_title: str = Field(..., title="Title of the JLPT exam page")
    sections: List[Section] = Field(
        default_factory=list,
        title="Titles and descriptions for each section of the JLPT exam paper.",
    )

    @property
    def as_str(self) -> str:
        sections = "\n\n".join(section.as_str for section in self.sections)
        return f"# {self.page_title}\n\n{sections}".strip()


In [5]:
# Read the topics from a file, sort them, and print the sorted list  
def process_topics(file_path):
    try:  
        # Read the file  
        with open(file_path, 'r', encoding='utf-8') as file:  
            topics = file.readlines()  
          
        # Remove any extra whitespace or newline characters  
        topics = [topic.strip() for topic in topics if topic.strip()]  
          
        # Shuffle the topics randomly  
        random.shuffle(topics)  
                
    except FileNotFoundError:  
        print("The file was not found. Please check the file path.")  
    except Exception as e:  
        print("An error occurred:", str(e)) 
      
    except FileNotFoundError:  
        print("The file was not found. Please check the file path.")  
    except Exception as e:  
        print("An error occurred:", str(e)) 

In [6]:
topic_list = process_topics("../Vocab/topics.txt")
generate_outline_direct = direct_gen_outline_prompt | azure_llm.with_structured_output(Outline)
initial_outline = generate_outline_direct.invoke({"topic_list": topic_list, "vocab_dict": vocab_dict})

In [7]:
initial_outline

Outline(page_title='JLPT N3 試験問題構成', sections=[Section(section_title='語彙・文法', subsections=[Subsection(subsection_title='漢字読み取り', description='8問: 漢字の読みを選ぶ問題', question_topics=[QuestionTopic(question='学'), QuestionTopic(question='裸'), QuestionTopic(question='覚ます'), QuestionTopic(question='予防'), QuestionTopic(question='代理'), QuestionTopic(question='濠'), QuestionTopic(question='幸運'), QuestionTopic(question='核')]), Subsection(subsection_title='漢字書き取り', description='6問: 漢字を書き取る問題', question_topics=[QuestionTopic(question='そっくり'), QuestionTopic(question='さて'), QuestionTopic(question='バランス'), QuestionTopic(question='周囲'), QuestionTopic(question='推薦'), QuestionTopic(question='工場')]), Subsection(subsection_title='語彙意味選択', description='11問: 語彙の意味を選ぶ問題', question_topics=[QuestionTopic(question='批判'), QuestionTopic(question='重視'), QuestionTopic(question='玉'), QuestionTopic(question='暮れ'), QuestionTopic(question='様子'), QuestionTopic(question='等しい'), QuestionTopic(question='平ら'), QuestionTopic(quest

In [8]:
from IPython.display import display, Markdown, Latex
display(Markdown(initial_outline.as_str))

# JLPT N3 試験問題構成

## 語彙・文法

### 漢字読み取り

8問: 漢字の読みを選ぶ問題

- **学**
- **裸**
- **覚ます**
- **予防**
- **代理**
- **濠**
- **幸運**
- **核**

### 漢字書き取り

6問: 漢字を書き取る問題

- **そっくり**
- **さて**
- **バランス**
- **周囲**
- **推薦**
- **工場**

### 語彙意味選択

11問: 語彙の意味を選ぶ問題

- **批判**
- **重視**
- **玉**
- **暮れ**
- **様子**
- **等しい**
- **平ら**
- **広がる**
- **丘**
- **エンジン**
- **掴む**

### 類義語置き換え

5問: 類義語を選ぶ問題

- **磁器**
- **塵**
- **記事**
- **費用**
- **生理**

### 語彙使用

5問: 語彙を使用する問題

- **たびたび**
- **図る**
- **主要**
- **デモ**
- **酌む**

### 文法穴埋め

13問: 文法の穴埋め問題

- **破る**
- **以来**
- **課程**
- **勤め**
- **有無**
- **課題**
- **羽根**
- **掲示**
- **与える**
- **賛成**
- **深める**
- **含む**
- **アップ**

### 文の並び替え

5問: 文の並び替え問題

- **雇う**
- **家具**
- **診察**
- **スキー**
- **ドレス**

### 文法構造選択

4-5問: 文法構造を選ぶ問題

- **物語**
- **向く**
- **おめでとう**
- **出版**
- **越える**

## 読解

### 短文読解

4記事: 短文を読んで質問に答える

- **日常生活**
- **買い物**
- **旅行**
- **健康**

### 中文読解

2記事: 中文を読んで質問に答える

- **仕事**
- **教育**

### 長文読解

1記事: 長文を読んで質問に答える

- **環境問題**

### 情報検索

1記事: 情報を検索する問題

- **イベント案内**

## 聴解

### 話題理解

6問: 話題を理解する問題

- **ニュース**
- **会話**

### 要点理解

6問: 要点を理解する問題

- **講義**
- **説明**

### 概要理解

3問: 概要を理解する問題

- **旅行ガイド**

### 発表

4問: 発表を聞いて答える問題

- **意見交換**

### 即時応答

9問: 即時応答する問題

- **日常会話**