# Language Knowledge (Vocabulary)
Duration: 30 minutes
Content: This section tests your knowledge of Japanese vocabulary, including kanji readings, orthography, word formation, contextually-defined expressions, paraphrases, and usage
It mainly composes following five categories:
- ``Reading Kana`` (Pronunciation Questions): Given a kanji word, choose the correct kana reading.
- `Writing Kanji` (Writing Questions): Given a word written in kana, choose the correct kanji representation.
- `Word Meaning` Selection (Vocabulary Understanding): Choose the most suitable word to fill in the sentence from four options.
- `Synonym Replacement`: Select a word that has the same or similar meaning as the underlined word.
- `Vocabulary Usage`: Assess the usage of words in actual contexts, choosing the most appropriate word usage, including some common Japanese expressions or fixed phrases.

In [390]:
import pandas as pd
import json
import random
import os
import pickle
import re
import uuid
from typing import *
from langchain_openai import AzureOpenAI,AzureChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate
from dotenv import load_dotenv
from langchain_aws import ChatBedrock
from langchain.embeddings.base import Embeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.embeddings import HuggingFaceEmbeddings
# from langchain_community.embeddings import XinferenceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from typing import Annotated, Literal, Sequence
from typing_extensions import TypedDict
from Libs.LLMs import *
from langchain import hub
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from typing import Annotated, Sequence
from typing_extensions import TypedDict
from langchain_core.messages import BaseMessage,RemoveMessage,HumanMessage,AIMessage,ToolMessage
from langgraph.graph.message import add_messages
from pydantic import BaseModel, Field
from langgraph.graph import END, StateGraph, START
from langgraph.prebuilt import ToolNode
from langgraph.prebuilt import tools_condition
from langgraph.checkpoint.memory import MemorySaver
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, validator
from typing import List, Optional

load_dotenv()

True

In [391]:
# Import N3 Vocabulary
file_path = '../Vocab/n3.csv'
# Read the CSV file
data = pd.read_csv(file_path)
words = data.iloc[:, :2].sample(frac=1).reset_index(drop=True)
# Display the content of the CSV file
vocab_dict = words.set_index(words.columns[0])[words.columns[1]].to_dict()
vocab_dict = json.dumps(vocab_dict, ensure_ascii=False, separators=(',', ':'))
vocab_dict

'{"相変わらず":"あいかわらず","課程":"かてい","選挙":"せんきょ","健康":"けんこう","椀":"わん","下り":"くだり","～観":"かん","デザート":"デザート","暮らし":"くらし","届く":"とどく","今にも":"いまにも","動揺":"どうよう","無料":"むりょう","先日":"せんじつ","スタンド":"スタンド","演説":"えんぜつ","舌":"した","グラス":"グラス","ハンサム":"ハンサム","限界":"げんかい","いらいら":"いらいら","割る":"わる","署名":"しょめい","食糧":"しょくりょう","例外":"れいがい","とんでもない":"とんでもない","公正":"こうせい","末":"まつ","例え":"たとえ","ダウン":"ダウン","革":"かわ","針":"はり","筆":"ふで","審判":"しんぱん","独立":"どくりつ","外交":"がいこう","知恵":"ちえ","拭く":"ふく","高める":"たかめる","具体":"ぐたい","伝統":"でんとう","膝":"ひざ","さっぱり":"さっぱり","考え":"かんがえ","鋭い":"するどい","故郷":"こきょう","次々":"つぎつぎ","汚染":"おせん","助かる":"たすかる","ミルク":"ミルク","刷る":"する","布":"ぬの","好み":"このみ","校舎":"こうしゃ","わがまま":"わがまま","同時":"どうじ","協議":"きょうぎ","発行":"はっこう","生年月日":"せいねんがっぴ","金銭":"きんせん","馬":"うま","勇気":"ゆうき","症状":"しょうじょう","故人":"こじん","刺激":"しげき","会員":"かいいん","トン":"トン","缶":"かん","いつのまにか":"いつのまにか","演技":"えんぎ","協調":"きょうちょう","植物":"しょくぶつ","物質":"ぶっしつ","影響":"えいきょう","大家":"おおや","反省":"はんせい","負う":"おう","はさみ":"はさみ","程度":"ていど","常識":"じょうしき","風景":"ふうけい","エンジン":"エンジン","東洋":"とうよう","犯罪":"はんざい",

#### load Models

#### Exam Paper Outline
### A. overall thinking the structure of an exam
1. distribution of the difficulty 
2. topics
3. reasoning

In [392]:
from typing import List, Optional

from langchain_core.prompts import ChatPromptTemplate

from pydantic import BaseModel, Field

instruction = """
Section 1: Vocabulary and Grammar
- Kanji reading: 8 questions
- Write Chinese characters (choose Chinese characters): 6 questions
- Word Meaning Selection (Vocabulary Understanding): 11 questions
- Synonyms substitution: 5 questions
- word usage: 5 questions
- Grammar fill in the blank (single choice): 13 questions
- Sentence sorting: 5 questions
- Grammar structure selection (cloze test): 4-5 questions

Section 2: Reading Comprehension
- Short passages: 4 articles
- Mid-size passages: 2 articles
- Long passages: 1 articles
- Information retrieval: 1 articles

Section 3: Listening Comprehension
- Topic understanding: 6 questions
- Key understanding: 6 questions
- Summary understanding: 3 questions
- Speak up (actively express): 4 questions
- Immediate acknowledgment: 9 questions
"""

direct_gen_outline_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            f"You are a japanese teacher. Your job is to write an outline for a JLPT Japanese-Language Proficiency Test N3 level exam paper. the complexity should be restricted to N3 level and respect japanese culture. The JLPT Japanese-Language Proficiency Test exam paper includes a mix of easy, moderate, and difficult questions to accurately assess the test-taker's proficiency across different aspects of the language."
            f"First,  randomly pick words in 'Vocabulary' for questions in Vocabulary and Grammar,  But, randomly choose topics for questions in 'TopicList' for Reading Comprehension and Listening Comprehension Sections, don't repeat to choose a same word or topic"  
            f"Second, you should abide by the provided exam instruction and decide the number of questions and content in the each Section."
            f"Finally, write the outline of the examination paper in japanese and provide question topics according to the instructions."
            f"Instruction: {instruction}",
        ),
        ("user", "TopicList: {topic_list}, Vocabulary: {vocab_dict}"),
    ]
)


## Data Strcuture

In [393]:
class QuestionTopic(BaseModel):
    question: str = Field(..., title="a vocabulary or topic hint for a question")
    
    
class Subsection(BaseModel):
    subsection_title: str = Field(..., title="Topic of the subsection")
    description: str = Field(..., title="giving the number of questions and requirements")
    question_topics: Optional[List[QuestionTopic]] = Field(
        default_factory=list
    )
    
    @property
    def as_str(self) -> str:
        question_topics_str = "\n".join(
            f"- **{qt.question}**" for qt in self.question_topics
        )
        return f"### {self.subsection_title}\n\n{self.description}\n\n{question_topics_str}".strip()

class Section(BaseModel):
    section_title: str = Field(..., title="Title of the section")
    # description: str = Field(..., title="Ideas of this section")
    subsections: Optional[List[Subsection]] = Field(
        default_factory=list,
        title="Titles and reason for each subsection of the JLPT exam page.",
    )

    @property
    def as_str(self) -> str:
        subsections = "\n\n".join(
            subsection.as_str for subsection in self.subsections or []
        )
        return f"## {self.section_title}\n\n{subsections}".strip()


class Outline(BaseModel):
    page_title: str = Field(..., title="Title of the JLPT exam page")
    sections: List[Section] = Field(
        default_factory=list,
        title="Titles and descriptions for each section of the JLPT exam paper.",
    )

    @property
    def as_str(self) -> str:
        sections = "\n\n".join(section.as_str for section in self.sections)
        return f"# {self.page_title}\n\n{sections}".strip()


In [394]:
# Read the topics from a file, sort them, and print the sorted list  
def process_topics(file_path):
    try:  
        # Read the file  
        with open(file_path, 'r', encoding='utf-8') as file:  
            topics = file.readlines()  
          
        # Remove any extra whitespace or newline characters  
        topics = [topic.strip() for topic in topics if topic.strip()]  
          
        # Shuffle the topics randomly  
        random.shuffle(topics)  
                
    except FileNotFoundError:  
        print("The file was not found. Please check the file path.")  
    except Exception as e:  
        print("An error occurred:", str(e)) 
      
    except FileNotFoundError:  
        print("The file was not found. Please check the file path.")  
    except Exception as e:  
        print("An error occurred:", str(e)) 

In [395]:
topic_list = process_topics("../Vocab/topics.txt")
generate_outline_direct = direct_gen_outline_prompt | azure_llm.with_structured_output(Outline)
initial_outline = generate_outline_direct.invoke({"topic_list": topic_list, "vocab_dict": vocab_dict})

In [396]:
initial_outline

Outline(page_title='JLPT N3 Examination Paper', sections=[Section(section_title='語彙と文法', subsections=[Subsection(subsection_title='漢字の読み', description='8問。与えられた漢字の読み方を選択する問題。', question_topics=[QuestionTopic(question='相変わらず'), QuestionTopic(question='課程'), QuestionTopic(question='選挙'), QuestionTopic(question='健康'), QuestionTopic(question='椀'), QuestionTopic(question='下り'), QuestionTopic(question='～観'), QuestionTopic(question='デザート')]), Subsection(subsection_title='漢字を書く', description='6問。与えられた言葉の漢字表記を記入する問題。', question_topics=[QuestionTopic(question='暮らし'), QuestionTopic(question='届く'), QuestionTopic(question='今にも'), QuestionTopic(question='動揺'), QuestionTopic(question='無料'), QuestionTopic(question='先日')]), Subsection(subsection_title='語彙の意味選択', description='11問。与えられた単語の意味を選択する問題。', question_topics=[QuestionTopic(question='スタンド'), QuestionTopic(question='演説'), QuestionTopic(question='舌'), QuestionTopic(question='グラス'), QuestionTopic(question='ハンサム'), QuestionTopic(question='限界'), Quest

In [397]:
from IPython.display import display, Markdown, Latex
display(Markdown(initial_outline.as_str))

# JLPT N3 Examination Paper

## 語彙と文法

### 漢字の読み

8問。与えられた漢字の読み方を選択する問題。

- **相変わらず**
- **課程**
- **選挙**
- **健康**
- **椀**
- **下り**
- **～観**
- **デザート**

### 漢字を書く

6問。与えられた言葉の漢字表記を記入する問題。

- **暮らし**
- **届く**
- **今にも**
- **動揺**
- **無料**
- **先日**

### 語彙の意味選択

11問。与えられた単語の意味を選択する問題。

- **スタンド**
- **演説**
- **舌**
- **グラス**
- **ハンサム**
- **限界**
- **いらいら**
- **割る**
- **署名**
- **食糧**
- **例外**

### 類義語の置き換え

5問。与えられた単語を類義語に置き換える問題。

- **とんでもない**
- **公正**
- **末**
- **例え**
- **ダウン**

### 語の使い方

5問。与えられた語を適切に使う問題。

- **革**
- **針**
- **筆**
- **審判**
- **独立**

### 文法穴埋め

13問。文中の空欄に適切な文法を選択する問題。

- **外交**
- **知恵**
- **拭く**
- **高める**
- **具体**
- **伝統**
- **膝**
- **さっぱり**
- **考え**
- **鋭い**
- **故郷**
- **次々**
- **汚染**

### 文の並び替え

5問。文の語順を正しく並び替える問題。

- **助かる**
- **ミルク**
- **刷る**
- **布**
- **好み**

### 文法構造選択

4-5問。文中の適切な文法構造を選択する問題。

- **校舎**
- **わがまま**
- **同時**
- **協議**

## 読解

### 短文

4つの短い文章を読んで内容を理解する問題。

- **健康と生活**
- **日本の伝統**
- **自然環境**
- **交通と移動**

### 中文

2つの中程度の長さの文章を読んで内容を理解する問題。

- **教育と学習**
- **文化と交流**

### 長文

1つの長い文章を読んで内容を理解する問題。

- **経済と社会**

### 情報検索

1つの文章から指定された情報を検索する問題。

- **観光情報**

## 聴解

### 話題理解

6問。短い会話や話題を聞いて内容を理解する問題。

- **買い物**
- **食事**
- **趣味**
- **旅行**
- **仕事**
- **家族**

### 要点理解

6問。会話や説明を聞いて要点を理解する問題。

- **教育**
- **健康**
- **文化**
- **自然**
- **交通**
- **社会**

### 概要理解

3問。長めの会話や説明を聞いて概要を理解する問題。

- **経済**
- **環境**
- **生活**

### 発言

4問。会話の中で適切に発言する問題。

- **意見交換**
- **質問回答**
- **提案**
- **問題解決**

### 即時応答

9問。短い会話の中で即時に応答する問題。

- **挨拶**
- **感謝**
- **謝罪**
- **依頼**
- **確認**
- **説明**
- **提案**
- **応答**
- **意見**