# Language Knowledge (Vocabulary)
Duration: 30 minutes
Content: This section tests your knowledge of Japanese vocabulary, including kanji readings, orthography, word formation, contextually-defined expressions, paraphrases, and usage
It mainly composes following five categories:
- ``Reading Kana`` (Pronunciation Questions): Given a kanji word, choose the correct kana reading.
- `Writing Kanji` (Writing Questions): Given a word written in kana, choose the correct kanji representation.
- `Word Meaning` Selection (Vocabulary Understanding): Choose the most suitable word to fill in the sentence from four options.
- `Synonym Replacement`: Select a word that has the same or similar meaning as the underlined word.
- `Vocabulary Usage`: Assess the usage of words in actual contexts, choosing the most appropriate word usage, including some common Japanese expressions or fixed phrases.

In [28]:
import pandas as pd
import json
import random
import os
import pickle
import re
import uuid
import threading
import asyncio
from typing import *
from tqdm import tqdm
import time
import yaml
import sys
import asyncio
import json
import random
import time
from tqdm.asyncio import tqdm_asyncio
from graphs.common.utils import collect_vocabulary
from langchain_openai import AzureOpenAI,AzureChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate
from dotenv import load_dotenv
from langchain_aws import ChatBedrock
from langchain.embeddings.base import Embeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.embeddings import HuggingFaceEmbeddings
# from langchain_community.embeddings import XinferenceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from typing import Annotated, Literal, Sequence
from typing_extensions import TypedDict
from libs.LLMs import *
from langchain import hub
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from typing import Annotated, Sequence
from typing_extensions import TypedDict
from langchain_core.messages import BaseMessage,RemoveMessage,HumanMessage,AIMessage,ToolMessage
from langgraph.graph.message import add_messages
from pydantic import BaseModel, Field
from langgraph.graph import END, StateGraph, START
from langgraph.prebuilt import ToolNode
from langgraph.prebuilt import tools_condition
from langgraph.checkpoint.memory import MemorySaver
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, validator
from typing import List, Optional

if sys.platform.startswith("win"):
    asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

from ExamTaskHandler import ExamTaskHandler
load_dotenv()

True

In [29]:
# Import N3 Vocabulary
file_path = '../../Vocab/n3.csv'
# Display the content of the CSV file
vocab_dict = collect_vocabulary(file_path)
with open("../../Vocab/topics.txt", "r", encoding="utf-8") as file:
    topics_list = [line.strip() for line in file]

#### load Models

#### Exam Paper Outline
### A. overall thinking the structure of an exam
1. distribution of the difficulty 
2. topics
3. reasoning

In [30]:
from typing import List, Optional

from langchain_core.prompts import ChatPromptTemplate

from pydantic import BaseModel, Field

# instruction = """
# Section 1: Vocabulary and Grammar
# - Kanji reading (kanji_reading): 8 questions
# - Write Chinese characters (write_chinese): 6 questions
# - Word Meaning Selection (word_meaning): 11 questions
# - Synonyms substitution (synonym_substitution): 5 questions
# - word usage (word_usage): 5 questions
# - Grammar fill in the blank (sentence_grammar): 13 questions
# - Sentence sorting (sentence_sort): 5 questions
# - Grammar structure selection (sentence_structure): 4-5 questions
# 
# Section 2: Reading Comprehension
# - Short passages (short_passage_read): 4 articles
# - Mid-size passages (midsize_passage_read): 2 articles
# - Long passages (long_passage_read): 1 articles
# - Information retrieval (info_retrieval): 1 articles
# 
# Section 3: Listening Comprehension
# - Topic understanding (topic_understanding): 6 questions
# - Key understanding (keypoint_understanding): 6 questions
# - Summary understanding (summary_understanding): 3 questions
# - Active expression (active_expression): 4 questions
# - Immediate acknowledgment (immediate_ack): 9 questions
# """

instruction = """
Section 1: Vocabulary and Grammar
- Kanji reading (kanji_reading): 1 question
- Write Chinese characters (write_chinese): 1 question
- Word Meaning Selection (word_meaning): 1 question
- Synonyms substitution (synonym_substitution): 1 question
- word usage (word_usage): 1 question
- Grammar fill in the blank (sentence_grammar): 1 question
- Sentence sorting (sentence_sort): 1 question
- Grammar structure selection (sentence_structure): 1 question

Section 2: Reading Comprehension
- Short passages (short_passage_read): 1 articles
- Mid-size passages (midsize_passage_read): 1 articles
- Long passages (long_passage_read): 1 articles
- Information retrieval (info_retrieval): 1 articles

Section 3: Listening Comprehension
- Topic understanding (topic_understanding): 1 questions
- Key understanding (keypoint_understanding): 1 questions
- Summary understanding (summary_understanding): 1 questions
- Active expression (active_expression): 1 questions
- Immediate acknowledgment (immediate_ack): 1 questions
"""


direct_gen_outline_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            f"You are a japanese teacher. Your job is to write an outline for a JLPT Japanese-Language Proficiency Test N3 level exam paper. the complexity should be restricted to N3 level and respect japanese culture. The JLPT Japanese-Language Proficiency Test exam paper includes a mix of easy, moderate, and difficult questions to accurately assess the test-taker's proficiency across different aspects of the language."
            f"First,  randomly pick words in 'Vocabulary' for questions in Vocabulary and Grammar. At Section 2 and 3, randomly choose topics in 'TopicList' for Reading Comprehension and Listening Comprehension Sections, don't repeat to choose a same word or topic"  
            f"Second, you should abide by the provided exam instruction and decide the number of questions and content in the each Section."
            f"Finally, write the outline of the examination paper in japanese and provide question topics according to the instructions."
            f"Instruction: {instruction}",
        ),
        ("user", "TopicList: {topic_list}, Vocabulary: {vocab_dict}"),
    ]
)


## Data Strcuture

In [31]:
class QuestionTopic(BaseModel):
    topic: str = Field(..., title="a vocabulary or topic hint for a question")
    
    
class Subsection(BaseModel):
    subsection_title: str = Field(..., title="subsection English word in () only from the instruction")
    description: str = Field(..., title="giving the number of questions and requirements")
    question_topics: Optional[List[QuestionTopic]] = Field(
        default_factory=list
    )
    
    @property
    def as_str(self) -> str:
        question_topics_str = "\n".join(
            f"- **{qt.topic}**" for qt in self.question_topics
        )
        return f"### {self.subsection_title}\n\n{self.description}\n\n{question_topics_str}".strip()

class Section(BaseModel):
    section_title: str = Field(..., title="Title of the section")
    subsections: Optional[List[Subsection]] = Field(
        default_factory=list,
        title="Titles and reason for each subsection of the JLPT exam page.",
    )

    @property
    def as_str(self) -> str:
        subsections = "\n\n".join(
            subsection.as_str for subsection in self.subsections or []
        )
        return f"## {self.section_title}\n\n{subsections}".strip()


class Outline(BaseModel):
    page_title: str = Field(..., title="Title of the JLPT exam page")
    sections: List[Section] = Field(
        default_factory=list,
        title="Titles and descriptions for each section of the JLPT exam paper.",
    )

    @property
    def as_str(self) -> str:
        sections = "\n\n".join(section.as_str for section in self.sections)
        return f"# {self.page_title}\n\n{sections}".strip()


In [32]:
# Read the topics from a file, sort them, and print the sorted list  
def process_topics(file_path):
    try:  
        # Read the file  
        with open(file_path, 'r', encoding='utf-8') as file:  
            topics = file.readlines()  
          
        # Remove any extra whitespace or newline characters  
        topics = [topic.strip() for topic in topics if topic.strip()]  
          
        # Shuffle the topics randomly  
        random.shuffle(topics)  
                
    except FileNotFoundError:  
        print("The file was not found. Please check the file path.")  
    except Exception as e:  
        print("An error occurred:", str(e)) 
      
    except FileNotFoundError:  
        print("The file was not found. Please check the file path.")  
    except Exception as e:  
        print("An error occurred:", str(e)) 

In [33]:
# Preload all topics from the file
with open("../../Vocab/topics.txt", "r", encoding="utf-8") as file:
    topics_list = [line.strip() for line in file]

generate_outline_direct = direct_gen_outline_prompt | azure_llm.with_structured_output(Outline)
initial_outline = generate_outline_direct.invoke({"topic_list": topics_list, "vocab_dict": vocab_dict})

In [34]:
from IPython.display import display, Markdown, Latex
display(Markdown(initial_outline.as_str))

# JLPT N3 レベル試験問題

## 語彙と文法

### Kanji reading (kanji_reading)

1問。与えられた漢字の読みを選択する。

- **郵便**

### Write Chinese characters (write_chinese)

1問。与えられた言葉の漢字表記を書く。

- **以前**

### Word Meaning Selection (word_meaning)

1問。与えられた言葉の意味を選択する。

- **泳ぎ**

### Synonyms substitution (synonym_substitution)

1問。与えられた言葉と同じ意味の言葉を選択する。

- **語学**

### word usage (word_usage)

1問。与えられた言葉を正しい文脈で使用する。

- **愛する**

### Grammar fill in the blank (sentence_grammar)

1問。与えられた文の空欄を正しい文法で埋める。

- **語る**

### Sentence sorting (sentence_sort)

1問。与えられた文の順序を正しい形に並べる。

- **率**

### Grammar structure selection (sentence_structure)

1問。与えられた文に適切な文法構造を選択する。

- **拭く**

## 読解

### Short passages (short_passage_read)

1問。短い文章を読んで質問に答える。

- **店で価格を尋ねる**

### Mid-size passages (midsize_passage_read)

1問。中程度の長さの文章を読んで質問に答える。

- **購入したい商品の説明**

### Long passages (long_passage_read)

1問。長い文章を読んで質問に答える。

- **割引交渉**

### Information retrieval (info_retrieval)

1問。文章から情報を見つけ出す。

- **レストランで食べ物を注文する**

## 聴解

### Topic understanding (topic_understanding)

1問。短い会話を聞いてその話題を理解する。

- **食事の好みについて話す**

### Key understanding (keypoint_understanding)

1問。会話を聞いて重要な情報を理解する。

- **料理を褒める**

### Summary understanding (summary_understanding)

1問。会話の要約を理解する。

- **道を尋ねる**

### Active expression (active_expression)

1問。会話の中で適切な応答を選択する。

- **交通手段について話す**

### Immediate acknowledgment (immediate_ack)

1問。会話を聞いて即座に理解する。

- **交通状況について話す**

In [None]:
outliner_json = initial_outline.model_dump_json()
data = json.loads(outliner_json)  # Replace with your actual JSON data

start_time = time.time()

output_data = {'sections': []}

for section in data['sections']:
    output_section = {'section_title': section['section_title'], 'subsections': []}

    for subsection in tqdm(section['subsections'], desc=f"Processing {section['section_title']}"):
        function_name = subsection['subsection_title'].split(' (')[1].rstrip(')')
        questions = subsection['question_topics']

        for question in tqdm(questions, desc=f"Processing {subsection['subsection_title']}"):
            handler = ExamTaskHandler(vocab=vocab_dict)
            func = getattr(handler, function_name, None)

            if func:
                max_attempts = 2
                original_topic = question['topic']  # Optional: Track original topic
                for attempt in range(max_attempts):
                    try:
                        result = json.loads(func(question['topic']))
                        question['result'] = result
                        break  # Exit on success
                    except Exception as e:
                        if attempt < max_attempts - 1:
                            # Replace topic and retry
                            question['topic'] = random.choice(topics_list)
                        else:
                            question['result'] = f"Error after {max_attempts} attempts: {str(e)}"
            else:
                question['result'] = f"Method {function_name} not found"

        output_subsection = {
            'subsection_title': subsection['subsection_title'],
            'description': subsection['description'],
            'question_topics': questions
        }
        output_section['subsections'].append(output_subsection)

    output_data['sections'].append(output_section)

# End the timer
end_time = time.time()

# Calculate the total execution time
execution_time = end_time - start_time

print(f"Total execution time: {execution_time:.2f} seconds")

Processing 語彙と文法:   0%|          | 0/8 [00:00<?, ?it/s]
Processing Kanji reading (kanji_reading):   0%|          | 0/1 [00:00<?, ?it/s][A

---WEB SEARCH---
---Generator----
---REVISOR---
--- AI Reviser feels Good Enough ---
--- Formatter ---



Processing Kanji reading (kanji_reading): 100%|██████████| 1/1 [00:17<00:00, 17.42s/it][A
Processing 語彙と文法:  12%|█▎        | 1/8 [00:17<02:01, 17.42s/it]
Processing Write Chinese characters (write_chinese):   0%|          | 0/1 [00:00<?, ?it/s][A

---WEB SEARCH---
---Generator----
---REVISOR---
--- AI Reviser feels Good Enough ---
--- Formatter ---



Processing Write Chinese characters (write_chinese): 100%|██████████| 1/1 [00:14<00:00, 14.39s/it][A
Processing 語彙と文法:  25%|██▌       | 2/8 [00:31<01:33, 15.64s/it]
Processing Word Meaning Selection (word_meaning):   0%|          | 0/1 [00:00<?, ?it/s][A

---WEB SEARCH---
---Generator----
---REVISOR---
--- AI Reviser feels Good Enough ---
--- Formatter ---



Processing Word Meaning Selection (word_meaning): 100%|██████████| 1/1 [00:46<00:00, 46.79s/it][A
Processing 語彙と文法:  38%|███▊      | 3/8 [01:18<02:29, 29.86s/it]
Processing Synonyms substitution (synonym_substitution):   0%|          | 0/1 [00:00<?, ?it/s][A

---WEB SEARCH---
---Generator----
---REVISOR---
--- AI Reviser feels Good Enough ---
--- Formatter ---



Processing Synonyms substitution (synonym_substitution): 100%|██████████| 1/1 [00:16<00:00, 16.45s/it][A
Processing 語彙と文法:  50%|█████     | 4/8 [01:35<01:38, 24.57s/it]
Processing word usage (word_usage):   0%|          | 0/1 [00:00<?, ?it/s][A

---WEB SEARCH---
---Generator----
---REVISOR---
--- AI Reviser feels Good Enough ---
--- Formatter ---



Processing word usage (word_usage): 100%|██████████| 1/1 [00:52<00:00, 52.98s/it][A
Processing 語彙と文法:  62%|██████▎   | 5/8 [02:28<01:44, 34.81s/it]
Processing Grammar fill in the blank (sentence_grammar):   0%|          | 0/1 [00:00<?, ?it/s][A

---WEB SEARCH---
---Generator----
---REVISOR---
--- AI Reviser feels Good Enough ---
--- Formatter ---



Processing Grammar fill in the blank (sentence_grammar): 100%|██████████| 1/1 [00:16<00:00, 16.80s/it][A
Processing 語彙と文法:  75%|███████▌  | 6/8 [02:44<00:57, 28.69s/it]
Processing Sentence sorting (sentence_sort):   0%|          | 0/1 [00:00<?, ?it/s][A

---WEB SEARCH---


Adding a node to a graph that has already been compiled. This will not be reflected in the compiled graph.

Processing Sentence sorting (sentence_sort): 100%|██████████| 1/1 [00:01<00:00,  1.69s/it][A
Processing 語彙と文法:  88%|████████▊ | 7/8 [02:46<00:19, 19.87s/it]
Processing Grammar structure selection (sentence_structure):   0%|          | 0/1 [00:00<?, ?it/s][A

---WEB SEARCH---
---Generator----
---REVISOR---
--- AI Reviser feels Good Enough ---
--- Formatter ---



Processing Grammar structure selection (sentence_structure): 100%|██████████| 1/1 [00:27<00:00, 27.50s/it][A
Processing 語彙と文法: 100%|██████████| 8/8 [03:14<00:00, 24.26s/it]
Processing 読解:   0%|          | 0/4 [00:00<?, ?it/s]
Processing Short passages (short_passage_read):   0%|          | 0/1 [00:00<?, ?it/s][A

---WEB SEARCH---
---Generator----


In [None]:
yaml_str = yaml.dump(output_data, allow_unicode=True, sort_keys=False, default_flow_style=False)
# Display as formatted YAML
display(Markdown(yaml_str))