### Japanese Language Flashcard Generation

#### Overview
This notebook automates the creation of Japanese language Anki flashcards from textbook images. It combines OCR technology with large language model processing to extract, verify, and format vocabulary into ready-to-import flashcards.

#### Features
- Extract Japanese text from textbook images using OCR API
- Cross-reference extracted text with original images for accuracy
- Generate structured CSV flashcards with proper formatting
- Support for contextual vocabulary notes and usage examples

#### Workflow
1. **Image Upload**: Upload Japanese textbook page images
2. **Text Extraction**: Use OCR API to extract text from the images
3. **LLM Processing**: Send both the extracted text and original image to an LLM
4. **Flashcard Generation**: Generate structured CSV data using specialized prompts from ```LLM_Prompts.py```
5. **Export**: Save the resulting flashcards in Anki-compatible CSV format

#### Flashcard Format
The generated flashcards follow a specific CSV structure:
- **Kanji column**: Contains the word in Kanji (or Hiragana/Katakana if no Kanji exists)
- **Furigana column**: Contains the phonetic reading of the word in Hiragana
- **English_Translation_and_Notes column**: Contains both the English translation and any usage or contextual notes

Example output:
```
"迷う [道に～]","まよう [みちに～]","lose one's way (e.g., get lost on the road)"
"先輩","せんぱい","senior (student, colleague, etc.)"
```

#### Benefits
- **Accuracy**: Cross-references OCR text with the original image to fix errors
- **Context-Aware**: Preserves usage examples and contextual information
- **Time-Saving**: Automates the tedious process of manual flashcard creation
- **Customizable**: Prompts can be adjusted for different textbook formats

#### Applications
- Creating comprehensive JLPT study materials
- Building personal vocabulary decks from textbooks
- Supplementing classroom learning with digital flashcards
- Archiving vocabulary from various Japanese learning resources

In [5]:
# Importing the required libraries
from unstract.llmwhisperer import LLMWhispererClientV2
import os
from dotenv import load_dotenv

# Defining variables
load_dotenv()
unstract_api_url = os.getenv("LLMWHISPERER_BASE_URL_V2")
unstract_api_key = os.getenv("LLMWHISPERER_API_KEY")
image_path = r"C:\Users\Rounak\OneDrive\Documents\Japanese Study\Flashcards - Textbook Photos and Text Files\MNN Intermediate 1\Chatpter 1\01page_1.jpg"

# Check if image pathe exists
if not os.path.exists(image_path):
    print("Image path does not exist")
    exit()

# Extract text from the image using LLMWhisperer
client = LLMWhispererClientV2(base_url=unstract_api_url, api_key=unstract_api_key, logging_level="ERROR")  # Change the logging level to "ERROR" for production
result= client.whisper(file_path=image_path, wait_for_completion=True)
print(result["extraction"]["result_text"])



                             Lesson          1 

ど の よ う に                            how 

 迷 う [ 道 に ~]    ま よ う [ み ち に ~]    lose [one's way] 

先 輩              せ ん ぱ い             senior (student, colleague, etc.) 

 ま る で                               just (as in X is just like Y') 

 明 る い           あ か る い             cheerful [personality] 

   [ 性 格 が ~]      [ せ い か く が ~] 
                                             は は お や 
 父 親             ち ち お や             father (cf. 母 親 :mother) 

 湖               み ず う み             lake 

 目 指 す           め ざ す               aim at, have one's eye on 
 命               い の ち               life 

 お せ ち 料 理       お せ ち り ょ う り       traditional Japanese food for the New Year 

 初 詣 で           は つ も う で           traditional practice of visiting a shrine or 

                                      temple during the New Year to pray for 

                                      happiness 
 畳               た た み               tatami ma