### Japanese Language Flashcard Generation

#### Overview
This notebook automates the creation of Japanese language Anki flashcards from textbook images. It combines OCR technology with large language model processing to extract, verify, and format vocabulary into ready-to-import flashcards.

#### Features
- Extract Japanese text from textbook images using OCR API
- Cross-reference extracted text with original images for accuracy
- Generate structured CSV flashcards with proper formatting
- Support for contextual vocabulary notes and usage examples

#### Workflow
1. **Image Upload**: Upload Japanese textbook page images  
2. **Text Extraction**: Use OCR API to extract text from the images (Currently the LLMWhisperer API)  
3. **LLM Processing**: Send both the extracted text and original image to an LLM (Currently the Gemini API is used)  
4. **Flashcard Generation**: Generate structured CSV data using specialized prompts from `LLM_Prompts.py`  
5. **Export**: Save the resulting flashcards in Anki-compatible CSV format  

#### Flashcard Format
The generated flashcards follow a specific CSV structure:
- **Kanji column**: Contains the word in Kanji (or Hiragana/Katakana if no Kanji exists)  
- **Furigana column**: Contains the phonetic reading of the word in Hiragana  
- **English_Translation_and_Notes column**: Contains both the English translation and any usage or contextual notes  

Example output:
```
"迷う [道に～]","まよう [みちに～]","lose one's way (e.g., get lost on the road)"  
"先輩","せんぱい","senior (student, colleague, etc.)"  
```

#### Benefits
- **Accuracy**: Cross-references OCR text with the original image to fix errors  
- **Context-Aware**: Preserves usage examples and contextual information  
- **Time-Saving**: Automates the tedious process of manual flashcard creation  
- **Customizable**: Prompts can be adjusted for different textbook formats  

#### Applications
- Creating comprehensive JLPT study materials  
- Building personal vocabulary decks from textbooks  
- Supplementing classroom learning with digital flashcards  
- Archiving vocabulary from various Japanese learning resources

In [None]:
# Saving the example images (which are used with the prompts in 'LLM_Prompts.py') as base64 strings in a JSON file
# This only has to be done once to generate the JSON file (or any time the example images are changed)
# Importing the required libraries
import base64
import json

# Defining variables
image_path_example_1 = r"C:\Users\Rounak\OneDrive\Documents\Japanese Study\Flashcards - Textbook Photos and Text Files\MNN Beginner 1 & 2 - Useful Words & Information\Chapter 12.jpg"
image_path_example_2 = r"C:\Users\Rounak\OneDrive\Documents\Japanese Study\Flashcards - Textbook Photos and Text Files\MNN Intermediate 1\Chatpter 1\01page_1.jpg"

# Function to encode an image to base64
# image_path: Path to the image file
# Returns: Base64 encoded string of the image
def image_to_base64(image_path):
    try:
        with open(image_path, "rb") as image_file:
            encoded_string = base64.b64encode(image_file.read()).decode('utf-8') # Decode to string for text storage
            return encoded_string
    except FileNotFoundError:
        print(f"Error: Image file not found at: {image_path}")
        return None
    except Exception as e:
        print(f"Error encoding image {image_path} to base64: {e}")
        return None

# Encode the example images to base64 strings
base64_string_example_1 = image_to_base64(image_path_example_1)
base64_string_example_2 = image_to_base64(image_path_example_2)

# Check if encoding was successful for both images
if base64_string_example_1 and base64_string_example_2: # Only proceed if encoding was successful for both
    # Create a dictionary to store the base64 strings
    base64_image_data = {
        "flashcard_image_example_1": base64_string_example_1,
        "flashcard_image_example_2": base64_string_example_2
    }

    # Write the dictionary to a JSON file
    json_filepath = "base64_example_images.json"
    try:
        with open(json_filepath, "w") as json_file:
            json.dump(base64_image_data, json_file, indent=4) # indent=4 for pretty formatting (optional)
        print(f"Base64 image strings written to: {json_filepath}")
    except Exception as e:
        print(f"Error writing to JSON file {json_filepath}: {e}")
else:
    print("Error: Could not encode one or more images to base64. JSON file not created.")

In [None]:
# Loading base64 images from a JSON file into a PIL Image object
# Importing the required libraries
import json
import base64
import PIL.Image
from io import BytesIO

# Function to load base64 images from a JSON file
# filepath: Path to the JSON file containing base64 image strings
# Returns: Dictionary with image names as keys and PIL Image objects as values
def load_base64_images_from_json(filepath="base64_example_images.json"):
    try:
        with open(filepath, 'r') as f:
            base64_images = json.load(f)
        return base64_images
    except FileNotFoundError:
        print(f"Error: File not found: {filepath}")
        return {}
    except json.JSONDecodeError:
        print(f"Error: Invalid JSON format in {filepath}")
        return {}

# Function to load a base64 image string into a PIL Image object
# base64_string: Base64 encoded image string
# Returns: PIL Image object
def load_image_from_base64(base64_string):
    try:
        image_bytes = base64.b64decode(base64_string)
        image = PIL.Image.open(BytesIO(image_bytes))
        return image
    except Exception as e:
        print(f"Error loading image from base64: {e}")
        return None

base64_example_image_dict = load_base64_images_from_json()
if base64_example_image_dict:
    image_example_1 = load_image_from_base64(base64_example_image_dict["flashcard_image_example_1"])
    image_example_2 = load_image_from_base64(base64_example_image_dict["flashcard_image_example_2"])

In [None]:
# Extracting text from an image using the LLMWhisperer API
# The LLMWhisperer API is a powerful OCR API that can be used to extract text from images.
# Importing the required libraries
from unstract.llmwhisperer import LLMWhispererClientV2
import os
from io import BytesIO
from dotenv import load_dotenv

# Defining variables
load_dotenv()
unstract_api_url = os.getenv("LLMWHISPERER_BASE_URL_V2")
unstract_api_key = os.getenv("LLMWHISPERER_API_KEY")
image_actual_path = r"C:\Users\Rounak\OneDrive\Documents\Japanese Study\Flashcards - Textbook Photos and Text Files\MNN Intermediate 1\Chatpter 1\01page_1.jpg"

# Check if image_actual_path exists
if not os.path.exists(image_actual_path):
    print("Image path does not exist")
    exit()

# Converting the image into a byte stream (IO[bytes] object) for use with the LLMWhisperer API
image_bytes = BytesIO(open(image_actual_path, "rb").read())

# Extract text from the image using LLMWhisperer
client = LLMWhispererClientV2(base_url=unstract_api_url, api_key=unstract_api_key, logging_level="ERROR")  # Change the logging level to "ERROR" for production
result = client.whisper(stream=image_bytes, wait_for_completion=True)
image_actual_extracted_text = result["extraction"]["result_text"]
print(image_actual_extracted_text)

In [None]:
# Generating the flashcards based on the supplied image (specified by 'image_path') and the extracted text (specified by 'image_actual_extracted_text')
# Importing the required libraries
import google.generativeai as genai
import PIL.Image
import json
from unstract.llmwhisperer import LLMWhispererClientV2
import os
from io import BytesIO
from dotenv import load_dotenv
import base64

# Importing all variables from LLM_Prompts.py
from LLM_Prompts import *

# Defining Functions
# Function to load base64 images from a JSON file
# filepath: Path to the JSON file containing base64 image strings
# Returns: Dictionary with image names as keys and PIL Image objects as values
def load_base64_images_from_json(filepath="base64_example_images.json"):
    try:
        with open(filepath, 'r') as f:
            base64_images = json.load(f)
        return base64_images
    except FileNotFoundError:
        print(f"Error: File not found: {filepath}")
        return {}
    except json.JSONDecodeError:
        print(f"Error: Invalid JSON format in {filepath}")
        return {}

# Function to load a base64 image string into a PIL Image object
# base64_string: Base64 encoded image string
# Returns: PIL Image object
def load_image_from_base64(base64_string):
    try:
        image_bytes = base64.b64decode(base64_string)
        image = PIL.Image.open(BytesIO(image_bytes))
        return image
    except Exception as e:
        print(f"Error loading image from base64: {e}")
        return None

# Defining the variables
load_dotenv()
gemini_api_key = os.getenv("GOOGLE_GEMINI_API_KEY")
unstract_api_url = os.getenv("LLMWHISPERER_BASE_URL_V2")
unstract_api_key = os.getenv("LLMWHISPERER_API_KEY")

genai.configure(api_key=gemini_api_key)
model = genai.GenerativeModel("gemini-2.0-flash")
# model = genai.GenerativeModel("gemini-2.0-flash-thinking-exp-01-21")

image_actual_path = r"C:\Users\Rounak\OneDrive\Documents\Japanese Study\Flashcards - Textbook Photos and Text Files\MNN Intermediate 1\Chatpter 4\01page_1.jpg" # User uploaded image for Streamlit

# Check if image paths exists
if not os.path.exists(image_actual_path):
    print("Image path does not exist")
    exit()

image_actual = PIL.Image.open(image_actual_path)

# Determining if the supplied image is suitable for flashcard generation
# Creating the content object for the suitability assessment (that will be passed to the Gemini API)
content_suitability = [
    suitability_system_prompt,
    image_actual,
    suitability_user_prompt,
]

# Retrieving the suitability assessment for the image from the API
try:
    response_suitability = model.generate_content(content_suitability)
    response_suitability.resolve() # This will raise an exception if there is an error in the response.

    # Extracting the suitability assessment from the API response and loading it as a JSON object
    json_string = response_suitability.text.replace("```json", "").replace("```", "") # Removing the code block markdown
    json_output = json.loads(json_string)

    is_suitable = json_output.get("is_suitable")
    reason = json_output.get("reason")

    if is_suitable != "Yes":
        print("Image is NOT suitable for flashcard generation.")
        print(f"Reason: {reason}")
except Exception as e:
    print(f"Error generating suitability assessment: {e}")
    if 'response_suitability' in locals(): # The API response is only printed if it exists
        print(f"API Response: {response_suitability.prompt_feedback}")

# Generating the flashcards if the image is marked as suitable for flashcard generation
if is_suitable == "Yes":
    # Loading base64 images from a JSON file into a PIL Image object
    base64_example_image_dict = load_base64_images_from_json()
    if base64_example_image_dict:
        image_example_1 = load_image_from_base64(base64_example_image_dict["flashcard_image_example_1"])
        image_example_2 = load_image_from_base64(base64_example_image_dict["flashcard_image_example_2"])

    # Extracting text from the image using the LLMWhisperer API
    # The LLMWhisperer API is a powerful OCR API that can be used to extract text from images.
    # Converting the image into a byte stream (IO[bytes] object) for use with the LLMWhisperer API
    image_bytes = BytesIO(open(image_actual_path, "rb").read())

    # Extract text from the image using LLMWhisperer
    client = LLMWhispererClientV2(base_url=unstract_api_url, api_key=unstract_api_key, logging_level="ERROR")  # Change the logging level to "ERROR" for production
    result = client.whisper(stream=image_bytes, wait_for_completion=True)
    image_actual_extracted_text = result["extraction"]["result_text"]

    # Creating the content object for the flashcard generation (that will be passed to the Gemini API)
    content_flashcards = [
        flashcard_system_prompt,

        image_example_1,
        flashcard_user_prompt_example_1,
        flashcard_answer_example_1,

        image_example_2,
        flashcard_user_prompt_example_2,
        flashcard_answer_example_2,

        image_actual,
        flashcard_user_prompt_actual.format(extracted_text = image_actual_extracted_text),
    ]

    # Retrieving the flashcards from the API
    try:
        response_flashcards = model.generate_content(content_flashcards)
        response_flashcards.resolve() # This will raise an exception if there is an error in the response.

        # Extracting the flashcards from the API response
        flashcards = response_flashcards.text.replace("```html", "").replace("```csv", "").replace("```", "") # Removing the code block markdown
        print(flashcards)
    except Exception as e:
        print(f"Error generating flashcards: {e}")
        if 'response_flashcards' in locals(): # The API response is only printed if it exists
            print(f"API Response: {response_flashcards.prompt_feedback}")
    
# Saving the flashcards to a .txt file
with open("generated_flashcards.txt", "w", encoding="utf-8") as f:
    f.write(flashcards)

"検査する","けんさする","examine, inspect"
"明日","あす","tomorrow"
"能力","のうりょく","capability"
"バザー","バザー","bazaar"
"マスク","マスク","nose and mouth mask"
"スーツケース","スーツケース","suitcase"
"目が覚める","めがさめる","wake up, realize"
"朝礼","ちょうれい","morning meeting"
"校歌","こうか","school song"
"敬語","けいご","honorific language"
"感想文","かんそうぶん","review (e.g., of a book one has read)"
"運動場","うんどうじょう","sports ground"
"いたずら","いたずら","prank"
"美しい","うつくしい","beautiful"
"世紀","せいき","century"
"平和 [な]","へいわ [な]","peaceful"
"人々","ひとびと","people"
"願う","ねがう","desire, wish for, hope for"
"文","ぶん","sentence, style"
"書き換える","かきかえる","rewrite"
"合わせる","あわせる","combine"
"もともと","もともと","originally"
"若者","わかもの","young person"
"~湖","~こ","lake~"
"深い","ふかい","deep"
"さまざま [な]","さまざま [な]","various"
"苦しい [生活が～]","くるしい [せいかつが～]","hard"
"性格","せいかく","character"
"人気者","にんきもの","popular person"
