# Place hard images in the benchmark folder

This notebook will run over only the images in benchmark/

Images in this folder are there because they are difficult or inconsistent.

If a good result has been obtained, they may be added to examples.txt

We can remove any such examples after finding a good solution

In [1]:
%matplotlib widget
import ctpy as ct
from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()

# Use the API key from .env OR set it as None
try:
    api_key = os.getenv('API_KEY')
except:
    api_key = None

gemini_key = os.getenv('VERTEX')

# name will determine the name used in raw_chapeters and final_chapters
# url is used only for scrape()
test = ct.issue(
    name='benchmark',
    url = 'benchmark', # pull images from benchmark folder
    api_key = api_key,
    gemini_key=gemini_key,
    )

# this will delete everything in raw_chapters/benchmark
# comment it out to save gpt responses upon restart
test.clear_directories()

# you can rerun this everytime
test.scrape()

'''
increasing scale_factor can help
BUT it makes it more expensive

Play around with it if you want to
'''
test.downsample(scale_factor=1.0)

! gcloud config set project comictranslator-407423
! gcloud auth login

Updated property [core/project].
Your browser has been opened to visit:

    https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=32555940559.apps.googleusercontent.com&redirect_uri=http%3A%2F%2Flocalhost%3A8085%2F&scope=openid+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fappengine.admin+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fsqlservice.login+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcompute+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Faccounts.reauth&state=zG9sSSLNUowPrcIsC2J3PBmmtqE1e4&access_type=offline&code_challenge=GUwo0a7djWn9rSUb1ZKLl8PwLouLBk3xMoZ27nbwqPY&code_challenge_method=S256


You are now logged in as [zwilson171@gmail.com].
Your current project is [comictranslator-407423].  You can change this setting by running:
  $ gcloud config set project PROJECT_ID


In [2]:
# pages can be combined, but we will leave them separate in the test

#test.combine_pages(1,2)

In [3]:
'''
We need to draw boxes around all text in images
I tried to automate it, but pytesseract did not perform well for this.

Use next to go to the next image

At the end things will lag because it is 
doing OKR Optical Korean Recognition on each image
'''
drawer = ct.BoxDrawer(test)
if False:
    drawer.draw()
    drawer.save(name='benchmark')
else:
    drawer.load(name='benchmark')

In [4]:
#test.add_boxes_to_images()

In [5]:
test.perform_ocr_on_all_images()

In [6]:
'''
This method combines the full color image with the image used for OKR

This also makes it more expensive.
This might not be needed, but I do think it helps a lot.
'''
test.tile_page_for_gpt()

In [7]:
test.resize_gpt(scale=2.0)

In [8]:
'''
pipe those images to the AI
prompt will be saved in raw_chapters/benchmark/prompts
response saved in raw_chapters/benchmark/response -and- /text

To re-do just one of the pages, delete the file in...

raw_chapters/benchmark/text/page_num_#.txt

'''
test.translate(model='gemini')

Gemini failure on  page_num_2.jpeg


In [9]:
bbbbbb

NameError: name 'bbbbbb' is not defined

In [None]:
# pdf is formatted automatically
# It can be weird if your images are too big or small
test.make_pdf()

# this test may report 'failed response'
# check the response, sometimes GPT just says no.


In [None]:
gemini_key = os.getenv('VERTEX')
! gcloud config set project comictranslator-407423
! gcloud auth login

In [None]:
import pathlib
import textwrap

import google.generativeai as genai
from vertexai.preview.generative_models import Part

from IPython.display import display
from IPython.display import Markdown

import PIL.Image as Image

def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

genai.configure(api_key=gemini_key)
# Open the image file for reading in binary mode


fname = 'page_num_3'
issuename = 'benchmark'

image = 'images'

image1 = Image.open(f'raw_chapters/{issuename}/{image}/{fname}.jpeg')

if False:
  with open(f'raw_chapters/{issuename}/text/{fname}.txt', 'r') as f:
    output=f.read()
  text = ''' I will provide the previous result of an analysis to transcribe and translate a comic I wrote.
    I will also provide the image itself. the base panel has been rotated 90 degrees to the left at the top.
    Blow the panel are zooms of all relevant text on the panel.

    You will make sure that no mistakes have been made when transcribing the korean text. You will correct any mistakes
    You will make sure the korean has been translated correctly, you will correct it if is incorrect
    You will verify the english sounds natural, and you will change it if it does not.

    Here is the previous result

    '''
  text+= output
elif True:
  with open(f'raw_chapters/{issuename}/prompt/{fname}.txt', 'r') as f:
    text=f.read()
else:
  text = '''
  Can you help me transcribe the korean text that appears on this comic page?
  After you transcribe it, can you translate it to english for me?

  Please respond with this format

  IMAGE_DESCRIPTION
  describe the image
  BEGIN_TEXT
  KOREAN_TEXT
  text here
  ENGLISH_TEXT
  text here
  END_TEXT

  Thank you for your help!
  '''
model = genai.GenerativeModel('gemini-pro-vision')
responses = model.generate_content(
  [image1, text],
  generation_config={
      "max_output_tokens": 2048,
      "temperature": 0.5,
      #"top_p": 1.0,
      #"top_k": 32,
  },
  )
  
to_markdown(responses.text)


In [None]:
with open('test.txt', 'w') as f:
    f.write(str(vars(responses)))

In [None]:
import json

responses.text

In [None]:
with open(f'raw_chapters/{issuename}/text/{fname}.txt', 'r') as f:
    prev=f.read()

to_markdown(prev)