Skip to content

import code in images (OCR)#837

Merged
hatemhosny merged 10 commits intodevelopfrom
import-image
Jun 5, 2025
Merged

import code in images (OCR)#837
hatemhosny merged 10 commits intodevelopfrom
import-image

Conversation

@hatemhosny
Copy link
Copy Markdown
Collaborator

@hatemhosny hatemhosny commented May 29, 2025

What type of PR is this? (check all applicable)

  • ✨ Feature
  • 🐛 Bug Fix
  • 📝 Documentation Update
  • 🎨 Style
  • ♻️ Code Refactor
  • 🔥 Performance Improvements
  • ✅ Test
  • 🤖 Build
  • 🔁 CI
  • 📦 Chore (Release)
  • ⏩ Revert
  • 🌐 Internationalization / Translation

Description

This PR allows importing code in images (local or via URL).

Tesseract.js is used for client-side OCR.

Language detection is performed using highlight.js

Best results are obtained when the image is generated using LiveCodes "Code to Image" feature, with a share URL added to the image.

Added tests?

  • 👍 yes
  • 🙅 no, because they aren't needed
  • 🙋 no, because I need help

Added to documentations?

  • 📓 docs (./docs)
  • 📕 storybook (./storybook)
  • 📜 README.md
  • 🙅 no documentation needed

Demo

https://import-image.livecodes.pages.dev/?screen=import

code_to_image - 2025-05-30T021000 450

I would appreciate opinion, suggestion and code review.
@BassemHalim @Seth0x41 @zyf722

@netlify
Copy link
Copy Markdown

netlify Bot commented May 29, 2025

Deploy Preview for livecodes ready!

Name Link
🔨 Latest commit b192195
🔍 Latest deploy log https://app.netlify.com/projects/livecodes/deploys/683c690da957710008ead34b
😎 Deploy Preview https://deploy-preview-837--livecodes.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 29, 2025

Deploying livecodes with  Cloudflare Pages  Cloudflare Pages

Latest commit: b192195
Status: ✅  Deploy successful!
Preview URL: https://0f0a5dbf.livecodes.pages.dev
Branch Preview URL: https://import-image.livecodes.pages.dev

View logs

@BassemHalim
Copy link
Copy Markdown
Contributor

Hi Dr @hatemhosny

That's a cool feature!
So it works great when there are no line numbers in the image but when you have line numbers the numbers are copied and all text is left aligned as seen below

test url: https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fwww.amincharoliya.com%2F_next%2Fimage%3Furl%3D%252Fimages%252Farticles%252Fjs-class.png%26w%3D3840%26q%3D75&f=1&nofb=1&ipt=e43e570eab9f3b1625e7053470800c5456ac9a52b5fa7fa3f8fc7c7d1cc162b5

image

I think removing numbers from the start of each line and reformatting would be enough

@hatemhosny
Copy link
Copy Markdown
Collaborator Author

Thank you, @BassemHalim
That's a nice suggestion 👍
I will try that isA.

@hatemhosny
Copy link
Copy Markdown
Collaborator Author

hatemhosny commented May 30, 2025

@BassemHalim

I think removing numbers from the start of each line and reformatting would be enough

Line numbers are now removed.

image

However, formatting would be difficult!
Prettier and most formatters will not format if there are any syntax errors.
So any error with OCR will prevent formatting.

Biome is a formatter that is compatible with Prettier, but would format blocks with no errors and ignore those with errors. But it is still lagging behind in language support.
Maybe we can plan to move to Biome later.
I think for now, we should just let the user fix the errors and then trigger the format manually.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 30, 2025

Size Change: +12 kB (+1.19%)

Total Size: 1.02 MB

Filename Size Change
./build/livecodes/app.js 110 kB +924 B (+0.84%)
./build/livecodes/import-src.js 18.3 kB +829 B (+4.76%) 🔍
./build/livecodes/import.js 16.9 kB +10.1 kB (+146.92%) 🆘
ℹ️ View Unchanged
Filename Size Change
./build/404.html 1 kB 0 B
./build/app.html 250 B 0 B
./build/index.html 2.47 kB +1 B (+0.04%)
./build/livecodes/app.css 22.5 kB +44 B (+0.2%)
./build/livecodes/assets.js 8.61 kB +25 B (+0.29%)
./build/livecodes/assets/noop.js 18 B 0 B
./build/livecodes/assets/templates/diagrams-starter.html 2.19 kB 0 B
./build/livecodes/backup.js 3.69 kB +3 B (+0.08%)
./build/livecodes/blockly.js 15.6 kB -6 B (-0.04%)
./build/livecodes/broadcast.js 1.18 kB -1 B (-0.08%)
./build/livecodes/bundle-types.js 4.35 kB 0 B
./build/livecodes/code-to-image.js 9.09 kB +120 B (+1.34%)
./build/livecodes/codejar.js 17.5 kB 0 B
./build/livecodes/codemirror.js 6.29 kB -1 B (-0.02%)
./build/livecodes/compile.page.js 2.36 kB -12 B (-0.51%)
./build/livecodes/compile.worker.js 14.1 kB +16 B (+0.11%)
./build/livecodes/compiler-utils.js 3.14 kB +3 B (+0.1%)
./build/livecodes/custom-editor-utils.js 198 B 0 B
./build/livecodes/deploy.js 6.86 kB +1 B (+0.01%)
./build/livecodes/editor-settings.js 17.5 kB -5 B (-0.03%)
./build/livecodes/embed-ui.js 5.51 kB -9 B (-0.16%)
./build/livecodes/embed.js 89 kB +94 B (+0.11%)
./build/livecodes/export.js 3.87 kB +1 B (+0.03%)
./build/livecodes/firebase.js 22.6 kB 0 B
./build/livecodes/format.worker.js 13.3 kB -2 B (-0.02%)
./build/livecodes/google-fonts.js 7.12 kB 0 B
./build/livecodes/headless.js 77.7 kB +35 B (+0.05%)
./build/livecodes/i18n-ar-language-info.json 5.05 kB 0 B
./build/livecodes/i18n-ar-translation.json 9.29 kB 0 B
./build/livecodes/i18n-de-language-info.json 5.08 kB 0 B
./build/livecodes/i18n-de-translation.json 9.41 kB 0 B
./build/livecodes/i18n-en-language-info.json 4.5 kB 0 B
./build/livecodes/i18n-en-translation.json 8.02 kB +21 B (+0.26%)
./build/livecodes/i18n-es-language-info.json 4.81 kB 0 B
./build/livecodes/i18n-es-translation.json 9.15 kB 0 B
./build/livecodes/i18n-fr-language-info.json 5.01 kB 0 B
./build/livecodes/i18n-fr-translation.json 9.39 kB 0 B
./build/livecodes/i18n-hi-language-info.json 5.48 kB 0 B
./build/livecodes/i18n-hi-translation.json 9.92 kB 0 B
./build/livecodes/i18n-it-language-info.json 4.86 kB 0 B
./build/livecodes/i18n-it-translation.json 9.22 kB 0 B
./build/livecodes/i18n-ja-language-info.json 5.32 kB 0 B
./build/livecodes/i18n-ja-translation.json 9.57 kB 0 B
./build/livecodes/i18n-pt-language-info.json 4.85 kB 0 B
./build/livecodes/i18n-pt-translation.json 9.33 kB 0 B
./build/livecodes/i18n-ru-language-info.json 5.4 kB 0 B
./build/livecodes/i18n-ru-translation.json 10.3 kB 0 B
./build/livecodes/i18n-ur-language-info.json 5.54 kB 0 B
./build/livecodes/i18n-ur-translation.json 9.75 kB 0 B
./build/livecodes/i18n-zh-CN-language-info.json 4.75 kB 0 B
./build/livecodes/i18n-zh-CN-translation.json 8.61 kB 0 B
./build/livecodes/i18n.js 20.2 kB +25 B (+0.12%)
./build/livecodes/index.js 5.33 kB +4 B (+0.08%)
./build/livecodes/lang-art-template-compiler.js 1.63 kB 0 B
./build/livecodes/lang-assemblyscript-compiler.js 290 B 0 B
./build/livecodes/lang-assemblyscript-script.js 386 B 0 B
./build/livecodes/lang-astro-compiler.js 2.32 kB +1 B (+0.04%)
./build/livecodes/lang-clio-compiler.js 1.53 kB 0 B
./build/livecodes/lang-commonlisp-script.js 123 B 0 B
./build/livecodes/lang-cpp-script.js 1.73 kB 0 B
./build/livecodes/lang-cpp-wasm-script.js 2.82 kB 0 B
./build/livecodes/lang-csharp-wasm-script.js 2.16 kB 0 B
./build/livecodes/lang-diagrams-compiler-esm.js 5.08 kB 0 B
./build/livecodes/lang-dot-compiler.js 1.64 kB 0 B
./build/livecodes/lang-ejs-compiler.js 1.61 kB 0 B
./build/livecodes/lang-eta-compiler.js 1.63 kB 0 B
./build/livecodes/lang-fennel-compiler.js 1.59 kB 0 B
./build/livecodes/lang-gleam-compiler.js 11.4 kB -12 B (-0.11%)
./build/livecodes/lang-haml-compiler.js 1.62 kB 0 B
./build/livecodes/lang-handlebars-compiler.js 1.9 kB 0 B
./build/livecodes/lang-imba-compiler.js 147 B 0 B
./build/livecodes/lang-java-script.js 4.03 kB 0 B
./build/livecodes/lang-jinja-compiler.js 1.63 kB 0 B
./build/livecodes/lang-julia-script.js 3.27 kB 0 B
./build/livecodes/lang-liquid-compiler.js 1.65 kB 0 B
./build/livecodes/lang-lua-wasm-script.js 205 B 0 B
./build/livecodes/lang-malina-compiler.js 13.5 kB -28 B (-0.21%)
./build/livecodes/lang-mustache-compiler.js 1.63 kB 0 B
./build/livecodes/lang-nunjucks-compiler.js 1.91 kB 0 B
./build/livecodes/lang-perl-script.js 268 B 0 B
./build/livecodes/lang-php-wasm-script.js 347 B 0 B
./build/livecodes/lang-postgresql-compiler-esm.js 1.71 kB 0 B
./build/livecodes/lang-prolog-script.js 204 B 0 B
./build/livecodes/lang-pug-compiler.js 371 B 0 B
./build/livecodes/lang-python-wasm-script.js 1.9 kB 0 B
./build/livecodes/lang-r-script-esm.js 2.41 kB 0 B
./build/livecodes/lang-rescript-compiler-esm.js 2.13 kB 0 B
./build/livecodes/lang-rescript-formatter.js 1.49 kB 0 B
./build/livecodes/lang-riot-compiler.js 13.4 kB -19 B (-0.14%)
./build/livecodes/lang-ruby-wasm-script.js 1.68 kB +1 B (+0.06%)
./build/livecodes/lang-scss-compiler.js 1.69 kB 0 B
./build/livecodes/lang-solid-compiler.js 263 B 0 B
./build/livecodes/lang-sql-compiler.js 1.62 kB 0 B
./build/livecodes/lang-sql-script.js 1.93 kB 0 B
./build/livecodes/lang-svelte-compiler.js 15 kB +8 B (+0.05%)
./build/livecodes/lang-tcl-script.js 1.8 kB 0 B
./build/livecodes/lang-teal-compiler.js 1.69 kB 0 B
./build/livecodes/lang-twig-compiler.js 1.62 kB 0 B
./build/livecodes/lang-vento-compiler.js 1.65 kB 0 B
./build/livecodes/lang-vue-compiler.js 16.4 kB +20 B (+0.12%)
./build/livecodes/lang-vue2-compiler.js 14 kB -20 B (-0.14%)
./build/livecodes/lang-wat-compiler.js 348 B 0 B
./build/livecodes/lang-wat-script.js 1.55 kB 0 B
./build/livecodes/language-info.js 7.65 kB -4 B (-0.05%)
./build/livecodes/monaco-lang-astro.js 947 B 0 B
./build/livecodes/monaco-lang-clio.js 639 B 0 B
./build/livecodes/monaco-lang-imba.js 7.35 kB 0 B
./build/livecodes/monaco-lang-wat.js 2.46 kB 0 B
./build/livecodes/monaco.js 10.1 kB 0 B
./build/livecodes/open.js 6.17 kB -6 B (-0.1%)
./build/livecodes/processor-lightningcss-compiler.js 1.85 kB 0 B
./build/livecodes/processor-postcss-compiler.js 10.4 kB -50 B (-0.48%)
./build/livecodes/processor-tailwindcss-compiler.js 11.3 kB -19 B (-0.17%)
./build/livecodes/processor-unocss-compiler.js 355 B 0 B
./build/livecodes/processor-windicss-compiler.js 450 B 0 B
./build/livecodes/quill.js 5.72 kB 0 B
./build/livecodes/quill.css 697 B 0 B
./build/livecodes/resources.js 3.43 kB +3 B (+0.09%)
./build/livecodes/result-utils.js 1.17 kB 0 B
./build/livecodes/share.js 3.78 kB 0 B
./build/livecodes/snippets.js 16.8 kB -33 B (-0.2%)
./build/livecodes/sync-ui.js 3.23 kB +1 B (+0.03%)
./build/livecodes/sync.js 3.52 kB -1 B (-0.03%)
./build/livecodes/sync.worker.js 29.7 kB 0 B
./build/livecodes/templates.js 24.6 kB 0 B
./build/sdk/livecodes.js 3.91 kB 0 B
./build/sdk/livecodes.umd.js 3.98 kB 0 B
./build/sdk/package.json 293 B 0 B
./build/sdk/react.js 4.22 kB 0 B
./build/sdk/vue.js 4.3 kB 0 B

compressed-size-action

@BassemHalim
Copy link
Copy Markdown
Contributor

Hi @hatemhosny

I tested it some more and now it works with most of images with line numbers but sometimes it still leaves the line number not sure why. I will double check the regex.

I tested with this photo and it didn't remove the numbers.
image

However, formatting would be difficult!
Prettier and most formatters will not format if there are any syntax errors.
So any error with OCR will prevent formatting.

That makes sense, I agree we can leave it as is and let the user fix it. We can't expect perfect results from OCR.

@hatemhosny
Copy link
Copy Markdown
Collaborator Author

hatemhosny commented May 30, 2025

Thank you, @BassemHalim

I tested with this photo and it didn't remove the numbers.

To check for line numbers, I look for numbers followed by a space.
I used to check that most lines (> 70%) contain these, so that we do not accidentally remove numbers in code.

The problem was that OCR inserted empty lines, every other line. So that check failed.
I decreased the threshold to 30%. See f79b013

Please have a look now.

@hatemhosny
Copy link
Copy Markdown
Collaborator Author

./build/livecodes/import.js | 16.7 kB | +9.81 kB (+143.41%) | 🆘

The size check shows significant increase in the import chunk bundle size.

This is related to this issue: #840
I will just keep this as it is for now and then fix it with that planned refactor.

@zyf722
Copy link
Copy Markdown
Contributor

zyf722 commented May 31, 2025

Hi @hatemhosny,

Thanks for the opportunity to review this PR - looks like another great feature!

I have a few comments and questions:

  1. When a user first tries to import from code, a .traineddata file is downloaded but during which there's no visual feedback in the UI. It would be nice to display a progress bar or some other indication that something is in progress in the background.

  2. It seems that the detection quality for images with light theme is not as good as for dark theme. I tested this with the same code snippet rendered using both light and dark themes and had different results. For example:

  1. I noticed the current implementation relies on detecting a LiveCodes share URL pattern in the last lines of the OCR'd text to extract the project ID.

    Since PNG images are being generated, would it be feasible to embed the project ID directly into the PNG metadata (e.g., using a custom text chunk)? This ID could then be read back during import. This approach could potentially make the import process more robust by removing the dependency on OCR accuracy for the URL pattern. Not sure about the implementation effort, but it might be an interesting alternative to consider.

const lastLines = lines.slice(-2).join('\n');

// detect images created by LiveCodes "Code to Image" with share URL
const shareUrlPattern = /\?x=(id\/\S{11,20})/g;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make the detection more accurate, would it be beneficial to check for a base URL or similar prefix here?

Copy link
Copy Markdown
Collaborator Author

@hatemhosny hatemhosny May 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure.
I want to support:

  • hosted app: livecodes.io
  • permanent URLs: v46.livecodes.io
  • preview URLs: import-image.livecodes.pages.dev
  • self-hosted instances: live-codes.github.io/livecodes (or any self-hosted URL)
  • localhost:8080
  • an app should be able to import images generated by another apps

Do you have a better suggestion?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, this should be a lot less relevant after using png meta tags.

@hatemhosny
Copy link
Copy Markdown
Collaborator Author

hey @zyf722
Great suggestions as usual!
Thank you.

When a user first tries to import from code, a .traineddata file is downloaded but during which there's no visual feedback in the UI. It would be nice to display a progress bar or some other indication that something is in progress in the background.

You are right. I added a loading notification.

It seems that the detection quality for images with light theme is not as good as for dark theme. I tested this with the same code snippet rendered using both light and dark themes and had different results.

This is the OCR quality of tesseract.js. I'm not sure we can improve that.
This is the best OCR library I found that works client-side.

Since PNG images are being generated, would it be feasible to embed the project ID directly into the PNG metadata (e.g., using a custom text chunk)? This ID could then be read back during import. This approach could potentially make the import process more robust by removing the dependency on OCR accuracy for the URL pattern. Not sure about the implementation effort, but it might be an interesting alternative to consider.

That's such a nice suggestion. I found meta-png which is a tiny library that reads and writes PNG meta data.
I implemented that and it had the following benefits:

  • significant performance improvement.
  • much more accurate.
  • works even when share URL is not added to the image.

I looked for something similar for JPG. I only found exiftool-vendored which is a much larger library and I'm not even sure I'll be able to get it work in the browser.
So, I'll just do that for PNG (which is the default format).

So you can now generate images using the "Code-to-Image" tool and then try importing them.

Thanks again.

P.S. Importing from CodePen now works again (only pens of PRO accounts):
https://import-image.livecodes.pages.dev/?x=https://codepen.io/jkantner/pen/KKYZrRv

@zyf722
Copy link
Copy Markdown
Contributor

zyf722 commented Jun 1, 2025

Hi @hatemhosny,

This is the OCR quality of tesseract.js. I'm not sure we can improve that.
This is the best OCR library I found that works client-side.

Got it. In that case, leaving it as is and letting users manually fix the text after importing sounds like a good plan for now.

So you can now generate images using the "Code-to-Image" tool and then try importing them.

Just tried it, and it is much better! Appreciate the fix.

So, I'll just do that for PNG (which is the default format).

Ah, I forgot there were three target formats for code-to-image. Then perhaps we can do the same for SVGs using the <metadata> tag, as browsers support it natively and no additional libraries are required?

Also, what do you think about adding a clickable tooltip for PNG (and maybe SVG, if we go that route) to explain this embedding benefit to users?

Thanks!

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Jun 1, 2025

@hatemhosny
Copy link
Copy Markdown
Collaborator Author

hatemhosny commented Jun 1, 2025

Ah, I forgot there were three target formats for code-to-image. Then perhaps we can do the same for SVGs using the <metadata> tag, as browsers support it natively and no additional libraries are required?

I don't think we should deal with SVG as images in this sense.
If a user tries to import SVG, I expect he wants to edit the code.
This is how vs code and other editors do. I have personally edited SVG code in LiveCodes many times before.

Also note, the text can be easily copied from SVG images (in case the user wants to copy the code from the SVG screenshot)

I added a commit to handle importing SVG -> opens in html editor.
Also added a fallback: if an image has no text, it is added as an image tag in html (src is base64 data url)

@hatemhosny hatemhosny merged commit aef91c2 into develop Jun 5, 2025
21 checks passed
@livecodes-ci
Copy link
Copy Markdown
Contributor

livecodes-ci Bot commented Jun 5, 2025

i18n Actions

Source PR has been merged into the default branch.

Maintainers can comment .i18n-update-push to trigger the i18n update workflow and push the changes to Lokalise.

@hatemhosny
Copy link
Copy Markdown
Collaborator Author

.i18n-update-push

@livecodes-ci
Copy link
Copy Markdown
Contributor

livecodes-ci Bot commented Jun 5, 2025

i18n Actions: .i18n-update-push

Localization updated and pushed to Lokalise.

Name Description
New Branch for i18n i18n/live-codes/import-image
Last Commit SHA aef91c2

Maintainers can comment .i18n-update-pull after translation is done to trigger the i18n pull workflow and pull the changes back to Github.

@hatemhosny
Copy link
Copy Markdown
Collaborator Author

.i18n-update-pull

@livecodes-ci
Copy link
Copy Markdown
Contributor

livecodes-ci Bot commented Jun 6, 2025

i18n Actions: .i18n-update-pull

Localization pulled from Lokalise.

Name Description
i18n Branch i18n/live-codes/import-image
Last Commit SHA 60ae1cb
i18n PR #844

@livecodes-ci livecodes-ci Bot mentioned this pull request Jun 6, 2025
12 tasks
@hatemhosny hatemhosny deleted the import-image branch June 8, 2025 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants