Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demo website down #18

Closed
sjscotti opened this issue May 22, 2022 · 7 comments
Closed

Demo website down #18

sjscotti opened this issue May 22, 2022 · 7 comments

Comments

@sjscotti
Copy link

Hi!
I read your paper and viewed your video with interest, and I would like to explore using your code for my application - getting layout segmentation from ~100-year-old newspapers. So I downloaded the repo, but in trying to set up the Anaconda environment, I discovered that you are using a number of dependencies that are Linux specific and not available for Windows. If there are no versions available for Windows, I can set up Windows Subsystem for Linux (WSL) and use it that way. But I really would like to see how your code can handle some of examples of images of newspaper pages before I go to the trouble of setting WSL up. So I went to your demo website - https://enherit.paris.inria.fr/ to see if I could use it for this evaluation - but it is down. Could you please establish a new demo website so I can evaluate your repo?
Thanks!

@monniert
Copy link
Owner

Hi @sjscotti, thanks for raising the issue!

Yes our demo website is down and we don't have the resources to make it work again yet... Nonetheless, I can try to run a couple of extractions for you when I have time; you can send me 10 images in jpg or png format by email at tom.monnier@enpc.fr and and will forward you the raw results.

Thanks, Tom

@sjscotti
Copy link
Author

Thanks Tom!
Could you work with .jp2 files (jpeg2000)? My images are in this format. If not, I'll find a way to convert them to jpeg or png.
Regards
-Steve

@monniert
Copy link
Owner

Yes I think it can work, otherwise I will convert them

Tom

@sjscotti
Copy link
Author

sjscotti commented May 24, 2022 via email

@sjscotti
Copy link
Author

Hi Tom
To get around the issue of setting up a Linux capability on my Windows machine, I got the idea of using your demo.ipynb notebook on Google colab. I got it to run and tried out your code on one of the images I emailed you. Beyond the images being .jp2 format, I found that I needed to convert them from grayscale to RGB for them to run correctly. So I converted them in GIMP and exported the image to .png format to do a test. I did get results but they were not very good because my images have a long dimension of 6720 pixels which is scaled down to 1280 in the code (5.25x smaller!). When I cut a section of the image out that was about 1280x1280, and ran it through the code, it detected lines of text nicely. So I was encouraged by that. Is there a way that your code can easily be modified to handle much larger images without downscaling them?

@monniert
Copy link
Owner

Hi Steve, oh yes indeed nice workaround doing it through Colab! This is indeed an issue, the neural network has been trained on images of size 1280 (roughly) so it cannot handle other magnitudes of size. Depending on your application you either want to downscale it globally (this is what is done in the current pipeline and works in most cases), or apply the extraction on crops of your original images (this would typically be the case if your HD image has a lot of small and compact contents). To do so, it is quite easy, you can preprocess your data into overlapping crops and gather them in a folder, apply the extraction on the resulting crops, and merge the results. The last step would require a little work but I think it can easily be done.

As you succeeded in running the extraction, I suppose that doing the extraction on my side would not provide you additional insights. Let me know if you need more help!

@monniert
Copy link
Owner

@sjscotti Since you seem to have figured out a solution, I am closing the issue for now; let me know if you need more help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants