Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transcribing Handwritten Text with Python and Microsoft Azure Computer Vision #511

Closed
giuliataurino opened this issue Oct 6, 2022 · 51 comments

Comments

@giuliataurino
Copy link
Contributor

giuliataurino commented Oct 6, 2022

The Programming Historian has received the following tutorial on 'Transcribe Handwritten Text with Python and Microsoft Azure Computer Vision' by @jeffblackadar. This lesson is now under review and can be read at:

http://programminghistorian.github.io/ph-submissions/en/drafts/originals/transcribing-handwritten-text-with-python-and-azure

Please feel free to use the line numbers provided on the preview if that helps with anchoring your comments, although you can structure your review as you see fit.

I will act as editor for the review process. My role is to solicit two reviews from the community and to manage the discussions, which should be held here on this forum. I have already read through the lesson and provided feedback, to which the author has responded.

Members of the wider community are also invited to offer constructive feedback which should post to this message thread, but they are asked to first read our Reviewer Guidelines (http://programminghistorian.org/reviewer-guidelines) and to adhere to our anti-harassment policy (below). We ask that all reviews stop after the second formal review has been submitted so that the author can focus on any revisions. I will make an announcement on this thread when that has occurred.

I will endeavor to keep the conversation open here on Github. If anyone feels the need to discuss anything privately, you are welcome to email me.

Our dedicated Ombudsperson is (Ian Milligan - http://programminghistorian.org/en/project-team). Please feel free to contact him at any time if you have concerns that you would like addressed by an impartial observer. Contacting the ombudsperson will have no impact on the outcome of any peer review.

Anti-Harassment Policy

This is a statement of the Programming Historian's principles and sets expectations for the tone and style of all correspondence between reviewers, authors, editors, and contributors to our public forums.

The Programming Historian is dedicated to providing an open scholarly environment that offers community participants the freedom to thoroughly scrutinize ideas, to ask questions, make suggestions, or to requests for clarification, but also provides a harassment-free space for all contributors to the project, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, age or religion, or technical experience. We do not tolerate harassment or ad hominem attacks of community participants in any form. Participants violating these rules may be expelled from the community at the discretion of the editorial board. Thank you for helping us to create a safe space.

@anisa-hawes
Copy link
Contributor

anisa-hawes commented Oct 7, 2022

Hello @giuliataurino and @jeffblackadar,

It is great to see this Issue opened, and the process of review beginning!

A reminder that we host the lesson Markdown file and any related assets (images or data) here on our ph-submissions repo.

I've uploaded the Markdown file to /en/drafts/originals/transcribing-handwritten-text-with-python-and-azure.md, where we encourage you to make direct edits going forwards (no need to use the PR system).

I've uploaded the lesson's images to /images/transcribing-handwritten-text-with-python-and-azure.

@jeffblackadar, you will note that I have removed two images which were in the /images folder on your repo but are not referenced in the lesson Markdown file.

You'll also note that I've adjusted the syntax to display images. We use liquid and we require this format (example):
{% include figure.html filename="file-name.png" alt="Visual description of figure image" caption="Caption text to display" %}. I've plotted in the minimum needed, and we can return to add alt-text during the review/revision process.

Jeff, I'd like to ask if you could take another look at the footnotes. I notice that [^3] appears twice, and overall the placement seems a little odd. We can work together to confirm these details if you have any questions.

@giuliataurino, you will notice that I've added in our YAML header, which is required to generate an online preview of the lesson. It is now ready for you to read: http://programminghistorian.github.io/ph-submissions/en/drafts/originals/transcribing-handwritten-text-with-python-and-azure. (I've updated the link you posted in your initial comment).

@jeffblackadar
Copy link
Collaborator

jeffblackadar commented Oct 7, 2022 via email

@anisa-hawes
Copy link
Contributor

Apologies, @jeffblackadar. I've just sent you an invitation to be a collaborator on ph-submissions. Please let me know if you've received this? It will enable to you make direct changes to your lesson.

In the meantime, I've merged in your changes.

@jeffblackadar
Copy link
Collaborator

jeffblackadar commented Oct 8, 2022 via email

@giuliataurino
Copy link
Contributor Author

Thank you Anisa for helping with the process.

And @jeffblackadar, thank you for the submission, I'll review it and get back to you in the following weeks.

Giulia

@giuliataurino
Copy link
Contributor Author

giuliataurino commented Dec 14, 2022

Hi @jeffblackadar,



Thank you for this tutorial, I found it very clear and useful. The lesson is overall straight forward and accessible to an entry-level audience, as it provides a detailed description of tools needed, technical dependencies, and commentary of the code. The walk-through for setting up Azure and transcribing handwriting from images was helpful for making the workflow smooth and easy. The code worked without errors on my end. There is no major revision I envisage, but I do have a few minor suggestions. You can read a more detailed feedback and comments below.

General suggestions:

Introduction

  • Section 1: The tutorial includes references and links to previously published lessons on PH English that help the reader deepen the subject and acquire contextual skills. It might be useful to mention other OCR tutorials used for similar tasks but on typewritten documents, as found here: https://programminghistorian.org/en/lessons/working-with-batches-of-pdf-files; https://programminghistorian.org/en/lessons/cleaning-ocrd-text-with-regular-expressions; https://programminghistorian.org/en/lessons/generating-an-ordered-data-set-from-an-OCR-text-file.

  • Section 5: While the objectives and possible challenges of existing OCR models for handwriting transcription are well presented in the introduction, I suggest to give a more explicit example of the cases in which the model is likely not going to perform well (e.g. under-documented languages; certain file formats; poor resolution or unclear handwriting).

  • Section 6: You might want to expand on the choice of focusing on Azure, as opposed to other softwares, for this lesson - maybe in relation to existing case studies or for usability reasons, or else in connection with technical accessibility (e.g. was it already used in other digital humanities projects? is the software well documented? does it need some amount of coding, hence the tutorial?). Additionally, you can add a brief section with a description of the goals of this tutorial and its usability for further applications.

Prerequisites

  • You will want to make sure that the prerequisite skills are correctly listed. In this respect, what are the platforms supported (e.g. Windows, SaaS/Web)? Is prior knowledge of Python required or recommended, given that the tutorial provides the full code? 


Procedure

  • Section 17-18: It would be useful to add a context and guidelines for the choice of this specific document, both in terms of content (e.g. language) and format (e.g. image quality, file, etc.).



Summary

  • Finally, the lesson might benefit from adding further commentary in the summary or conclusion on the performance of the model in this specific instance, possible applications in research, limits of this service and alternative tools available. A few questions that you can answers (although feel free to expand on other questions if you think they are more insightful) include: Can the model be re-trained on a different set of data? If the reader does have knowledge of Python, can this code be adapted to perform more advanced tasks (e.g. loop through multiple files)? Which part of the code - if any - is likely to return an error?

Specific comments:
(@anisa-hawes, feel free to step in if I’m missing some aspects regarding the editorial guidelines)

Thank you again for your time and work.

@anisa-hawes
Copy link
Contributor

Thank you, @giuliataurino.

I've corrected the link in the introductory paragraph – there was a rogue bracket causing a problem!

I've also made some interventions where I notice that numbering sequences are interrupted by a figure of a code block and then re-start from 1.. This problem is fixed in Markdown by adding a backslash after the number like this: 1\. 2\. but I'm not 100% sure I've caught all of them...@jeffblackadar I wonder if you could double check this?

I'd like to recommend reducing/removing the numbered sequences within the sections. Within such a short lesson, I think the sub-section numbering could become more confusing than clarifying. For example, in Section 4. Install Azure Computer Vision on your machine, there are two numbered sub-steps. I think these could simply be sentences:

Create a new cell in your notebook [...]

Create another new cell [...]

Let me know what you both think.

@jeffblackadar
Copy link
Collaborator

jeffblackadar commented Jan 8, 2023 via email

@anisa-hawes
Copy link
Contributor

Happy New Year, @jeffblackadar.

Thank you for your responses and updates. I'm still not 100% sure about the numbering. I am going to have another look, but my initial instinct is that the sub-sections numbered 1-6 within Images to Transcribe may be better as simple sub-titles. Sub-sub-section 3.B. Create a notebook looks a little odd in the table of contents.

I'm tagging @giuliataurino here + above to ensure a notification.

@jeffblackadar
Copy link
Collaborator

jeffblackadar commented Feb 11, 2023 via email

@anisa-hawes
Copy link
Contributor

Dear @jeffblackadar.

Thank you for your message. There's nothing further we need from you at this stage. @giuliataurino is coordinating the peer-review process and will be in touch to let you know who will be contributing.

@jeffblackadar
Copy link
Collaborator

jeffblackadar commented Mar 8, 2023 via email

@giuliataurino
Copy link
Contributor Author

Dear @jeffblackadar,

Apologies for the delay. I'm glad to announce that @mdermentzi will be reviewing your submission!
She will be posting her review as a comment to this GitHub issue in the following month or so.

Thank you for your patience as we move forward towards publication.

Best,

Giulia

@jeffblackadar
Copy link
Collaborator

jeffblackadar commented Apr 21, 2023 via email

@mdermentzi
Copy link

Hi @jeffblackadar,

Thank you for this very useful tutorial. You’ve made it accessible enough so that someone without prior experience with Python or handwritten text recognition can follow along and start transcribing handwritten documents with minimal effort. My view is that it will be valuable to historians as well as archivists who need to perform such work for research or cataloging purposes. Once it’s published, I will definitely start using it and recommend it to the historians and archivists with whom I collaborate.

I didn’t notice any serious issues with this tutorial. Most of my suggestions seek to simplify the structure or anticipate questions that beginners might have while following the steps. Mind you, I’m reviewing this lesson having just a week ago delivered a how-to workshop to historians with varying tech skills using Google Colab; seeing what they struggled with, my comments are aimed at ensuring that even complete beginners will find this tutorial as easy as possible. I didn’t focus on copy editing or compliance with the author guidelines, leaving these for the editors to check.

Overall, my main suggestion would be to make it clear early on that you recommend users follow this tutorial using Google Colab and prioritise this type of users throughout your instructions. This will make the prerequisites section more straightforward and the tutorial easier to follow. If you choose to do this, adding more screenshots of the Colab environment is also important.

Here’s my detailed feedback:

At the beginning of the tutorial, the reader might benefit from a short and concise Learning Objectives or learning goals section similar to other Programming Historian tutorials.

Par 1, final sentence – add OCR abbreviation in this paragraph and from here onward use OCR

Paragraphs 2, 3 & 4: It is quite possible that I might have missed something but, having tried to find out what model is powering the Azure computer vision service showcased in this tutorial, my understanding is that Microsoft does not clarify what model architectures (or what datasets as the tutorial rightly points out) they have used to train the models powering their APIs. For this reason, a focus on CNNs might be a bit redundant; it is perhaps giving the idea that this is what is powering the Azure service, which may or may not be true. For example (and this is not my area of expertise–I only did a quick search for this so I could be wrong), in recent years, transformers have also been used for OCR. My suggestion, therefore, would be to remove direct mentions of CNNs, as they might additionally be alienating to beginners.

If, however, the purpose of referring to the CNNs is to provide more context about the progress of this field, my suggestion would be to add a disclaimer clarifying that we don’t know how the Azure service, which is showcased later in the tutorial, works. Additionally, if this part is kept, I would suggest expanding on it a bit more, citing even key papers that led to relevant breakthroughs. From my experience working with historians in Europe, many of them have tried such tools before (granted, without having trained custom models) and are sceptical about their success, so recent advances in AI might encourage them to give handwriting recognition another go.

More detailed suggestions per paragraph:

Par 2:
I’d recommend starting this paragraph with “Digitally transcribing [...]”
It’d be helpful to remove the parentheses and better integrate the PH references in one or more sentences starting with something like “Previous programming historian tutorials that have demonstrated typed text recognition include: ”
Consider adding one more reference to the latest PH OCR lesson that uses the Google Vision API (https://programminghistorian.org/en/lessons/ocr-with-google-vision-and-tesseract) either here or in another paragraph.
And then you could continue the paragraph by adding the first sentence but with small changes, such as:
Recent advances in artificial intelligence offer the ability for historians to automatically transcribe handwritten documents [...]
In the bit where it says “within limits of types of letters used, language and legibility.”, the expression “types of letters” might read better and be more inclusive if changed to “writing systems”
Final sentence: Remove mention of CNN and add another disclaimer here to make it clear that this is only true for certain writing systems and languages so that readers won’t get disappointed if they get bad results when trying this with images including texts written in lower-resource languages and writing systems.

I would cut paragraph 3 and keep paragraph 4 but remove the CNN bit towards the end. To make up for removing these parts, you could add another sentence somewhere explaining that these models are only as good as the data on which they were trained and advising historians to keep in mind that their results will reflect their training data, with all the biases stemming from how and by whom the training dataset was put together.

Par 4:
“as long as these documents are recognizable to the service. ” – expand on what recognizable means in this context. For example, recognizable in terms of the writing system used, language, file type, etc.
final two sentences, fix “is not” to are not.
Might be best to cut from “I assume [...] property”.
Final sentence, it’d be interesting to know what this assumption is based on. Is it based on personal experience using some of these services or is it based on how similar models of which we know the details work?

Par 6:
In this paragraph, I would strongly recommend adding what languages are currently supported.

Par 7: It would be interesting to read what scripts and languages you’ve tried it with.

Prerequisites section:
First requirement: I’d suggest changing to something like “Knowledge of Python is not required since all of the code is provided in the tutorial. That said, basic Python knowledge would be useful for users who wish to understand the code or to tweak it for their purposes.”

Second requirement: I’d suggest changing to “Google Colab, a web-based virtual Python programming platform, was used to write this lesson. If you choose to use Google Colab to program Python (recommended), a Google account is required. If you choose to run the code in this tutorial locally on your own machine, Python and pipneed to be installed.”
Also, it would be good to check if there is a specific version of Python required (I think it's 3+) and, if so, add this to the text. Perhaps add another footnote here to point to the python-sdk quickstart guide found later within the text.

Fourth requirement: Change to “Credit card or debit card” so that those with no access to credit are not discouraged.

Consider whether you want to address users who are already familiar with Google Colab or not. If familiarity with Google Colab is not listed in this section, there could be more screenshots and explanations about how to create new cells and run cells in Google Colab after paragraph 39. My recommendation would be to simply add more screenshots and instructions, because this tutorial could be easily followed by beginners as long as they’re not getting confused by simple steps that might, instead, come intuitively to more experienced users.

Procedure section
The procedure section (par 8) followed by the separate Images to transcribe section (par 9) is making the structure of this tutorial slightly confusing. I would suggest either flipping the two sections or making steps 5 and 6 of the procedure section subsections of a parent section called “Transcribe handwriting”, which would start with the “Images to transcribe section” subsection.

So perhaps a better structure would be:
Contents
(Learning Objectives)
Introduction
Prerequisites
Procedure

  • Register for a Microsoft account.

  • Create a “Computer Vision” Resource in Azure to perform transcription.

  • Store a secret Key and Endpoint to access Computer Vision from your machine.

  • Install Azure Computer Vision on your machine.

  • Transcribe handwriting

    • Image requirements
    • Transcribe handwriting in an image found online.
    • Transcribe handwriting in an image stored on your machine.

Summary
Bibliography
Footnotes

Par 9: In the Images to transcribe section, I would start by saying that “Microsoft’s Azure Cognitive Services require that images used [...]”.

Par 10-16: Create a “Computer Vision” Resource in Azure to perform transcription.
When following this process, I didn’t get a “start with an Azure free trial” message. Instead, I got a “Checking on your subscription” message and then Azure asked me to upgrade my account. Apparently, I was not eligible for an Azure free account, and so I had to sign up for Azure with the pay-as-you-go pricing. This didn’t imply that I actually had to pay anything, but it felt unclear and intimidating. Therefore, it might be useful to update the text of the tutorial so as to include this as a potential scenario for those who don’t see the Azure free trial prompt and clarify that they won’t actually get charged, because there are free quotas available in the pay-as-you-go subscription (unless they have already spent them).

Par 20:
Instead of Azure subscription 1, there was a second option “Free trial”, which is the one that I selected. I can see that there have been many months since this tutorial was first submitted, so it might be worth going over the process once again to check if these instructions are up to date. The rest of the instructions including Par 22 were correct. (Pricing tier to Free F0 etc)

Par 28-29: Here, beginners would benefit from more information on what an endpoint and keys are. Consider adding a footnote to offer some context.

Par 30: This paragraph might be confusing to users who’ve been following the tutorial using Colab and have not created any folders. Also, perhaps it’d be more straightforward to start the paragraph with the sentence that is currently last in this paragraph and make the distinction of what users need to do depending on whether they’re using Github or not. In any case, make it clear that these keys are not meant to be shared with anyone under any circumstances. Also, consider integrating this paragraph later in the text into par 34, where users are asked to copy KEY 1.

Par 36 Make Colab link clickable. Par ends with duplicate closing parentheses.

Par 38 onward: I’d recommend prepending “Colab” before every instance of the word “notebook” in the remainder of this text to avoid confusion and make it clear that the instructions are tailored mainly to Colab users.
Par 39 Keep in mind that users might lack familiarity with Google Colab. Statements that might be intuitive to some, such as “Create a new cell” or “run a cell”, might not be as obvious to the uninitiated. Within the body of the text in this instruction, specify that, after copying the code, readers must also change the currently existing endpoint in the code to their own endpoint that they’ve previously copied from the Azure environment and make sure that it will be enclosed in quotation marks. Perhaps a screenshot from the Google Colab environment would be helpful here.

Par 40: Explain how one might run the cell. Also, specify that after running this cell they will get prompted for their secret computer vision key (KEY 1), which they need to paste inside the input box, and that they’re expected to hit Enter.

Par 41 Perhaps add another screenshot here. Also, it might be helpful to explain what they should do if they get an error. Should they rerun the cell? If so, add it to the text or to the error message in the code.

Par 42 I would say prioritize users running this on Google Colab as the preferred way to follow this tutorial and consider removing “on your machine” from the title to avoid any confusion that readers might actually need to install something on their devices. Flip the order of the two final sentences. Consider adding a footnote on what a session is (although not important). Also, for users who run this locally, consider flagging that if the pip install line is not run through a notebook but rather on the command line, then they should remove the exclamation mark. The previous comment about how to create a new cell is also applicable here.

Par 43 The previous comment about how to create a new cell is also applicable here.

Par 44 Is this a public domain image and is it OK copyright-wise for others to use it while following the tutorial? If so, it’d be a good idea to mention it here so that readers know it’s safe to use it. Perhaps coordinate with the PH team to save it under their domain to ensure greater chances of sustainability for this tutorial and don’t forget to update the links. Also, what happens to the images that are getting processed by the Azure Computer Vision API? In certain cases, researchers might not be permitted to transfer their data to third parties. Therefore, it might be a good idea to add a disclaimer here or a link detailing how Azure is processing data sent to them through this kind of APIs.

Par 46 In this paragraph, I would suggest adding one more sentence to explicitly say that if readers want to try this method with their other images stored online they should replace the existing link after the comment “# Get an image with text. Set the URL of the image to transcribe.” with the link to the image that they’ve found online (and are permitted to use) in quotes. Alternatively, the same note could be added at the end of par 48.

Par 48: Consider expanding on what “Call Azure using computervision_client with the URL.” means. Beginners might not be familiar with API calls. Consider adding a screenshot of the result and commenting on it. This will not only help users know what to expect but will also give them a sense of how accurate this method can be.

Par 49 Same note as above regarding permission to use the image.

Par 50 Consider adding a screenshot to show how one might do this in Colab. Sometimes the vertical bar on the left can easily go unnoticed.

Par 54 Add a note that Colab users need not change this.

Par 54-56 There seems to be something wrong with markdown here. Make sure the post appears as intended.

At this point, as a more experienced user, I would be interested to know whether there are any parameters that I can tweak when making the API calls (such as the language that I’m interested in transcribing) to get more accurate results. Consider adding a link that will point more advanced users to further documentation.

Finally, I’d recommend this lesson to be aimed at beginners (provided that they reproduce it using the Google Colab route).

This tutorial is an important and enjoyable read; congratulations, and thanks a lot to the editors for giving me the chance to review it.

Kind regards,
Maria

@giuliataurino

PS: Feel free to let me know if you have any questions or need any clarifications when it comes to my feedback.

@giuliataurino
Copy link
Contributor Author

Thank you for your review @mdermentzi!

@jeffblackadar, as I am still looking for a second reviewer I was wondering if you were able to take a look at this first review. Let me know if you have any questions!

Best,

Giulia

@jeffblackadar
Copy link
Collaborator

jeffblackadar commented Jun 29, 2023 via email

@mkane968
Copy link
Collaborator

Hi @jeffblackadar and @giuliataurino,

Thank you for the opportunity to read this tutorial as a second reviewer! As someone with limited familiarity with handwriting transcription, I found the directions easy to follow and the code simple and efficient to use. I think it will become a valuable resource for historians. Like @mdermentzi, most of my comments are related to structure, or in anticipation of questions a beginner-level audience might have about Microsoft Azure and Google Colab.

P1-Your introduction clearly sets up the need for digital handwriting transcription. Though the historians you’re speaking to might not require much convincing, I’d love to see a tangible example of a handwritten document which would be beneficial to transcribe. I’d also recommend cutting off that paragraph at the second-to-last sentence and shifting your focus to digitization in the next, given that you spend some time coming back to this in p2. Along those lines, I’d echo @mdermentzi's comments to hone the focus of the first 2-3 paragraphs with your endpoint of working with Microsoft Azure in mind. I am also unfamiliar with the software, but from a quick search I found these descriptions on Microsoft’s website related to how their Computer Vision works:

“The image is then sent to an interpreting device. The interpreting device uses pattern recognition to break the image down, compare the patterns in the image against its library of known patterns, and determine if any of the content in the image is a match.”

“With deep learning, a computer vision application runs on a type of algorithm called a neural network, which allows it deliver even more accurate analyses of images."

(Source: https://azure.microsoft.com/en-us/resources/cloud-computing-dictionary/what-is-computer-vision/)

To me, this seems like a cross between the traditional OCR approaches you are describing in p2 and the CNNs in p3–though the term “interpreting device” is pretty nebulous! That said, you might do well to position your discussion in p2-3 as a general one, related to the various approaches of OCR, perhaps emphasizing how machine learning methods generally perform better than and/or enhance dictionary-based methods given the complexities of handwriting analysis (if that’s indeed the case). I’d recommend clarifying the relationship between CNNs and OCR too (CNNs are used for OCR, right? They’re not two separate technologies?) In any case, your contextual discussion could feed into a note at the start of p4 that acknowledges how commercial services use a combination of these methods, however transparent (or not) they make their approaches.

In paragraphs 4-6, you acknowledge a lot of important limitations to handwriting transcription. Speaking again as someone outside this field, it did seem like the emphasis was more on the limits than the benefits of this type of analysis. Perhaps this focus on critique is just realistic, and/or perhaps the benefits are implied, but could you expand any further on why, specifically, Microsoft Azure is a viable option for this type of work? I personally don’t have a frame of reference for Google/AWS accuracy and would be curious to hear more about it–what are the benchmarks beginners should look for when evaluating this type of service, and how does Azure measure up? This could be something you integrate later in the tutorial below–see my comments about discussing the sample image output.

P6-7-You introduce the tool as “Microsoft Azure Cognitive Services,” and I think a little more information about what the platform is could be helpful. Having mostly worked with Abbyy before, I was envisioning a desktop app from Microsoft, but obviously that’s not what it is. More context about Azure (and “Computer Vision” as a resource within it) might also make the last line of p7 clearer–you’re saying there’s documentation around it, but not around the coding aspect of it? And/or are there other coding platforms that its use has been documented for, but you’re contributing with a Python tutorial? Leaning into a focus on Colab users, I think, would help here, as you’d be positioning this tutorial as a simple-to-use pipeline that doesn’t require any purchases or local software downloads.

P7/Prerequisites–access to an internet connection seems to be implied, given the other prerequisites listed. Clarify that you don’t need to install Python on your machine if you’re using Google Colab, and point toward Colab tutorials here. You might also want to give context for the telephone number, like the credit/debit card, since it’s somewhat unusual. You might also want to clarify in parenthesis that “Though there is a free tier of service for Microsoft, you are required to put a credit card on file.”

In Procedures (P8), perhaps clarify what you mean by “install Azure on your machine.” This too made me think Azure was a desktop platform like Abbyy, but it’s actually something to be installed in a coding environment, and doesn’t even need to be installed locally if you’re using Colab. Same with “access Computer Vision” from your machine”. Especially since you are using “stored on your machine” to reference a locally stored file in step 6, just tweak the use of these terms above.

I second @mdermentzi's comments to nest the images to transcribe within your procedures. This section could be more readable/skimmable if you structured it in a bulleted list, like as follows:

Image Requirements:

  • Acceptable Formats: JPEG, PNG, GIF, BMP
  • Min Size: 50 x 50 px (how many GB/MB?)
  • Max size: 4 MB (how many px?)

I’m not sure you need all the sentences about conversion (as you acknowledge, it’s outside your scope and would assumedly be an implied step) and you could still put the caveats about experimentation below the list.

P9-It might be smoother to have a line before you start the numbered directions saying, “If you already have a personal Microsoft account, skip this section.” Along those lines, perhaps clarify here that you need a PERSONAL account, rather than a school/organizational one. As noted below, I ran into trouble trying to use a school account for this because I could not change my access to the feature and input a credit card.

P9–it might be more straightforward to direct users to the general Microsoft login in page (https://account.microsoft.com/account/Account) to register for an account, especially since the first step of step 2 is to again go to portal.azure.com.

P10 to 16-When I tried to sign in with my personal Microsoft account, the steps you outlined worked perfectly. However, when I tried with my school account, I got an error message stating the feature was disabled through my school’s subscription. Not sure how common of an issue this would be, but perhaps put a disclaimer to make sure to use a personal account, rather than a school/organization account for this process.

P22–The plus signs breaking up your sentences make it a little choppy to read, so perhaps just remove pulse sides and say “Select a region, name the instance")

P24–Before I clicked review, I saw that I also had the option to set the network, identity and tags for project. I understand these are set by default, but maybe talk through the options/why they’re important/why you can just leave them alone?

P28–Clarify what you mean by “access this service through the computer” here as above.

P30–I would also like more clarity on the function of the key and the endpoint. Why does it give you 2 keys, and why do you only need one of them? Why is it important to keep the key a secret?

P30–I’m not quite sure what you mean by “check your code into a repository.” Is this just another way to say upload? And what do you mean by “avoid checking the file”--just erasing the key before you upload it to the repository?

P35–List this as a regular paragraph, rather than the 4th step in this process–I nearly regenerated my key and endpoint after copying/pasting/saving them and would have had to do the whole process again.

P36-I’d recommend making 3B a completely separate step (4) since we are really switching topics to working in a Python environment now

P36–Say “Create a Google Colab (or Python) notebook in p 36 header and beyond

P39–Consider giving a little more context about what an “environment variable” is, why to import os package, and what “basic validation” means.

P40–Clarify that you have to run the cell and then copy/paste your key into it before the output is generated.

P41–Why is it important to delete the text of your key?

P42–Yes, again clarify that you’re installing it in a Python environment, and that it’s not a local install when using Google Colab.

P43–Perhaps say “the code below” instead of “this code”; I thought you were referring to the code above, as there’s a bulleted list between the instruction and the code. You might give some more info on what libraries and authentication processes are for people not familiar with them.

P44–If possible, I’d be interested in learning more about the sample image, the challenges of transcribing it manually, and why you think computer vision would be valuable–in a sense, a tie-in to the demands you address at the beginning of this tutorial. If you’re bringing the image paragraph down here, you could also note why (or why not) this image is an ideal one to work with

P46–”Create” rather than “open” a new cell.

P46–It was helpful to see the bulleted list of outcomes in P43, before you ran that code cell. Here you tell users to run it, and then go back and describe each line below, but I’m wondering if reversing those things would be more instructive.

P48–Clarify that you have to change the URL to the image you are transcribing

P48–Could you include screenshots or code for the last two steps (read the results line by line and print the text if successful)? I think it would be particularly helpful to see what you mean by “the coordinates of a rectangle” and why that’s valuable information to have.

P48–Not sure how complicated this would be to add, but I do think an additional step where you discuss how to export the results would be useful. Even if it’s just a note to copy/paste the text into a txt file, or if there’s a more readable format you could generate that also stores the lines separate from the coordinates (for readability).

P48–It might also be helpful to put the output and your file side-by-side here, to see how the output compares to the text, and reflect briefly on its accuracy and value. Especially as you discussed critiques of the process in the beginning, it could be helpful to model your own process for discerning what is useful and what is limiting about this tool.

P50–Consider providing more detailed guidance (and screenshots) for new Colab users who may not know how to upload file to a directory.

P55–I think this is supposed to be code, next line is supposed to be text, then code again (just adjust formatting of blocks)

P56–Same as above, share code/screenshots for last two bullet points and consider a brief model of evaluating output.

P57– I was left wondering how to go from your code to the next steps you described (processing multiple images, storing transcribed text in a file/database). I know this is a beginner tutorial, and I’m not sure how complicated it would be to add any of these steps, but even saving output in a file seems like a valuable addition. Additionally, it might be interesting to share a different sample image when you walk through the process of transcribing a local image, perhaps a map or a spreadsheet like you are describing above, so you have another type of file to show output of. Just a suggestion, and it shows that your tutorial is piquing my interest about the capabilities of this tool.

P58–What do you mean by “customize the training” and why isn’t it possible at this time? Also, this is purely subjective, but consider ending on an even stronger note! Your tutorial makes me intrigued about the possibilities of this type of analysis and I think you could say more on this here–for example, you could circle back to specific use cases or reflect on how exactly it could continue to grow (if this is something you’re excited about).

Overall, I really enjoyed reading this tutorial and learned a lot about the possibilities of digital handwriting transcription. Thanks to the editors for the chance to read it! If you have any questions about my comments or need any clarifications, don’t hesitate to reach out.

Best,

Megan

@jeffblackadar
Copy link
Collaborator

jeffblackadar commented Aug 15, 2023

Answers to questions/replies to feedback below from Jeff:

This lesson is now under review and can be read at:
https://github.com/jeffblackadar/mre/blob/main/docs/en/lessons/transcribe_handwriting_2.md

Thank you very much @mdermentzi and @mkane968 for this thoughtful and detailed feedback. I have made some replies below.
-Jeff

Feedback from Maria

Hi @jeffblackadar,
Thank you for this very useful tutorial. You’ve made it accessible enough so that someone without prior experience with Python or handwritten text recognition can follow along and start transcribing handwritten documents with minimal effort. My view is that it will be valuable to historians as well as archivists who need to perform such work for research or cataloging purposes. Once it’s published, I will definitely start using it and recommend it to the historians and archivists with whom I collaborate.
I didn’t notice any serious issues with this tutorial. Most of my suggestions seek to simplify the structure or anticipate questions that beginners might have while following the steps. Mind you, I’m reviewing this lesson having just a week ago delivered a how-to workshop to historians with varying tech skills using Google Colab; seeing what they struggled with, my comments are aimed at ensuring that even complete beginners will find this tutorial as easy as possible. I didn’t focus on copy editing or compliance with the author guidelines, leaving these for the editors to check.

Overall, my main suggestion would be to make it clear early on that you recommend users follow this tutorial using Google Colab and prioritise this type of users throughout your instructions. This will make the prerequisites section more straightforward and the tutorial easier to follow. If you choose to do this, adding more screenshots of the Colab environment is also important.

JB: Great feedback.

Here’s my detailed feedback:

At the beginning of the tutorial, the reader might benefit from a short and concise Learning Objectives or learning goals section similar to other Programming Historian tutorials.

JB: I have added a short lesson objective.

Par 1, final sentence – add OCR abbreviation in this paragraph and from here onward use OCR

JB: Added (OCR), OCR is now used onward.

Paragraphs 2, 3 & 4: It is quite possible that I might have missed something but, having tried to find out what model is powering the Azure computer vision service showcased in this tutorial, my understanding is that Microsoft does not clarify what model architectures (or what datasets as the tutorial rightly points out) they have used to train the models powering their APIs. For this reason, a focus on CNNs might be a bit redundant; it is perhaps giving the idea that this is what is powering the Azure service, which may or may not be true. For example (and this is not my area of expertise–I only did a quick search for this so I could be wrong), in recent years, transformers have also been used for OCR. My suggestion, therefore, would be to remove direct mentions of CNNs, as they might additionally be alienating to beginners.

JB: That’s a great point.

If, however, the purpose of referring to the CNNs is to provide more context about the progress of this field, my suggestion would be to add a disclaimer clarifying that we don’t know how the Azure service, which is showcased later in the tutorial, works. Additionally, if this part is kept, I would suggest expanding on it a bit more, citing even key papers that led to relevant breakthroughs. From my experience working with historians in Europe, many of them have tried such tools before (granted, without having trained custom models) and are skeptical about their success, so recent advances in AI might encourage them to give handwriting recognition another go.

More detailed suggestions per paragraph:

Par 2:
I’d recommend starting this paragraph with “Digitally transcribing [...]”
It’d be helpful to remove the parentheses and better integrate the PH references in one or more sentences starting with something like “Previous programming historian tutorials that have demonstrated typed text recognition include: ”

Consider adding one more reference to the latest PH OCR lesson that uses the Google Vision API (https://programminghistorian.org/en/lessons/ocr-with-google-vision-and-tesseract) either here or in another paragraph.

And then you could continue the paragraph by adding the first sentence but with small changes, such as:
Recent advances in artificial intelligence offer the ability for historians to automatically transcribe handwritten documents [...]
In the bit where it says “within limits of types of letters used, language and legibility.”, the expression “types of letters” might read better and be more inclusive if changed to “writing systems”

JB: Done

Final sentence: Remove mention of CNN and add another disclaimer here to make it clear that this is only true for certain writing systems and languages so that readers won’t get disappointed if they get bad results when trying this with images including texts written in lower-resource languages and writing systems.

I would cut paragraph 3 and keep paragraph 4 but remove the CNN bit towards the end. To make up for removing these parts, you could add another sentence somewhere explaining that these models are only as good as the data on which they were trained and advising historians to keep in mind that their results will reflect their training data, with all the biases stemming from how and by whom the training dataset was put together.

JB: I had some earlier feedback to include this for context. This was to differentiate this approach from OCR and also provide a rationale for using this versus training a model from scratch.

Par 4:

“as long as these documents are recognizable to the service. ” – expand on what recognizable means in this context. For example, recognizable in terms of the writing system used, language, file type, etc.
final two sentences, fix “is not” to are not.
Might be best to cut from “I assume [...] property”.
Final sentence, it’d be interesting to know what this assumption is based on. Is it based on personal experience using some of these services or is it based on how similar models of which we know the details work?

JB: It's based on personal experience. I haven't seen details of how these models are trained, but have read a general description. I've used the models from the three companies and made comparisons, but don't have statistics. I've run a bunch of different documents through the services and seen where they have worked well or failed. Some notes are here.

Par 6:
In this paragraph, I would strongly recommend adding what languages are currently supported.

Par 7: It would be interesting to read what scripts and languages you’ve tried it with.

JB Answer: For Microsoft, Google and AWS I’ve only used English and French. With Google I’ve used it for Persian and Arabic – but I did not want to distract the reader by talking about Google’s service.

** JB: I have added links to the supported languages**

Prerequisites section:

First requirement: I’d suggest changing to something like “Knowledge of Python is not required since all of the code is provided in the tutorial. That said, basic Python knowledge would be useful for users who wish to understand the code or to tweak it for their purposes.”

JB: Done

Second requirement: I’d suggest changing to “Google Colab, a web-based virtual Python programming platform, was used to write this lesson. If you choose to use Google Colab to program Python (recommended), a Google account is required. If you choose to run the code in this tutorial locally on your own machine, Python and pipneed to be installed.”
Also, it would be good to check if there is a specific version of Python required (I think it's 3+) and, if so, add this to the text. Perhaps add another footnote here to point to the python-sdk quickstart guide found later within the text.
Fourth requirement: Change to “Credit card or debit card” so that those with no access to credit are not discouraged.
Consider whether you want to address users who are already familiar with Google Colab or not. If familiarity with Google Colab is not listed in this section, there could be more screenshots and explanations about how to create new cells and run cells in Google Colab after paragraph 39. My recommendation would be to simply add more screenshots and instructions, because this tutorial could be easily followed by beginners as long as they’re not getting confused by simple steps that might, instead, come intuitively to more experienced users.

Procedure section

The procedure section (par 8) followed by the separate Images to transcribe section (par 9) is making the structure of this tutorial slightly confusing. I would suggest either flipping the two sections or making steps 5 and 6 of the procedure section subsections of a parent section called “Transcribe handwriting”, which would start with the “Images to transcribe section” subsection.

So perhaps a better structure would be:

  1. Contents
  2. (Learning Objectives)
  3. Introduction
  4. Prerequisites
  5. Procedure
    1. Register for a Microsoft account.
    2. Create a “Computer Vision” Resource in Azure to perform transcription.
    3. Store a secret Key and Endpoint to access Computer Vision from your machine.
    4. Install Azure Computer Vision on your machine.
    5. Transcribe handwriting
      1. Image requirements
      2. Transcribe handwriting in an image found online.
      3. Transcribe handwriting in an image stored on your machine.
  6. Summary
  7. Bibliography
  8. Footnotes

JB: Thanks for this, done.

Par 9: In the Images to transcribe section, I would start by saying that “Microsoft’s Azure Cognitive Services require that images used [...]”.

JB: Done.

Par 10-16: Create a “Computer Vision” Resource in Azure to perform transcription.
When following this process, I didn’t get a “start with an Azure free trial” message. Instead, I got a “Checking on your subscription” message and then Azure asked me to upgrade my account. Apparently, I was not eligible for an Azure free account, and so I had to sign up for Azure with the pay-as-you-go pricing. This didn’t imply that I actually had to pay anything, but it felt unclear and intimidating. Therefore, it might be useful to update the text of the tutorial so as to include this as a potential scenario for those who don’t see the Azure free trial prompt and clarify that they won’t actually get charged, because there are free quotas available in the pay-as-you-go subscription (unless they have already spent them).

Par 20:
Instead of Azure subscription 1, there was a second option “Free trial”, which is the one that I selected. I can see that there have been many months since this tutorial was first submitted, so it might be worth going over the process once again to check if these instructions are up to date. The rest of the instructions including Par 22 were correct. (Pricing tier to Free F0 etc)

JB: Edits made.

Par 28-29: Here, beginners would benefit from more information on what an endpoint and keys are. Consider adding a footnote to offer some context.

JB: Done.

Par 30: This paragraph might be confusing to users who’ve been following the tutorial using Colab and have not created any folders. Also, perhaps it’d be more straightforward to start the paragraph with the sentence that is currently last in this paragraph and make the distinction of what users need to do depending on whether they’re using Github or not. In any case, make it clear that these keys are not meant to be shared with anyone under any circumstances. Also, consider integrating this paragraph later in the text into par 34, where users are asked to copy KEY 1.

**JB: I agree, I have edited this **

Par 36 Make Colab link clickable. Par ends with duplicate closing parentheses.

JB:Done

Par 38 onward: I’d recommend prepending “Colab” before every instance of the word “notebook” in the remainder of this text to avoid confusion and make it clear that the instructions are tailored mainly to Colab users.
Par 39 Keep in mind that users might lack familiarity with Google Colab. Statements that might be intuitive to some, such as “Create a new cell” or “run a cell”, might not be as obvious to the uninitiated. Within the body of the text in this instruction, specify that, after copying the code, readers must also change the currently existing endpoint in the code to their own endpoint that they’ve previously copied from the Azure environment and make sure that it will be enclosed in quotation marks. Perhaps a screenshot from the Google Colab environment would be helpful here.

JB:Done

Par 40: Explain how one might run the cell. Also, specify that after running this cell they will get prompted for their secret computer vision key (KEY 1), which they need to paste inside the input box, and that they’re expected to hit Enter.

JB:Done

Par 41 Perhaps add another screenshot here. Also, it might be helpful to explain what they should do if they get an error. Should they rerun the cell? If so, add it to the text or to the error message in the code.

JB:Done

Par 42 I would say prioritize users running this on Google Colab as the preferred way to follow this tutorial and consider removing “on your machine” from the title to avoid any confusion that readers might actually need to install something on their devices. Flip the order of the two final sentences. Consider adding a footnote on what a session is (although not important). Also, for users who run this locally, consider flagging that if the pip install line is not run through a notebook but rather on the command line, then they should remove the exclamation mark. The previous comment about how to create a new cell is also applicable here.

Par 43 The previous comment about how to create a new cell is also applicable here.
JB: should be fixed now.

Par 44 Is this a public domain image and is it OK copyright-wise for others to use it while following the tutorial?

JB Answer: I took the photograph - I will ask to save it with PH.

If so, it’d be a good idea to mention it here so that readers know it’s safe to use it. Perhaps coordinate with the PH team to save it under their domain to ensure greater chances of sustainability for this tutorial and don’t forget to update the links. Also, what happens to the images that are getting processed by the Azure Computer Vision API? In certain cases, researchers might not be permitted to transfer their data to third parties. Therefore, it might be a good idea to add a disclaimer here or a link detailing how Azure is processing data sent to them through this kind of APIs.

JB: I've added a note under Image requirements

Par 46 In this paragraph, I would suggest adding one more sentence to explicitly say that if readers want to try this method with their other images stored online they should replace the existing link after the comment “# Get an image with text. Set the URL of the image to transcribe.” with the link to the image that they’ve found online (and are permitted to use) in quotes. Alternatively, the same note could be added at the end of par 48.

JB: Added a note

Par 48: Consider expanding on what “Call Azure using computervision_client with the URL.” means. Beginners might not be familiar with API calls. Consider adding a screenshot of the result and commenting on it. This will not only help users know what to expect but will also give them a sense of how accurate this method can be.
Par 49 Same note as above regarding permission to use the image.
JB: Added a note

Par 50 Consider adding a screenshot to show how one might do this in Colab. Sometimes the vertical bar on the left can easily go unnoticed.
Par 54 Add a note that Colab users need not change this.
Par 54-56 There seems to be something wrong with markdown here. Make sure the post appears as intended.
** JB:Fixed**

At this point, as a more experienced user, I would be interested to know whether there are any parameters that I can tweak when making the API calls (such as the language that I’m interested in transcribing) to get more accurate results. Consider adding a link that will point more advanced users to further documentation.

Finally, I’d recommend this lesson to be aimed at beginners (provided that they reproduce it using the Google Colab route).
This tutorial is an important and enjoyable read; congratulations, and thanks a lot to the editors for giving me the chance to review it.
Kind regards,
Maria
@giuliataurino
PS: Feel free to let me know if you have any questions or need any clarifications when it comes to my feedback.

Thanks for this Maria!
Jeff

Feedback from Megan https://github.com/mkane968

Hi @jeffblackadar and @giuliataurino,
Thank you for the opportunity to read this tutorial as a second reviewer! As someone with limited familiarity with handwriting transcription, I found the directions easy to follow and the code simple and efficient to use. I think it will become a valuable resource for historians. Like @mdermentzi, most of my comments are related to structure, or in anticipation of questions a beginner-level audience might have about Microsoft Azure and Google Colab.

P1-Your introduction clearly sets up the need for digital handwriting transcription. Though the historians you’re speaking to might not require much convincing, I’d love to see a tangible example of a handwritten document which would be beneficial to transcribe.

JB: Added Sources such as diaries, letters, logbooks and reports

I’d also recommend cutting off that paragraph at the second-to-last sentence and shifting your focus to digitization in the next, given that you spend some time coming back to this in p2. Along those lines, I’d echo @mdermentzi's comments to hone the focus of the first 2-3 paragraphs with your endpoint of working with Microsoft Azure in mind. I am also unfamiliar with the software, but from a quick search I found these descriptions on Microsoft’s website related to how their Computer Vision works:
“The image is then sent to an interpreting device. The interpreting device uses pattern recognition to break the image down, compare the patterns in the image against its library of known patterns, and determine if any of the content in the image is a match.”
“With deep learning, a computer vision application runs on a type of algorithm called a neural network, which allows it deliver even more accurate analyses of images."

(Source: https://azure.microsoft.com/en-us/resources/cloud-computing-dictionary/what-is-computer-vision/)

To me, this seems like a cross between the traditional OCR approaches you are describing in p2 and the CNNs in p3–though the term “interpreting device” is pretty nebulous! That said, you might do well to position your discussion in p2-3 as a general one, related to the various approaches of OCR, perhaps emphasizing how machine learning methods generally perform better than and/or enhance dictionary-based methods given the complexities of handwriting analysis (if that’s indeed the case). I’d recommend clarifying the relationship between CNNs and OCR too (CNNs are used for OCR, right? They’re not two separate technologies?) In any case, your contextual discussion could feed into a note at the start of p4 that acknowledges how commercial services use a combination of these methods, however transparent (or not) they make their approaches.

JB: Q: CNNs are used for OCR, right? They’re not two separate technologies?

JB Answer: OCR is based on different technology that predates CNNs and Transformers. Traditional OCR is about recognizing printed letters based on patterns humans have coded versus a CNN where a computer has learned itself to generally recognize a pattern present in images it is trained on.

JB: Thanks for the feedback above, I have changed the language to use the terms deep learning and computer. This may make this more approachable. I wanted to set context that this is different than OCR and that’s why it’s worth considering as a tool.

In paragraphs 4-6, you acknowledge a lot of important limitations to handwriting transcription. Speaking again as someone outside this field, it did seem like the emphasis was more on the limits than the benefits of this type of analysis. Perhaps this focus on critique is just realistic, and/or perhaps the benefits are implied, but could you expand any further on why, specifically, Microsoft Azure is a viable option for this type of work? I personally don’t have a frame of reference for Google/AWS accuracy and would be curious to hear more about it–what are the benchmarks beginners should look for when evaluating this type of service, and how does Azure measure up? This could be something you integrate later in the tutorial below–see my comments about discussing the sample image output.

JB: I was being too realistic. I want to convey that it works well for handwritten document but not on everything (to avoid not meeting high expectations). Unfortunately, I don’t have product comparison benchmarks but looked. I saw one other website that did a comparison, and I did some work on that as well, but without statistics.

P6-7-You introduce the tool as “Microsoft Azure Cognitive Services,” and I think a little more information about what the platform is could be helpful. Having mostly worked with Abbyy before, I was envisioning a desktop app from Microsoft, but obviously that’s not what it is. More context about Azure (and “Computer Vision” as a resource within it) might also make the last line of p7 clearer–you’re saying there’s documentation around it, but not around the coding aspect of it?

JB Answer: There is a tutorial from Microsoft about how to write a Python program to access this. The PH tutorial is meant to be a more accessible tutorial.

And/or are there other coding platforms that its use has been documented for, but you’re contributing with a Python tutorial?

JB Answer: Other languages can be used, I’ve selected Python here since it’s fairly popular on PH.

Leaning into a focus on Colab users, I think, would help here, as you’d be positioning this tutorial as a simple-to-use pipeline that doesn’t require any purchases or local software downloads.

P7/Prerequisites–access to an internet connection seems to be implied, given the other prerequisites listed. Clarify that you don’t need to install Python on your machine if you’re using Google Colab, and point toward Colab tutorials here. You might also want to give context for the telephone number, like the credit/debit card, since it’s somewhat unusual. You might also want to clarify in parenthesis that “Though there is a free tier of service for Microsoft, you are required to put a credit card on file.”

JB: done

In Procedures (P8), perhaps clarify what you mean by “install Azure on your machine.” This too made me think Azure was a desktop platform like Abbyy, but it’s actually something to be installed in a coding environment, and doesn’t even need to be installed locally if you’re using Colab. Same with “access Computer Vision” from your machine”. Especially since you are using “stored on your machine” to reference a locally stored file in step 6, just tweak the use of these terms above.
I second @mdermentzi's comments to nest the images to transcribe within your procedures. This section could be more readable/skimmable if you structured it in a bulleted list, like as follows:
Image Requirements:
• Acceptable Formats: JPEG, PNG, GIF, BMP
• Min Size: 50 x 50 px (how many GB/MB?)
• Max size: 4 MB (how many px?)

JB: I've made most changes above. With different image formats, specifying the file size of a 50x50 image or the number of pixels of a 4mb image is variable. I’ll stay away from this rather than be wrong most of the time.

I’m not sure you need all the sentences about conversion (as you acknowledge, it’s outside your scope and would assumedly be an implied step) and you could still put the caveats about experimentation below the list.

JB: Will do.

P9-It might be smoother to have a line before you start the numbered directions saying, “If you already have a personal Microsoft account, skip this section.” Along those lines, perhaps clarify here that you need a PERSONAL account, rather than a school/organizational one. As noted below, I ran into trouble trying to use a school account for this because I could not change my access to the feature and input a credit card.

P9–it might be more straightforward to direct users to the general Microsoft login in page (https://account.microsoft.com/account/Account) to register for an account, especially since the first step of step 2 is to again go to portal.azure.com.

JB: I got different behavior when I tested this. I made a new account with the above link, but http://portal.azure.com/ didn't know about it after. I'm going to stick with the original link since that works with testing

P10 to 16-When I tried to sign in with my personal Microsoft account, the steps you outlined worked perfectly. However, when I tried with my school account, I got an error message stating the feature was disabled through my school’s subscription. Not sure how common of an issue this would be, but perhaps put a disclaimer to make sure to use a personal account, rather than a school/organization account for this process.

JB: I’ve made a note about this

P22–The plus signs breaking up your sentences make it a little choppy to read, so perhaps just remove pulse sides and say “Select a region, name the instance")

JB: done

P24–Before I clicked review, I saw that I also had the option to set the network, identity and tags for project. I understand these are set by default, but maybe talk through the options/why they’re important/why you can just leave them alone?

JB I added: The "Identity" and "Tags" tabs can be left with default values. They are relevant only if you are using this in combination with other Microsoft Azure services.

P28–Clarify what you mean by “access this service through the computer” here as above.

** JB: changed to "your python environment." **

P30–I would also like more clarity on the function of the key and the endpoint. Why does it give you 2 keys, and why do you only need one of them? Why is it important to keep the key a secret?

P30–I’m not quite sure what you mean by “check your code into a repository.” Is this just another way to say upload? And what do you mean by “avoid checking the file”--just erasing the key before you upload it to the repository?

**JB: I've made mistakes where I had keys in my code then send checked the code into GitHub. Then my keys were there for the world to see and use. I want to be careful to have people avoid that. **

P35–List this as a regular paragraph, rather than the 4th step in this process–I nearly regenerated my key and endpoint after copying/pasting/saving them and would have had to do the whole process again.

JB: Thank you

P36-I’d recommend making 3B a completely separate step (4) since we are really switching topics to working in a Python environment now

JB:Done

P36–Say “Create a Google Colab (or Python) notebook in p 36 header and beyond
P39–Consider giving a little more context about what an “environment variable” is, why to import os package, and what “basic validation” means.

** JB: Noted the use of os and environment variables. I did not want to go too deep on that though. **

P40–Clarify that you have to run the cell and then copy/paste your key into it before the output is generated.

** JB: Done**

P41–Why is it important to delete the text of your key?

JB: It's just anther way to prevent someone else from copying it from a print out

P42–Yes, again clarify that you’re installing it in a Python environment, and that it’s not a local install when using Google Colab.
P43–Perhaps say “the code below” instead of “this code”; I thought you were referring to the code above, as there’s a bulleted list between the instruction and the code. You might give some more info on what libraries and authentication processes are for people not familiar with them.

** JB: Changing to the code below**

P44–If possible, I’d be interested in learning more about the sample image, the challenges of transcribing it manually, and why you think computer vision would be valuable–in a sense, a tie-in to the demands you address at the beginning of this tutorial. If you’re bringing the image paragraph down here, you could also note why (or why not) this image is an ideal one to work with

P46–”Create” rather than “open” a new cell.

** JB: Done**

P46–It was helpful to see the bulleted list of outcomes in P43, before you ran that code cell. Here you tell users to run it, and then go back and describe each line below, but I’m wondering if reversing those things would be more instructive.

** JB: I've changed this**

P48–Clarify that you have to change the URL to the image you are transcribing

** JB: Done**

P48–Could you include screenshots or code for the last two steps (read the results line by line and print the text if successful)? I think it would be particularly helpful to see what you mean by “the coordinates of a rectangle” and why that’s valuable information to have.

JB: I've added the sample output

P48–Not sure how complicated this would be to add, but I do think an additional step where you discuss how to export the results would be useful. Even if it’s just a note to copy/paste the text into a txt file, or if there’s a more readable format you could generate that also stores the lines separate from the coordinates (for readability).

JB: I've added two more steps to export the data 6.iv and 6.v

P48–It might also be helpful to put the output and your file side-by-side here, to see how the output compares to the text, and reflect briefly on its accuracy and value. Especially as you discussed critiques of the process in the beginning, it could be helpful to model your own process for discerning what is useful and what is limiting about this tool.

** JB: That's a good idea. **

P50–Consider providing more detailed guidance (and screenshots) for new Colab users who may not know how to upload file to a directory.

** JB: Added a screenshot**

P55–I think this is supposed to be code, next line is supposed to be text, then code again (just adjust formatting of blocks)

** JB: Fixed **

P56–Same as above, share code/screenshots for last two bullet points and consider a brief model of evaluating output.

** JB: Fixed **

P57– I was left wondering how to go from your code to the next steps you described (processing multiple images, storing transcribed text in a file/database).

** JB: I've added an export to a file - I hope this fits **

I know this is a beginner tutorial, and I’m not sure how complicated it would be to add any of these steps, but even saving output in a file seems like a valuable addition. Additionally, it might be interesting to share a different sample image when you walk through the process of transcribing a local image, perhaps a map or a spreadsheet like you are describing above, so you have another type of file to show output of. Just a suggestion, and it shows that your tutorial is piquing my interest about the capabilities of this tool.

** JB: Giving this thought for getting a usable, authorized image**

P58–What do you mean by “customize the training” and why isn’t it possible at this time?

** JB: Answer: I was meaning I can't customize the training of the handwriting styles, great call, I've edited this**

Also, this is purely subjective, but consider ending on an even stronger note! Your tutorial makes me intrigued about the possibilities of this type of analysis and I think you could say more on this here–for example, you could circle back to specific use cases or reflect on how exactly it could continue to grow (if this is something you’re excited about).

** JB: I will look to edit this a bit more, it is pretty amazing. **

Overall, I really enjoyed reading this tutorial and learned a lot about the possibilities of digital handwriting transcription. Thanks to the editors for the chance to read it! If you have any questions about my comments or need any clarifications, don’t hesitate to reach out.
Best,
Megan

--
Thanks for this Megan!
Jeff

@jeffblackadar
Copy link
Collaborator

Hi, @giuliataurino, @mdermentzi and @mkane968, - Thank you for the feedback! I have made revisions and I will look for remaining errors. I believe I have addressed the items in the feedback. I have reorganized the sections, added a bit more about Google Colab, edited the introduction to remove CNN, added the ability to save results to a file and added a link to more documentation. Thanks again for your reviews, they have made this a better organized and more complete tutorial.

A question I have is, is there a place I should put the page images I've made for people to download and is there something I need to do to designate them as open access?

Thanks
Jeff

@anisa-hawes
Copy link
Contributor

Hello @jeffblackadar.

We can save any downloads which accompany your lesson in our assets repository where readers will be able to access them. Could you email them to me? admin@programminghistorian.org.

Thank you.

@charlottejmc
Copy link
Collaborator

charlottejmc commented Oct 12, 2023

Hello @jeffblackadar and @hawc2 (in place of Giulia for now). I've prepared the copyedits for this lesson to commit in PR #591. I'd be grateful if you could review the adjustments and confirm that you are happy for me to merge these. You can see the details of my edits here! Sorry if it appears slightly confusing – one of the changes involved changing the lesson title and slug to "transcribing-handwritten-text-with-python-and-azure".

You can respond to any of my suggestions via the comments, or click to Resolve conversation if you're happy with them like this:

image

If you want to make edits, please work here, accessing the edit facility by clicking the three dots at upper right of the file named transcribing-handwritten-text-with-python-and-azure:

image

@hawc2
Copy link
Collaborator

hawc2 commented Oct 12, 2023

These look good to me @charlottejmc. Once these changes have been approved, I have some suggestions for further revision @jeffblackadar that I can delineate for you to do before we publish

@anisa-hawes
Copy link
Contributor

Hello @jeffblackadar and @hawc2,

Thank you for reviewing Charlotte's copyedit PR. I've merged this in so that you read the updated lesson in the web preview:

http://programminghistorian.github.io/ph-submissions/en/drafts/originals/transcribing-handwritten-text-with-python-and-azure

--

When you are both happy, Charlotte and I will take this lesson through its next steps:

  • Typesetting + final checks of metadata
  • Generating archival hyperlinks

Very best, Anisa

@anisa-hawes anisa-hawes changed the title Transcribe Handwritten Text with Python and Microsoft Azure Computer Vision Transcribing Handwritten Text with Python and Microsoft Azure Computer Vision Oct 20, 2023
@jeffblackadar
Copy link
Collaborator

jeffblackadar commented Oct 22, 2023 via email

@charlottejmc
Copy link
Collaborator

Hi Jeff,

Thank you for flagging these up!

I have made these changes in the file. I did as you suggested and redirected the link to download to show td_00044_b2.jpg instead of td_00040_b2.jpeg.

I also created a new zip file in the assets directory containing all the images, and redirected the link to download at P68 straight to that.

The lesson is now ready for typesetting, which I will be working on in the next two days. I hope to have it ready for you by the end of Friday.

Best,
Charlotte

@charlottejmc
Copy link
Collaborator

charlottejmc commented Oct 26, 2023

Hello @hawc2 ,

This lesson's sustainability + accessibility checks are in progress.

  • Preview:

EN: https://programminghistorian.github.io/ph-submissions/en/drafts/originals/transcribing-handwritten-text-with-python-and-azure

Publisher's sustainability + accessibility actions:

  • Copyediting
  • Typesetting
  • Addition of Perma.cc links
  • Check/resize images
  • Check/adjust image filenames
  • Receipt of author(s) copyright agreement – hi @jeffblackadar, here is the form which you can fill out and send to my email address publishing.assistant [@] programminghistorian.org.
  • Added doi

Authorial / editorial input to YAML:

  • Define difficulty:, based on the criteria set out here
  • Define the research activity: this lesson supports (acquiring, transforming, analysing, presenting, or sustaining) – I've suggested transforming but let me know what you think.
  • Define the lesson's topics: (apis, python, data-management, data-manipulation, distant-reading, set-up, linked-open-data, mapping, network-analysis, web-scraping, digital-publishing, r, or maching-learning) Choose one or more. Let us know if you'd like us to add a new topic
  • Provide alt-text for all figures – Figures 3 and 7 are missing their alt-text description!
  • Provide a short abstract: for the lesson
  • Agree an avatar (thumbnail image) to accompany the lesson

The image must be:

  • copyright-free
  • non-offensive
  • an illustration (not a photograph)
  • at least 200 pixels width and height
    Image collections of the British Library, Internet Archive Book Images, Library of Congress Maps or the Virtual Manuscript Library of >Switzerland are useful places to search
  • Provide avatar_alt: (visual description of that thumbnail image)
  • Provide author(s) bio for ph_authors.yml using this template:
- name: Jeff Blackadar
  orcid: 0000-0002-8160-0942
  team: false
  bio:
    en: |
      Jeff Blackadar has a Master of Arts in History with a specialization in Data Science from Carleton University.

Files to prepare for transfer to Jekyll:

EN:

Promotion:

  • Prepare announcement post (using template)
  • Prepare x2 posts for future promotion via our social media channels

@jeffblackadar
Copy link
Collaborator

jeffblackadar commented Oct 26, 2023 via email

@charlottejmc
Copy link
Collaborator

Hi @jeffblackadar,

I had a little look around for an avatar and came across this one which could look like this, cropped and greyscaled:
transcribing-handwritten-text-with-python-and-azure

What do you think? Please also feel free to find one on your side, if you have a different idea!

@jeffblackadar
Copy link
Collaborator

jeffblackadar commented Oct 27, 2023 via email

@hawc2
Copy link
Collaborator

hawc2 commented Nov 9, 2023

Hi @jeffblackadar, I've had a chance to read through your tutorial and make some line edits before we move forward with publication. I have a series of minor revision requests that I think will help flesh this tutorial out a little more and make it more in keeping with most of our Programming Historian lessons.

The introduction was rather long, so I made a few attempts at breaking it up into sections. Feel free to change those if they don't represent what you intended. There are a couple places in the introduction where it could be useful if you give more information - for instance, you bring up related PH lessons, like the one on Google Vision, but you don't really say how they are connected. That one seems especially relevant to compare in some way with your tutorial, just a couple sentences explaining how the reader can see the two in tandem. Note I moved that section down to Prerequisites, where it makes more sense.

One concern we have about this lesson is it depends on "commercial" software. Perhaps you could say a little more early on about any free trials or ways people might demo this software without a cost (for instance, when you first mention it is a commercial software). I see under Prerequisites you mention the free tier in relation to the credit card being required, but it might be worth stating that earlier as well.

It is also worth explaining when you bring up the range of commercial options, why this particular process is so popular for commercial purposes, and what proprietary commercial transcription softwares offer over and above open-source ones. After explaining the advantages of commercial software, it would be helpful to point readers to free opensource options before digging into this commercial option. One problem with commercial software and tutorials like this is they are often unsustainable, since the companies change them so often and there's no opensource option to point to. Anything you can say about that in the tutorial would also be worth bringing up in relationship to the 'commercial' service.

We typically prefer for Programming Historian lessons to explore some kind of research question. It might help if you say a little more about how Historians can use transcription to pursue actual research projects, giving concrete examples. With your sample datasets/projects, just make a few nods throughout to the research questions this transcription process can make easier to answer.

For example, in the case of "Working with an image found online," is there anything about the particular image you can say represents a specific historical period or style that the transcription process could help the historian explore? Just a sentence or two would be nice to link it to the broader point of the lesson. I see you put an Endnote here where you say: "This is an image from the 1917 wartime diary of Captain William Andrew White photographed by the author during research." I wonder why this is a footnote? Seems like a relevant historical detail to talk about in terms of how this transcription process relates to historical research.

As an example, this lesson is much more step by step than most we publish, so there are lots of opportunities at the beginning of each section where you could add a little commentary, providing broader context for the purposes of each step. Instead of each section being only an alphabetized list of steps, you could add a few comments to introduce each section in paragraph form. This is what you do for Step 3, for example. But the section "Installing Azure Computer Vision in your Python environment" only begins with a link to a Microsoft resource, but you could also take moment to spell out for the reader here: "In this next section, we will install the Azure libraries in the Python environment we've created. You could take a second to explain some of the rudimentary methodological lessons relevant here, relating to Python package management and virtual environments.

I don't think this should require a lot of time-consuming revision. Mostly I think you could just weave a few more moments of commentary and discussion throughout the piece, and these commentaries can also serve as context and signposting for a reader to get oriented around a series of tutorial steps. It is helpful to remind the reader how this relates to research questions, but it is most important that there is some gesture towards the purposes of transcription in the Introduction and Conclusion. Right now, the ending, "Summary," doesn't really provide any conclusive claims about what you've taught and how it's useful to historical research. Feel free to take a little more time to explain the implications to the reader!

As you make these last revisions, be mindful that the most important lessons for the reader are methodological, relating to transcription and historical research. For example, when you are teaching about using "Keys" that's a good opportunity to generalize about this essential aspect of using cloud services for historical research. Secondarily, it is most important that through this lesson you teach the reader about Python more so than Microsoft products. If there are ways you can add commentary about the Python steps you are teaching, and the related concepts, that would be welcome. For instance, alot of your code has commented out sections with information about what the code does. In some of those cases, you could add commentary before or after the code chunk where you explain what this section of code does, and what important coding concepts are important to be aware of.

Once you make these minor additions, we can move forward with publishing this lesson. Thanks for your work finalizing it!

@charlottejmc
Copy link
Collaborator

charlottejmc commented Nov 10, 2023

Hi @hawc2 (and @jeffblackadar), thank you for this detailed comment! Just quickly jumping in to note that the footnote [^1] ("This is an image from the 1917 wartime diary of Captain William Andrew White photographed by the author during research.") was a choice made in our copyedit. You can see this change was made with this link at lines 219 (green) and 539 (green). This was part of an effort to tidy up the Bibliography and Endnotes, keeping source links out of the main text.

@anisa-hawes
Copy link
Contributor

Thank you, @charlottejmc. That makes good sense.

I agree that if this endnote was reinstated as a line within the lesson reading This is an image from the 1917 wartime diary of Captain William Andrew White it would be good to extend the sentence (and/or add one or two) to surface that taking photographs of handwritten documents when you're doing archival research is a good example of a scenario where you might want to perform automatic transcription to help save time. I think this is an interesting piece of contextual information about your experience of doing research, @jeffblackadar.

@jeffblackadar
Copy link
Collaborator

jeffblackadar commented Nov 19, 2023 via email

@charlottejmc
Copy link
Collaborator

charlottejmc commented Nov 22, 2023

Thank you very much @jeffblackadar for all those changes – we appreciate your work on this! I have gone through and applied some copyedits, which you can see by clicking on this link to read the "rich diff" in detail.

I'll take this opportunity to reiterate the points we still need from the checklist in my comment above:

  • Receipt of author(s) copyright agreement – here is the form which you can fill out and send to my email address publishing.assistant [@] programminghistorian.org.
  • Define the research activity: this lesson supports (acquiring, transforming, analysing, presenting, or sustaining) -
    I've suggested transforming but let me know what you think.
  • Define the lesson's topics: (apis, python, data-management, data-manipulation, distant-reading, set-up, linked-open-data, mapping, network-analysis, web-scraping, digital-publishing, r, or maching-learning) - Choose one or more. Let us know if you'd like us to add a new topic
  • Provide alt-text for all figures – Figures 3 and 7 are missing their alt-text description!
  • Provide a short abstract: for the lesson
  • Provide avatar_alt: (visual description of that thumbnail image)
  • Prepare x2 posts for future promotion via our social media channels – this is usually in the hands of @hawc2

Thanks again!

@jeffblackadar
Copy link
Collaborator

jeffblackadar commented Nov 24, 2023 via email

@charlottejmc
Copy link
Collaborator

charlottejmc commented Nov 24, 2023

Hello @jeffblackadar, thank you very much for getting back to me on those.

I've received the copyright agreement and added [python, apis, data-manipulation] as the topics.

I think the alt-text you saw was (perhaps confusingly!) simply placeholder text (alt="Visual description of figure image"). I made some changes to the file to suggest instead:

  • Figure 3 = "Screen capture of the Keys and Endpoint tab in the Azure Portal"
  • Figure 7 = "Picture of a handwritten diary entry"

I've also suggested the avatar_alt (visual description of lesson avatar) as "Drawing showing the design for the Youths progressive recorder, a mechanical handwriting copying machine."

If you're happy with those three descriptions, we can just keep them in!

Thanks again,
Charlotte

@jeffblackadar
Copy link
Collaborator

jeffblackadar commented Nov 24, 2023 via email

@charlottejmc
Copy link
Collaborator

charlottejmc commented Nov 29, 2023

Hello @hawc2 ,

This lesson's sustainability + accessibility checks are now complete.

  • author(s) bio for ph_authors.yml
- name: Jeff Blackadar
  orcid: 0000-0002-8160-0942
  team: false
  bio:
    en: |
      Jeff Blackadar has a Master of Arts in History with a specialization in Data Science from Carleton University.

Promotion:

  • Template announcement posts
  • Prepare x2 posts for future promotion via our social media channels

@hawc2
Copy link
Collaborator

hawc2 commented Nov 29, 2023

Thanks @jeffblackadar for your thorough edits!

@charlottejmc we should be ready to move forward with publication.

@jeffblackadar
Copy link
Collaborator

jeffblackadar commented Nov 29, 2023 via email

@jeffblackadar
Copy link
Collaborator

jeffblackadar commented Dec 8, 2023 via email

@anisa-hawes
Copy link
Contributor

Hello @jeffblackadar.

We are publishing your lesson today!
I'll update you as soon as the DOI is live.

Very best, Anisa

@anisa-hawes
Copy link
Contributor

anisa-hawes commented Dec 8, 2023

Transcribing Handwritten Text with Python and Microsoft Azure Computer Vision is published! 🎉

Congratulations @jeffblackadar!
Thank you all for your contributions


Our suggested citation for this lesson is:

Jeff Blackadar, "Transcribing Handwritten Text with Python and Microsoft Azure Computer Vision," Programming Historian 12 (2023), https://doi.org/10.46430/phen0114.

We appreciate your help to circulate our social media announcements about this lesson among your networks:
Twitter/X: https://twitter.com/ProgHist/status/1733069313219699067
Mastodon: https://hcommons.social/@proghist/111544298504241295


I'd be also grateful if you might consider supporting our efforts to grow Programming Historian's community of Institutional Partners. This is a network of organisations across Europe, Canada, North America and Latin America who have invested in our success by contributing an annual membership fee in lieu of subscription.

Institutional Partnerships enable us to keep developing our model of sustainable, open-access publishing, and empower us to continue creating peer-reviewed, multilingual lessons for digital humanists around the globe.

If you think that supporting Diamond Open Access initiatives may be among the strategic priorities of the university or library where you work, please let me know.

You can email me <admin [@] programminghistorian.org>, and I can send you an information pack to share with your colleagues. Alternatively, feel free to put me in touch with the person or department you think would be best-placed to discuss this opportunity.

Sincere thanks,
Anisa

@jeffblackadar
Copy link
Collaborator

jeffblackadar commented Dec 8, 2023 via email

@anisa-hawes
Copy link
Contributor

Thank you for these kind words, @jeffblackadar. It makes us very proud to hear feedback like this.

We are grateful for your participation. Your second Programming Historian lesson! The first has also been translated into Portuguese – so these resources are reaching and benefiting a broad community of learners.

@hawc2
Copy link
Collaborator

hawc2 commented Dec 10, 2023

Thanks @jeffblackadar for your kind words and for authoring this great lesson for Programming Historian. We really appreciate your efforts to revise and polish it. We're accruing a bunch of transcription focused lessons in our wheelhouse and this one will pair well with the others while providing much needed guidance for scholars looking to do this work with new cloud services. Congrats on publishing it!

@hawc2 hawc2 closed this as completed Dec 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants