Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lesson proposal: Clustering and Visualising Documents using Word Embeddings (PH/JISC/TNA) #415

Closed
tiagosousagarcia opened this issue Nov 2, 2021 · 91 comments
Assignees
Labels
7. Publication 2021/22-JiscTNA Articles submitted in answer to the PH/JISC/TNA call for papers English Original

Comments

@tiagosousagarcia
Copy link
Contributor

tiagosousagarcia commented Nov 2, 2021

The Programming Historian has received the following proposal for a lesson on 'Clustering and Visualising Documents using Word Embeddings' by @jreades and @jenniewilliams. The proposed learning outcomes of the lesson are:

  • The ability to generate word embeddings from a large corpus.
  • The ability to use dimensionality reduction and clustering techniques for visualisation and analysis
    purposes.
  • The ability to use these steps to find and explore groups of similar documents within a large data set.

In order to promote speedy publication of this important topic, we have agreed to a submission date of no later than April 2022. The author(s) agree to contact the editor in advance if they need to revise the deadline.

If the lesson is not submitted by April 2022, the editor will attempt to contact the author(s). If they do not receive an update, this ticket will be closed. The ticket can be reopened at a future date at the request of the author(s).

The main editorial contact for this lesson is @tiagosousagarcia.

Our dedicated Ombudsperson is (Ian Milligan - http://programminghistorian.org/en/project-team). Please feel free to contact him at any time if you have concerns that you would like addressed by an impartial observer. Contacting the ombudsperson will have no impact on the outcome of any peer review.

@tiagosousagarcia tiagosousagarcia added English 0. Proposal 2021/22-JiscTNA Articles submitted in answer to the PH/JISC/TNA call for papers labels Nov 2, 2021
@tiagosousagarcia tiagosousagarcia self-assigned this Nov 2, 2021
@svmelton
Copy link
Contributor

svmelton commented Feb 2, 2022

@hawc2 has offered to edit this piece.

@hawc2
Copy link
Collaborator

hawc2 commented Feb 18, 2022

Hi @jreades and @jenniewilliams, I look forward to reading your submission. Please let me know if you have any questions in the meantime. Feel free to email me or post questions on this ticket.

@jreades
Copy link
Collaborator

jreades commented Mar 9, 2022

Hi sorry -- between strikes, childcare, and general... aaaaaaaargh... I'm behind where I'd hoped to be with this! I do have a perfectly serviceable draft of the core explanatory part (what are Word Embeddings, etc.) and I have separate code that I've already used in other analysis of the same data so I know the path to completion...

However, is it helpful for me to share this early draft, or would you prefer to see only a full submission? Having open review creates more opportunities to shape the work as it develops rather than afterwards, but it could also be confusing/unhelpful. Let me know!

If helpful then I can: 1) share access to the GitHub repo where I'm writing the draft (so that we don't pollute this 'timeline'); 2) attach a draft to this thread; 3) submit a draft but recognise that it will need be versioned later (a pull request or similar).

Best,

Jon

@hawc2
Copy link
Collaborator

hawc2 commented Mar 9, 2022 via email

@jreades
Copy link
Collaborator

jreades commented Mar 10, 2022 via email

@jreades
Copy link
Collaborator

jreades commented Apr 6, 2022

I assume the first draft is submitted as an attachment to this issue... so here goes!

The article is a README from our private repo (will make a public version prior to publication):
README.md

Images are here:
EThOS
UMAP_Output
DDC_Plot
Dendogram-euclidean-100
DDC_Cloud-c4-ddcBiology-tfidf
DDC_Cloud-c4-ddcEconomics-tfidf
DDC_Cloud-c4-ddcPhysics-tfidf
DDC_Cloud-c4-ddcSocial sciences-tfidf
Word_Cloud-c15-tfidf

@hawc2
Copy link
Collaborator

hawc2 commented Apr 6, 2022

@jreades, I'll try to get the lesson set up, and I'll email you with more specific questions/issues with the files. More soon!

@tiagosousagarcia
Copy link
Contributor Author

@hawc2 -- I can setup the lesson later today, if you haven't had a chance

@jreades
Copy link
Collaborator

jreades commented Apr 19, 2022 via email

@tiagosousagarcia
Copy link
Contributor Author

no worries @jreades, it's all part of the process. I haven't seen any development here, that's why I was asking if there was something I could do -- but if @hawc2 has the matter in hands, then we are all good (though the offer to help if needed still stands)

@hawc2
Copy link
Collaborator

hawc2 commented Apr 21, 2022

It's looking good now, here is a preview link to the lesson: https://programminghistorian.github.io/ph-submissions/en/drafts/originals/clustering-visualizing-word-embeddings

@jreades can you let me know if you see anything basic in the markdown rendering that might be incorrect?

I'll follow up with some preliminary feedback on the lesson itself in the coming week. Once you finish that round of edits, I'll work on sending this out for peer review.

Thanks also @jreades for putting together now a Github repo that will be linked in the lesson. The repo will include the Python code in a Jupyter notebook runnable in Google Colab for testing purposes

@jreades
Copy link
Collaborator

jreades commented Apr 21, 2022 via email

@hawc2
Copy link
Collaborator

hawc2 commented Apr 21, 2022

I'm giving you write access now. Can you also post the Github repo and Colab notebook here for reference?

@jreades
Copy link
Collaborator

jreades commented Apr 22, 2022 via email

@hawc2
Copy link
Collaborator

hawc2 commented Apr 26, 2022

This is looking like a very solid first draft. My main feedback is pretty general, so I’ll hold off from giving you specific line edits, and just ask for some broad revisions before we send out for review.

My main observation is that this is quite a difficult lesson, and more work will be required to translate terminology for beginner audiences, signposting where the lesson is going, and onboarding the reader to each phase of the methodology. It will be helpful for you to do some basic revisions in this direction before I send it out for reviewers, so they don’t need worry as much about how this lesson caters to its audience.

My only other concern is that this lesson is very long. Lessons usually don’t go over 8,000 words. I’d rather not see it bulge into a two part lesson, although that is a possible solution. For now, I’d encourage you to focus on the difficult task of editing this draft for both clarity and length, making it ideally more concise and more concrete at the same time.

As an example of clarifying your language for introductory steps, in your first Learning Outcome, you say: “we use a selection of nearly 50,000 records relating to U.K. PhD completions.” Right off the bat, you should use language that more clearly identifies what kind of data your tutorial works with. What kind of records are these? As an American, I’m not sure what “records relating to U.K. PhD completions” would look like, nor why someone would do word embedding analysis on this type of data. I would’ve expected “a corpus of doctoral dissertations” as the main dataset. In this vein, on Paragraph 9, where you introduce this dataset in more detail, it’s still not clear yet what “textual data” you will be analyzing within the “metadata” about dissertations. I have to admit that the section on the Case Study gets so technical and detailed about the metadata that I lost the main thread: What is the text you are going to model?

The part where you explain word embeddings and compare them to other text mining algorithms also requires more revision. In the learning outcomes, the tutorial jumps right into ‘dimensionality reduction’ and ‘hierarchal clustering,’ but maybe a preliminary learning outcome should be something about teaching the reader why these methods are appropriate next steps once you’ve created a word embedding model, in order to pursue a research question about the dataset. Putting it in these less technical terms will help readers understand how the algorithmic processes relate to broader scholarly work.

The subsequent paragraphs do a good job of distinguishing PCA, LDA, and TF-IDF from WEs, but they do assume that the reader knows something about what all these have in common. In these opening paragraphs, try to find more ways to spell this out, in terms of approaches like predictive modeling and latent meaning. For example, this sentence clause doesn’t really clarify what TF-IDF is, so its comparison with WEs remains a bit vague: “ The benefit of simple frequency-based analyses such as TF/IDF is that they are readily intelligible and fairly easy to calculate . . .” What seems essential to highlight here is the type of meaning WEs offer us insight into about the text the other approaches overlook. There’s some explanation in the Word Embedding section (beginning Paragraph 39) that helpfully explains why dimensionality reduction is necessary; a brief version of this could be included early on in the tutorial to explain why that tutorial leads the reader through this specific series of steps. Similarly, under Prerequisites, you explain how this lesson differs from the Scikit Learn Clustering lesson, but you don’t really explain first what the two lessons have in common. Alot of these comparison examples are useful for clarifying what your lesson on word embeddings does, but ideally they’d all occur in one section, and focus mostly on clarifying what word embedding analysis can show about the text.

In this context, the Word Embedding section, in particular paragraphs 40-44, jumps very quickly from the mathematical to the semantic. Could you spend more time here explaining the analogical nature of word embedding models and vector relationships?

The Sample Output section similarly jumps right into the weeds. Could you have a little more introductory info here about the outputs, and how this is a useful sample for elucidating some key points?

A couple of your Tables take up a lot of real estate. Could they be condensed? Table 3 for example, could just be one row? Table 5 is also long.

Paragraph 57 - I agree this is a good break point. You can remove the signpost you put here for review. I think the next section on Words to Documents is very useful, and could be contextualized a bit in terms of Word2Vec and Doc2Vec or how this method differs from those. I see why it’s useful to get into Manifold Learning, but if TSNE-UMAP is the main point, you should get to that sooner. I kinda got lost in this section. Generally I think you go into too much behind the scenes background detail about alternative options, and not enough info on the specific thing you are teaching. Try to offload some of the secondary comparisons with other methods to footnotes.

The Visualization section seems like a good place to conclude. Right now that Figure isn’t rendering in Markdown. But Visualizing and Clustering the data ideally could’ve been foreshadowed earlier in the lesson. The current version of these sections can be condensed to focus on concluding the lesson with some first steps in these visualization directions. What about this section is really essential to this lesson? Is all the validation and related steps necessary, or could that be included as supplemental material on your Github repo and Colab notebook for more advanced users? There’s a bunch, like the Confusion Matrix, that just seems so dense and complicated, that you’d have to do a lot more work to justify its inclusion for the proof-of-concept word embedding methodology. Since that would take up more space, so I’m inclined to think a bunch of it can be removed?

If you can try to take a shot at edits along these lines in the next couple weeks, after one round of revision, it’ll be ready to send out for review. Let me know if you have any questions

@jreades
Copy link
Collaborator

jreades commented May 3, 2022

I've nearly finished -- I just need to review the final bits of analysis in light of the edits above, but have been able to prune the tutorial down to about 9,800 words. I've fixed issues with maths rendering (GItHub doesn't actually do this directly for Markdown) and tried to generally tidy up.

@jreades
Copy link
Collaborator

jreades commented May 3, 2022

Done. I've gone the whole way through and yanked as much as I think we can while preserving the overall intention of the submission. I'm sure there's more that could be done but I'm not able to see it at this point. The commit is in. The only thing I wasn't sure about is the images: I can see that they eventually go into an images/<tutorial_name>/ folder but figured you'd want to do this yourselves.

Let me know if you need anything else or have any further comments/ideas before sending out for review. As you can see, your initial comments prompted a major rethink and I hope you'll think we've done a good job acting on them.

@jreades
Copy link
Collaborator

jreades commented May 4, 2022

Quick note to self: clarify that Euclidean distance works well with UMAP in this case because the abstracts don't vary enormously in length; this means that the magnitude of the averaged document vector isn't an issue. Cosine would probably be a better choice where there was significant variation in the length of the documents.

@jreades
Copy link
Collaborator

jreades commented May 6, 2022

Quick note to self: clarify that Euclidean distance works well with UMAP in this case because the abstracts don't vary enormously in length; this means that the magnitude of the averaged document vector isn't an issue. Cosine would probably be a better choice where there was significant variation in the length of the documents.

I've now fixed this. This is ready for a review... I hope!

@hawc2
Copy link
Collaborator

hawc2 commented May 6, 2022

@jreades regarding the images, let's make sure those are all rendering correctly. I put them all in the directory: https://github.com/programminghistorian/ph-submissions/tree/gh-pages/images/clustering-visualizing-word-embeddings

Can you make sure your markdown file has each embedded in the appropriate place with alt-text? You can see information on naming the image files and inserting them into the markdown here: https://programminghistorian.org/en/author-guidelines

Once the lesson is rendering correctly, I'll do one last skim and send it out for peer review. Thanks so much for your thorough edits

@jreades
Copy link
Collaborator

jreades commented May 6, 2022 via email

@hawc2
Copy link
Collaborator

hawc2 commented May 6, 2022

So images don't need the whole directory link, just the name of the image file. You should be able to look at this preview to know when everything looks right: https://programminghistorian.github.io/ph-submissions/en/drafts/originals/clustering-visualizing-word-embeddings

I edited the final image in your lesson to show you what it should look like. The last image now renders correctly. I'd rather you finalize it since you know how it should look. Once you think it looks like there, I'll send this out for review. I don't have any other immediate feedback, but after we get reviewer feedback, I'll synthesize their feedback and add any remaining thoughts I have for further revision.

@jreades
Copy link
Collaborator

jreades commented May 9, 2022 via email

@tiagosousagarcia
Copy link
Contributor Author

I’m definitely missing something here: I’ve removed the full path and followed the convention used in the other tutorials that I peeked at (image name only, no other path info) but I still can’t get the images to display even though they appear to me to be in the right place for the includes to work. I don’t know if the GitHub pages are only rebuilt intermittently or, more likely, if I’m still mucking up something in the placement of the images/code… but I’m stuck. I’m sure the images are ‘fine' in the sense that if you can get them working we can sort out any issues that they might present during the review stage. I’m not too worried about minor look-and-feel issues since the reviewers will presumably also comment on these if they noticed anything wrong. Jon

@jreades and @hawc2, there's apparently something wrong with preview in the submissions repo which means that the relative paths don't work, we need the full path to the image for the preview to display it, according to what @anisa-hawes told me here

@tiagosousagarcia
Copy link
Contributor Author

I’m definitely missing something here: I’ve removed the full path and followed the convention used in the other tutorials that I peeked at (image name only, no other path info) but I still can’t get the images to display even though they appear to me to be in the right place for the includes to work. I don’t know if the GitHub pages are only rebuilt intermittently or, more likely, if I’m still mucking up something in the placement of the images/code… but I’m stuck. I’m sure the images are ‘fine' in the sense that if you can get them working we can sort out any issues that they might present during the review stage. I’m not too worried about minor look-and-feel issues since the reviewers will presumably also comment on these if they noticed anything wrong. Jon

@jreades and @hawc2, there's apparently something wrong with preview in the submissions repo which means that the relative paths don't work, we need the full path to the image for the preview to display it, according to what @anisa-hawes told me here

I'll go through the file and correct the paths, give me 5 mins

@jreades
Copy link
Collaborator

jreades commented Mar 13, 2023 via email

@jreades
Copy link
Collaborator

jreades commented Mar 15, 2023 via email

@hawc2
Copy link
Collaborator

hawc2 commented Mar 20, 2023

Colab notebook worked well for me, thank you @jreades.

@quinnanya can you take a look and share any last thoughts on this lesson?

@quinnanya
Copy link
Contributor

The notebook worked well for me, too -- and admittedly, I was on Colab Pro, but the completion times were quite short.

Running through the code, though, leaves me with one major concern: how can people run this on any other data set they might have? There's a somewhat breezy note about parquet, but I can think of exactly one person I know who regularly uses that as a data format. If everything is going to depend on parquet files, for this to have a prayer of being reusable, it needs at a very minimum a pointer to some external tutorial for how one might go about converting, say, a CSV to a parquet file.

@jreades
Copy link
Collaborator

jreades commented Mar 21, 2023 via email

@quinnanya
Copy link
Contributor

Hi Jon,

Got it! Yup, I think a quick "if you've got a CSV you can convert it easily with [code]" insert there would take care of it, thanks!

~Quinn

@BarbaraMcG
Copy link

I too have checked the code and it runs quickly with no issues. I agree that the input format needs some clarification, as well as some more comments to the code in the initial part of the notebook; it gets more verbose later on, but the beginning is a little dense.

@hawc2
Copy link
Collaborator

hawc2 commented Mar 23, 2023

Thanks @BarbaraMcG and @quinnanya for this useful and precise feedback. @jreades it sounds like the lesson is ready for copy-editing. @anisa-hawes can start working on that next phase.

Thanks again to our reviewers for taking a second look at this lesson, and giving it such a careful eye. It's going to be an excellent lesson, and I look forward to seeing how @quinnanya's in-progress lesson provides an introduction/background useful for this lesson.

Separately, I can work with you @jreades on revising and finalizing the accompanying Google Colab notebook. What you described doing sounds good to me. Since we don't want to replicate commentary available in the Programming Historian lesson, you can focus on minimal commentary and headings in the colab notebook that make it easy to follow along with, and explain any technical diversions from the PH lesson. The parquet pandas implementation is very elegant!

@jreades
Copy link
Collaborator

jreades commented Mar 23, 2023 via email

@anisa-hawes
Copy link
Contributor

Thank you @jreades. This lesson /en/drafts/originals/clustering-visualizing-word-embeddings is now being copyedited.

@anisa-hawes
Copy link
Contributor

anisa-hawes commented Apr 5, 2023

Hello @jreades,

I hope you are well.

Our copyeditor Iphgenia has prepared the edits for this lesson. I've staged these edits in a Pull Request #554. You can review the changes she's made in the rich diff by navigating to the "Files changed" tab.

Please let me know if you're happy with the adjustments. You'll notice that I have left some small comments/queries, and indicated where a few additions are needed.

With many thanks,
Anisa

cc. @hawc2

@hawc2
Copy link
Collaborator

hawc2 commented Apr 17, 2023

@jreades did you see the pull request #554 awaiting your approval for copyedits on your lesson?

@jreades
Copy link
Collaborator

jreades commented Apr 19, 2023

I had -- what I wasn't sure about was whether I was supposed to use the inline commenting function or approve the pull request. I've now done this and then added in the requested revisions.

I think this means we're there? 🤞

@anisa-hawes
Copy link
Contributor

anisa-hawes commented Apr 19, 2023

Thank you, @jreades!

--

Hello @hawc2,

This lesson is almost ready for your final review.

Sustainability + accessibility actions status:

  • Copyediting
  • Typesetting
  • Addition of Perma.cc links
  • Addition of alt-text for all figures (thank you, Jon)
  • Hello Jon @jreades and Jennie @jenniewilliams. May I ask if one of you could download and complete this authorial copyright declaration form? For co-authored lessons, we only require one lead author to complete the form. Please email your completed form to me admin[@]programminghistorian.org
  • @jreades and @jenniewilliams, could you also confirm whether you have ORCIDs that you'd like us to add to your bios? (below)
- name: Jon Reades
  orcid: 0000-0002-1443-9263
  team: false
  bio:
    en: |
      Jon Reades is Associate Professor at the Centre for Advanced Spatial Analysis, University College London.
- name: Jennie Williams
  orcid: 0000-0000-0000-0000
  team: false
  bio:
    en: |
      Jennie Williams is a PhD Student at the Centre for Advanced Spatial Analysis, University College London.

Next steps @hawc2 :

  • Define the lesson's difficulty: level, based on the criteria set out here
  • Define the lesson's activity:
  • Define the lesson's topics:
  • Liaise with the authors to provide a short abstract: for the lesson : Jon has provided this
  • Select + upload an image for the lesson (incl. avatar_alt:) : I've done this
  • Prepare x2 posts for our Twitter/Mastodon Bot (Let me know if you'd like me to do this? Happy to once we have an abstract)

@hawc2
Copy link
Collaborator

hawc2 commented Apr 28, 2023

thanks @anisa-hawes for getting everything ready. Yes, we can host these assets either in github large file storage or our zenodo repository.

@jreades, can you add the link to github where code is hosted as Anisa mentioned on line 731?

@jreades
Copy link
Collaborator

jreades commented Apr 28, 2023 via email

@anisa-hawes
Copy link
Contributor

Thank you, @jreades 🙂

@hawc2
Copy link
Collaborator

hawc2 commented May 3, 2023

@jreades I did a final line edit of the lesson and standardized some of the wording, tried to clarify a few points. The lesson overall is looking really solid, it's impressive, and while difficult, illuminating. It provides a lighthouse around which future PH lessons on embeddings can situate themselves, and it'll be interesting to see how we publish more lesssons that go into the weeds of emerging machine learning methods while trying to stay sustainable. I also appreciate how you engage with the Scikit Clustering lesson, and more generally situate the lesson in the context of other PH lessons.

In terms of the publishing timeline, Anisa is working on finalizing how we'll store all the assets for this lesson, and I'm preparing other elements for publication. I'm hoping to publish the lesson in the next couple weeks.

So this would be your last chance to make any edits to it. I had a couple lingering questions I was hoping you could clarify, and two suggestions for additional minor edits.

  • Under Configuring the Context, what does this line mean?: "A mixture of experimentation and reading indicated that Euclidean distance with Ward's quality measure is best" - does "reading" here mean "research"? Is there an article to cite for this decision?

  • I also wasn't sure what this sentence meant?: "Indeed, the assumptions about the theses being swapped between History DDCs are probably more robust, since the number of misclassified records is substantial enough for the differences to be relatively more robust." Can we use different words than robust here?

  • One thing I should've flagged earlier is I'm not a fan of commentary embedded in code blocks, especially when it breaks up functions. In this case, however, your lesson is so complex, and there's such detailed code blocks, that I think it works in many places. I tried to condense how many lines some of these commentary sections take up in distrupting the code, but I'd also encourage you to take one last look at this element of the lesson. In a few instances, some of the commentary in-line the code could be taken out and added as a paragraph before or after the code chunk. Ideally you are explaining in the tutorial prose what each code chunk is about to do or has just done. This might be especially helpful in the cases of functions being broken up by commentary. I'll defer to you in the end on how you prefer the in-line commentary to appear case by case, but I just wanted to flag it as something you could alter.

  • It's fine for you to leave that sort of commentary in the Google Colab notebook. The notebook itself works great, and my only ask for edits on the Colab notebook would be for you to try to incorporate more of the Section Headings from the PH Leson into the Colab notebook itself. Ideally a reader could switch from the lesson to the colab notebook and use the outlines to figure out where in the lesson the google colab code fits. It doesn't have to be perfect, but adding some more sign posts might help a reader juggle everything.

@jreades
Copy link
Collaborator

jreades commented May 9, 2023 via email

@hawc2
Copy link
Collaborator

hawc2 commented May 9, 2023

@jreades thanks for making these edits and additions, including the references. .

For the parquet file you mentioned, is that currently in the directory of assets you link? I think @anisa-hawes is planning to store all of those files on our Zenodo repository and relink to that location in the lesson

@jreades
Copy link
Collaborator

jreades commented Jun 8, 2023

Just chasing this: I can certainly put the Parquet file in GitHub so that you can access it via the 'assets' directory, but if you're then going to move it elsewhere there's not much point as it will bulk up your repo with a 26MB file that's never actually used.

Let me know where you want it to go and I can do it, or feel free to download using the URL in the code and move it wherever you like.

Is there anything else you ened from me?

@anisa-hawes
Copy link
Contributor

Dear @jreades. Thank you for following up.

Apologies for the delay. I am working through a few questions about how we handle lessons that integrate codebooks, and also about how to host large data assets. Both are important to ensuring we can manage and sustain this lesson into the future.

I have downloaded your code and created a .zip file which combines all the data assets and uploaded it to our PH Zenodo repository. However, I've expressed to Alex that I am a bit unsure about the data .zip having its own DOI. This appears to be automatically assigned by Zenodo unless we supply one (the lesson's own DOI isn't activated until shortly after publication). I've contacted the library who coordinate and register our DOIs with crossref to ask for advice here.

I’m also uncertain about how the download would work within the code. At line 245 of the Markdown, a block of Python specifies df = pd.read_parquet. Would this work to download and save a .zip? Sorry for all the questions and doubts here.

Anisa

@hawc2
Copy link
Collaborator

hawc2 commented Jul 24, 2023

@anisa-hawes Can we move this lesson forward to publish in the next couple weeks?

@anisa-hawes
Copy link
Contributor

Hello @hawc2 ,

Thank you for your extended patience. All the sustainability + accessibility actions are complete:

  • Copyediting
  • Typesetting
  • Addition of Perma.cc links
  • Addition of alt-text for all figures
  • Receipt of authorial copyright agreement
  • Select + upload an image for the lesson (incl. avatar_alt:)
  • Liaise with authors to prepare bios for ph_authors.yml

As you are in the unusual position of being both Managing Editor and Editor of this lesson, I have prepared the files on Jekyll to help you.

Everything is ready for your review here: programminghistorian/jekyll#2987

(the first thing after publication will be for us to prepare x1 announcement + x2 future posts for our social media channels)

@hawc2
Copy link
Collaborator

hawc2 commented Aug 14, 2023

Huge congrats @jreades and @jenniewilliams on the publication of this amazing new lesson on word embeddings: https://programminghistorian.org/en/lessons/clustering-visualizing-word-embeddings

It's been my pleasure editing this piece, and I'm grateful to @quinnanya and @BarbaraMcG for their careful review of the lesson. Big thanks to @anisa-hawes too for helping prepare this lesson for publication and developing a new way for us to manage Jupyter and Colab Notebooks going forward.

@jreades and @jenniewilliams, we'll be promoting the published lesson on social media, and we encourage you to share it around as well. I look forward to recommending students read it, and I'm sure I'll make use of it in my own research in the future as well. Thanks for all your work and time on this lesson, and again congratulations!

@hawc2 hawc2 closed this as completed Aug 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
7. Publication 2021/22-JiscTNA Articles submitted in answer to the PH/JISC/TNA call for papers English Original
Projects
None yet
Development

No branches or pull requests

8 participants