explore_xml_for_textmining

This repo documents my explorations of using the {xml2} package to process multiple xml documents.

My insights are in 03_looking_for_paragraphs.qmd and 04_iterate.qmd

summary

Exploring the techniques of Jockers and Thalken's Text Analysis with R @jockers2020. It's clear that a major issue is composing the proper xpath. There are many useful youtube tutorials on xpath, or xpath and tei. Getting the xpath correct may be a key to good analysis. I have taken a stab, with an eye towards functionality. But I have stopped short of being concerned about the precision of the xpath. Researchers should pay special attention to the xpath so they are clear about their corpus.

Beyond Jockers and Thalken, I highly recomment Sigle and Robinson's Text Mining with R and the helpful {tidytext} package. I think Silge and Robinson's data wrangling techniques, being tidyverse inspired, are more accessible to most people. Employing Jockers and Thalken algoritms may still be necessary on a strategic basis, but at least the researcher will not have to be delayed but base-R techniques. Of course this recommendation is only relevant if a person prefers the grammatical assumptions of tidyverse. Base-R people are free to continue their approach without any asperstions from me.

Lastly, while I have not read it. Hvitfeldt and Silge's Supervised Machine Learning for Text Analysis in R is a text I would read if I were interested in deepening my understanding of text analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
XMLAuthorCorpus		XMLAuthorCorpus
data		data
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
01_EXPLORE.qmd		01_EXPLORE.qmd
02_explore.qmd		02_explore.qmd
03_looking_for_paragraphs.qmd		03_looking_for_paragraphs.qmd
04_iterate.qmd		04_iterate.qmd
LICENSE.md		LICENSE.md
README.md		README.md
books_sample.xml		books_sample.xml
charity-exploration.R		charity-exploration.R
clustering-exploration.R		clustering-exploration.R
eebo_datascrape.qmd		eebo_datascrape.qmd
explore_xml_for_textmining.Rproj		explore_xml_for_textmining.Rproj
sample.xml		sample.xml
sample2.xml		sample2.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

explore_xml_for_textmining

summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

explore_xml_for_textmining

summary

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages