Repository for research and drafts of my next book on library search algorithms
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
proposals
quotes
screenshots
tweets
.DS_Store
.gitignore
README.markdown
bibliography.txt
discovery_instances.txt

README.markdown

Sabbatical Leave Request

You must address the following criteria in your Sabbatical Leave Request. Type on this form, save as a PDF document, and upload in the electronic sabbatical leave request system under the proposal section. The proposal shall not exceed ten (10) pages, excluding references and other supporting documents.

1. Descriptive Title for the Project

Stabilizers of Trust: A critical study of the algorithms behind library discovery systems

2. Goals and Objectives

Proposals for sabbatical leave must have a clear conceptual focus. Be certain that the conditions and criteria for sabbatical leave as stated in the Board of Trustees Policies, chapter 4, section 2.25.4 and 5, have been addressed. A sabbatical proposal must be explicit about the desired results or outcomes of the project.

The goal of my sabbatical will be to complete a publishable manuscript on the practical and ethical implications of software algorithms on research and learning in libraries. In the past decade, the tools of research have changed dramatically. The rise of Google and its integration into nearly every aspect of our lives has pushed libraries to adopt similar “Google-like” tools to search their collections. These unified search tools, called discovery systems, search across hundreds of third- party research databases, presenting scholarly articles, books, archival items, and other library holdings in a single results list, removing the need for the user to search individual databases, one at a time.

Because these tools are provided by libraries and search subscription databases of scholarly materials rather than the open web, we often assume they are more “accurate” or “reliable” than their general-purpose peers like Google or Bing. Although the content may be more academic, library discovery systems are still software written by people with prejudices and biases. Discovery systems are subject to strong commercial pressures, which are hidden from users (and many in libraries) behind diffuse collection-development contracts and layers of administration.

Additionally, they struggle to integrate content from thousands of different vendors and their respective disregard for consistent metadata. As I will show in my sabbatical project, library discovery systems struggle with accuracy, relevance, and human biases, and these shortcomings have the potential to shape the academic research of the students and faculty who rely on them.

Human bias, commercial interests, and problematic metadata have long affected researchers’ access to information. What is new with algorithms in library discovery systems is the scale of the effect and the widespread belief that these obstacles to “objective truth” have been largely erased from our library tools.

My sabbatical project will meet Objectives 1 and 2:

  • Objective 1: Promise of a significant contribution to a new or existing subject under study or problem undertaken. This project will extend the scholarly discussion surrounding critical algorithmic studies to commercial library discovery tools. It will contribute to the critical examination underlying how library collections are searched, as well as how contemporary academic research is done.
  • Objective 2: Expansion of skills that deepens or extends the applicant’s professional capabilities related to professional effectiveness, research, or creative activity. This research will help me better understand the critical tools that I help maintain at the University Libraries by closely examining how the underlying algorithms affect search results. It will also help me develop techniques for evaluating the algorithms, and provide methods for evaluating new commercial software platforms for the library in the future. In my past work, these types of findings have also been of interest to the software vendors, who have made changes to benefit all libraries that license their products.

3. Background and Significance of Project

Describe the background and significance of the project for non-specialists. Depending on the standards of your discipline, this section may take the form of a literature review, a comparison to similar projects, a description of how this fits within the broader dialogue or artistic tradition, or how it will improve your professional competence.

An algorithm is “a finite series of steps used to solve a problem” (Christian & Griffiths, 2017, p. 3). Historically an algorithm could be any formulaic process, from using a pattern to knit a scarf to cooking spaghetti carbonara from a recipe. Today, we understand algorithms to be part of the software that we interact with in our everyday lives on the web, in apps on various devices, and increasingly in places where software has never been much of a concern, from watches and televisions to light bulbs and speakers.

The promise of algorithms is that they relieve some of the burdens of our daily work, such as gathering options (“Other books you might enjoy”), making choices for us based on opaque criteria (“Fastest route to your destination”), and many more. Software algorithms have become so pervasive that the writer Adam Greenfield refers to their spread as “the colonization of everyday life by information technology” (2017, p. 286).

Little is understood about the effects of our reliance on algorithms. But systems designed to make decisions on our behalf or control our access to information by necessity will affect the quality and trajectory of our projects and lives. When the systems work as designed, this takes the form of including or prioritizing certain items from our results and excluding or demoting others without our intervention. Yet this is precisely what Google was fined $2.7 billion for in promoting its own services above competitors, and what commercial library discovery tools do as a matter of course, by prioritizing the inherently more reliable links to content aggregated by the tool’s parent company (Scott, 2017).

When these tools fail, they affect us in different ways, often surfacing political or social biases in unexpected ways. In 2015, Google Photos began categorizing users’ photos based on the content of the digital images. The algorithm routinely labeled Black people as “gorillas,” which Google treated solely as an engineering failure rather than a reflection of systemic bias surfacing in a human-made product (Curtis, 2015). The resulting controversy continues to shape public discussion about algorithmic tools. But what does that say to us, when even the tools we normally think of as objective “truth tellers” begin to exhibit the same biases we struggle with in human-to- human interaction? Greenfield (2017), hearkening back to Churchill’s famous quip about architecture’s power, says that “now we make [computer] networks, and they shape us every bit as much as any building ever did, or could” (p. 28).

Faith and Trust

Perhaps no algorithms affect our daily lives as much as search algorithms. Google has dominated commercial search for the past 15 years, in part by presenting search as a simple process, which its early competitors failed to do. While still a VP at Google, Marissa Mayer told the search giant’s users to leave the hard parts of search to the engineers: Google’s users need only “to understand that they can just go to a box, type what they want, and get answers” (qtd. in Vaidhyanathan, 2011, p. 54). And users, according to Tim Sherratt (2016), have “faith that search will just work.” They take Google and other search tools at what Sherry Turkle (1997) calls “interface value.” If presented with a simple interface, then the process must be simple. The importance for a user, argues Greenfield (2017), “isn’t so much what a system can actually do, but what we believe it can do” (p. 254). Yet even if we wanted to know more, the logic behind these algorithms is not shared by the companies that build them.

Legal scholar Frank Pasquale (2015) reminds us that commercial algorithms are “black boxes.” We can see some of the inputs and outputs, but not the inner workings of the machine. Commercial algorithms are the primary intellectual property asset of the search provider, and if the details of the algorithm became public, competitors could copy the tool and the search provider would lose a competitive advantage, and thus its revenue stream. But Greenfield (2017) notes that algorithms are also subject to Goodhart’s Law, which states that “when a measure becomes a target, it ceases to be a measure” (p. 247). If we knew the formula Google uses to rank results, every web page would exploit all possible measures, making ranked results on these criteria meaningless.

Critical Algorithmic Studies

Because of this inherent trust in computing systems, studying the workings and effects of search algorithms is becoming more important as these tools continue to “colonize” our lives. Moshowitz and Kawaguchi (2002) argued 15 years ago for the importance of studying search algorithms for their role as gateways to information with “an absence of mechanism to insure fairness” (p. 143). Many of the studies that have come in the intervening years have focused on commercial search engines like Google, Bing, and Yahoo! because of their market share and because the general-purpose nature of their indexes ensures that they will become an integrated part of everyday life. Eslami et al (2015) note that “researchers have paid particular attention to algorithms when the outputs are unexpected or when the risk exists that the algorithm might promote anti-social, political, economic, geographic, racial, or other discrimination” (p. 154).

Researchers have also focused on the social aspects of our reliance on these algorithms (cf. Vaidhyanathan, 2011; Dormehl, 2014; Pasquale, 2015) as well as the impacts of their use. In commercial search, former head of the Harvard Data Privacy Lab Latanya Sweeney (2013) showed that searches for “Black-identifying names” like DeShawn and Trevon on Google and other search services were 25% more likely to be shown an advertisement suggesting the person had been arrested than names associated with whites, like Jill or Emma. Safiya Noble’s work (2012; 2016) on Google results highlights how users understand search results to reflect the truth, following along with Google’s goals as articulated by Mayer. Noble shows how searches for “black girls” return mostly pornography websites, while searches for the word “beautiful” return images of white women, despite not including “women” in the search. What are African-American girls to think about their identity when these “truths” are presented to them? And who should be held accountable for these kinds of biases? Google, for its part, lays the blame on its algorithms, implying that the cold logic of the computer has done the selecting free from human influence (Dormehl, 2014, p. 226)^[The footer of Google News, for instance, carries the disclaimer: “The selection and placement of stories on this page were determined automatically by a computer program.”]. But as the legal scholar Danielle Citron told the author Luke Dormehl (2014), “humans craft these algorithms and can embed in them all sorts of biases and perspectives” (p. 150). Despite their reputation as being, in Tarleton Gillespie’s (2014) words, “stabilizers of trust,” algorithms are created by people who often are not aware of their own biases and prejudices (p. 179).

What’s more, as Ian Bogost (2015) reminds us, algorithms are already abstractions that “capture some of [a] system’s logic and discard others.” Whether these biases are encoded into algorithms intentionally, Greenfield (2017) claims is irrelevant, because “whatever values and priorities are inscribed in [an algorithmic system] will be incorporated by reference into everything it touches” (p. 275). The designers of algorithms may have intended for their tools to behave and be used in a certain way, but the British cyberneticist Stafford Beer (2002) argued that only results, and not intentions, matter: “the purpose of a system is what it does” (p. 217). If a search engine reinforces systemic racism, then it doesn’t matter if this was a design goal or not. Its purpose is made clear by the values and perspectives that it enforces upon us.

It is here that we need to approach our search tools, examining what they do, rather than on the pages of the marketing brochures or in design documents. Evaluating how a search tool performs is the best way to begin to understand what effects it has on its users. To date critical algorithm scholars have focused on commercial search tools and social media platforms, but there has been no study of the influence of algorithms in academic search tools. In 2016, I conducted a small analysis of one relevance algorithm in ProQuest’s Summon^[Due to a corporate merger, Summon is now under the Ex Libris umbrella, but at the time of my study it was still a ProQuest product. Before that, as seen in marketing materials from 2013, it was a Serials Solutions product.], the commercial library discovery system GVSU has licensed since 2009. Summon offers a “Topic Explorer,” like Google’s Knowledge Graph, which shows reference articles for broad searches to, in the words of ProQuest’s marketing team (2013), “provide users with valuable contextual information to improve research outcomes.” In my study (Reidsma, 2016a) I analyzed 8,000 Summon searches that returned a Topic Explorer result, and saw that the algorithm was (more or less) accurate an impressive 93% of the time. But of the incorrect results, 54 (.68% of the 8,000) exhibited bias against women, black people, Muslims, the LGBT community, and the mentally ill, mostly through what Mowshowitz and Kawaguchi (2002) call “indexical bias” (p. 143) where the juxtaposition of items was enough to suggest bias. “Stress” in the workplace was equated with working women, any search on mental illness implied that this was a “myth,” and a search for information about rape in the United States suggested learning more about “hearsay evidence” (Reidsma, 2016a). This research was just an initial, localized foray into examining the role commercial academic search tools have on how students and scholars access and understand information. Far from “neutral,” these systems must be examined to uncover the claims, beliefs, and prejudices they perpetrate that challenge the values and ideals of libraries.

4. Relevant Preparation

Describe your scholarly and/or creative preparation for this project. Indicate any additional training, if necessary, you plan to acquire either prior to or as a part of the sabbatical project. If a book is being written, append an outline or table of contents to demonstrate that groundwork has been laid.

I’ve been designing and building web-based tools and search algorithms for nearly 15 years, and have spent the past seven testing and analyzing commercial library discovery systems for the University Libraries. I have the technical understanding and coding ability to undertake this project. In addition, as an adjunct I have taught Ethics and Logic in the Philosophy Department at GVSU. This built off work I did during my Masters at Harvard Divinity School on the intersection of phenomenology, or the philosophy of experience, and ethics. I started my academic career by examining the ethical aspects of human-to-human communication and experience. Given the technological developments of the past two decades, looking closer at the ethics behind human-to- computer interaction seems a natural next step.

In 2015, I began to systematically study the relevance and “experience” algorithms in discovery systems (autosuggest, autocomplete, query expansion, synonym suggestion, and spelling correction, for example). The following year, I uncovered apparent bias in one such algorithm, which I wrote up as “Algorithmic Bias in Library Discovery Systems” (Reidsma, 2016a). In June of this year, I gave the opening keynote at the User Experience in Libraries (UX Libs) Conference in Glasgow, Scotland, on “Ethical UX,” which included additional research I have done into the algorithms behind library discovery systems (Reidsma, 2017).

In addition to my subject expertise, I am no stranger to scholarly publishing and managing large writing projects. Since 2013 I have written and published two books on web and experience design for libraries:

  • Reidsma, M. (2014). Responsive Web Design for Libraries: A LITA Guide. Chicago: ALA TechSource, an imprint of American Library Association.
  • Reidsma, M. (2016). Customizing vendor systems for better user experiences: the innovative librarian’s guide. Santa Barbara, CA: Libraries Unlimited.

I also co-founded and serve as editor-in-chief of Weave: Journal of Library User Experience, an open-access, peer-reviewed journal for library UX professionals published by Michigan Publishing (http://weaveux.org). Our seventh issue will be published in October.

Attached is a draft Table of Contents for the proposed book. I have begun contacting publishers about this project to gauge interest (e.g. MIT Press, University of Michigan Press, ACRL Publishing), and have detailed the timeline for creation and submission of the book proposal in my Project Plan (Section 5).

Since this project does not involve human subjects, it will not require IRB approval.

5. Project Plan

Describe the sabbatical project. Show how this plan relates to the goals and objectives outlined in Criteria 2.

In preparation for the sabbatical, I will work to achieve three goals: First, I will collect search results for an examination of various relevance and “experience” algorithms, using an existing data set of 8,000 search terms I examined for my Topic Explorer analysis (Reidsma, 2016b). These search terms were collected between November 2015 and February 2016, and represent broad one- or two-word search phrases that remain common searches. I will record the results of these searches in each of the major commercial discovery systems: Ex Libris’ Summon and Primo, EBSCO’s EDS, and OCLC’s WorldShare,^[If possible, I will use several different instances of each search tool to examine the results, which may vary depending on the library’s holdings. I will use publicly-accessible instances of each tool at Comprehensive and R1 institutions.] capturing the ‘most relevant’ results as chosen by the tool’s main algorithm, as well as any supporting content generated by “experience” algorithms , such as autocomplete suggestions, spelling suggestions, query expansions, topic explorer results, and recommended databases and resources.^[These last results will need to be analyzed broadly, since each individual client can adjust the databases and other resources that are recommended to users based on keyword searches.] I will then examine the results, looking for patterns and anomalies to inform my understanding of how libraries present search results to our users. Since I will be the one entering the search terms into these systems, no user data will be collected in this project.

The second pre-sabbatical goal is to work through research and reading of relevant literature. This is a topic I have been actively researching for the past few years, but there is still much I have not read, especially as interest in the impact of algorithms grows and more works are published on the subject. A few of the topics I propose to explore in this project, including the pressures of the market economy and the monetization of collected academic user data and their effects on algorithm design are outside my current areas of expertise. I will focus on developing a more thorough understanding of these areas as I prepare to write the book.

Finally, before my sabbatical begins I will draft a book proposal for academic presses, which I plan to begin sharing in Winter of 2018. While the proposal will be tailored for the individual presses based on their specific submission guidelines, this general proposal will help shape the topic, demonstrate the rigor, and give a feel for the tone of the book.

During the sabbatical (Fall 2018) my plan will be the same as the writing process I’ve used for years. I will alternate among research, reading, writing, and editing. I also have informal agreements in place to share my writing in progress with colleagues for feedback and critique, including Courtney Greene McDonald of Indiana University Libraries, Donna Lanclos of University of North Carolina Charlotte Libraries, and Hugh Rundle of Brimbank Libraries, Victoria, Australia.

After returning from sabbatical, I will continue to edit and revise chapters throughout Winter 2019 with the goal of having the manuscript ready to submit to a publisher by Spring/Summer term. During this time, I also hope to begin sharing my research more broadly with colleagues both within the University Libraries and at professional conferences. I find this to be a fruitful time to begin discussions around scholarship, where there is time to revise and rethink but after the arguments have had time to congeal around the research. While I have authored two books without the benefit of a sabbatical, this book would be impossible to write alongside the daily obligations of my position. My previous books were more “how-to” manuals, rather than research examining the social and ethical issues around academic search. This book will require periods of uninterrupted thought and writing, something I am not accustomed to as the primary technical support contact for over a dozen public-facing online tools in use by a significant portion of our faculty, students, and staff.

6. Timeline

Indicate estimated dates for each of the significant steps in the project plan. Be as specific as possible. Include an explanation showing whether the project can be completed in the time available. If the sabbatical leave is being used to begin a longer term project, state when you expect the whole activity to be completed.

Pre-Sabbatical

Fall 2018

  • Preliminary reading and research
  • Record results from relevance and “experience” algorithms
  • Draft general book proposal

Winter 2018

  • Continue reading and research
  • Continue recording results
  • Tailor and submit book proposal to potential publishers
  • Draft Introduction

Spring/Summer 2018

  • Examine search results
  • Draft chapter on relevance algorithms
  • Edit Introduction

Sabbatical

Sept. 2018

  • Draft chapter on experience algorithms
  • Edit chapter on relevance algorithms

Oct. 2018

  • Draft chapter on algorithmic bias
  • Edit chapter on experience algorithms

Nov. 2018

  • Draft chapter on content as data
  • Edit chapter on algorithmic bias

Dec. 2018

  • Draft chapter on the commercialization of library user data
  • Edit chapter on content as data

Post-Sabbatical

Winter 2019

  • Edit chapter on the commercialization of library user data
  • Revise all chapters again, with formal feedback from writing partners, peers, and editor
  • Present work to peers at University Libraries and conferences (e.g. Library Technology Conference, Code4Lib)

Spring/Summer 2019

  • Submit manuscript to publisher

7. Benefit to one's own or other units

A clear relationship between the proposed sabbatical leave and a proposer's academic unit shall be demonstrated. If your project is to benefit a unit other than your home unit, describe that situation. Attach signed, written verification of that benefit from the head of that other unit.

Over the past two decades, libraries have been transformed from repositories of physical collections to collections mediated by software. But we have not yet come to terms with how algorithms shape the way that information is made accessible and findable to library users. The simplistic model is that we’ve simply moved many of the manual, paper-based processes online, but this overlooks the huge gaps in our knowledge about how these software tools shape the very nature of scholarly discourse by algorithmically controlling what is found and what is not.

At GVSU, as at other universities, the library is the gateway to research and scholarly information for our students and faculty, both in a practical sense (where the books, journals, and links to databases are located) and the budgetary sense (our collection budget pays for these materials). My research will help the library better understand how these commercial tools affect how and what information is accessed by our scholarly community. This knowledge has implications to help facilitate decision-making in collection development, in the software and technology we both purchase and develop, and will suggest new avenues of research and collaboration with colleagues in library instruction.

References

Beer, S. (2002). “What is Cybernetics?” Kybernetes, 31(2), pp. 209–19. Bogost, I. (2015). The cathedral of computation. The Atlantic. Retrieved from http://www.theatlantic.com/technology/archive/2015/01/the-cathedral-of-computation/384300/

Christian, B., & Griffiths, T. (2017). Algorithms to live by: the computer science of human decisions. New York: Henry Holt.

Curtis, S. (2015, July 1). Google Photos labels black people as ‘gorillas.’ The Telegraph. Retrieved from http://www.telegraph.co.uk/technology/google/11710136/Google-Photos-assigns-gorilla-tag-to-photos-of-black-people.html

Dormehl, L. (2014). The formula: how algorithms solve all our problems–and create more. New York: Penguin.

Eslami, M., Rickman, A., Vaccaro, K., Aleyasen, A., Vuong, A., Karahalios, K., Hamilton, K., & Sandvig, C. (2015). “I always assumed that I wasn’t really that close to [her]”: Reasoning about Invisible Algorithms in News Feeds. 33rd Annual ACM Conference on Human Factors in Computing Systems. pp. 153–162. Retrieved from [http://www- personal.umich.edu/~csandvig/research/Eslami_Algorithms_CHI15.pdf](http://www- personal.umich.edu/~csandvig/research/Eslami_Algorithms_CHI15.pdf)

Gillespie, T. (2014). The Relevance of Algorithms. In T. Gillespie, P. Boczkowski & K. Foot (Eds.), Media Technologies: Essays on Communication, Materiality, and Society. (pp. 167– 194). Cambridge, MA: MIT Press.

Greenfield, A. (2017). Radical technologies: the design of everyday life. Verso: London.

Mowshowitz, A., & Kawaguchi, A. (2002). Assessing bias in search engines. Information Processing and Management, 38(1). pp. 141–156.

Noble, S. (2012). Missed connections: What search engines say about women. Bitch, 1(54). pp. 36–41. Retrieved from https://safiyaunoble.files.wordpress.com/2012/03/54_search_engines.pdf

– (2016, March). The politics of online information: algorithmic ethics and big data(bases) in libraries. Paper presented at the Library Technology Conference 2016. Retrieved from http://ustream.tv/recorded/84532162

Pasquale, F. (2015). The black box society: the secret algorithms behind money and information. Cambridge, MA: Harvard University Press.

ProQuest. (2013). Serials Solutions Advances Library Discovery with Summon 2.0. Retrieved from http://www.proquest.com/about/news/2013/Serials-Solutions-Advances-Library-Discovery-with-Summon–2–0.html

Reidsma, M. (2016a). Algorithmic bias in library discovery systems. Retrieved from https://matthew.reidsrow.com/articles/173

— (2016b). Summon Topic Explorer Results by Search Query [Data set]. Available from Zenodo: http://doi.org/10.5281/zenodo.47723

— (2017, June). Ethical UX. Paper presented at the User Experience in Libraries III Conference. Retrieved from https://matthew.reidsrow.com/talks/198

Scott, M. (2017, June 27). Google fined record $2.7 billion in E.U. antitrust ruling. New York Times. Retrieved from https://www.nytimes.com/2017/06/27/technology/eu-google-fine.html

Sherratt, T. (2016). Seams and edges: Dreams of aggregation, access & discovery in a broken world. Retrieved from http://discontents.com.au/seams-and-edges-dreams-of-aggregation-access-discovery-in-a-broken-world/

Sweeney, L. (2013). Discrimination in online ad delivery. Communications of the ACM, 56(5). pp. 44–54. Retrieved from https://cacm.acm.org/magazines/2013/5/163753-discrimination-in-online-ad-delivery/fulltext

Turkle, S. (1997). Life on the screen: identity in the age of the internet. New York: Simon & Schuster.

Vaidhyanathan, S. (2011). The Googlization of everything (and why we should worry). Berkely: University of California Press.