Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
592 lines (288 sloc) 30 KB

Syllabus: Critical Perspectives in Cultural Data Analysis

University of Texas at Austin School of Information

Fall 2017, Mondays 3–6 p.m.

Instructor: Tanya Clement

TA: Steve McLaughlin

Office hours: Mondays 1–3 p.m., UTA 5.558

Course Schedule

Week 1 | Week 2 | Week 3 | Week 4 | Week 5 | Week 6 | Week 7 | Week 8 | Week 9 | Week 10 | Week 11 | Week 12 | Week 13 | Week 14

Course Objectives

Prerequsites: advanced-level undergraduate or graduate coursework in the humanities; no or very little programming experience preferred;

In the data, information, knowledge, wisdom (DIKW) hierarchy that circulates through Knowledge Management (KM) and Information Science (IS) discussions, data appears at the base of a pyramid of which wisdom is the pinnacle. In this schematic, data is “raw” and lacking in meaning, while information, the next higher level of the pyramid—just below knowledge and then wisdom—represents the presence of added links and relationships; information is higher up on the wisdom chain because it is data made meaningful. In the humanities, students are taught that data is not found in the “raw” but has rather been cooked all along, taken and constructed and seasoned according to our situated contexts including access issues (Where is the data?); media, format, and technology constraints (How is the data?); and perspectives (What is the data? Who is involved in and impacted by its creation and use?).

Learning to think critically about data as information means rejecting common illusions about data more generally, including its objectivity, impersonality, atemporality, and authorlessness. To teach students to think about information from this more critical perspective means first understanding how a culture tends to understand what is informative.

Towards these ends, this course takes on “data wrangling” in the context of humanist perspectives.

Learning goals:

  • Exploration of cultural implications of large-scale preservation of cultural materials.

  • Writing using perspectives in critical data studies;

  • Familiarity with scripting-style programming in Python and Unix-like systems, emphasizing literacy in finding and using free and open source software; techniques for collecting, transforming, and analyzing media and metadata available on the Web; with commonly used data models and their standard formats, including CSV, JSON, and XML; with text analysis techniques such as natural language processing (NLP), sentiment analysis, and machine learning classification; and with tools for analyzing cultural data via visualization and statistical tests, emphasizing critical reflection on the limitations of these approaches.

Course Principles

  • Writing critically about data requires both a level of knowldege about data and data wrangling as it requires a level of knowledge about thinking and writing from critical perspectives learned in cultural studies. While this course does not teach cultural studies, an understanding of and experience in humanities theory and research and the principles of cultural studies are essential.

  • Imitating and modifying others’ code is essential in learning to program. You can many examples and explanations on Stack Exchange and similar online forums. Taking one or two lines without attribution is OK; if you use a longer chunk of code found online, add a #comment with the source’s URL.

  • Begin assignments early. If you realize what you had in mind is more difficult than expected, talk to the instructor about choosing an alternative.

  • We’ll be focusing on a scripting approach to programming. This course is not oriented toward developing large, complex programs or writing perfectly optimized code.

  • Learning to code takes trial and error. Work through weekly programming tutorials before class and continue polishing in-class coding assignments at home.

Assignments

Final Project: Critical Data Analysis (50%)

For your final project, you will use a dataset drawn from online sources and analyze those data in a critical essay. You may either present an argument about the data (e.g., describing bias in the way the data were chosen and arranged) or you may use your dataset as the basis for an argument about culture (e.g., tracing a stylistic shift in a literary community). You should conceive and execute your project with a specific audience in mind, such as literary scholars, newspaper readers, or policy advocates.

Your dataset should comprise at least 200 texts or other media files, or at least 2000 metadata records. The size of your collection should be appropriate to your technical skills and the complexity of each record. Rather than using an entire pre-existing dataset, you may choose to extend or limit the dataset in some way. This might mean curating material from multiple sources, mashing up two or more datasets, augmenting records using machine learning or natural language processing, or using a creative technique to organize messy data.

Your final project will include the following elements:

  • Proposal (7%)

  • Proposal Peer Review (3%)

  • In-class presentation (week 14) (10%)

  • 12 page critical essay, with an appendix of 3–4 data visualizations (30%)

Weekly Assignments (WA) (50%)

Except when indicated, there will be required readings each week. The required readings will either be available online and linked below or posted on Canvas, so there are no books to buy or papers to acquire for the class.

Assignments should be posted on Canvas by midnight the day before class.


Week 1 (9/11): Introductions & Command Line Basics

Readings

Canvas

  • Nick Montfort (2016) “Why Program?” In Exploratory Programming for the Arts and Humanities, 267–77. Cambridge, MA: The MIT Press.

  • danah boyd & Kate Crawford (2012) "Critical Questions for Big Data," Information, Communication & Society, 15:5, 662-679.

To start for next week:

▸ In-class outline

Week 2 (9/18): The Operating System in Context

Readings

Readings in Canvas

Optional

Read pages 1–28 of Shieber’s Python tutorial and work through the code examples.

Work through Chris Albon’s tutorial on Python string operations.

  • Albon, Chris. “String Operations.” http://chrisalbon.com/python/string_operations.html

  • Neff, Gina, Tanweer, Anissa, Fiore-Gartland, Brittany, Osburn, Laura Critique and Contribute: A Practice-Based Framework for Improving Critical Data Studies and Data Science. Big Data 5, no. 2, 2017.

Assignment

WA #1

▸ In-class outline

Week 3 (9/25): Collections as Data: Meaning making

Readings

Canvas

Optional

Assignment

WA #2

▸ In-class outline

Week 4 (10/2): Collections as Data: Data Models

Readings

Canvas

Optional Readings

Assignment

WA #3

▸ In-class outline

Week 5 (10/9): An Algorithmic Criticism: Word-Level Text Analysis

<! -- Note: assign Text II this week -- have them turn in their Jupyter notebook. -->

Readings

Canvas

  • Burrows, John. “Textual Analysis.” In Companion to Digital Humanities, edited by Susan Schreibman, Ray Siemens, and John Unsworth. Link.

  • Ramsay, Stephen. “Chapter 1: An Algorithmic Criticism.” In Reading Machines: Toward an Algorithmic Criticism, 1–17. Topics in the Digital Humanities. Urbana: University of Illinois Press, 2011.

  • Montfort, Nick. “Text III.” In Exploratory Programming for the Arts and Humanities, 185–213. Cambridge, MA: The MIT Press, 2016.

  • Fellenbaum, Christiane. “Wordnet(s).” In The Encyclopedia of Language & Linguistics, edited by E. K. Brown and Anne Anderson, 2nd ed., 14:665–79. Amsterdam ; Boston: Elsevier, 2005.

  • “Alphabetical list of part-of-speech tags used in the Penn Treebank Project.” https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html

Assignment

WA #4

▸ In-class outline

Week 6 (10/16): The Rise of Free Culture: Web Scraping & APIs

Readings

Canvas

  • Pomerantz, Jeffrey. “The Future of Metadata.” In Metadata. The MIT Press Essential Knowledge Series. Cambridge, MA ; London, England: The MIT Press, 2015.

  • Peters, Justin. The Idealist: Aaron Swartz and the Rise of Free Culture on the Internet, Chapters 7 and 8. New York: Scribner, 2016.

  • Swartz, Aaron. “Building a Platform: Providing APIs.” In Aaron Swartz’s ‘A Programmable Web’: An Unfinished Work, 31–39. San Rafael, CA: Morgan & Claypool Publishers, 2013.

  • Kelly, Chelsea Emelie. “Beyond Digital: Open Collections & Cultural Institutions,” 2014. https://artmuseumteaching.com/2014/11/06/beyond-digital-open-collections-cultural-institutions

Optional Readings

Assignment

WA #5

▸ In-class outline

Week 7 (10/23) The Politics of Open Data

Readings

Canvas

  • Christen, Kim. “Does Information Really Want to be Free? Indigenous Knowledge Systems and the Question of Openness.” International Journal of Communication 6 (2012), 2870–2893.

  • Greenwald, Glenn. “Chapter 1: Contact.” In No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State, 2015.

  • Hitchcock, Tim. “Digital Searching and the Re-formulation of Historical Knowledge” 2008. In The Virtual Representation of the Past, edited by Mark Greenglass and Lorna Hughes, 81-90. Ashgate: 2008.

  • Freelon, Deen Goodwin, Charlton D. McIlwain, and Meredith D. Clark. “Beyond the Hashtags: #Ferguson, #Blacklivesmatter, and the Online Struggle for Offline Justice,” 2016. http://cmsimpact.org/wp-content/uploads/2016/03/beyond_the_hashtags_2016.pdf

  • American Civil Liberties Union. "First Amendment Lawsuit Brought on Behalf of Academic Researchers and Journalists Who Fear Prosecution Under the Computer Fraud and Abuse Act." https://www.aclu.org/news/aclu-challenges-law-preventing-studies-big-data-discrimination

Optional Readings

  • Day, Ronald E. “Governing Expression: Social Big Data and Neoliberalism.” In Indexing It All: The Subject in the Age of Documentation, Information, and Data, 123–44. History and Foundations of Information Science. Cambridge, Massachusetts: The MIT Press, 2014.

Assignment

WA #6

▸ In-class outline

Week 8 (10/30): Statistics and Visualization

Readings

Canvas

  • Montfort, Nick. “Statistics and Visualization.” In Exploratory Programming for the Arts and Humanities, 215–40. Cambridge, MA: The MIT Press, 2016.

  • Krumme, Coco. “What Data Doesn’t Do.” In Beautiful Data: The Stories behind Elegant Data Solutions, edited by Toby Segaran and Jeff Hammerbacher, 1st ed. Beijing ; Sebastopol, CA: O’Reilly, 2009.

  • McCandles, David. Information is Beautiful. http://www.informationisbeautiful.net

Optional Readings

Assignment

▸ In-class outline

Week 9 (11/6): Your Data, Your culture

No Readings

Assignment

Due: Proposal

▸ In-class outline

Week 10 (11/13): Machine Learning

Readings

Canvas

  • Berendt, Bettina, Preibusch, Soren. Toward Accountable Discrimination-Aware Data Mining:The Importance of Keeping the Human in the Loop—and Under the Looking Glass.Big DataVolume 5, Number 2, 2017.

  • Brew, Chris. “Language Processing: Statistical Methods.” In Encyclopedia of Language & Linguistics, edited by Keith Brown, 2nd ed., 12:597–604. Elsevier, 2006.

  • Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner, ProPublica. “Machine Bias.” ProPublica. May 23, 2016. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

  • Revisit: Montfort, Nick. “Text III.” In Exploratory Programming for the Arts and Humanities, 185–213. Cambridge, MA: The MIT Press, 2016.

  • Geitgey, Adam. “Machine Learning is Fun!” Medium. https://medium.com/@ageitgey/machine-learning-is-fun-80ea3ec3c471

Optional Readings

Assignment

WA #7

▸ In-class outline

Week 11 (11/20): Critical Text Analysis

Readings

Canvas

  • Hall, Gary. “Toward a Postdigital Humanities: Cultural Analytics and the Computational Turn to Data-Driven Scholarship.” American Literature 85, no. 4 (January 1, 2013): 781–809.

  • Hammond, Adam. "The double bind of validation: distant reading and the digital humanities' 'trough of disillusionment." Literature Compass 14, no. 8 (August 1, 2017): no. pg.

  • Jockers, Matthew Lee. “Chapter 8: Theme.” In Macroanalysis: Digital Methods and Literary History, 118–53. Topics in the Digital Humanities. Urbana: University of Illinois Press, 2013.

  • Marche, Stephen. “Literature Is not Data: Against Digital Humanities.” Los Angeles Review of Books, October 28th, 2012. https://lareviewofbooks.org/essay/literature-is-not-data-against-digital-humanities

Optional Reading

  • Ramsay, Stephen. “Chapter 3: Potential Readings.” In Reading Machines: Toward an Algorithmic Criticism, 33–57. Topics in the Digital Humanities. Urbana: University of Illinois Press, 2011.

Assignment

WA #8

▸ In-class outline

Week 12 (11/27): Peer Production & Crowdsourcing

Readings

Canvas

  • Benkler, Yochai, and Helen Nissenbaum. “Commons-Based Peer Production and Virtue.” Journal of Political Philosophy 14, no. 4 (2006): 394–419. https://www.nyu.edu/projects/nissenbaum/papers/jopp_235.pdf.

  • Bodó, Balázs. “Set the Fox to Watch the Geese: Voluntary IP Regimes in Piratical File-sharing Communities.” In Piracy: Leakages from Modernity, edited by James Arvanitakis and Martin Fredriksson, 241–63. Sacramento, CA: Litwin Books, 2014.

  • Kreiss, D., M. Finn, and F. Turner. “The Limits of Peer Production: Some Reminders from Max Weber for the Network Society.” New Media & Society 13, no. 2 (March 1, 2011): 243–59.

  • Manzo, Christina, Geoff Kaufman, Sukdith Punjasthitkul, and Mary Flanagan. “‘By the People, For the People’: Assessing the Value of Crowdsourced, User-Generated Metadata.” Digital Humanities Quarterly 9, no. 1 (2015). http://www.digitalhumanities.org/dhq/vol/9/1/000204/000204.html

Optional Readings

  • Benkler, Yochai. “Peer Production and Sharing.” In The Wealth of Networks: How Social Production Transforms Markets and Freedom, 59–90. New Haven [Conn.]: Yale University Press, 2006.

Assignment

WA #9

▸ In-class outline

Week 13 (12/4): Copyright and the Information Commons

Readings

Canvas

Optional Readings

Assignment

WA #10

▸ In-class outline

Week 14 (12/11): Final Presentations

Final Presentation due

12/18: Final Project due


Additional resources:

-- Installation Tutorials Jeroen Janssens Seven Command Line Tools for Data Science (2013) workbench. Juola, P. and Ramsay, S. Six Septembers: Mathematics for the Humanist. Zea E-Books. Seaver, Nick "Algorithms as culture: Some tactics for the ethnography of algorithmic systems" Big Data and Society. 9 Nov. 2017

You can’t perform that action at this time.