Notes for Jane Zhang's Digital Curation class at Catholic University. March 5, 2014.
- How many people use social media, have a website, run a webserver?
- Why do we love the Web? Why do we hate the Web?
- The Web needs to be cared for, and it needs archivists.
- What is the archive, to archive, an archive?
- Archivvy: select, appraise, arrange, describe, preserve, make available
- My background
- NDF Talk: Web as a Preservation Medium
- The Internet Archive and Library of Congress have got this covered right?
- Supreme Court Opionions Clicks That Lead Nowhere
- UK Conservative Party deletes links
- Not a solved problem.
- IA: 366 billion
- IIPC: 75 billion
- Google: 1T URLs
- generous guesstimate: 44%
Even if archivists in a particular country were to preserve every record generated throughout the land, they would still have only a sliver of a window into that country’s experience. But of course in practice, this record universum is substantially reduced through deliberate and inadvertent destruction by records creators and managers, leaving a sliver of a sliver from which archivists select what they will preserve. And they do not preserve much.
The archival record is best understood as a sliver of a sliver of a sliver of a window into process. It is a fragile thing, an enchanted thing, defined not by its connection to “reality”, but by its open-ended layerings of construction and reconstruction.
-- Verne Harris - The Archival Sliver
- team of 6 + InternetArchive
- selection
- notification
- seed lists, scoping
- quality control
- embargo period
- access!
- Internet Archive
- IIPC
- perma.cc
- ArchiveIt
- Hanzo
- ArchiveTeam
- ArchiveSocial
- CommonCrawl
- Social Feed Manager
- 1/10 Americans think HTML is an STD
- 25 years old: HTTP, HTML, URL
- robots.txt
- OpenWayback
- Heretrix
- wget
- pywb
- WARC: ISO 28500:2009 ; wget -H -Dwww.cua.edu -r -l 2 --warc-file=cua --convert-links="on" http://lis.cua.edu/courses/index.cfm
- Memento RFC 7089
- Demo Facebook and Twitter "archive" packages.
- Packaging on the Web
- ResourceSync
- scoping (backlinks)
- streaming video / audio
- dynamic content / ghost
- funding (sustainable)
- copyright
- storage space
- format migration
- digital preservation significant characteristics?
- collection development: seedlists, inventory
- single point of failure (IA)
- Big data is great, but start with small data:
- your organizations web presence
- local blogs
- local government
- local arts scene / businesses
- Website owners:
- Permalinks/Cool URIs
- robots.txt
- sitemaps
- Personal Digital Archiving
- outreach with your community
- best practices / guidance
- Keep an open mind.
- Have a whole class about web archiving!
- ArchiveIt webinars for libraries, archives, classes.
- More Podcast Less Process - Episode #7 on Web Archiving - Alex Thurman (Columbia) and Lily Pregill (New York Art Resources Consortium)
- Web Curators Mailing List