@edgi-govdata-archiving

Tools for Government Data Archiving

This project seeks to develop tools for preserving environmental data that is endangered by the incoming US anti-science administration.

Pinned repositories

  1. overview

    The key place for both new members and the tracking of progress by veteran members.

    64 5

  2. archivers.space

    Public repo for issues & feature-requests relating to archivers.space

    1

  3. harvesting-tools

    A collection of code snippets designed to be dropped into the data harvesting process directly after generating the zip starter kit

    Python 11 19

  4. eot-nomination-tool

    ๐Ÿ“š Chrome extension to nominate government data that needs to be preserved

    HTML 18 11

  5. web-monitoring

    Project to enable analysts to quickly assess changes to monitored government websites

    HTML 5 6

  6. DataRescueTEMPLATE

    Start here for the [MMMM DD, YYYY] event

    7 6

  • A Ruby script that scrapes Versionista's web interface to generate a csv summarizing which websites and pages have had recent changes.

    Ruby 1 Updated Feb 25, 2017
  • A collection of code snippets designed to be dropped into the data harvesting process directly after generating the zip starter kit

    Python 11 18 Updated Feb 24, 2017
  • Technical guides for how to preserve and hold data

    1 1 Updated Feb 24, 2017
  • A more automated versions of page monitoring with Versionista (just a proof of concept for now)

    Ruby Updated Feb 24, 2017
  • Public repo for issues & feature-requests relating to archivers.space

    1 Updated Feb 22, 2017
  • The key place for both new members and the tracking of progress by veteran members.

    64 5 Updated Feb 22, 2017
  • The UI for viewing the sites archive and diffs

    JavaScript 1 2 Updated Feb 21, 2017
  • Start here for the [MMMM DD, YYYY] event

    7 5 Updated Feb 20, 2017
  • ๐Ÿ“š Chrome extension to nominate government data that needs to be preserved

    HTML 18 11 Updated Feb 18, 2017
  • 1 Updated Feb 17, 2017
  • Project to enable analysts to quickly assess changes to monitored government websites

    HTML 5 6 Updated Feb 15, 2017
  • 1 Updated Feb 14, 2017
  • Automate zip folder creation for Harvesting during pipeline

    Go 1 Updated Feb 13, 2017
  • ๐Ÿ” Diffing service for the website monitoring project

    HTML 2 Updated Feb 11, 2017
  • Upload Datasets to S3 from the browser

    Go 1 Updated Feb 10, 2017
  • DataRefuge workflow for DataRescue events

    9 27 Updated Feb 7, 2017
  • for versioning stuff

    JavaScript 2 1 Updated Feb 7, 2017
  • Docker app to crawl URLs and generate WARCs

    Python 6 4 Updated Feb 6, 2017
  • ARCHIVED--Scraper to archives all GIS data ZIP files on EPA's Geoportal, http://gis.epa.ie/GetData/Download

    JavaScript 4 1 Updated Feb 2, 2017
  • ARCHIVED--Organizing documents for tech group at our Guerilla Archiving Dec. 17, 2016 archiv-a-thon

    6 15 Updated Jan 24, 2017
  • ARCHIVED--Jupyter Notebooks investigating EPA quantitative dataset downloading

    Jupyter Notebook 2 6 Updated Jan 24, 2017
  • Tools and services to create xml, csv and json sitemaps of websites

    Python 3 3 Updated Jan 13, 2017
  • ARCHIVED--Tool to generate sitemap of EPA, see https://github.com/edgi-govdata-archiving/sitemapper for current development

    Python 3 Updated Jan 12, 2017
  • ARCHIVED--Identify, scrape, and archive as WARC the Environmental Impact Statements from the EPA, see also https://github.com/edgi-govdata-archiving/eis-WARC-archiver

    Python 2 1 Updated Jan 9, 2017
  • Save the data

    PHP 1 6 Updated Dec 23, 2016
  • Mini website crawler to make sitemap from a website.

    Python 1 23 Updated Dec 22, 2016
  • ARCHIVED--EPA site search scraper

    Go 1 3 Updated Dec 21, 2016
  • ARCHIVED--Scraper for the EPA Enforcement & Compliance History archives at http://echo.epa.gov for the End of Term Web Archive

    Ruby 1 1 Updated Dec 19, 2016
  • WARC writing MITM HTTP/S proxy

    Python 2 19 Updated Nov 21, 2016
  • The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

    Python 2 20 Updated Sep 13, 2016