Skip to content
This repository

The current TODO list for PANDA is below. Much on the 1.0 list will not happen -- we've got a deadline coming fast!

(If there are features that you're dying to see, but look like they're not going to happen on our watch, you're welcome to contribute and we'll try to integrate your code!)

Allllllmost 1.0 (feature freeze)

We've put these features in rough order of priority, done rough estimates, and then totaled the estimate. There are 6 iterations after beta 2 until we hit feature freeze, so the trailing half of this list will likely not be completed before launch.

Feature Estimate (in iterations) Countdown Done
"Welcome to PANDA" upon admin setup (time zone, etc) 1 1 Yep!
First login user welcome screen 0.5 1.5 Yep!
Delete data uploads and associated data 0.5 2 Yep!
User page: my notifications 0.5 2.5 Yep!
Sysadmin notifications (you're running low on disk space, yo!) 0.1 2.6 Yep!
Related files metadata/description 0.2 2.8 Yep!
Search text within just one column 0.2 3 Yep!
Links to exports on user pages 0.1 3.1 Yep!
Search for data across all datasets within a category 1 4.1 Yep!
Fuzzy matching for names 1 5.1 Yep!
Bulk create users 0.5 5.6 Yep!
Dataset metadata -- related stories 0.2 5.8 Yep!
API logging messages 0.5 6.3
Make the login cookie last longer (#639) 0 6.3 Yep!
Translation / i18n (CHANGED!) 2 8.3
Standard metadata (customizable? public source? updated periodically? safe of pub? verified?) (or tags?) 1 9.3
Google Refine reconciliation endpoint 4 13.3
Saved searches (personalization) 0 13.3
Edit(able) column headers 0.5 13.8
Search datasets - use a stemmer (#562) 0.2 14
Admin-editable PANDA home page: news/links/searches 0.1 14.1
User favoriting for datasets 0.25 14.35
preloaded data in PANDA 0.2 14.55
number import i18n 1 15.55
Sync w/ Google Docs 1 16.55
Universal upgrade script 0.5 17.05
backup_volumes.py progress meter 0.1 17.15
Dataset metadata: timeframe ,provenance, etc 0 17.15
Non-arbitrary ordering for datasets across search results 0 17.15
Sort by type-indexed columns (#537) 0 17.15
Column filter type as "factor" or "enumerated" (#541) 0 17.15
Web-configurable Time Zone (#619) 0 17.15 Yep!
Grouped/hierarchal categories 0 17.15

1.0

  • DONE 1.0 -- The PANDA cookbook

Storming to beta 2

  • DONE 0.2.0 -- Enforce good metadata (better title/description/etc) in the upload process
  • DONE 0.2.0 -- Verify available disk space before upload/import
  • DONE 0.2.0 -- Ability to abort a long-running task from the admin UI
  • DONE Squash critical bugs
  • DONE Tidy up the dashboard a bit
  • DONE 0.2.0 -- Notifications (subscribed searches via email, etc)

The features archive (look above for the currently-maintained list)

user requests

  • user-created tags for datasets (Tom M)
  • search only a certain tag (Tom M)
  • view related files so you dont have to download them (Tom M)
  • add or edit fields without downloading the data (Tom M)
  • edit the headers after upload (Tom M)
  • metadata/descriptions for related files (Tom M and Andy B)
  • metadata for a dataset: the timeframe this set applies to (Andy B) (can't this just be in the description/title)
  • DONE B1.4 -- list of data sets uploaded by individual users (Tom M and Andy B)

Beta-phase insights

  • UP -- revisit priority of no-friction-upload vs. enforcing good metadata (possibly move title/description editing to before upload begins, or require to move from preview to indexing-in-progress state)
  • UP -- badge/metadata to indicate verified/CQed/gardener-approved datasets
  • standard metadata or checkboxes: "was this from a public source or was this created by our people?"; "is this data updated periodically?", "is this safe for publication?", etc
  • admin-editable PANDA homepage for news, links to other resources, maybe even search boxes (e.g. http://boundaries.tribapps.com)
  • PANDA scheduled email reminders to write a follow-up FOIA or otherwise obtain updates for existing datasets
  • upload-page guidance on cleaning up datasets in cases when spreadsheets have extraneous formatting/tips to re-order XLS worksheets if first tab is instructions, not data/etc. (Tom M also asked for this)
  • DONE B1.4 -- click on an uploader's name to go to a list of all their datasets (wrinkle: add'l data file added by someone other than original dataset creator)

Still up for grabs, priority unknown

  • DOWN -- Permissions/set-level security (like Doc Cloud or LDAP?? got another suggestion... project teams? less like a hierarchy, more like a circle or ad-hoc group) (Tom M requested this too, for read-only users)
  • DOWN -- Sharing between organizations (not sharing the whole PANDA, just parts)
  • DOWN -- Edit data in PANDA, delete rows, add new columns, etc., read-only lock on a set?
  • Address normalization (solvable with fuzzy search instead?)
  • DOWN -- S, M, L sizing, or something like it
  • DOWN -- Faceted search
  • DOWN -- Fancy query builder like Doc Cloud
  • UP -- Search data within categories (#473)
  • Search datasets within a set or intersection of categories (#472)
  • DOWN -- Export PANDA data to a SQL database (#468)
  • DOWN -- RSS activity feeds for integration with CMSes and other systems (#469)
  • Duplicate detection during data import (#467)

Must-have

  • DOWN -- Import w/ arbitrary delimiters (not just commas)
  • DOWN -- Import from fixed-width files
  • DOWN -- Comments on a dataset (#116) (Tom M requested this too)
  • DOWN -- Meta type columns
    • Address (and address like-stuffs)
  • DONE A1 -- Store the original file
  • DONE A1 -- Data set metadata (source, provenance)
  • DONE A1 -- Import from CSV
  • DONE A1 -- Async data import (queuing)
  • DONE A1 -- Full-text search on a dataset
  • DONE A2 -- Taxonomy for datasets (categories, tags?)
  • DONE A2 -- Search dataset metadata (help me find a dataset)
  • DONE A2 -- Login/users
  • DONE A3 -- Cumulative data sets via write API
  • DONE A3 -- Cumulative data sets via write API demo
  • DONE A3 -- Cumulative data sets via scraperwiki (??)
  • DONE A3 -- Import from Excel (maybe by explaining people to use CSV, maybe parsing)
  • DONE A4 -- Cumulative data sets via additional file uploads (maybe this is solved with versioning?)
  • DONE A4 -- Encrypted communications (SSL)
  • DONE A4 -- Export a dataset (to csv, xls? etc)
  • DONE A4 -- Browser compatibility w/ recent versions of modern browsers: FF/Chrome/Safari/IE Beta 9
  • DONE A4 -- Documents related to the dataset
  • DONE B1 -- A plan for scaling (how to grow your PANDA)
  • DONE B1 -- Import wizard/walk through UI
  • DONE B1 -- Async data export
  • DONE B1 -- Amazon Machine Image
  • DONE B1.1 -- Primitive column types (int, varchar, date, etc.)
  • DONE B1.3 -- In-system metrics. A dashboard for the admins of the PANDA instance, so that they can measure how well it's working inside their organization. (sneaky new feature inserted by Brian as the result of an interesting conversation with some of the folks that Knight asks that I speak with)
  • DONE B1.4 -- Profile stuff (create users, change my password, etc) (#150)

Want

  • Related stories on a dataset (searchable?) (Tom M requested this too)
  • I18n/L10n
  • Initial demo data
  • Iterative updates to a dataset (quarterly updates, etc. keep the old list)
  • Version tracking for datasets
  • Export a subset of a dataset (fewer columns from a wide set, filtered rows, etc)
  • UP -- Google Refine reconciliation endpoint
  • PANDA-hosted Google Refine
  • Import localized number formats (1.000, 1 000, 1,000)
  • IE7 support
  • UP -- Fuzzy name search (Abbreviations, Bill/William) (#476)
  • Other datasets related to this one (grouping?)
  • Row-level comments
  • Meta type columns
    • Birthdate
    • Phone number
  • UP -- Notifications (email? RSS?) for new data sets, new data in sets, etc. (changeset subscriptions?)
  • UP -- Welcome to panda (optional registration, set up your admin user, links to getting started docs)
  • DONE B1 -- Document our advanced query language for end users (solr-style)
  • DONE B1.1 -- Date range search
  • DONE B1.4 -- Export search results (to csv, etc)

Gravy

  • Meta type columns
    • Location (lat/lng)
    • URL
    • SSN
    • Money
    • Organization (name, DUNS, etc)
    • User-extensible (make your own, like Illinois school codes)
    • Foreign address
  • Geographic search by shapefiles
  • Geographic search by any drawn shape
  • Geographic search by distance
  • Map the data
  • Geocode addresses
  • Canned/saved searches
  • Import from MDB/Access
  • Import from shapefile
  • Import from DBF (#466)
  • Import from Google Refine, carry the audit trail into PANDA
  • Import/export to/from Google Docs
  • Export to Google Fusion Tables
  • Column statistics (std. dev., sum, etc)
  • Sysadmin notifications (you're running out of disk! etc.)
  • Single-click deployment
  • Automatic upgrades (like wordpress)
  • Search by taxonomy
  • De-normalize data / dataset merge (connect a table to its lookup table on import)
  • Fixtures to import (from the IRE data library, etc)
  • P13n, store queries that I like to run, etc
  • DONE B1.1 -- Number range search

Meh

  • Encrypt all the data
  • Entity relationships (John Smith in dataset A = John Smith in dataset B, for neat stuff like social network analysis)
  • RDF, linked data endpoint
  • Deploy as a hosted service (somebody else can do that once we've written the regular version)
  • Automated server/resource scaling
  • Join datasets at runtime (reinvent SQL)
  • Non-tabular stuff (PDFs, emails, Doc Cloud and Overview Project)
Something went wrong with that request. Please try again.