NSFdata.tex

%%%%%%%%% DATA MANAGEMENT PLAN -- 2 pages
\required{Data Management Plan}
% Include this supplementary document for your plans for data management
% and sharing of the products of research.
% Describe how this proposal will conform to NSF policy on the 
% dissemination and sharing of research results.
% This may incude
% 1. the types of data, samples, physical collections, software, 
% curriculum materials, and other materials to be produced in the course of the project;
% 2. the standards to be used for data and metadata format 
% and content (where existing standards are absent or deemed inadequate, 
% this should be documented along with any proposed solutions or remedies);
% 3. policies for access and sharing including provisions for appropriate 
% protection of privacy, confidentiality, security, intellectual property, 
% or other rights or requirements;
% 4. policies and provisions for re-use, re-distribution, 
% and the production of derivatives; and
% 5. plans for archiving data, samples, and other research products, 
% and for preservation of access to them.
% A valid Data Management Plan may include only the statement 
% that no detailed plan is needed, as long as the statement is 
% accompanied by a clear justification.

Multiple filtered streams of twitter data relevant to diseases studied will be acquired using Twitter's public streaming API. This data includes tweets related to Pre Exposure Prophylaxis, Truvada, HIV, and AIDS as well as other infectious diseases, and commonly used prescription medications. In addition to twitter data we will also acquire other social media data such as Reddit or Facebook, either from an open data repository, or in the case of Reddit through the public API. Direct medical data will be acquired from Oak Ridge National Laboratory and other sources and will be used in accordance with institutional and national laws and regulations (including HIPAA) governing appropriate use of medical data.

Data will be analyzed on local researcher's desktop machines and on research server clusters including the Quinn research group cluster, the Georgia Advanced Computing Resource Center, and Oak Ridge's institutional computing resources. Code developed for analyses will be made openly available on github.com under open source licenses.

Most of the primary information acquired through Twitter's API or direct medical information cannot be shared directly due to Twitter's terms of service and federal regulations such as HIPAA. However anonymized summary information can and will be disseminated in the form of research article publications, and perhaps also through blog articles and other non-technical mediums.