Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
265 lines (202 sloc) 19.5 KB

Greene Laboratory Onboarding Information

Mission Statement

We view our core purpose as the development of methodological advances and integrative systems that make analysis of big data, particularly gene expression data, as routine in wet-bench biology labs as PCR. To accomplish this, we will write good code, perform solid and reproducible analyses, and disseminate our results widely through approachable publications and webservers. We recognize that trust, both in the process and in our results, is of primary importance to the biologists that use our methods and webservers. Therefore, we strive to make our source code as open and accessible as possible. When we submit papers, we expect that the analytical code behind those papers will be something that we can be proud of. To these ends, we will provide reviewers and the scientific community with all source code required to generate figures in the paper that result from computational analyses.

Expectations

Your role: We expect that you will take primary responsibility for the success of your research project and career development. As a member of the lab, you are expected to participate fully in the team. When disagreements about methodological approaches arise, you recognize that these should be resolved through a solid and reproducible analysis of available data. In general, lab members are expected to be present from 9:30AM to 4:00PM on weekdays to facilitate discussion within the group. If you aren’t sure — ask.

Casey's role: Casey’s goal is to facilitate your success as well as that of your project. Within your project, Casey will serve as a sounding board for ideas, will help you plan your project, and will help to devise experiments to test your hypotheses. To facilitate your success, Casey will help you to plan your training, to devise a career plan that can take you to where you want to go, to advise you on your project-risk portfolio, and to provide guidance on other elements of career and project development as needed.

Deadlines: Our lab has worked hard to develop a reputation for high-quality science that is well presented. We all benefit from this reputation, but we must also work to maintain it. In order to maintain the quality of our lab's output, we've established deadlines for various outputs. Each of these applies to sharing a complete version that the author deems ready for submission in the Greene Lab slack's #general channel.

The specific deadlines for various types of outputs are:

  • Manuscripts must be shared two weeks before any deadlines.
  • Posters must be shared one week before the deadline for printing.
  • Scientific talks based on a submitted abstract must be shared one week before the presentation.
  • Meeting abstracts must be shared one week before the deadline for submission.

Lab members are given a two-day period to provide feedback on the document. We expect that authors will then revise the document to incorporate feedback provided within the initial two-day period. Authors are encouraged but not required to address feedback received after the initial two days, as it may not always be practicable.

This does not eliminate the need for all coauthors to approve a document. Coauthors are not required to provide their feedback within the two-day window. Coauthors can hold submission of a document until they approve; however, according to ICMJE guidelines extreme holds may result in a change from authorship to acknowledgement.

In the case that all feedback received within the two-day period has been addressed and all coauthors approve, the submission can proceed.

Failure to abide by these guidelines will result in missing whatever the opportunity in question is.

Code of Conduct: All members of the lab, along with visitors, are expected to agree with this code of conduct. We will enforce this code. We expect cooperation from all members to help ensuring a safe environment for everybody. The lab is dedicated to providing a harassment-free experience for everyone, regardless of gender, gender identity and expression, age, sexual orientation, disability, physical appearance, body size, race, or religion (or lack thereof). We do not tolerate harassment of lab members in any form. Sexual language and imagery is generally not appropriate for any lab venue, including lab meetings, presentations, or discussions. However, do note that we work on biological matters so work-related discussions of e.g. animal reproduction are appropriate Harassment includes offensive verbal comments related to gender, gender identity and expression, age, sexual orientation, disability, physical appearance, body size, race, religion, sexual images in public spaces, deliberate intimidation, stalking, following, harassing photography or recording, sustained disruption of talks or other events, inappropriate physical contact, and unwelcome sexual attention Members asked to stop any harassing behavior are expected to comply immediately.

If you are being harassed, notice that someone else is being harassed, or have any other concerns, please contact Casey Greene immediately If Casey is the cause of your concern, Dr. Deborah Hogan (Deborah.A.Hogan@dartmouth.edu) is a good informal point of contact; she does not work for Casey and has agreed to mediate For official concerns, please see the University of Pennsylvania ombuds office The code of conduct section is licensed under a Creative Commons Attribution 3.0 Unported License. http://2012.jsconf.us/#/about & The Ada Initiative Please help by translating or improving: http://github.com/leftlogic/confcodeofconduct.com.

We expect members to follow these guidelines at any lab-related event.

Authorship: Our lab follows the Perelman School of Medicine Authorship Policy These guidelines are derived from ICMJE's Uniform Requirements for Manuscripts Submitted to Biomedical Journals.

Ethics: We expect lab members to be honest in scientific communications both within and outside the lab We expect that lab members will design experiments in a manner that minimizes both bias and self deception We expect that lab members will keep agreements, be careful, and share their code and results openly with the scientific community We expect that credit will be given where credit is due, including in scientific writing Plagiarism is not tolerated While a full enumeration of ethical considerations is outside of the scope of this document, Penn provides a handbook that we recommend In addition, please don't hesitate to raise any questions or concerns that you have at any point with Casey.

Communication

General

Slack: We use slack for rapid communication within the lab. If you'd send an e-mail to someone within the lab, try a slack message instead. This helps to keep communications in one place, and Casey commits to respond to slacks (not necessarily immediately, but the same guarantee is not made for e-mail).

HeyTaco: We recognize that people regularly go above and beyond lab expectations. We wanted a way to recognize each other when this happens. We now use HeyTaco. This allows lab members to send a quick virtual thank you note and/or pat on the back. If someone’s paper gets accepted or someone helps you out with a programming question, congratulate or thank them. Post a message that mentions any user in the #wins Slack channel, and they'll get a HeyTaco point. When one member accumulates enough points, they take the lab out to lunch (Casey pays).

Social Media: Lab members are encouraged to communicate through public social media, and if you choose to do so then you are expected to follow our code of conduct.

Projects: By the nature of our research, lab members will often have the opportunity to participate projects managed via private or publicly accessible source code repositories. In these cases, lab members are expected to: follow the code of conduct; expect that private repositories will be world accessible; and to communicate via the project-specific medium (e.g. if Rene reported an issue on a project on GitHub, it would not be appropriate for Casey to reply "I'll drop by your desk and show you how to solve that.").

IP/Openness: This is handled in accordance with the instructions from our research sponsors and university guidance. Lab members must follow the Penn Participation Agreement and the agreements with our sponsors. These often allow, encourage, or require openness. If you have concerns at any point, set up a meeting with Casey to discuss these concerns.

Calendars: There are two Google Calendars for the lab: Greene Lab Core Events (webview, Calendar ID h1eia9g7qu1udm079vsav7qlq0) and Greene Lab Attendance (webview, Calendar ID dk2vdln8ci4mh1m723df6rcb3s). The Attendance calendar is for noting individual availability (i.e. whether you'll be out of office). It should be used, for example, to note vacations, conference travel, and other workday conflicts. All other events should go in the Core Events calendar. In general, this calendar is for events that could possibly involve 3 or more lab members. Mandatory events such as lab meetings, scrums, and group deadlines go on Core Events.

Accounts: Lab members are expected to have accounts for the following and be members of the specified (organizations) if applicable:

  • GitHub (greenelab)
  • Google Calender (Shared Calendar)
  • Slack (GreeneLab)
  • Dropbox (permanent members)

Meetings

Scrum: Our team's scrum process involves three components:

1. An opening scrum meeting at 10:00 AM Monday morning where individuals will lay out their goals for the week, in the form of post-it notes, on the whiteboard in the Greene Lab.
2. A closing scrum meeting at 1:40 PM Friday afternoon where team members will reflect on the progress they made on the goals they stated the previous Monday.
3. A daily virtual scrum update.

The daily virtual scrum update should include an update to scrum repository. An issue is automatically created for each day the office is open. These issues can be found here. The update should include the following:

1. What specific item(s) he/she accomplished yesterday.
2. What specific item(s) he/she plans to accomplish today.
3. Who, if anyone, is blocking him/her?
4. Who, if anyone, is he/she blocking?

Lab Meeting: Lab meeting is held weekly at a location at Penn and also via the google hangouts link used for scrum. Scheduling is managed via a google spreadsheet. See the #general slack channel's pinned items link. Lab meeting consists of three components described below.

  • Journal Club
  • Braintrust
  • Applied Imagination

Journal Club: We have a 15 minute journal club to start each lab meeting. For journal club, prepare a presentation of 4 papers. All except for one should have been published since your last journal club presentation. The content you discuss - specifically your summary of the papers - should be the product of thoughtful analysis. The presentation itself should be simple. During the discussion, please share why you picked the paper, its implication for your research, and any potential implications that touch on other research that is ongoing in the lab. For each paper, the presentation should consist of:

  1. A title slide
  2. An overview slide (usually a flow-chart of some sort from the paper, could also be an initial result that sets context).
  3. The results figure that convinced you to pick this paper.

Braintrust: This is an opportunity to share anything that you wish to talk about with the group. This could be a confounding result, an interesting result, an analysis that isn’t working, a demo of a cool new technology etc. This is your chance to have the group focus on and help you solve a challenge that you’re facing or to share something interesting that you've discovered with the group. Scheduling is voluntary, but each member of the lab is expected to share at least once every three months.

Applied Imagination: One hour per month, lab meetings will be dedicated to big ideas, brainstorming, extended discussion outside the scope of weekly lab meeting, and other team endeavors. Topics can be big questions like “How do we get rid of dark pools of gene expression data?” or the time can be used to discuss new methods and how they fit in with the lab mission (e.g., adversarial networks). Individual lab members are expected to do some brief preparation before the meeting (e.g., read provided papers/materials, come with a few ideas on the topic). The monthly meeting itself consists of group brainstorming and/or discussion and wraps up with a list of action items for follow up.

Tech Team Meeting: Members of the tech team are expected to attend a tech team meeting each week rather than the Lab Meeting, although they are welcome to attend both. The Tech Team Meeting will include an update from Deepa regarding information gleaned from users or perspective users. Kurt will give a review of project statuses and progress. The final half hour will be a presentation from varying members relating to cool new tech, designs of new projects, or anything else that is relevant to the tech team.

Individual Meetings: We schedule weekly individual meetings. Once you join the lab, contact Casey (or Kurt if you're on the tech team) to set up a time. These are set up for a term to accommodate class schedules. We don’t reschedule these meetings by default if one of the parties (Casey, Kurt or you) are out of town, so if you do want to meet in a week but travel conflicts, contact Casey or Kurt to reschedule. The goal of the weekly meeting is to:

  1. Discuss challenges.
  2. Plan strategy (project related, personal career, etc).

Source Code, Data, and Reproducibility

Pride: We expect lab members to sign their code. To quote from The Pragmatic Programmer:

Craftsmen of an earlier age were proud to sign their work. You should be, too… People should see your name on a piece of code and expect it to be solid, well written, tested, and documented.

While some code will be proof-of-concept code, it should be of a form that inspires confidence.

Language: We write code for our analyses in Python or R, which allows everyone in the lab to know two languages and understand analytical code. Code for visualization can be Python, R, or javascript. Webserver interface code uses javascript.

Licensing: We expect code that we produce to be licensed under a 3-clause BSD license. Unless a funding agency requires something different, we'll use this. If you have questions or concerns about licensing, feel free to raise them in Slack.

Version Control Services: Our primary version control service is GitHub, and we have a greenelab account there. We expect that lab members will maintain their code in repositories under these team accounts. However, lab member should not commit to the branch that is shown as default on GitHub for any of these repositories. Instead commits happen as described below to facilitate code review.

Creating a Greenelab Repository:

  1. Create a repository under the team account.
  2. Immediately fork this repository into one that your user account owns.
  3. Make commits to your own repository, and move code back to the greenelab repository as described below.

Getting Code into Greenelab Repositories: Code moves from user repositories to greenelab repositories through a process of code review. Code review is handled through pull requests. The process is described briefly below. Feel free to ask for guidance if you are uncomfortable with the process. We will revoke write access for failing to adhere to these rules.

  1. Make changes to your code and commit them in your own repository first.
  2. Create a pull request into the repository owned by Greenelab.
  3. Name potential reviewers for your pull request.
  4. Once at least one lab member has approved your pull request, you or a reviewer may merge your pull request. The only exception to this policy is this repository ("onboarding") where, in addition to the above rules, Casey must also approve the pull request.

Composition of Pull Requests: Each pull request may contain one or more changesets. In keeping with good source control practice, each changeset or commit should contain all changes necessary for a particular fix or update. In addition, each pull request should relate to no more than one functional area in the code base you are updating. Keeping the pull request focused to one area makes it easier for your reviewers to provide thoughtful feedback.

Reviewing Pull Requests: We expect that all lab members will participate in review of pull requests. If you get named by the submitter, it's courteous to review the request. We have created a checklist to facilitate review. As a reviewer, you are responsible for making sure that all checklist guidelines are followed.

Projects that didn't work: We expect that repositories will contain failures (e.g. proof-of-concepts that didn't work). This is ideal. Being able to find them will make sure we don't make the same failure twice.

Non-Code Versioning: Non-code documents should be kept in a place that maintains version history (e.g. dropbox for word documents). We maintain a dropbox for business account for these purposes.

Data Management: For publicly available data, scripts used to download and process these data should be preserved, as should the versions of items used in processing (e.g. probe to gene mappings). These items should be version controlled. Where possible, intermediate files of reasonable size can be stored to facilitate re-use, but the process to regenerate these files from publicly available data should be preserved. When we generate data, they should be stored in a location where they are replicated and uploaded to the relevant database as soon as possible (e.g. GEO for gene expression, SRA for sequencing).

Reproducibility: We expect all lab members to maintain code that performs reproducible analyses. This can be in the form of makefiles, shell scripts, or other automation approaches that allow analyses to be automatically performed. We expect that these scripts, including those to generate figures in papers generated as a consequence of such analyses, will be included in source control repositories (see "Getting Code into Greenelab Repositories) and made publicly available before or concurrent with the submission of preprint (if submitted) or manuscripts. Combined with the review guidelines, this means that all code must have been reviewed for these documents to be submitted.

How to Modify this Document

This is a living document. The repository is at GitHub. To make changes, fork, edit the files you wish, and create a pull request. The pull request process is handled as described in the Getting Code into Greenelab Repositories section of coding_and_software.

Additional Resources