Skip to content
Permalink
master
Go to file
 
 
Cannot retrieve contributors at this time
290 lines (227 sloc) 23.3 KB

Greene Laboratory Onboarding Information

Mission Statement

We view our core purpose as the development of methodological advances and integrative systems that make analysis of big data, particularly gene expression data, as routine in wet-bench biology labs as PCR. To accomplish this, we will write good code, perform solid and reproducible analyses, and disseminate our results widely through approachable publications and webservers. We recognize that trust, both in the process and in our results, is of primary importance to the biologists that use our methods and webservers. Therefore, we strive to make our source code as open and accessible as possible. When we submit papers, we expect that the analytical code behind those papers will be something that we can be proud of. To these ends, we will provide reviewers and the scientific community with all source code required to generate figures in the paper that result from computational analyses.

Expectations

Your role: We expect that you will take primary responsibility for the success of your research project and career development. As a member of the lab, you are expected to participate fully in the team. When disagreements about methodological approaches arise, you recognize that these should be resolved through a solid and reproducible analysis of available data. In general, lab members are expected to be present from 9:30AM to 4:00PM on weekdays to facilitate discussion within the group. If you aren’t sure — ask.

Casey's role: Casey’s goal is to facilitate your success as well as that of your project. Within your project, Casey will serve as a sounding board for ideas, will help you plan your project, and will help to devise experiments to test your hypotheses. To facilitate your success, Casey will help you to plan your training, to devise a career plan that can take you to where you want to go, to advise you on your project-risk portfolio, and to provide guidance on other elements of career and project development as needed.

Deadlines: Our lab has worked hard to develop a reputation for high-quality science that is well presented. We all benefit from this reputation, but we must also work to maintain it. Abstracts for meetings must be shared with all co-authors, including Casey, at least one week prior to the deadline for submission. Failure to abide by this guideline will result in missing whatever the opportunity in question is.

Trainees in the lab will often receive opportunities to present their work at scientific conferences. These presentations reflect on the entire lab. Oral presentations must be presented to the research lab during a braintrust meeting prior to the conference, and lab members are expected to address feedback that is provided. Poster presentations should be shared in the #general slack channel at least a week before printing.

Code of Conduct: All members of the lab, along with visitors, are expected to agree with this code of conduct. We will enforce this code. We expect cooperation from all members to help ensuring a safe environment for everybody. The lab is dedicated to providing a harassment-free experience for everyone, regardless of gender, gender identity and expression, age, sexual orientation, disability, physical appearance, body size, race, or religion (or lack thereof). We do not tolerate harassment of lab members in any form. Sexual language and imagery is generally not appropriate for any lab venue, including lab meetings, presentations, or discussions. However, do note that we work on biological matters so work-related discussions of e.g. animal reproduction are appropriate. Harassment includes offensive verbal comments related to gender, gender identity and expression, age, sexual orientation, disability, physical appearance, body size, race, religion, sexual images in public spaces, deliberate intimidation, stalking, following, harassing photography or recording, sustained disruption of talks or other events, inappropriate physical contact, and unwelcome sexual attention. Members asked to stop any harassing behavior are expected to comply immediately.

If you are being harassed, notice that someone else is being harassed, or have any other concerns, please contact Casey Greene immediately. If Casey is the cause of your concern, Dr. Deborah Hogan (Deborah.A.Hogan@dartmouth.edu) is a good informal point of contact; she does not work for Casey and has agreed to mediate. For official concerns, please see the University of Pennsylvania ombuds office. The code of conduct section is licensed under a Creative Commons Attribution 3.0 Unported License. http://2012.jsconf.us/#/about & The Ada Initiative. Please help by translating or improving: http://github.com/leftlogic/confcodeofconduct.com.

We expect members to follow these guidelines at any lab-related event.

Authorship: Our lab follows the Perelman School of Medicine Authorship Policy. These guidelines are derived from ICMJE's Uniform Requirements for Manuscripts Submitted to Biomedical Journals.

Ethics: We expect lab members to be honest in scientific communications both within and outside the lab. We expect that lab members will design experiments in a manner that minimizes both bias and self deception. We expect that lab members will keep agreements, be careful, and share their code and results openly with the scientific community. We expect that credit will be given where credit is due, including in scientific writing. Plagiarism is not tolerated. While a full enumeration of ethical considerations is outside of the scope of this document, Penn provides a handbook that we recommend. In addition, please don't hesitate to raise any questions or concerns that you have at any point with Casey.

PhD Student Committees: PhD students will interact with their qualifier and thesis committees. Students should correspond with the coordinator for their graduate program to understand the expectations that exist around communication with committee members. Questions around what document(s) their committee will expect to see and when they should be sent to the committee should be resolved with the coordinator at least a month in advance of a scheduled meeting. Students in the Greene lab are not to provide food or drinks for committee members. If the students are in a graduate program where a culture of providing food and drinks to committee members has developed, the students can include the information that no food or drink will be provided on an email in advance of the meeting and cite this policy.

Conference Travel: We try to make sure that each member of the lab can travel to once conference per year of their choice outside of the Philadelphia region. The conference should be within the continental United States or cost competitive with similar conferences in the continental US. Lab members who travel to such a conference should submit an abstract for an oral presentation and poster and should present in whatever form is accepted at the meeting. The conference should be topical for the lab member's research projects and the purpose must align with the grant(s) that support the lab member. Lab members should first clear such travel with Casey. Lab members who are invited to conferences or other presentation opportunities with their costs covered by the organization inviting them, e.g., as an invited speaker or keynote, are welcome to accept such invitations. In all cases conference travel should be noted on the lab attendance calendar.

Communication

General

Slack: We use slack for rapid communication within the lab. If you plan on sending an e-mail to someone within the lab, try a slack message instead. This helps to keep communications in one place, and Casey commits to respond to slacks (not necessarily immediately, but the same guarantee is not made for e-mail). There are many channels on our lab's slack; however, it is recommend that newcomers join the following channels: #general, #lab-meeting, #lab-supplies, #journalclub, #random, #wins.

HeyTaco: We recognize that people regularly go above and beyond lab expectations. We wanted a way to recognize each other when this happens. We now use HeyTaco. This allows lab members to send a quick virtual thank you note and/or pat on the back. If someone’s paper gets accepted or someone helps you out with a programming question, congratulate or thank them. Post a message that mentions any user in the #wins Slack channel, and they'll get a HeyTaco point. When one member accumulates enough points, they take the lab out to lunch (Casey pays).

Social Media: Lab members are encouraged to communicate through public social media, and if you choose to do so then you are expected to follow our code of conduct.

Projects: By the nature of our research, lab members will often have the opportunity to participate projects managed via private or publicly accessible source code repositories. In these cases, lab members are expected to: follow the code of conduct; expect that private repositories will be world accessible; and to communicate via the project-specific medium (e.g. if Rene reported an issue on a project on GitHub, it would not be appropriate for Casey to reply "I'll drop by your desk and show you how to solve that.").

IP/Openness: This is handled in accordance with the instructions from our research sponsors and university guidance. Lab members must follow the Penn Participation Agreement and the agreements with our sponsors. These often allow, encourage, or require openness. If you have concerns at any point, set up a meeting with Casey to discuss these concerns.

Calendars: There are two Google Calendars for the lab: Greene Lab Core Events (webview, Calendar ID h1eia9g7qu1udm079vsav7qlq0) and Greene Lab Attendance (webview, Calendar ID dk2vdln8ci4mh1m723df6rcb3s). The Attendance calendar is for noting individual availability (i.e. whether you'll be out of office). It should be used, for example, to note vacations, conference travel, and other workday conflicts. All other events should go in the Core Events calendar. In general, this calendar is for events that could possibly involve 3 or more lab members. Mandatory events such as lab meetings, scrums, and group deadlines go on Core Events.

Accounts: Lab members are expected to have accounts for the following and be members of the specified (organizations) if applicable:

  • GitHub (greenelab)
  • Google Calender (Shared Calendar)
  • Slack (GreeneLab)

Meetings

Scrum: Our team's scrum process involves three components:

1. An opening scrum meeting at 9:45 AM Monday morning where individuals will lay out their goals for the week, in the form of post-it notes, on the whiteboard in the Greene Lab.
2. A closing scrum meeting at 1:40 PM Friday afternoon where team members will reflect on the progress they made on the goals they stated the previous Monday.
3. A daily virtual scrum update.

The daily virtual scrum update should include an update to scrum repository. An issue is automatically created for each day the office is open. These issues can be found here. The update should include the following:

1. What specific item(s) he/she accomplished yesterday.
2. What specific item(s) he/she plans to accomplish today.
3. Who, if anyone, is blocking him/her?
4. Who, if anyone, is he/she blocking?

Lab Meeting: Lab meeting is held weekly at a location at Penn and also virtually via uberconference. Scheduling is managed via a google spreadsheet. See the #general slack channel's pinned items link. Lab meeting consists of two components described below.

  • Journal Club
  • Braintrust

Journal Club: This is an opportunity for lab members to strengthen their ability to comprehend and critically analyze research that has already been conducted. For journal club, everyone will be randomly split off into groups of two (or three if an odd number of lab members). Presenters in these groups are expected to present on four papers. Three papers will be in the form of lightning talks - where presenters spend 5 to 10 minutes discussing the following: the overall goal of the work, any results presenters found interesting and the paper's implication for presenter's own research or any potential implications that touch on other ongoing research in the lab. The final paper will be a more in-depth discussion where presenters will dive deeper into individual experiments and mention how these experiments tie into the paper's global scheme. All selected papers except for one should have a publication date that is later than the most recent paper from your last journal club presentation. If this is your very first journal club, you are free to pick papers from any publication date to start; however, current members within the lab should abide by the date rule above. An example outline for presentations are provided below and an example template can be found here. For each lightning talk paper, the presentation should consist of:

  1. A title slide
  2. An overview slide (usually a flow-chart of some sort from the paper, could also be an initial result that sets context).
  3. The results figure that convinced you to pick this paper.

For the in-depth discussion, the presentation should consist of:

  1. A title slide
  2. Paper's global hypothesis: What do the author(s) want to know (motivation)? What are the key questions they want answered? Why do they think these question(s) are important?
  3. Overall Experimental Design slide(s) (usually a flow-chart of some sort from the paper). What data are they using? Why was it done that way? (context within the field)
  4. Results from each experiment and how it ties to the global hypothesis. What is the design of the experiment? What are the controls? Evaluation metric? How does this figure relate back to the broad question of the paper?
  5. Take home messages from the paper and how does the paper relate to research in the lab
  6. (Optional) Brief discussion on possible follow up experiments. Are the findings convincing?

Braintrust: This is an opportunity to share anything that you wish to talk about with the group. This could be a confounding result, an interesting result, an analysis that isn’t working, a demo of a cool new technology etc. This is your chance to have the group focus on and help you solve a challenge that you’re facing or to share something interesting that you've discovered with the group. Scheduling is voluntary, but each member of the lab is expected to share at least once every three months.

Individual Meetings: We schedule weekly individual meetings. Once you join the lab, contact Casey (or Kurt if you're on the tech team) to set up a time. These are set up for a term to accommodate class schedules. We don’t reschedule these meetings by default if one of the parties (Casey, Kurt or you) are out of town, so if you do want to meet in a week but travel conflicts, contact Casey or Kurt to reschedule. The goal of the weekly meeting is to:

  1. Discuss challenges.
  2. Plan strategy (project related, personal career, etc).

Triannual Self Reflection: Every four months students and staff will individually meet with Casey to discuss their existing goals, current progress made and set goals for the next interval. To prepare for these meetings students and staff are required to create an activity report that contains any of the following information (if applicable):

  • publications: submitted/accepted/published
  • grants/fellowships/scholarships: applied/awarded
  • presentations delivered
  • posters presented
  • meeting abstracts: submitted/accepted
  • software releases
  • other honors
  • goals for next session: What would you like to accomplish by the end of next cycle?
  • self-reflection. What do you regard as your strengths and as areas where you need improvement?

The report should be in the form of a plain text file, markdown file, or PDF and the file should be called lastname-reflection-yearmonth (e.g. Greene-reflection-201908.txt). Submit the report in a direct message to Casey via slack. During the summer, graduate students are requried to complete Penn's individual development plan (IDP). Post-docs must complete an IDP prior to their annual contract renewal. This document covers more in-depth content than the regular triannual self reflection; therefore, the IDP can be used as a replacement annual report for that cycle. Because much of the material is overlapping, trainees will benefit from preserving their self reflection materials in a format that supports copying and pasting to the IDP form.

Source Code, Data, and Reproducibility

Pride: We expect lab members to sign their code. To quote from The Pragmatic Programmer:

Craftsmen of an earlier age were proud to sign their work. You should be, too… People should see your name on a piece of code and expect it to be solid, well written, tested, and documented.

While some code will be proof-of-concept code, it should be of a form that inspires confidence.

Language: We write code for our analyses in Python or R, which allows everyone in the lab to know two languages and understand analytical code. Code for visualization can be Python, R, or javascript. Webserver interface code uses javascript.

Licensing: We release as many research outputs as possible under permissive open licenses. This ensures lab research is reusable and reproducible, with minimal legal barriers. The default license for software that should be applied to new lab related repositories is the BSD-2-Clause Plus Patent License. This license is OSI-approved and rated highly for its simplicity, compatability, and effectiveness.

In certain cases, a funding agency requires a different license or upstream restrictions require certain licensing. In these cases, the lab may apply a different license. If you have questions or concerns about licensing, feel free to raise them in Slack.

Version Control Services: Our primary version control service is GitHub, and we have a greenelab account there. We expect that lab members will maintain their code in repositories under these team accounts. However, lab member should not commit to the branch that is shown as default on GitHub for any of these repositories. Instead commits happen as described below to facilitate code review.

Creating a Greenelab Repository:

  1. Create a repository under the team account.
  2. Immediately fork this repository into one that your user account owns.
  3. Make commits to your own repository, and move code back to the greenelab repository as described below.

Getting Code into Greenelab Repositories: Code moves from user repositories to greenelab repositories through a process of code review. Code review is handled through pull requests. The process is described briefly below. Feel free to ask for guidance if you are uncomfortable with the process. We will revoke write access for failing to adhere to these rules.

  1. Make changes to your code and commit them in your own repository first.
  2. Create a pull request into the repository owned by Greenelab.
  3. Name potential reviewers for your pull request.
  4. Once at least one lab member has approved your pull request, you or a reviewer may merge your pull request. The only exception to this policy is this repository ("onboarding") where, in addition to the above rules, Casey must also approve the pull request.

Composition of Pull Requests: Each pull request may contain one or more changesets. In keeping with good source control practice, each changeset or commit should contain all changes necessary for a particular fix or update. In addition, each pull request should relate to no more than one functional area in the code base you are updating. Keeping the pull request focused to one area makes it easier for your reviewers to provide thoughtful feedback.

Reviewing Pull Requests: We expect that all lab members will participate in review of pull requests. If you get named by the submitter, it's courteous to review the request. We have created a checklist to facilitate review. As a reviewer, you are responsible for making sure that all checklist guidelines are followed.

Projects that didn't work: We expect that repositories will contain failures (e.g. proof-of-concepts that didn't work). This is ideal. Being able to find them will make sure we don't make the same failure twice.

Non-Code Versioning: Non-code documents should be kept in a place that maintains version history. Penn provides Box for these purposes.

Data Management: For publicly available data, scripts used to download and process these data should be preserved, as should the versions of items used in processing (e.g. probe to gene mappings). These items should be version controlled. Where possible, intermediate files of reasonable size can be stored to facilitate re-use, but the process to regenerate these files from publicly available data should be preserved. When we generate data, they should be stored in a location where they are replicated and uploaded to the relevant database as soon as possible (e.g. GEO for gene expression, SRA for sequencing).

Reproducibility: We expect all lab members to maintain code that performs reproducible analyses. This can be in the form of makefiles, shell scripts, or other automation approaches that allow analyses to be automatically performed. We expect that these scripts, including those to generate figures in papers generated as a consequence of such analyses, will be included in source control repositories (see "Getting Code into Greenelab Repositories") and made publicly available before or concurrent with the submission of preprint (if submitted) or manuscripts. Combined with the review guidelines, this means that all code must have been reviewed for these documents to be submitted.

How to Modify this Document

This is a living document. The repository is at GitHub. To make changes, fork, edit the files you wish, and create a pull request. The pull request process is handled as described in the Getting Code into Greenelab Repositories section of coding_and_software.

Additional Resources

You can’t perform that action at this time.