Biostatistics 753 and 754
Class github repository for 753 and 754; doctoral classes in the Department of Biostatistics at Johns Hopkins. You may use the material according to the Creative Commons Share-Alike 2.0 (CC-SA 2.0) license.
Instructor: Jeff Leek (ask me for my email if you don't know it)
TA: Amanda Mejia (ask her for her email, office hour TBA)
Old 753and4 Webpages:
Room: W2009 (3rd Term), W4007 (4th term)
Time: TuTh 10:30am-11:50am
This course is the third term of the intensive introduction to methods for applied statistics. The goal of this sequence is to develop Ph.D. level biostatisticians who are capable of both applied data analysis and developing the next generation of statistical methods. Both data analysis and methods development require substantial hands on experience, so the focus of this class will be on hands on data analysis.
Upon completion of this course students will be able to:
- Obtain, clean, transform and process raw data into usable formats
- Formulate quantitative models to address scientific questions
- Organize and perform a complete data analysis, from exploration, to analysis, to synthesis, to communication.
- Understand and apply a range of statistical methods for inference and prediction.
- Develop ideas for new statistical methods, tools, and analyses
Students will also be encouraged to independently read and apply statistical methods from texts and the scientific literature that are not covered in the course. They will also be encouraged to think of improvements or variations on existing methods to address specific scientific questions.
Evaluation and feedback
- 35% = Data analysis (peer graded/instructor summarized)
- 20% = Bi-weekly problems (graded by TA)
- 10% = Data analysis review (completion)
- 25% = Final Project (graded by instructor)
You will get receive grades on the methods problems, feedback from your peers, and brief (< 1 paragraph + grade) feedback from me within a week of submitting your assignment. If you would like further feedback on your assignments please schedule time to meet with me. I will try to leave Fridays available from 10am-3pm in 20 minute slots available. You may book up to 3 slots at a time:
I believe the purpose of the Ph.D. is to train you to be able to think for yourself and initiate and complete your own projects. I am super excited to talk to you about ideas, work out solutions with you, and help you to figure out statistical methods and/or data analysis. I don't think that graduate school grades are important for this purpose. This means that I don't care very much about graduate student grades.
That being said, the purpose of this course is to prepare you for the qualifying exam. I will therefore assign grades on a three level scale:
- A - Excellent
- B - Passing
- C - Needs improvement
If you receive A's and B's and perform in a similar way on the qualifying exam, I anticipate that you will pass. If you receive C's that is my way of letting you know that your work would not pass on the qualifying exam. I don't feel comfortable assigning percentages to data analyses, but to be able to calculate grades at the completion of the course I will use the following percentages: A = 100%, B = 85%, C = 75% of available points.
Data analysis assignments
(For more on my project philosophy see: http://bit.ly/wQT5uI)
Each student will be required to perform two data analysis projects during the course of the class. Students will be given 2 weeks to perform each analysis. The project assignments will consist of a scientific description of the problem. Students are responsible for all stages of each data analysis from obtaining the data to the final report. At the conclusion of each analysis each student must turn in:
- A write-up of their data analysis in a synthesized format, with numbered figures and references. (You may also include supplementary material for detailed additional calculations/analyses)
- A reproducible Rmd file that produces all of the numbers, figures and results in your write-up.
All documents should be submitted electronically.
- Did you answer the scientific question? (30%)
- Did you use appropriate statistical methods? (40%)
- Was your write-up simple, clear, and precise? (20%)
- Was your code reproducible? (10%)
Keep in mind that this is a methods class. In some cases standard methodology will be sufficient to answer the question of interest. You may speak to your fellow students about specific statistical questions related to the projects, but the overall idea, analysis, and write-up should be your own individual work. You should cite any help you get from fellow students/TAs in your report in standard citation format.
Data Analysis Reviews
After each data analysis is turned in, they will be randomly assigned to another student for review. Your review will be due one week after it is assigned. Your comments should have the format of a typical peer review. You should include a summary of the analyses and conclusions in the project you are reviewing, any major revisions, and any minor revisions. I will also evaluate each data analysis independently to assign a grade. Synthesized comments will be made available for each project.
Every two weeks you will be assigned one or more mathematical, directed problems focused on a statistical method we have covered. These problems may have multiple parts. The solutions should be submitted as PDF files.
The final project will have the same format as the data analyses. It will be slightly longer than the weekly projects in terms of space and more in depth in terms of analysis. For 753, the final project will be assigned to you. For 754 you will be able to select your own data set and project.
The choice of your final project is up to you. The project should involve data/code that you can obtain, process, analyze, and synthesize yourself. Keep in mind that real scientists make their own data. You may use any of the methods you learn during the course, or any other methods you know/look up etc.
Grading for the final exam will be weighted by the difficulty of the project you undertake. The more difficult the project you take on, the greater the multiplier of your final score. The maximum possible score will still be 100%.
Structure of Class Time
Class will consist of both lectures on statistical methdology and hands on practice. The hands on practice will be assigned in advance of each lecture and will give you time to look it over and come up with questions. The plan will be for students to work on the problem and ask questions, followed by the instructor or a chosen student presenting their solution.
Tentative syllabus (753 and 754)
- Obtaining data and data processing
- Exploratory data analysis
- Regression and generalizations
- High dimensional analysis
- Simulation studies