Skip to content

h4sci/h4sci-course

Repository files navigation

About this Course

Creative Commons License

The vast majority of data has been created within the last decade. As a result, more and more fields of research start to consider and embrace programming to process and analyse data. This course teaches applied programming with data and aims to leverage the open source tech stack to deal with this new wealth and complexity of data.

The idea behind Hacking for Sciences is to build a solid understanding of core technologies and concepts to help researchers develop a data processing strategy and increase your possibilities when working with data. The course approach is to single out those concepts stemming from software development that are easy to adopt and useful to social scientists. The course has three major learning objectives:

  • Be able to evaluate the role of focal components in a data science tech toolbox and pick problems suitable for the problem. Learn how technologies like R, Python, Git Version Control, docker or Cloud Computing could play together in your research project.
  • Learn how to manage and version control source code. Hacking for Sciences teaches how to use git version control to collaborate professionally, make your research reproducible and your code base persistent.
  • Applied data sourcing and data transformation Learn how to communicate with SQL databases. Learn how to consume data from different sources using machine to machine communication interfaces (APIs) such as the OpenStreetMap geocoding API / Routing Engine or the KOF data API for macroeconomic time series.

Non-Goals: Hacking for Sciences is not a Statistics, Econometrics or Machine Learning course. Though experience in these fields will help inasmuch that students will have an easier time to motivate invest into programming and to come up with their own application examples, profound methodological knowledge is not a prerequisite.

Hacking for Sciences is a course taught within the PhD program of ETH Zurich's D-MTEC Department. It's was first in taught in the 2020 fall semester.

Resources

Read Online

Note, the link course material is currently being update to this year's edition of the course and will change before the start of the course. Nevertheless, feel free to catch a glimpse / preview:

Source Code

Server & Community

Schedule

This course was originated with the concepts and tricks in mind that I wish I had know when I started my own PhD (back in 2012). While some of the concepts became only more inevitable, other things may not apply today. Hence, it is up to you to bring your questions and problems and help create the most useful course experience. The course will always be centered around the open source data science stack, but its blocks will be adapted according to popular demand.

REQUIRED BEFORE THE START OF THE COURSE: Make sure you have checked out the course's R Studio server. It is greatly appreciated if users with R and git experience installed git and R locally in advance of the course. Please also make sure everyone has a free, working github.com account before the start of the course.

Block 1: General Overview, How to Git

September 29, 10:00 a.m. - 1:00 p.m. CEST (online): The Big Picture

September 30, 10:00 a.m. - 2:00 p.m. CEST (online): Git & Workflows

Block 2

October 20, 10:00 a.m. - 1:00 p.m. CEST (online): R Programming Crash Course

October 21, 10:00 a.m. - 2:00 p.m. CEST (online): Programming with Data

Block 3

November 17, 10:00 a.m. - 1:00 p.m. CEST (online): Infrastructure

November 18, 10:00 a.m. - 2:00 p.m. CEST (online): Infrastructure

Block 4

December 1, 10:00 a.m. - 1:00 p.m. CEST (online): Semester Projects

December 2, 10:00 a.m. - 2:00 p.m. CEST (online): Semester Projects

Format

The 4 blocks of the course contain

  • short live sessions
  • interactive questionnaires / apps
  • pre-recorded videos
  • own reading / research
  • programming or setup tasks
  • coaching

All 4 blocks will contain most of these elements if not all.

Exam (Leistungsnachweis)

Ungraded programming tasks, active participation in class. The final programming task will be to create a production ready CI/CD pipeline in a group. This could be a regular data update (ETL process) or automated build/testing of a package. Group size will depend on course registrations.

License

This work by Dr. Matthias Bannert is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. This includes all illustrations unless stated otherwise. Logos of software products or companies are just used to referenced to these very companies and products and are not shared under a CC license.

Contact

Find me on twitter @whatsgoodio, send an e-mail to my ETH address or use the course community's Slack workspace.

Best

Matt