Skip to content

kaylielau/dsi-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DSI Workshop for Python

   

Contents:

  1. Description
  2. Learning Outcomes
  3. Course Contacts
  4. Pre-Course Work
  5. Design
  6. Schedule
  7. Prerequisites
  8. Expectations
  9. Policies
  10. Folder Structure
  11. Key Texts
  12. Acknowledgements  

Description

The course was created by the University of Toronto's Data Sciences Institute. It is designed for those who have a degree in something other than Computer Science/Statistics and are looking to enhance their data science skills for their career.  

The first half of the course will focus on the essentials of coding in Python and ethical considerations of using algorithms. You will learn how to design functions, repeat code using loops, store data in lists, test and debug your code, and manipulate data using various data analysis and visualization tools such as numpy, pandas, matplotlib, seaborn, and plotly. You will have discussions about the Tuskegee experiment, its long-term effects, and the trustworthiness of AI applications in disparate social systems.  

The second half of the course will develop the professional skills necessary to be a data scientist with a focus on machine learning. You will go through an industry overview, explore the job interview process, including potential technical questions, and receive additional resources.   

Learning Outcomes

After successfully completing the course, the students will:

  1. Understand various Python data types and their role in coding. This includes being able to differentiate and evaluate expressions using numeric types (integer, long, and floating-point numbers), Booleans, strings, and lists. This will be assessed in Assignment 1.
  2. Be able to reduce the duplication of code by following the Function Design Recipe and create functions in Python. This will be assessed in Assignment 1.
  3. Be able to use numpy and pandas to analyze a dataset, more specifically, be able to use these libraries to manipulate numerical and tabular data in Python. This will be assessed in Assignment 1 and 2.
  4. Know how to interact with databases via Python. This includes using visualization techniques like matplotlib, seaborn, and plotly. This will be assessed in Assignment 2.
  5. Know how to debug and test Python code. Students will learn to troubleshoot errors and to select test cases to check for correctness, reliability, and robustness of code. This will be assessed in Assignment 1 and 2.
  6. Understand the ethical issues with software and be aware of case studies in which software failure resulted in catastrophe.
  7. Be able to answer job interview questions with confidence.

Course Contacts

Pre-Course Work

Prior to the first class please:

  1. Create a Google account that can use Google Colab:
    1. Go to https://colab.research.google.com/. In the upper left corner, click File, then New Notebook
    2. Enter !python --version in the code cell, then hit ctrl+enter to run the cell and confirm that your Python version is 3.6 or above.
    • If you are having issues with the set-up, the TA will be available to help with this Monday 28 November from 5pm-6pm.
  2. Complete the pre-course survey: https://forms.gle/rcVCTfZasarXAGQg9  

Design

The course runs synchronously over Zoom. It consists of three classes a week for three weeks, or nine classes total. Classes are 6 PM - 8 PM EDT on Mondays and Thursdays, and 9 AM - 12 PM EDT on Saturdays. Being mindful of online fatigue, there will be one or two breaks during each class where students are encouraged to stretch, grab a drink and snacks, or ask any additional questions.  

Tutorial sessions with a TA will also be offered over Zoom. These will take place from 5 PM - 6 PM EDT on Mondays and Thursdays, and 8:30 AM - 9 AM EDT and 12 PM - 12:30 PM EDT on Saturdays.  

Schedule

Schedule is tentative and may be modified as needed. Learners will be notified of schedule changes.

  • Day 1 (Monday 28 November, 6pm-8pm): Getting Started I (Introduction; Python fundamentals)
  • Day 2 (Thursday 1 December, 6pm-8pm): Getting Started II (Python fundamentals)
  • Day 3 (Saturday 3 December, 9am-noon): Dealing with Reality (Control flow using conditionals and loops; Lists, tuples, sets, and dictionaries)
  • Day 4 (Monday 5 December, 6pm-8pm): In/Out (Modules; Working with files; Object-oriented programming)
  • Day 5 (Thursday 8 December, 6pm-8pm): Doing More with Data I (numpy)
  • Day 6 (Saturday 10 December, 9am-noon): Doing More with Data II (pandas)
  • Day 7 (Monday 12 December, 6pm-8pm): Visualizing Data (matplotlib; seaborn; plotly)
  • Day 8 (Thursday 15 December, 6pm-8pm): Professional skills: Industry case study - Hareem Naveed
  • Day 9 (Saturday 17 December, 9am-noon): Review and Ethics

Prerequisites

Learners are expected to know how to operate a computer and are also expected to be familiar with the parts of a data table or spreadsheet, summary statistics, and basic data visualizations. No prior programming knowledge is required.

Expectations

The course is a live-coding class. Learners are expected to follow along with the coding in their own Python notebooks. Learners should be active participants while coding and are encouraged to ask questions throughout. Although slides will be available, they should be referenced before or after class, as class will be dedicated to coding with the instructor.  

Technology requirements

  • Learners must have an internet connection and a computer to participate in online activities
  • Learners must have a Google account that can use Google Colab  

Policies

  • Accessiblity: We want to provide an accessible learning environment for all. If there is something we can do to make this course more accessible to you, please let us know.
  • Course communications: Communications take place over email. Please include "DSI-Python" or similar in the subject line, e.g. "DSI-Python: pandas question"
  • Camera: Keeping your camera on is optional.
  • Microphone: Please keep microphones muted unless you need to speak. Please indicate your name before speaking as some Zoom configurations make it hard to tell who is talking!
  • Assessment: There will be homework which is not graded, but highly reccomended, and there will be three assignments which are graded.  

Folder Structure

  • 01-slides: Course slides as interactive Google Colab notebooks (.ipynb files)
  • 02-html-slides: Course slides as HTML files that can be downloaded and viewed in a web browser
  • 03-pdf-slides: Course slides as PDFs
  • 04-homework: Optional homework to practice concepts covered in class
  • 05-assignments: Graded assignments
  • 06-live-code: Notebooks from class live coding sessions
  • data: Datasets used in the course
  • README: This file!
  • LICENSE: Copyright information for these materials
  • .gitignore: Files to exclude from this folder, specified by the instructor

Slides

Key Texts

Gries, Campbell, and Montojo, 2017, Practical Programming: An Introduction to Computer Science Using Python 3.6. Adhikari, DeNero, and Wagner, Computational and Inferential Thinking: The Foundations of Data Science.  

Acknowledgements

Course materials were originally developed by Asel Kushkeyeva under the supervision of Rohan Alexander, University of Toronto. Materials have been modified by A. Mahfouz and Kaylie Lau for 2022.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published