Skip to content

Latest commit

 

History

History
769 lines (459 loc) · 20.2 KB

intro.md

File metadata and controls

769 lines (459 loc) · 20.2 KB
author title semester footer license
Claire Le Goues & Christian Kaestner
MLiP: Motivation, Syllabus, and Introductions
Spring 2024
Machine Learning in Production/AI Engineering • Claire Le Goues & Christian Kaestner, Carnegie Mellon University • Spring 2024
Creative Commons Attribution 4.0 International (CC BY 4.0)

Machine Learning in Production

Motivation, Syllabus, and Introductions


Slack

We use Slack for this course, including during lectures

See signup link on Canvas

Setup the ability to read/post to Slack during lecture


Catastrophic Success

Crowd


The Waitlist Situation

¯\_(ツ)_/¯

About 120 students waitlisted

Best guess: 40 more people will get in, but it may take a few days

For those joining late:

  • Ask us for recording of missed lectures on Slack
  • Post introduction on Slack (#intro) when joining
  • See Canvas for automatic extensions and makeup opportunities for quizzes, labs, and homeworks
  • Automatically excused for participation in missed lectures

Learning Goals

  • Understand how ML components are parts of larger systems
  • Illustrate the challenges in engineering an ML-enabled system beyond accuracy
  • Explain the role of specifications and their lack in machine learning and the relationship to deductive and inductive reasoning
  • Summarize the respective goals and challenges of software engineers vs data scientists
  • Explain the concept and relevance of "T-shaped people"

Agenda Today

  1. Preliminaries (just done)
  2. Case Study
  3. Syllabus
  4. Introductions

Case Study: A Transcription Service Startup


competitor


Transcription services

Take audio or video files and produce text.

  • Used by academics to analyze interview text
  • Podcast show notes
  • Subtitles for videos

State of the art a few years ago: Manual transcription, often mechanical turk (1.5 $/min)

Recently: Many ML models for transcription (e.g., in Youtube, Alexa, Siri, Zoom)


The startup idea

PhD research on domain-specific speech recognition, that can detect technical jargon

DNN trained on public PBS interviews + transfer learning on smaller manually annotated domain-specific corpus

Research has shown amazing accuracy for talks in medicine, poverty and inequality research, and talks at Ruby programming conferences; published at top conferences

Idea: Let's commercialize the software and sell to academics and conference organizers


Breakout: Likely challenges in building commercial product?

As a group, think about challenges that the team will likely focus when turning their research into a product:

  • One machine-learning challenge
  • One engineering challenge in building the product
  • One challenge from operating and updating the product
  • One team or management challenge
  • One business challenge
  • One safety or ethics challenge

Post answer to #lecture on Slack and tag all group members (skip if nobody in group has slack set up yet)


What qualities are important for a good commercial transcription product?


ML in a Production System

Architecture diagram of transcription service; many components, not just ML


ML in a Production System

Architecture diagram of transcription service; many components, not just ML


Screenshot of Temi transcription service

Notes: Highlights challenging fragments. Can see what users fix inplace to correct. Star rating for feedback.


<style> text { font: 60px sans-serif; } </style> Data Scientists Software Engineers
and Data engineers + Domain specialists + Operators + Business team + Project managers + Designers, UI Experts + Safety, security specialists + Lawyers + Social scientists + ...

Data scientist

  • Often fixed dataset for training and evaluation (e.g., PBS interviews)
  • Focused on accuracy
  • Prototyping, often Jupyter notebooks or similar
  • Expert in modeling techniques and feature engineering
  • Model size, updateability, implementation stability typically does not matter

Software engineer

  • Builds a product
  • Concerned about cost, performance, stability, release time
  • Identify quality through customer satisfaction
  • Must scale solution, handle large amounts of data
  • Detect and handle mistakes, preferably automatically
  • Maintain, evolve, and extend the product over long periods
  • Consider requirements for security, safety, fairness
----

Likely collaboration challenges?


What might Software Engineers and Data Scientists Focus on?

Screenshot of Temi transcription service


Unicorns

By Steven Geringer, via Ryan Orban. Bridging the Gap Between Data Science & Engineer: Building High-Performance Teams. 2016


T-Shaped People

Broad-range generalist + Deep expertise

T-Shaped

Figure: Jason Yip. Why T-shaped people?. 2018


T-Shaped People

Broad-range generalist + Deep expertise

Example:

  • Basic skills of software engineering, business, distributed computing, and communication
  • Deep skills in deep neural networks (technique) and medical systems (domain)

Latest Buzzword: π-Shaped People

π

Examples for discussion

  • What does correctness or accuracy really mean? What accuracy do customers care about?
  • How can we see how well we are doing in practice? How much feedback are customers going to give us before they leave?
  • Can we estimate how good our transcriptions are? How are we doing for different customers or different topics?
  • How to present results to the customers (including confidence)?
  • When customers complain about poor transcriptions, how to prioritize and what to do?
  • What are unacceptable mistakes and how can they be avoided? Is there a safety risk?
  • Can we cope with an influx of customers?
  • Will transcribing the same audio twice produce the same result? Does it matter?
  • How can we debug and fix problems? How quickly?
----

Examples for discussion 2

  • With more customers, transcriptions are taking longer and longer -- what can we do?
  • Transcriptions sometimes crash. What to do?
  • How do we achieve high availability?
  • How can we see that everything is going fine and page somebody if it is not?
  • We improve our entity detection model but somehow system behavior degrades... Why?
  • Tensorflow update; does our infrastructure still work?
  • Once somewhat successful, how to handle large amounts of data per day?
  • Buy more machines or move to the cloud?
  • Models are continuously improved. When to deploy? Can we roll back?
  • Can we offer live transcription as an app? As a web service?
  • Can we get better the longer a person talks? Should we then go back and reanalyze the beginning? Will this benefit the next upload as well?
----

Examples for discussion 3

  • How many domains can be supported? Do we have the server capacity?
  • How specific should domains be? Medical vs "International Conference on Allergy & Immunology"?
  • How to make it easy to support new domains?
  • Can we handle accents?
  • Better recognition of male than female speakers?
  • Can and should we learn from customer data?
  • How can we debug problems on audio files we are not allowed to see?
  • Any chance we might private leak customer data?
  • Can competitors or bad actors attack our system?

Syllabus and Class Structure

17-445/17-645/17-745/11-695, Spring 2024, 12 units

Monday/Wednesdays 2-3:20pm

Recitation Fridays 9:30am, 11am, and 2pm


Communication

  • Email us or ping us on Slack (invite link on Canvas)
  • All announcements through Slack #announcements
  • Weekly office hours, starting next week, schedule on Canvas
  • Post questions on Slack
    • Please use #general or #assignments and post publicly if possible; your classmates will benefit from your Q&A!
  • All course materials (slides, assignments, old midterms) available on GitHub and course website: https://mlip-cmu.github.io/s2024/
    • Pull requests encouraged!

Class with software engineering flavor

Focused on engineering judgment

Arguments, tradeoffs, and justification, rather than single correct answer

Practical engagement, building systems, testing, automation

Strong teamwork component

Both text-based and code-based homework assignments

It depends sticker


Prerequisites

Some machine-learning experience required

  • Basic understanding of data science process, incl. data cleaning, feature engineering, using ML libraries
  • High level understand of machine-learning approaches
    • supervised learning
    • regression, decision trees, neural networks
    • accuracy, recall, precision, ROC curve
  • Ideally, some experience with notebooks, sklearn or other frameworks

Basic programming and command-line skills will be needed

No further software-engineering knowledge required

  • Teamwork experience in product team is useful but not required
  • No required exposure to requirements, software testing, software design, continuous integration, containers, process management, etc
    • If you are familiar with these, there will be some redundancy -- sorry!

First Homework Assignment I1

"Coding warmup assignment"

Out now, due Monday Jan 29

Enhance simple web application with ML-based features: Image search and automated captioning

Open ended coding assignment, change existing code, learn new APIs

Screenshot of Albumy


Active lecture

Case study driven

Discussions highly encouraged

Regular in-class activities, breakouts

Contribute your own experience!

Discussions over definitions

Screenshot of Temi


Recordings and Attendance

Try to attend lecture -- discussions are important to learning

Participation is part of your grade

No lecture recordings, textbook and slides available

Contact us for accommodations (illness, interview travel, unforseen events) or have your advisor reach out. We try to be flexible


Participation

Participation != Attendance

Grading:

  • 100%: Participates actively at least once in most lectures by (1) asking or responding to questions or (2) contributing to breakout discussions
  • 90%: Participates actively at least once in two thirds of the lectures
  • 75%: Participates actively at least once in over half of the lectures
  • 50%: Participates actively at least once in one quarter of the lectures
  • 20%: Participates actively at least once in at least 3 lectures.

Class Overview


Reading Assignments & Quizzes

Building Intelligent Systems by Geoff Hulten

https://www.buildingintelligentsystems.com/

Most chapters assigned at some point in the semester

Supplemented with research articles, blog posts, videos, podcasts, ...

Electronic version in the library

Building intelligent systems book


Reading Quizzes

Short essay questions on readings, due before start of lecture (Canvas quiz)

Planned for: about 30-45 min for reading, 15 min for discussing and answering quiz


Book for the Class

"Machine Learning in Production: From Models to Products"

Mostly similar coverage to lecture

Not required, use as supplementary reading

Published online (and in book form next year)


Assignments

Most assignments available on GitHub now

Series of 4 small to medium-sized individual assignments:

  • Engage with practical challenges
  • Analyze risks, fairness
  • Reason about tradeoffs and justify your decisions
  • Mostly written reports, a little modeling, some coding

Large team project with 4 milestones:

  • Build and deploy a prediction (movie recommendation) service
  • Testing in production, monitoring
  • Final presentation

Usually due Monday night; see schedule

----

17-745 PhD Research Project

Research project instead of individual assignments I3 and I4

Design your own research project and write a report

  • A case study, empiricial study, literature survey, etc.,

Very open ended: Align with own research interests and existing projects

See the project requirements and talk to us

First hard milestone: initial description due Feb 27


Labs

Introducing various tools, e.g., fastAPI (serving), Kafka (stream processing), Jenkins (continuous integration), MLflow (experiment tracking), Docker & Kubernetis (containers), Prometheus & Grafana (monitoring), CHAP (explainability)...

Hands on exercises, bring a laptop

Often introducing tools useful for assignments

about 1h of work, graded pass/fail, low stakes, show work to TA

First lab on this Friday: Calling, securing, and creating APIs


Lab grading and collaboration

We recommend to start at lab before the recitation, but can be completed during

Graded pass/fail by TA on the spot, can retry

Relaxed collaboration policy: Can work with others before and during recitation, but have to present/explain solution to TA individually

(Think of recitations as mandatory office hours)


Grading

  • 35% individual assignment
  • 30% group project with final presentation
  • 10% midterm
  • 10% participation
  • 10% reading quizzes
  • 5% labs
  • No final exam (final presentations will take place in that timeslot)

Expected grade cutoffs in syllabus (>82% B, >94 A-, >96% A, >99% A+)


Grading Philosophy

Specification grading, based in adult learning theory

Giving you choices in what to work on or how to prioritize your work

We are making every effort to be clear about expectations (specifications), will clarify if you have questions

Assignments broken down into expectations with point values, each graded pass/fail

Opportunities to resubmit work until last day of class

[Example]


Token System for Flexibility

8 individual tokens per student:

  • Submit individual assignment 1 day late for 1 token (after running out of tokens 15% penalty per late day)
  • Redo individual assignment for 3 token
  • Resubmit or submit reading quiz late for 1 token
  • Redo or complete a lab late for 1 token (show in office hours)
  • Remaining tokens count toward participation

8 team tokens per team:

  • Submit milestone 1 day late for 1 token (no late submissions accepted when out of tokens)
  • Redo milestone for 3 token

How to use tokens

  • No need to tell us if you plan to submit very late. We will assign 0 and you can resubmit
  • Instructions and Google form for resubmission on Canvas
  • We will automatically use remaining tokens toward participation and quizzes at the end
  • Remaining individual tokens reflected on Canvas, for remaining team tokens ask your team mentor.

Group project

Instructor-assigned teams

Teams stay together for project throughout semester, starting Feb 5

Fill out Catme Team survey before Feb 5 (3pt)

Some advice in lectures; we'll help with debugging team issues

TA assigned to each team as mentor; mandatory debriefing with mentor and peer grading on all milestones (based on citizenship on team)

Bonus points for social interaction in project teams


Academic honesty

See web page

In a nutshell: do not copy from other students, do not lie, do not share or publicly release your solutions

In group work, be honest about contributions of team members, do not cover for others

Collaboration okay on labs, but not quizzes, individual assignments, or exams

If you feel overwhelmed or stressed, please come and talk to us (see syllabus for other support opportunities)


Thoughts on Generative AI for Homework?

GPT4, ChatGPT, CoPilot...? Reading quizzes, homework submissions, ...?


Our Position on Generative AI for Homew.

This is a course on responsible building of ML products. This includes questions of how to build generative AI tools responsibly and discussing what use is ethical.

Feel free to use them and explore whether they are useful. Welcome to share insights/feedback.

Warning: Be aware of hallucinations. Requires understanding to check answers. We test them ourselves and they often generate bad/wrong answers for reading quizzes.

You are responsible for the correctness of what you submit!


What makes software with ML challenging?


ML Models Make Mistakes

ML image captioning mistakes

Note: Source: https://www.aiweirdness.com/do-neural-nets-dream-of-electric-18-03-02/


Lack of Specifications

/**
  Return the text spoken within the audio file
  ????
*/
String transcribe(File audioFile);

Data Focused and Scalable

The ML Flywheel


Interaction with the environment

Architecture diagram of transcription service; many components, not just ML


It's not all new

We routinely build:

  • Safe software with unreliable components
  • Cyberphysical systems
  • Non-ML big data systems, cloud systems
  • "Good enough" and "fit for purpose" not "correct"

ML intensifies our challenges


Complexity

Complexity prediction


Complexity

Complexity prediction


Introductions

Before the next lecture, introduce yourself in Slack channel #social:

  • Your (preferred) name
  • In 1~2 sentences, your data science background and goals (e.g., coursework, internships, work experience)
  • In 1~2 sentences, your software engineering background, if any, and goals (e.g., coursework, internships, work experience)
  • One topic you are particularly interested in learning during this course?
  • A hobby or a favorite activity outside school

Summary

Machine learning components are part of larger systems

Data scientists and software engineers have different goals and focuses

  • Building systems requires both
  • Various qualities are relevant, beyond just accuracy

Machine learning brings new challenges and intensifies old ones