# Machine Learning Strategy

## Introduction to Machine Learning Strategy

### Why ML Strategy?
- Introduction
  - Welcome to the course on how to structure your machine learning project
  - Objective: Learn to quickly and efficiently make machine learning systems work
- Understanding Machine Learning Strategy
  - Motivating example: Working on a cat classification system
  - Current system accuracy: 90%, not sufficient for the application
  - Various ideas to improve the system:
    - Collect more training data
    - Increase diversity in training set (cat images with different poses, diverse negative examples)
    - Train the algorithm longer with gradient descent
    - Try different optimization algorithms (e.g., Adam optimization)
    - Experiment with network size (bigger/smaller), dropout, L2 regularization
    - Modify network architecture (activation functions, hidden unit count, etc.)
- Challenges in Improving Deep Learning Systems
  - Abundance of ideas to try
  - Risk of wasting time on ineffective approaches
  - Example: Spending six months collecting more data with minimal improvement
- Importance of Effective Strategies
  - Limited time availability for problem-solving
  - Need for quick and reliable ways to identify promising ideas
- Course Objectives
  - Teach strategies to analyze machine learning problems effectively
  - Share lessons learned from building and shipping deep learning products
  - Unique insights not commonly taught in university deep learning courses
- Evolution of Machine Learning Strategy in the Deep Learning Era
  - Deep learning algorithms offer new possibilities
  - Strategies differ from previous generation machine learning algorithms
- Conclusion
  - Application of the course's ideas to improve deep learning systems
  - Goal: Enhance effectiveness in achieving successful outcomes

  

### Orthogonalization

Orthogonalization - TV Example
- Old school televisions had multiple knobs to adjust various aspects of the picture.
- The TV designers ensured that each knob had a specific function, making it easier to adjust the picture.
- Orthogonalization refers to designing knobs that perform distinct functions, allowing precise tuning of the TV image.

Orthogonalization - Car Example
- Cars have separate controls for steering, acceleration, and braking, making it easy to understand the effects of each action.
- If controls were combined, it would be harder to achieve desired steering and speed.
- Orthogonal controls aligned with specific actions make it easier to tune the car.

Orthogonalization in Machine Learning
- To achieve good performance in supervised learning, four criteria must be met: training set performance, dev set performance, test set performance, and real-world performance.
- Each criterion requires specific tuning knobs to address potential issues.
- For example, a bigger network or better optimization algorithm can be used to improve training set performance.
- Regularization techniques can be applied to enhance dev set performance.
- Adjusting the size of the dev set can help if the algorithm performs well on the dev set but not the test set.
- If the algorithm performs well on the test set but not in the real world, the dev set or cost function may need to be modified.

Orthogonalization and Knobs in Machine Learning
- Orthogonalized controls allow for clear identification and adjustment of specific issues in a machine learning system.
- Early stopping, although widely used, affects both training set and dev set performance simultaneously, making it less orthogonalized.
- Using more orthogonalized controls simplifies the tuning process of neural networks.
- Orthogonalization helps diagnose performance bottlenecks and identify the specific set of knobs to improve system performance.

Diagnosing Performance Bottlenecks and Tuning
- The process involves identifying the limitations of the machine learning system's performance.
- Understanding which aspect (training set, dev set, test set, or real-world performance) needs improvement.
- Determining the specific set of knobs or adjustments to address the identified problem.
- Detailed explanation of this process will be provided in the following week's discussions.

## Setting Up your Goal

### Single Number Evaluation Metric

Introduction - Importance of a Single Real Number Evaluation Metric
- Tuning hyperparameters, trying out different learning algorithms, and exploring various options for building machine learning systems are common tasks.
- Progress is faster when there is a single real number evaluation metric that quickly indicates if a new approach is better or worse than the previous one.
- Setting up a single real number evaluation metric is recommended at the beginning of a machine learning project.
- Example: Evaluating classifiers using precision and recall.

Precision and Recall
- Precision: Percentage of classifier-recognized examples that are actually cats.
- Recall: Percentage of actual cat images correctly recognized by the classifier.
- Tradeoff exists between precision and recall, and both metrics are important.
- Precision and recall alone make it difficult to determine the better classifier.

Combining Precision and Recall - F1 Score
- F1 score is a standard way to combine precision and recall.
- The F1 score is the harmonic mean of precision and recall.
- F1 score is defined as 2/ (1/P + 1/R).
- Classifier A has a better F1 score in the example, making it the preferable choice.

Benefits of a Single Number Evaluation Metric
- Having a well-defined dev set and a single number evaluation metric accelerates the iteration process.
- A single number evaluation metric helps quickly determine the superior classifier (e.g., classifier A or B).
- It improves the efficiency of improving machine learning algorithms.

Evaluating Performance in Different Geographies
- Scenario: Building a cat app for cat lovers in four major geographies (US, China, India, Rest of the World).
- Classifiers achieve different errors in each geography.
- Tracking four numbers for each geography makes it difficult to determine the superior algorithm.
- Computing the average performance provides a single real number evaluation metric.
- Algorithm C has the lowest average error, making it a potential choice for further iteration.

Conclusion
- A single number evaluation metric enhances decision-making efficiency in machine learning.
- It helps determine if an idea or approach is effective.
- The discussion will continue in the next video, focusing on setting up optimizing and satisfying metrics.

Setting Up Optimizing and Satisfying Metrics
- The next video will cover how to effectively set up optimizing and satisfying metrics.
- Optimizing metrics focus on maximizing a specific performance measure (e.g., accuracy, F1 score).
- Satisfying metrics prioritize meeting certain criteria or thresholds (e.g., error rate below a specified value).
- Choosing the appropriate metric depends on the problem and project requirements.

Importance of Clear Evaluation Metrics
- Clear evaluation metrics provide guidance and clarity in decision-making.
- They enable teams to assess the effectiveness of different approaches objectively.
- Well-defined evaluation metrics facilitate communication and alignment among team members.

Iterative Process in Machine Learning
- The workflow in machine learning often involves generating ideas, implementing them, and evaluating their impact.
- A single number evaluation metric helps track the progress and effectiveness of each iteration.
- Iterating based on the evaluation metric improves the algorithm and enhances overall performance.

Summary
- Having a single number evaluation metric is crucial for efficiently evaluating machine learning models.
- Precision and recall are common metrics, but a tradeoff exists between them.
- The F1 score combines precision and recall, providing a balanced evaluation.
- Computing averages can help compare performance in different categories or geographies.
- Optimizing and satisfying metrics play a role in measuring specific goals and criteria.
- Clear evaluation metrics improve decision-making and facilitate communication within teams.

### Satisficing and Optimiznig Metric

Combining Optimizing and Satisficing Metrics
- In some cases, it's challenging to combine multiple evaluation metrics into a single row number.
- Setting up both optimizing and satisficing metrics can be useful to address this challenge.
- Example: Suppose you care about the classification accuracy and running time of a cat classifier.
- Combining accuracy and running time using a linear weighted sum may seem artificial.
- An alternative approach is to choose a classifier that maximizes accuracy while keeping the running time below a specified threshold.
- Accuracy is an optimizing metric, as you aim to maximize its value.
- Running time becomes a satisficing metric, where it only needs to be good enough (e.g., less than 100 milliseconds).
- Defining both metrics allows for a trade-off between accuracy and running time.
- Users may not significantly differentiate between running times below the threshold.
- The approach enables the selection of the "best classifier" based on the criteria.
- In general, when there are N metrics, it's reasonable to choose one as optimizing and the remaining N-1 as satisficing.
- The satisficing metrics need to reach a specific threshold but don't require further improvement.

Example: Wake Word Detection System
- Building a wake word detection system (e.g., "Alexa," "Hey Siri") involves accuracy and false positives.
- Maximizing accuracy when recognizing trigger words is crucial.
- Setting a threshold for false positives per 24 hours (e.g., at most one) becomes the satisficing metric.
- By combining these metrics, the system aims to achieve high accuracy while limiting false positives.

Summary of Combining Metrics
- Combining optimizing and satisficing metrics allows for a comprehensive evaluation.
- Choose one metric as optimizing to maximize its performance.
- The remaining metrics become satisficing, with thresholds that need to be met.
- This approach facilitates comparing and selecting the best option based on the combined metrics.

Training, Development, and Test Sets
- Evaluation metrics are calculated using training, development (dev), or test sets.
- Proper setup of these sets is crucial for reliable performance evaluation.
- The next video will provide guidelines on how to set up training, dev, and test sets.

### Train/Dev/Test Distributions

Importance of Setting up Training Dev and Test Sets
- Proper setup of training, dev (development), and test sets significantly impacts the progress and efficiency of machine learning teams.
- Poorly designed data sets can hinder progress instead of facilitating it.
- The focus in this video is on setting up dev and test sets.

Dev Set and Its Role in Machine Learning Workflow
- The dev set, also known as the development set or hold-out cross-validation set, plays a crucial role in the machine learning workflow.
- Machine learning teams generate various ideas and train different models on the training set.
- The dev set is then used to evaluate these ideas and select the most promising one.
- Continuous innovation and experimentation are conducted to improve the performance of the selected model on the dev set.
- The goal is to achieve satisfactory results on the dev set before evaluating the model on the test set.

Setting up Dev and Test Sets for a Cat Classifier Example
- Suppose you are building a cat classifier for different regions, such as the U.S., U.K., Europe, South America, India, China, other Asian countries, and Australia.
- Setting up dev and test sets in a specific way is crucial.
- In the example provided, four regions are randomly chosen for the dev set, while the other four regions are chosen for the test set.
- However, this approach is flawed because the dev and test sets come from different distributions.

Importance of Dev and Test Sets Having the Same Distribution
- It is recommended to ensure that the dev and test sets have the same distribution.
- Treating the dev set as a target allows the team to quickly innovate, experiment, and evaluate different models.
- Teams excel at aiming for a target, making improvements, and getting closer to hitting the bullseye on the dev set.
- Having dev and test sets from different distributions can lead to unexpected performance discrepancies.
- Months of optimization based on the dev set might not translate well when evaluated on the test set.
- It is frustrating for the team to realize that the target has been shifted to a different location after significant efforts.
- To avoid this, include randomly shuffled data from all regions in both the dev and test sets.
- This ensures that both sets come from the same distribution, representing the mixed data of all regions.

Example of Mismatched Dev and Test Sets
- A true story example involves a machine learning team optimizing a model for loan approvals in medium income zip codes.
- After several months of work, they decided to test the model on data from low income zip codes.
- The distribution of medium and low income zip codes is significantly different, leading to poor performance on the test set.
- The team wasted three months of work and had to redo substantial portions.
- The team aimed for one target for three months and then faced frustration when asked to hit a different target.

Recommendations for Dev and Test Set Setup
- Choose a dev and test set that reflects the data expected in the future and is important for good performance.
- Ensure that both sets come from the same distribution to accurately represent the desired target.
- Aligning the sets with future data expectations allows the team to aim at the desired target efficiently.
- The training set setup will be discussed in a separate video.
- Setting up the dev set and evaluation metric defines the target for the machine learning team.
- The size of the dev and test sets, along with considerations in the era of deep learning, will be discussed in the next video.

### Size of the Dev and Test Sets

Setting Up Dev and Test Sets in the Deep Learning Era

- Historical Guidelines for Dev and Test Sets
  - Traditional rule of thumb: 70/30 or 60/20/20 split for train and test sets
  - Reasonable when dealing with smaller dataset sizes
    - E.g., 100 examples: 70/30 or 60/20/20 split
    - E.g., 1,000 examples or 10,000 examples: similar splits still reasonable

- Changes in the Modern Machine Learning Era
  - Working with larger dataset sizes
  - Example: Having a million training examples
    - 98% for training set, 1% for dev set, 1% for test set
    - Dev and test sets can be smaller due to the abundance of training data

- Test Set Size Considerations
  - Purpose of the test set: Evaluate the final system's performance
  - Set the test set size to provide high confidence in overall system performance
  - Large test sets not always necessary
    - Confidence gained from 10,000 examples may suffice for application-specific performance evaluation
  - Test set size depends on the available data
    - May be much less than 30% of the overall dataset

- Train and Dev Sets without a Test Set
  - Some applications may not require high confidence in the overall system performance
  - Train and dev sets can be used, omitting the test set
  - Referred to as the "train dev set"
  - Dev set is used for tuning, acting as a substitute for the test set
  - Not recommended to omit the test set, but it can be acceptable in specific cases

- Summary of Setting Up Dev and Test Sets
  - Old rule of thumb (70/30 split) no longer applies in the era of big data
  - More data allocated for training, less for dev and test sets, especially with large datasets
  - Dev set should be set sufficiently large for its purpose, allowing evaluation and idea comparison
  - Test set should be sized adequately for evaluating the final model's performance
    - Can be much smaller than 30% of the data
  - Guidelines provided for setting up dev and test sets in the Deep Learning era

Next: Changing Evaluation Metric or Dev and Test Sets Midway in a Machine Learning Problem

### When to Change Dev/Test Sets and Metrics?

- Example of Misleading Evaluation Metric
  - Building a cat classifier for cat-loving users
  - Initial metric: Classification error
  - Algorithm A: 3% error, Algorithm B: 5% error
  - Algorithm A allows pornographic images, Algorithm B doesn't
  - Algorithm B considered better due to the absence of pornographic images, despite higher error rate

- Significance of Evaluation Metric and Dev Set
  - Evaluation metric acts as a target for the team to aim at
  - Dev set helps rank algorithms based on the metric
  - In the example, the evaluation metric misrepresents Algorithm A as better

- Need for Metric and Set Adjustment
  - When evaluation metric no longer accurately ranks algorithm preferences
  - Consider changing the evaluation metric, dev set, or test set

- Modifying the Evaluation Metric
  - Issue with the misclassification error metric in the example
  - Pornographic and non-pornographic images treated equally
  - Desire to avoid mislabeling pornographic images as cat images
  - Suggestion: Introduce weight term (w(i)) to differentiate between pornographic and non-pornographic images
    - Higher weight for pornographic images (e.g., 10 times)
  - Implementation involves labeling pornographic images in the dev and test sets

- Importance of Defining a New Evaluation Metric
  - If the current metric fails to rank algorithms correctly
  - Goal: Accurately determine the better algorithm for the application
  - Different ways to define a new evaluation metric
  - Don't hesitate to redefine the metric if unsatisfied with the current one

- Expanding the Scope of Evaluation Metrics
  - Focus on defining metrics for evaluating classifiers
  - Metrics should reflect preferences and goals of selecting a better algorithm
  - Not limited to the example of detecting pornographic images

- Breaking Down the Problem
  - Example of orthogonalization in machine learning
  - Divide the machine learning task into distinct steps
  - Step 1: Define the metric that captures the objective
  - Step 2: Focus on achieving high performance on the defined metric

- Placing the Target
  - Analogy of placing the target in target shooting
  - Separate step from aiming and shooting
  - Placing the target is analogous to defining the metric
  - Consider it as a separate knob to tune for algorithm performance

- Modifying the Cost Function
  - Adjusting the cost function in the learning algorithm
  - Incorporating weights to differentiate between different examples
  - Modifying the normalization constant for the cost function

- Importance of Orthogonalization
  - Philosophy of separating the steps of placing the target and shooting
  - Define the metric first, then optimize for it
  - Encouragement to think of defining the metric as one step
  - Modify the approach (e.g., cost function) to excel at the defined metric

- Example of Metric and Dev/Test Set Mismatch
  - Scenario: Two cat classifiers, A and B
  - Dev set evaluation: A has 3% error, B has 5% error
  - Deployment scenario differs from the evaluation set
  - Users upload lower quality, less well-framed images
  - Algorithm B performs better in the deployment scenario

- Adjusting Metric and Dev/Test Set
  - If the current metric and dev/test set don't reflect the desired application performance
  - Evaluation on high-quality images doesn't predict real-world performance
  - Change the metric and/or the dev/test set to match the application needs

- Speeding Up Iteration with Metrics and Dev Set
  - Benefits of having an evaluation metric and dev set
  - Enables faster decision-making and iteration
  - Set up a preliminary metric and dev set quickly
  - Continuously improve and refine them over time

- Recommendation and Efficiency
  - Suggestion to set up an evaluation metric and dev set early on
  - Iteration efficiency and team performance are improved
  - It's acceptable to change the metric and dev/test set later if needed
  - Discouragement from running without any evaluation metric or dev set

- Conclusion
  - Guidelines for changing the evaluation metric and dev/test sets
  - Setting up a well-defined target for efficient iteration and performance improvement

## Comparing Human-level Performance

### Why Human-Level Performance?

- Comparing Machine Learning to Human-Level Performance
  - Recently, there has been a growing interest in comparing machine learning systems to human-level performance.
  - Two main reasons for this trend:
    - Advances in deep learning have significantly improved the performance of machine learning algorithms, making them competitive with humans in many application areas.
    - Designing and building machine learning systems becomes more efficient when targeting tasks that humans can perform.

- Progress Towards Human-Level Performance
  - Progress in machine learning tasks over time:
    - Rapid improvement as the algorithm approaches human-level performance.
    - After surpassing human-level performance, progress and accuracy tend to slow down.
    - The hope is to achieve some theoretical optimum level of performance.

- Bayes Optimal Error
  - Bayes optimal error is the best possible error in mapping from input (x) to output (y).
  - The perfect level of accuracy may not be 100% due to factors like noise in audio or blurriness in images.
  - Bayes optimal error represents the theoretical best performance that cannot be surpassed.

- Reasons for Slowing Down After Surpassing Human-Level Performance
  - Human-level performance is often not far from Bayes optimal error, leaving less room for improvement.
  - Certain tools for performance improvement, such as using labeled data from humans and manual error analysis, are more effective when the algorithm is worse than humans.
  - Once the algorithm surpasses human-level performance, these tactics become harder to apply.

- Importance of Comparing to Human-Level Performance
  - Comparing to human-level performance helps in tasks that humans excel at.
  - Humans can provide labeled data and insights into algorithmic errors, improving the algorithm's performance.
  - Comparing to human performance aids in analyzing bias and variance.
  - Understanding human capabilities helps determine the focus on reducing bias and variance in machine learning algorithms.

### Avoidable Bias

- The Importance of Human Level Performance
  - Knowing human level performance helps determine the desired level of performance for the learning algorithm on the training set.
  - It guides the decision of how well, but not too well, the algorithm should perform on the training set.

- Influence of Human Level Error on Bias and Variance
  - The comparison of algorithm performance to human level performance reveals insights about bias and variance.
  - Scenario 1: Human level error is 1%
    - If the algorithm achieves 8% training error and 10% dev error, it indicates a significant gap between algorithm and human performance.
    - Focus on reducing bias by training a bigger neural network or running the training set longer to improve performance on the training set.
  - Scenario 2: Human level error is 7.5%
    - Even with the same training error and dev error as in the previous scenario, the algorithm is considered to perform well.
    - Focus on reducing variance by employing regularization or acquiring more training data to reduce the gap between training error and dev error.

- Human Level Error as a Proxy for Bayes Error
  - Human level error serves as an estimate or proxy for Bayes error, which represents the best achievable performance.
  - Human level error is typically close to Bayes error in computer vision tasks where humans excel.
  - The difference between human level error and training error is termed "avoidable bias," representing the minimum level of error that cannot be surpassed without overfitting.
  - The difference between training error and dev error is an indicator of the algorithm's variance problem.

- Tailoring Tactics Based on Human Level Performance
  - Understanding human level error helps determine the appropriate tactics to focus on.
  - High human level error indicates potential for bias reduction tactics.
  - Low human level error suggests a need for variance reduction tactics.
  - The avoidable bias and variance measure the potential for improvement in the algorithm's performance.

- Considerations for Decision Making
  - Human level performance has a nuanced impact on decision making and focus areas.
  - Factors like understanding the estimate of Bayes error and the level of avoidable bias influence the choice of tactics.
  - Different scenarios require different approaches to improve the algorithm's performance.

This video emphasizes the significance of human level performance in guiding the optimization of machine learning algorithms. By comparing algorithm performance to human level performance, decisions regarding bias and variance reduction tactics can be tailored accordingly. Understanding the relationship between human level error, avoidable bias, and variance helps in optimizing the algorithm's performance on training and development sets.

### Understanding Human-level Performance

- Definition of Human-Level Performance
  - The term "human-level performance" is often used loosely in research articles.
  - It is important to define it more precisely for driving progress in machine learning projects.
  - The definition that is most useful is the one related to estimating Bayes error.

- Estimating Bayes Error
  - Human-level error is used as a way to estimate Bayes error.
  - Bayes error represents the best possible error any function could achieve.
  - Example: Medical image classification
    - Different levels of human performance: untrained human (3% error), typical doctor (1% error), experienced doctor (0.7% error), team of experienced doctors (0.5% error).
    - Defining human-level error: It is a proxy or estimate for Bayes error.
    - Bayes error is less than or equal to the best achievable human performance (0.5% or lower).

- Different Definitions of Human-Level Error
  - Purpose-driven definitions: For research papers or system deployment.
    - Definition 1: Surpassing a typical doctor's performance.
    - Definition 2: Surpassing a single radiologist doctor's performance.

- Importance of Clear Definition
  - Define human-level error based on the intended purpose.
  - If the goal is to estimate Bayes error, use the performance achieved by a team of human doctors.
  - An error analysis example: Training error (5%), dev error (6%).
    - Measure of avoidable bias: The gap between Bayes error and training error.
    - Measure of variance problem: The difference between Bayes error and dev error.

- Impact of Definitions on Bias and Variance
  - Example 1: Training error (5%), dev error (6%)
    - Measure of avoidable bias: Around 4% (depending on the definition of human-level error).
    - Bias reduction techniques should be the focus.
  - Example 2: Training error (1%), dev error (5%)
    - Measure of avoidable bias: Around 0% to 0.5% (depending on the definition of human-level error).
    - Variance reduction techniques should be the focus.
  - Example 3: Training error (0.7%), dev error (0.8%)
    - Measure of avoidable bias: 0.2%.
    - Measure of variance problem: 0.1%.
    - Both bias and variance need attention.

- Difficulty of Progress at Human-Level Performance
  - Teasing out bias and variance effects becomes harder.
  - Estimating Bayes error accurately is crucial.
  - Progress in machine learning projects becomes more challenging.

- Importance of Human-Level Performance Estimate
  - Human-level error serves as a proxy or approximation for Bayes error.
  - Difference between Bayes error estimate and training error indicates avoidable bias.
  - Difference between training error and dev error indicates variance.

- Nuanced Analysis for Non-Zero Bayes Error
  - In some cases, Bayes error is non-zero and can't be expected to reach 0%.
  - Previous analysis using training error compared to 0% is insufficient.
  - Better estimates for Bayes error help understand avoidable bias and variance.

- Advantages of Human-Level Performance Estimate
  - Helps make decisions on bias reduction or variance reduction tactics.
  - Works well until surpassing human-level performance.
  - Estimating Bayes error accurately becomes challenging beyond human-level performance.

- Surpassing Human-Level Performance
  - Deep learning has enabled surpassing human-level performance in many tasks.
  - Further discussion on the process of surpassing human-level performance in the next video.

### Surpassing Human-level Performance

- Excitement in Surpassing Human-Level Performance:
  - Many teams find it thrilling to exceed human-level performance in a specific recreational classification task.
  - The difficulty of machine learning progress increases as it approaches or surpasses human-level performance.

- Evaluating Avoidable Bias and Variance - Example 1:
  - Problem scenario: A team of humans achieves 0.5% error, a single human has 1% error, and the algorithm has 0.6% training error and 0.8% dev error.
  - Estimate of Bayes' error: 0.5%.
  - Avoidable bias calculation: The difference between human-level error (0.5%) and the algorithm's training error (0.6%) suggests an avoidable bias of at least 0.1%.
  - Variance calculation: The difference between training error (0.6%) and dev error (0.8%) indicates a variance of approximately 0.2%.
  - Possible focus on reducing variance over avoidable bias in this case.

- Challenges in Evaluating Avoidable Bias - Example 2:
  - Problem scenario: A team of humans and a single human perform the same as before, but the algorithm has 0.3% training error and 0.4% dev error.
  - It becomes more difficult to determine the avoidable bias in this scenario.
  - Uncertainty arises regarding whether the training error (0.3%) indicates overfitting by 0.2%, or if Bayes' error is actually 0.1%, 0.2%, or 0.3%.
  - Insufficient information is available to decide whether to focus on reducing bias or variance in the algorithm.
  - Lack of clarity hinders progress efficiency.

- Implications of Surpassing Human-Level Performance:
  - Surpassing the 0.5% threshold makes it challenging to rely on human intuition for further algorithm improvement.
  - Machine learning significantly surpasses human-level performance in various areas, including online advertising, product recommendations, logistics, and loan prediction.
  - Notable observation: These examples involve structured data rather than natural perception tasks like computer vision or speech recognition.
  - Humans excel in natural perception tasks, making it harder for computers to surpass human-level performance in those areas.

- Role of Data Availability in Surpassing Human-Level Performance:
  - Problems where machine learning excels typically involve teams with access to extensive data.
  - Examples include online advertising, product recommendations, logistics, and loan prediction.
  - Deep learning systems benefit from examining large amounts of data to discover statistical patterns beyond the capabilities of the human mind.

- Surpassing Human-Level Performance in Speech Recognition and Computer Vision:
  - Speech recognition systems have achieved performance surpassing humans.
  - Some computer vision and image recognition tasks have also seen computers surpassing human-level performance.
  - Natural perception tasks are challenging for computers due to humans' inherent strength in these areas.

- Advances in Medical Tasks:
  - Certain medical tasks, such as reading ECGs, diagnosing skin cancer, and specific radiology tasks, show computers achieving performance beyond a single human.
  - Recent developments in deep learning have enabled surpassing human-level performance in some medical applications, despite the difficulty posed by natural perception tasks.

- Surpassing Human-Level Performance with Sufficient Data:
  - Surpassing human-level performance is challenging but achievable through deep learning and the availability of ample data.
  - Deep learning systems have achieved human-level performance and beyond in numerous applications.
  - Success relies on training deep learning systems with enough data for a specific supervisory problem.

- Encouragement for Future Deep Learning System Success:
  - The text expresses hope that deep learning systems, with continued advancements, will eventually surpass human-level performance.
  - It acknowledges that surpassing human-level performance is not an easy task but emphasizes the potential for significant progress.
  - The examples provided throughout the text serve as inspiration and motivation for individuals working on deep learning systems to strive for surpassing human-level performance in their respective applications.

### Improving your Model Performance


- Fundamental Assumptions for a Supervised Learning Algorithm:
    - The algorithm can fit the training set well, indicating low avoidable bias.
    - Performance on the training set generalizes effectively to the dev set or test set, implying manageable variance.
- Orthogonalization Approach:
    - Separate knobs for addressing avoidable bias issues and variance problems.
- Evaluating Performance Improvement:
    - Calculate the difference between training error and proxy for Bayes error to estimate avoidable bias.
    - Determine the difference between dev error and training error as an indication of the magnitude of the variance problem.
- Reducing Avoidable Bias:
    - Tactics for reducing avoidable bias:
        - Training a larger model to achieve better performance on the training set.
        - Extending training duration.
        - Utilizing advanced optimization algorithms (e.g., ADS momentum, RMSprop, Adam).
        - Exploring alternative neural network architectures and hyperparameters.
- Addressing Variance Problems:
    - Techniques to mitigate variance problems:
        - Acquiring more training data to enhance generalization to unseen dev set data.
        - Employing regularization methods such as L2 regularization, dropout, and data augmentation.
        - Conducting neural network architecture and hyperparameter search for better model fit.
- Mastery of Bias and Variance:
    - Understanding the concepts of bias and variance is crucial but challenging to master.
    - Systematically applying the learned concepts enables more efficient and strategic improvement of machine learning system performance.
- Homework and Conclusion:
    - Homework assignment allows practice and application of the discussed concepts.
    - Wishes good luck with the assignment and anticipation of upcoming videos.