Skip to content

Commit

Permalink
Merge pull request #3 from brettkoonce/minor-sp
Browse files Browse the repository at this point in the history
minor spelling tweaks
  • Loading branch information
jph00 committed Feb 29, 2020
2 parents 32f5baa + 69ac4f9 commit 5ca8bf2
Show file tree
Hide file tree
Showing 17 changed files with 52 additions and 52 deletions.
10 changes: 5 additions & 5 deletions 01_intro.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@
"\n",
"- NLP: answering questions; speech recognition; summarizing documents; classifying documents; finding names, dates, etc in documents; searching for articles mentioning a concept\n",
"- Computer vision: satellite and drone imagery interpretation (e.g. for disaster resilience); face recognition; image captioning; reading traffic signs; locating pedestrians and vehicles in autonomous vehicles\n",
"- Medicine: Finding anomolies in radiology images, including CT, MRI, and x-ray; counting features in pathology slides; measuring features in ultrasounds; diagnosing diabetic retinopathy\n",
"- Medicine: Finding anomalies in radiology images, including CT, MRI, and x-ray; counting features in pathology slides; measuring features in ultrasounds; diagnosing diabetic retinopathy\n",
"- Biology: folding proteins; classifying proteins; many genomics tasks, such as tumor-normal sequencing and classifying clinically actionable genetic mutations; cell classification; analyzing protein/protein interactions\n",
"- Image generation: Colorizing images; increasing image resolution; removing noise from images; Converting images to art in the style of famous artists\n",
"- Recommendation systems: web search; product recommendations; home page layout\n",
Expand Down Expand Up @@ -1464,7 +1464,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"**Overfitting is the most important and challenging single issue** when training for all machine learning practitioners, and all algorithms. As we will see, it is very easy to create a model that does a great job at making predictions on the exact data which it has been trained on, but it is much harder to make predictions on data that it has never seen before. And of course this is the data that will actually matter in practice. For instance, if you create and hand-written digit classifier (as we will very soon!) and use it to recognise numbers written on cheques, then you are never going to see any of the numbers that the model was trained on -- every cheque will have slightly different variations of writing to deal with. We will learn many methods to avoid overfitting in this book. However, you should only use those methods after you have confirmed that overfitting is actually occuring (i.e. you have actually observed the validation accuracy getting worse during training). We often see practitioners using over-fitting avoidance techniques even when they have enough data that they didn't need to do so, ending up with a model that could be less accurate than what they could have gotten."
"**Overfitting is the most important and challenging single issue** when training for all machine learning practitioners, and all algorithms. As we will see, it is very easy to create a model that does a great job at making predictions on the exact data which it has been trained on, but it is much harder to make predictions on data that it has never seen before. And of course this is the data that will actually matter in practice. For instance, if you create and hand-written digit classifier (as we will very soon!) and use it to recognise numbers written on cheques, then you are never going to see any of the numbers that the model was trained on -- every cheque will have slightly different variations of writing to deal with. We will learn many methods to avoid overfitting in this book. However, you should only use those methods after you have confirmed that overfitting is actually occurring (i.e. you have actually observed the validation accuracy getting worse during training). We often see practitioners using over-fitting avoidance techniques even when they have enough data that they didn't need to do so, ending up with a model that could be less accurate than what they could have gotten."
]
},
{
Expand Down Expand Up @@ -2089,7 +2089,7 @@
"\n",
"The outputs themselves can be deceiving: they have the results of the last time the cell was executed, but if you change the code inside a cell without executing it, you will keep them.\n",
"\n",
"Except when we mention it explicitely, the notebooks provided on the book website are meant to be run in order, from top to bottom. In general, when experimenting, you will find yourself executing cells in any order to go fast (which is a super neat featur of Jupyter Notebooks) but once you have explored and arrive at the final version of your code, make sure you can run the cells of your notebooks in order (your future self won't necessarily remember the convoluted path you took otherwise!). \n",
"Except when we mention it explicitly, the notebooks provided on the book website are meant to be run in order, from top to bottom. In general, when experimenting, you will find yourself executing cells in any order to go fast (which is a super neat feature of Jupyter Notebooks) but once you have explored and arrive at the final version of your code, make sure you can run the cells of your notebooks in order (your future self won't necessarily remember the convoluted path you took otherwise!). \n",
"\n",
"In edit mode, pressing `0` twice will restart the *kernel* (which is the engine powering your notebook). This will wipe your state clean and make it as if you had just started in the notebook. Clean then on the \"Cell\" menu and then on \"Run All Above\" to run all the cells above the point you are. We have found this to be very useful when developing the fastai library."
]
Expand Down Expand Up @@ -2581,7 +2581,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"To do a good job of defining a validation set (and possibly a test set), you will sometimes want to do more than just randomly grab a fraction of your original datset. Remember: a key property of the validation and test sets is that they must be representative of the new data you will see in the future. This may sound like an impossible order! By definition, you haven’t seen this data yet. But you usually still do know some things.\n",
"To do a good job of defining a validation set (and possibly a test set), you will sometimes want to do more than just randomly grab a fraction of your original dataset. Remember: a key property of the validation and test sets is that they must be representative of the new data you will see in the future. This may sound like an impossible order! By definition, you haven’t seen this data yet. But you usually still do know some things.\n",
"\n",
"It's instructive to look at a few example cases. Many of these examples come from predictive modeling competitions on the *Kaggle* platform, which is a good representation of problems and methods you would see in practice.\n",
"\n",
Expand Down Expand Up @@ -2721,7 +2721,7 @@
"1. What is the name of the theorem that a neural network can solve any mathematical problem to any level of accuracy?\n",
"1. What do you need in order to train a model?\n",
"1. How could a feedback loop impact the rollout of a predictive policing model?\n",
"1. Do we always have to use 224x224 pixel images with the cat recogition model?\n",
"1. Do we always have to use 224x224 pixel images with the cat recognition model?\n",
"1. What is the difference between classification and regression?\n",
"1. What is a validation set? What is a test set? Why do we need them?\n",
"1. What will fastai do if you don't provide a validation set?\n",
Expand Down
2 changes: 1 addition & 1 deletion 02_production.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"**Text (natural language processing)**: just like in computer vision, computers are very good at categorising both short and long documents based on categories such as spam, sentiment, author, source website, and so forth. We are not aware of any rigourous work done in this area to compare to human performance, but anecdotally it seems to us that deep learning performance is similar to human performance here. Deep learning is also very good at generating context-appropriate text, such as generating replies to social media posts, and imitating a particular author's style. It is also good at making this content compelling to humans, and has been shown to be even more compelling than human-generated text. However, deep learning is currently not good at generating *correct* responses! We don't currently have a reliable way to, for instance, combine a knowledge base of medical information, along with a deep learning model for generating medically correct natural language responses. This is very dangerous, because it is so easy to create content which appears to a layman to be compelling, but actually is entirely incorrect.\n",
"**Text (natural language processing)**: just like in computer vision, computers are very good at categorising both short and long documents based on categories such as spam, sentiment, author, source website, and so forth. We are not aware of any rigorous work done in this area to compare to human performance, but anecdotally it seems to us that deep learning performance is similar to human performance here. Deep learning is also very good at generating context-appropriate text, such as generating replies to social media posts, and imitating a particular author's style. It is also good at making this content compelling to humans, and has been shown to be even more compelling than human-generated text. However, deep learning is currently not good at generating *correct* responses! We don't currently have a reliable way to, for instance, combine a knowledge base of medical information, along with a deep learning model for generating medically correct natural language responses. This is very dangerous, because it is so easy to create content which appears to a layman to be compelling, but actually is entirely incorrect.\n",
"\n",
"Another concern is that context-appropriate, highly compelling responses on social media can be used at massive scale — thousands of times greater than any troll farm previously seen — to spread disinformation, create unrest, and encourage conflict. As a rule of thumb, text generation will always be technologically a bit ahead of the ability of models to recognize automatically generated text. For instance, as we will see in this book, it is possible to use a model that can recognize artificially generated content to actually improve the generator that creates that content, until the classification model is no longer able to complete its task.\n",
"\n",
Expand Down
14 changes: 7 additions & 7 deletions 03_ethics.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"images/ethics/image2.png\" id=\"meeting\" caption=\"IBM CEO Tom Watson Sr. metting with Adolf Hitler\" alt=\"A picture of IBM CEO Tom Watson Sr. metting with Adolf Hitler\" width=\"400\">"
"<img src=\"images/ethics/image2.png\" id=\"meeting\" caption=\"IBM CEO Tom Watson Sr. meeting with Adolf Hitler\" alt=\"A picture of IBM CEO Tom Watson Sr. meeting with Adolf Hitler\" width=\"400\">"
]
},
{
Expand Down Expand Up @@ -328,7 +328,7 @@
"source": [
"Russia Today's coverage of the Mueller report was an extreme outlier in how many channels were recommending it. This suggests the possibility that Russia Today, a state-owned Russia media outlet, has been successful in gaming YouTube's recommendation algorithm. The lack of transparency of systems like this make it hard to uncover the kinds of problems that we're discussing.\n",
"\n",
"Another example of a feedbakc loop is a predictive policing algorithm that predicts more crime in certain neighborhoods, causing more police officers to be sent to those neighborhoods, which can result in more crime being recorded in those neighborhoods, and so on. University of Utah computer science processor Suresh Venkatasubramanian says about this: \"Predictive policing is aptly named: it is predicting future policing, not future crime.”\n",
"Another example of a feedback loop is a predictive policing algorithm that predicts more crime in certain neighborhoods, causing more police officers to be sent to those neighborhoods, which can result in more crime being recorded in those neighborhoods, and so on. University of Utah computer science processor Suresh Venkatasubramanian says about this: \"Predictive policing is aptly named: it is predicting future policing, not future crime.”\n",
"\n",
"There are positive examples of people and organizations attempting to combat these problems. Evan Estola, lead machine learning engineer at Meetup, [discussed the example](https://www.youtube.com/watch?v=MqoRzNhrTnQ) of men expressing more interest than women in tech meetups. Meetup’s algorithm could recommend fewer tech meetups to women, and as a result, fewer women would find out about and attend tech meetups, which could cause the algorithm to suggest even fewer tech meetups to women, and so on in a self-reinforcing feedback loop. Evan and his team made the ethical decision for their recommendation algorithm to not create such a feedback loop, but explicitly not using gender for that part of their model. It is encouraging to see a company not just unthinkingly optimize a metric, but to consider their impact. \"You need to decide which feature not to use in your algorithm… the most optimal algorithm is perhaps not the best one to launch into production\", he said.\n",
"\n",
Expand Down Expand Up @@ -420,7 +420,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Yes, that is showing what you think it is: Google Photos classified a Black user's photo with their friend as \"gorrilas\"! This algorithmic mis-step got a lot of attention in the media. “We’re appalled and genuinely sorry that this happened,” a company spokeswoman said. “There is still clearly a lot of work to do with automatic image labeling, and we’re looking at how we can prevent these types of mistakes from happening in the future.”\n",
"Yes, that is showing what you think it is: Google Photos classified a Black user's photo with their friend as \"gorillas\"! This algorithmic mis-step got a lot of attention in the media. “We’re appalled and genuinely sorry that this happened,” a company spokeswoman said. “There is still clearly a lot of work to do with automatic image labeling, and we’re looking at how we can prevent these types of mistakes from happening in the future.”\n",
"\n",
"Unfortunately, fixing problems in machine learning systems when the input data has problems is hard. Google's first attempt didn't inspire confidence, as covered by The Guardian:"
]
Expand All @@ -429,7 +429,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"images/ethics/image8.png\" id=\"gorilla-ban\" caption=\"Google first response to the problem\" alt=\"Pictures of a headlines form the Guardian, whoing Google removed gorialls and other moneys fomr the possible labels of its algorithm\" width=\"500\">"
"<img src=\"images/ethics/image8.png\" id=\"gorilla-ban\" caption=\"Google first response to the problem\" alt=\"Pictures of a headlines from the Guardian, whoing Google removed gorillas and other moneys from the possible labels of its algorithm\" width=\"500\">"
]
},
{
Expand Down Expand Up @@ -795,14 +795,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"> : The ethical implications of algorithmic systems have been much discussed in both HCI and the broader community of those interested in technology design, development and policy. In this paper, we explore the application of one prominent ethical framework - Fairness, Accountability, and Transparency - to a proposed algorithm that resolves various societal issues around food security and population ageing. Using various standardised forms of algorithmic audit and evaluation, we drastically increase the algorithm's adherence to the FAT framework, resulting in a more ethical and beneficent system. We discuss how this might serve as a guide to other researchers or practitioners looking to ensure better ethical outcomes from algorithmic systems in their line of work."
"> : The ethical implications of algorithmic systems have been much discussed in both HCI and the broader community of those interested in technology design, development and policy. In this paper, we explore the application of one prominent ethical framework - Fairness, Accountability, and Transparency - to a proposed algorithm that resolves various societal issues around food security and population aging. Using various standardised forms of algorithmic audit and evaluation, we drastically increase the algorithm's adherence to the FAT framework, resulting in a more ethical and beneficent system. We discuss how this might serve as a guide to other researchers or practitioners looking to ensure better ethical outcomes from algorithmic systems in their line of work."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this paper, the rather contraversial proposal (\"Turning the Elderly into High-Nutrient Slurry\") and the results (\"drastically increase the algorithm's adherence to the FAT framework, resulting in a more ethical and beneficent system\") are at odds... to say the least!\n",
"In this paper, the rather controversial proposal (\"Turning the Elderly into High-Nutrient Slurry\") and the results (\"drastically increase the algorithm's adherence to the FAT framework, resulting in a more ethical and beneficent system\") are at odds... to say the least!\n",
"\n",
"In philosophy, and especially philosophy of ethics, this is one of the most effective tools: first, come up with a process, definition, set of questions, etc, which is designed to resolve some problem. Then try to come up with an example where that apparent solution results in a proposal that no-one would consider acceptable. This can then lead to a further refinement of the solution."
]
Expand All @@ -828,7 +828,7 @@
"\n",
"Ethical behavior in industry is necessary as well, since:\n",
"- Law will not always keep up\n",
"- Edge cases will arise in which pracitioners must use their best judgement."
"- Edge cases will arise in which practitioners must use their best judgement."
]
},
{
Expand Down
Loading

0 comments on commit 5ca8bf2

Please sign in to comment.