Tweak text for clarity in ai-garbage post

matt-dray · Mar 18, 2024 · 5156233 · 5156233
1 parent e9e3c47
commit 5156233
Show file tree

Hide file tree

Showing 7 changed files with 78 additions and 54 deletions.
diff --git a/_freeze/posts/2024-03-15-ai-garbage/index/execute-results/html.json b/_freeze/posts/2024-03-15-ai-garbage/index/execute-results/html.json
@@ -1,7 +1,7 @@
 {
-  "hash": "310ddb35b63827043e86c9ada466d940",
+  "hash": "f9561631a622b961857246824d01f602",
   "result": {
-    "markdown": "---\ntitle: Sh\\*tty R help from sh\\*tty AI\ndate: 2024-03-15\nslug: \"ai-garbage\"\nimage: resources/beckhams.png\ncategories:\n  - ai\n  - r\n---\n\n\n![](resources/beckhams.png){fig.alt=\"Beckhams meme. Panels with Victoria then David. She says 'I wrote an R help website' and he responds 'be honest'. She says 'I am being honest' and he says 'really?'. She says 'well, I used AI to write an R help website and am shamelessly making money from innocent suckers as a result' and he says 'thank you'.\" width='50%'}\n\n## tl;dr\n\nThe rise of R 'help' websites written by AI is predatory and shameless. Things will only get worse.\n\n## Help\n\nRegular readers know this for sure: I'm not an R expert. I don't just 'know' stuff. I'm DuckGoGoing 'how to do x in r' every 10 minutes.\n\nIn doing this, I've noticed a trend that I want to complain about: I've found a few suspicious 'help websites' for R appearing high up the search rankings.\n\nWhy 'suspicious'? They're clearly written with an AI tool. And they're garbage. In content and ethics.\n\nI'm absolutely not going to name websites here because I do not want to send any traffic there. \n\n## Taking you for a sucker\n\nHere's where they excel. They:\n\n* use your gullibility to make money\n* are brazen\n* have excellent SEO ([search-engine optimisation](https://en.wikipedia.org/wiki/Search_engine_optimization))\n\n### Gullibility\n\nYou might be thinking 'okay, but maybe this is an efficient way of helping people'. To which the obvious retort is 'okay, no, this is an efficient way to make money by exploiting the clicks of vulnerable learners'.\n\nHow? At least one of these sites suggests it has 'partners', which are clearly just affiliate links. They will make a commission if their visitors sign up for a course at the affiliate link. The site's KPI is conversions, not 'people helped'.\n\n### Brazenness\n\nThese sites seem to have tens (hundreds?) of help pages published on the same day without any attribution to a particular human. Either they have some very efficient staff or they assume no-one will check.\n\nOne website includes a 'package guide' for every package on CRAN. Wow! But you guessed it: these pages exist only to pad out the site. In this case, each of the 20,000 'guides' was the same AI-generated content, but with the name of the package changed each time. Of course, there are affiliate links at the bottom of each one.\n\nPerhaps most brazen is the poor attention to detail. At least one of these sites retains the sentence:\n\n> Certainly! Here are the two sections for adding <affiliate> and <affiliate> to your webpage:\n\nClearly someone has asked a chatbot for some text and it has obliged. And then they forgot to delete this telltale line from the output before pasting it into their website, lol.\n\n### SEO\n\nIn some cases I found links to these sites as the top search result for fairly generic R queries.\n\nAs is well known, people will just click the top links willy-nilly. There's an expectation that these must be the best sites if they're top of the search rankings, right?\n\nBut no. Google is gameable as heck and easily manipulated for clicks. \n\nThese sites haven't 'earned' their ranking by producing high-quality advice. They're not there because other people are linking to them as a mark of endorsement. \n\n## Why this sucks\n\nI mean it's kind of obvious that this garbage is harmful, but for the benefit of the doubt, my concerns are as follows.\n\n1. Who are they stealing from?\n2. How much of the code is hallucinatory?\n3. Is this ruining learners' understanding?\n\nIt's pretty common knowledge that many AIs are trained on data without the consent of original creators. How much content on these pages is stolen from people without their consent? Maybe it slurped up some of _your_ material against your will.\n\nThese sites have code where the examples literally cannot be run; the syntax cannot be evaluated if copy-pasted into an R terminal. At least one of these sites was offering advice for {ggplot2} without ever showing an example plot.\n\nI'm pretty seasoned at searching for things on the internet, particularly R. I can separate the wheat from the chaff, I reckon. But not everyone can. How can a beginner user know what's wrong if they copy and paste trash from a shameless website like this?\n\n## Suck it up\n\nBottom line: this is scummy.\n\nI'm asking that you take two seconds to think 'could this be a fake help website?' Consider the telltale signs:\n\n* suspicious wording and accidentally-undeleted verbiage copied from the output of an LLM (large language model) query\n* obvious links to affiliate sites\n* code that doesn't run when you copy it to your machine\n* examples of code, but no output\n* crappy [Corporate-Memphis-style](https://en.wikipedia.org/wiki/Corporate_Memphis) AI-generated images of a generic white-guy in his 30s at a computer who is probably called Matt[^matt]\n\nMaybe I don't need to warn you about this. It's 2024. Times have changed. You're smart. I grew up with floppies and CD ROMs. \n\nDon't patronise these sites by clicking affiliate links; patronise them with condescension. It's all we can do.\n\n### Environment {.appendix}\n\n<details><summary>Session info</summary>\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n```\nLast rendered: 2024-03-16 21:00:07 GMT\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nR version 4.3.1 (2023-06-16)\nPlatform: aarch64-apple-darwin20 (64-bit)\nRunning under: macOS Ventura 13.2.1\n\nMatrix products: default\nBLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib \nLAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0\n\nlocale:\n[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8\n\ntime zone: Europe/London\ntzcode source: internal\n\nattached base packages:\n[1] stats     graphics  grDevices utils     datasets  methods   base     \n\nloaded via a namespace (and not attached):\n [1] htmlwidgets_1.6.2 compiler_4.3.1    fastmap_1.1.1     cli_3.6.2        \n [5] tools_4.3.1       htmltools_0.5.6.1 rstudioapi_0.15.0 yaml_2.3.8       \n [9] rmarkdown_2.25    knitr_1.45        jsonlite_1.8.7    xfun_0.41        \n[13] digest_0.6.33     rlang_1.1.3       evaluate_0.23    \n```\n:::\n:::\n\n</details>\n\n[^matt]: I can say this because it's my exact description.",
+    "markdown": "---\ntitle: Sh\\*tty R help from sh\\*tty AI\ndate: 2024-03-15\nslug: \"ai-garbage\"\nimage: resources/beckhams.png\ncategories:\n  - ai\n  - r\n---\n\n\n![](resources/beckhams.png){fig.alt=\"Beckhams meme. Panels with Victoria then David. She says 'I wrote an R help website' and he responds 'be honest'. She says 'I am being honest' and he says 'really?'. She says 'well, I used AI to write an R help website and am shamelessly making money from innocent suckers as a result' and he says 'thank you'.\" width='50%'}\n\n## tl;dr\n\nThe rise of R 'help' websites written by AI is predatory and shameless. Things will only get worse.\n\n## Chatbottom of the barrel\n\nRegular readers know this for sure: I'm not an R expert. I don't just 'know' stuff. I'm DuckGoGoing 'how to do x in r' every 10 minutes.\n\nIn doing this, I've noticed a trend that I want to complain about: I've found a few suspicious 'help websites' for R appearing high up the search rankings.\n\nWhy 'suspicious'? They're clearly written with an AI tool. And they're garbage. In content and ethics.\n\nI'm absolutely not going to name websites here because I do not want to send any traffic there. \n\n## Taking you for a sucker\n\nHere's where they excel. They:\n\n* use your gullibility to make money\n* are brazen\n* have excellent SEO ([search-engine optimisation](https://en.wikipedia.org/wiki/Search_engine_optimization))\n\n### Gullibility\n\nYou might be thinking 'okay, but maybe this is an efficient way of helping people'. To which the obvious retort is 'okay, no, this is an efficient way to make money by exploiting the clicks of vulnerable learners with low-quality, harmful content'.\n\nHow? At least one of these sites suggests it has 'partners', which are clearly just affiliate links. They will make a commission if their visitors sign up for a course at the affiliate link. \n\nReputable companies know they need good product to help drive sales. But the KPI for these fake sites is purely conversions, not 'people helped'.\n\n### Brazenness\n\nThese sites seem to have tens (hundreds?) of help pages published on the same day without any attribution to a particular human. Either they have some very efficient staff or they assume no-one will check.\n\nOne website includes a 'package guide' for every package on CRAN. Wow! But you guessed it: these pages exist only to pad out the site. In this case, each of the 20,000 'guides' was the same AI-generated content, but with the name of the package changed each time. Of course, there are affiliate links at the bottom of each one.\n\nPerhaps most brazen is the poor attention to detail. At least one of these sites retains the sentence:\n\n> Certainly! Here are the two sections for adding <affiliate> and <affiliate> to your webpage:\n\nClearly someone has asked a chatbot for some text and it has obliged. And then they forgot to delete this telltale line from the output before pasting it into their website, lol.\n\n### SEO\n\nIn some cases I found links to these sites as the top search result for fairly generic R queries. Naturally, people will just click the top search results willy-nilly. These must be the best sites if they're top of the rankings, right?\n\nBut no. Google is, of course, gameable as heck and you can be manipulated for clicks with [search-engine optimisation](https://en.wikipedia.org/wiki/Search_engine_optimization) (SEO) hacks.\n\nThese sites haven't 'earned' their ranking by producing high-quality advice. They're not there because other people are linking to them as a mark of endorsement.\n\n## Why this sucks\n\nFeels obvious that this garbage is harmful, but for the benefit of the doubt, my concerns are as follows:\n\n1. Who are they stealing from?\n2. How much of the code is hallucinatory?\n3. Is this ruining learners' understanding?\n\nIt's pretty common knowledge that many AIs are trained on data without the consent of original creators. How much content on these pages is stolen from people without their consent? Maybe it slurped up some of _your_ material against your will.\n\nThese sites also have code where the examples literally cannot be run; the syntax cannot be evaluated if copy-pasted into an R terminal. At least one of these sites was offering advice for {ggplot2} without ever showing an example plot.\n\nI've been searching the internet for R-related stuff[^r] for many years and can separate the wheat from the chaff, I reckon. But not everyone can. How can a beginner user know what's wrong if they copy and paste trash from a shameless website like this?\n\n## Suck it up\n\nBottom line: this is scummy.\n\nI'm asking that you take two seconds to think 'could this be a fake website?' Consider the telltale signs:\n\n* suspicious wording and accidentally-undeleted verbiage copied from the output of an LLM (large language model) query\n* obvious links to affiliate sites\n* code that doesn't run when you copy it to your machine\n* examples of code, but no output\n* crappy AI-generated images that fill space (probably [Corporate-Memphis-style](https://en.wikipedia.org/wiki/Corporate_Memphis) abominations showing a generic 30-something white guy at a computer who is probably called Matt[^matt])\n\nMaybe I don't need to warn you about this. It's 2024. I grew up in a slower-paced learning environment of floppies and CD ROMs. Times have changed. You're smart. \n\nDon't patronise these sites by clicking affiliate links; patronise them with condescension. It's all we can do.\n\n### Environment {.appendix}\n\n<details><summary>Session info</summary>\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n```\nLast rendered: 2024-03-18 10:16:41 GMT\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nR version 4.3.1 (2023-06-16)\nPlatform: aarch64-apple-darwin20 (64-bit)\nRunning under: macOS Ventura 13.2.1\n\nMatrix products: default\nBLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib \nLAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0\n\nlocale:\n[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8\n\ntime zone: Europe/London\ntzcode source: internal\n\nattached base packages:\n[1] stats     graphics  grDevices utils     datasets  methods   base     \n\nloaded via a namespace (and not attached):\n [1] htmlwidgets_1.6.2 compiler_4.3.1    fastmap_1.1.1     cli_3.6.2        \n [5] tools_4.3.1       htmltools_0.5.6.1 rstudioapi_0.15.0 yaml_2.3.8       \n [9] rmarkdown_2.25    knitr_1.45        jsonlite_1.8.7    xfun_0.41        \n[13] digest_0.6.33     rlang_1.1.3       evaluate_0.23    \n```\n:::\n:::\n\n</details>\n\n[^matt]: I can say this because it's my exact description.\n[^r]: Although putting 'r' in your search terms often ends up with links to subreddits. Do try to avoid being sucked down an r/ProgrammerHumor black hole.",
     "supporting": [],
     "filters": [
       "rmarkdown/pagebreak.lua"

diff --git a/_site/index.html b/_site/index.html
@@ -162,7 +162,7 @@
 
 <div class="quarto-listing quarto-listing-container-grid" id="listing-listing">
 <div class="list grid quarto-listing-cols-3">
-<div class="g-col-1" data-index="0" data-categories="ai,r" data-listing-date-sort="1710460800000" data-listing-file-modified-sort="1710622805924" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="5">
+<div class="g-col-1" data-index="0" data-categories="ai,r" data-listing-date-sort="1710460800000" data-listing-file-modified-sort="1710756998985" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="5">
 <a href="./posts/2024-03-15-ai-garbage/index.html" class="quarto-grid-link">
 <div class="quarto-grid-item card h-100 card-left">
 <div class="card-body post-contents">