-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pulumi AI is poisoning Google search results with AI answers #79
Comments
Hey @petetnt, we've taken this feedback from you and others and we've taken steps to remove more than half (almost two thirds) of AI Answers, and we plan to continue to ensure that these AI answers are complementary to our existing documentation. We are also taking steps to:
|
Worth mentioning this list was submitted to Google this morning, so it could be a bit before they're removed from search results. We expect this to happen fairly soon, though. |
Thank you @AaronFriel and @cnunciato for the prompt (sic) and solid response 🫡 |
This was trending on my Twitter feed today, so it's pretty safe to assume that the situation is dire still https://twitter.com/ProgrammerDude/status/1784833971731223033 |
It honestly makes using Pulumi itself very challenging... hard to find valid answers on how to do something b/c the pulumi ask AI generated ones crowd the results and if you try them they don't actually work. And for some time (at least as of 2-3 weeks ago), the links to Pulumi's site for these generated results were 404'ing. Appreciate there's a GTM genefit to this SEO work, but at least for me it cut the opposite direction. Wanted to use Pulumi but this was such a pain point I just stuck with Terraform. |
It doesn't sound like robots.txt was changed. Removing answers isn't going to fix the issue if LLM-generated answers are still available in search results. However, for people that don't like Google, the issue will probably help provide alternatives to Google gain market share. |
You ABSOLUTELY MUST add a report button on these pages at the very least, ASAP! If somebody asks a question about stuff that doesn't exist, the LLM will hallucinate it, it'll rank high in searches (as no one else will have written about the solution that isn't possible) and it'll confuse the hell out from whoever finds it! I'm pretty knowledgeable about GCP (actually have the GCP Professional Cloud Architect certification) but I was chasing the wrong idea down that it would be possible to pre-create a Cloud Function with Pulumi and then use https://www.pulumi.com/ai/answers/7Kzx1a8vhPuAX6yYjEpeG3/deciphering-google-cloud-artifact-registry-and-cloud-functions-v2-integration |
@daaain Thank you for pointing this out! I actually thought we were doing this already. I just opened a PR to add the same feedback widget we use elsewhere in Pulumi AI. |
The PR's been merged and the site's been updated. Thanks again for the report! |
@mbomb007 We actually did update our robots.txt file to point to a sitemap we built to tell Google about unpublished pages, all of which return HTTP 410 with |
Amazing turnaround time, thanks a lot for your hard work on this today! I'll make sure to flag nonsense generated code when I find more (did on the 2 pages I linked above with explanation). |
This use of AI is monstrously stupid. Let us all pray to our respective gods that someone at Pulumi is intelligent enough to end this. |
You could maybe speed reindexing up using Google Search Console? |
@mbomb007 we did that as well, submitting the "unpublished" sitemap and the console reported it scanned (IIRC, it did not say "crawled") those pages. Our last resort has been using a tool that allows us to remove up to 1,000 URLs per day from Google's index, but it is fairly manual. |
Why not add |
@petetnt @mbomb007 Thanks, we've already taken some of those steps and we've added the meta tag for the pages we want to remove that didn't meet our quality bar and were cluttering search results (affecting roughly 2/3 of the pages we published.) I think the point you're getting at is: why publish AI Answers at all? In short: we've gotten very positive feedback users when the pages show up appropriately and don't clutter the first page of results in a search engine. We don't want to throw out the good with the bad, and we've marked those pages as I'll speak to @tobytteh's comment here, which I think captures the frustration folks have and the underlying question of why we Pulumi feels comfortable generating code examples with AI:
Code generation is Pulumi's bread and butter, it is a core competency of our engineering org. Every one of our providers has a rich schema describing the SDK (example Docker provider schema.json). Those schemas are then used to generate the SDKs for each language (source code in github.com/pulumi/pulumi/pkg/codegen). Pulumi AI combines this with retrieval augmented generation and type checking of generated programs to be more than ten times more likely to generate valid, working code for many questions than ChatGPT (GPT-4) on its own. Generating ten times better code than state of the art language models is itself a feat, but we aren't resting on our laurels and we're continuing to work on setting an even higher bar for ourselves to ensure that every program we publish would work from copy-and-paste to That said, we certainly didn't expect the result of publishing as many pages as we did, and that's why we've taken drastic steps to withdraw a significant number (around 2/3) of the AI Answers. We'll continue to raise the bar on quality and prune pages that do not meet our standards. |
Personally I don’t think the (alledged) 2/3rds is nearly good enough ratio to spam the internet full of absolutely wrong answers. Not to mention the page mentioned in the first post is still up and indexed for example, which makes me think that you are willing to risk it for a piece of the much obsessed AI pie. For example publishing an index of valid answers while keeping the actual index out the results would probably satisfy those looking for AI answers too. |
In the interest of transparency, I'm happy to set up a call to chat and prove that 2/3 figure. Email me at my last name at pulumi.com. That said, there are three issues here:
While I see the pros and cons of 1, the issues we want to solve are 2 and 3. If you have examples where the example is "absolutely wrong", that falls under 2, so please create issues or use the feedback buttons to let us know. |
Thanks everyone for your feedback. In February when we saw the impact Pulumi AI Answers had on search result quality, we started work on solutions and we’re now seeing dramatic improvements from the work we’ve done:
The good news is this has been effective! We’re pleased to see search engines use these signals to place our authoritative, expert-written docs content first. Pulumi AI is still providing a ton of value to users - we're seeing thousands of questions asked and answered every day, helping devs build faster on any cloud. With quality checks in place and search results cleaned up, we’ve made Pulumi AI a better resource that is more correctly ranked relative to our other docs such as our Registry API Docs. And we will keep iterating on these improvements to documentation, code generation and verification of AI generated content. We’ll close this issue as resolved, and thanks again for pushing us to make Pulumi better. |
Sadly for me the original issue persist, with the example in the OP still being one of the many answers that provide me with negative value, so I guess I’ll just consider this more a ”wontfix” than completed. |
Bizarrely, this is one of the only queries that seems to still rank so highly. It honestly baffles me why Google ranks this page above everything else. I've tried numerous others - and we've validated that most of the traffic Google is sending to these pages - has died down considerably. We will keep monitoring and iterating. That said, for what it's worth, the example on this page works! Is there a particular reason it isn't perceived as a reasonable page to have on the Internet? I'm not an AWS Lightsail expert, so apologies if I'm missing something obvious. |
What happened?
Today I was googling various infrastructure related searches and noticed a worrying trend of Pulumi AI answers getting indexed and ranking high on Google results, regardless of the quality of the AI answer itself or if the question involved Pulumi in the first place. This happened with multiple searches and will probably get even worse as the time goes on.
Example
For example search
AWS Lightsail xray
, brings up this AI Answer from Pulumi as the top result:Link to the AI Answer: https://www.pulumi.com/ai/answers/bLHAi4DutXJvbJyNngGRvS/optimizing-aws-lightsail-and-x-ray-deployment.
While this might seem like a good thing for someone, spamming high ranked results that are at best misleading and at worst destructive does not seem like something I would want to associate Pulumi as a brand. There's already tons of generated, false content available on the internet and adding even more noise to the search results is not a good idea.
I would highly recommend
Disallow
:ing robots from scrapinghttps://www.pulumi.com/ai/answers
viaRobots.txt
or similar functionality.Additional context
Adding
-inurl:[pulumi.com/ai]
to your query will remove Pulumi AI answers from the search results, but it's cumbersome.Contributing
Vote on this issue by adding a 👍 reaction.
The text was updated successfully, but these errors were encountered: