-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reindex site results with search.gov #2991
Comments
While we're reindexing the site, I'd like to think about adding a sitemap.xml. Are there places in the site that people are having trouble finding, places we'd like them to find more easily, etc? May be good to add those to a general sitemap so search engines can more readily find them. |
Trying to get access into the rebranded search.gov system. Emailed the system owner yesterday. Going to move this to blocked, it's important that we get access back into the system before we start trying to re-index. Need to verify the API key from their system is also still valid. |
I have received a response back from the search.gov team and I have access back into their system now. Heading back into researching this ticket. |
Found some great documentation on our website's search indexing here: https://github.com/fecgov/fec-cms/blob/develop/fec/search/management/instructions.md |
I was able to follow the documentation to re-index the wagtail pages and data app pages. Next we'll need to re-index transition pages. |
@dorothyyeager The pages for transition are scraped based on the pages that are defined in this JSON file: https://github.com/fecgov/fec-cms/blob/develop/fec/search/management/data/transition_pages.json. Is there a more up-to-date list of transition pages you would like me to scrape? It doesn't have to be every page on transition, just the pages we think are important to have in the search. |
Thanks @dorothyyeager! These have been removed from the site search. At some point, I'd like the content team to decide what pages on the transition site should be added to the site index. Made a new ticket here: #3279 |
Thanks for doing this! It will be awesome to have all the content we've been putting up show up in the searches. Much, much appreciated! Will start thinking about the new ticket. |
Thank you @rfultz for suggesting this! I wrote up a ticket that explains steps that should be taken to accomplish this goal: #3280. It may even help us with automating search indexing from search.gov. |
@dorothyyeager I noticed that your example in this issue is still not showing up in the site search. I think it's not indexing the children or descendants of this section: /introduction-campaign-finance/. It may be missing other pages too. I created this new issue to see if we can figure out why: #3281 |
@dorothyyeager FYI, solved the issue about the |
Thanks @patphongs !! Good idea. I'll think on it. |
Summary
When searching for "other pages" in fec.gov, the search generally yields badly outdated results pointing to transition pages that no longer exist.
For example, a search in fec.gov's search box for "Guideline good order public funding" should return the public funding page (https://www.fec.gov/introduction-campaign-finance/understanding-ways-support-federal-candidates/presidential-elections/public-funding-presidential-elections/) at the top. Instead this is the result. The correct page does not appear in the results.
(The correct page is actually at the top of google's results when searching the same term.)
Expected Behavior
The search should return updated results so that the latest version of the current pages appears. I don't think this is an SEO issue as noted above; fec.gov pages generally turn up in google's results quickly.
Actual Behavior
fec.gov content pages that have been taken down off of transition are returned in the search results. Pages that have been up for quite awhile and are in the results for the same term searched in Google do not.
Frequency
How to Reproduce
List any steps you took for this issue to happen. Be sure to include the URL and what you clicked or entered.
URL: https://www.fec.gov/…
Screenshots
Same search on Google, yielding correct page as top result:
Misc
This is actually happening a lot with various searches for content pages, but this was the most egregious error yet, as public funding page has been up for over a year.
Completion criteria:
The text was updated successfully, but these errors were encountered: