Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search fails due to search cache populated with non-existent models after plugin removal ('NoneType' object has no attribute 'objects') #13507

Closed
pv2b opened this issue Aug 18, 2023 · 9 comments · Fixed by #13546
Assignees
Labels
severity: medium Results in substantial degraded or broken functionality for specfic workflows status: accepted This issue has been accepted for implementation type: bug A confirmed report of unexpected behavior in the application

Comments

@pv2b
Copy link
Contributor

pv2b commented Aug 18, 2023

NetBox version

v3.5.8

Python version

Steps to Reproduce

Note: The full and exact repro steps aren't verified right now, but I agreed with @kkthxbye-code that I should file this as a bug anyway, as per our discussion on Slack.

  1. Install a plugin, such as netbox-topology-views
  2. Create a device, and make sure that somehow a "coordinate" is saved for that device. (Don't have exact steps for this right now.)
  3. Uninstall the plugin.
  4. Search for the name of the device that was created.

Expected Behavior

I expect to get a page of search results including that object.

Observed Behavior

I an get error screen with the python exception 'NoneType' object has no attribute 'objects'

@pv2b pv2b added the type: bug A confirmed report of unexpected behavior in the application label Aug 18, 2023
@pv2b
Copy link
Contributor Author

pv2b commented Aug 18, 2023

Here's some analysis based on my conversation with @kkthxbye-code on Slack and a workaround that got me out of this condition:

The pre-condition seems to be that a plugin (in my case, (netbox-topology-views)[https://github.com/mattieserver/netbox-topology-views] has been installed, and has populated some data into the database. Specifically in my case, there were some "coordinates" (i.e. on a topology graph, not lat/lng) that were set on a device.

After the plugin was uninstalled, a search on the name of the object with coordinates associated with it would error out with this message:

'NoneType' object has no attribute 'objects'

By running this with DEBUG on, I was able to repro that this crashes on this line: https://github.com/netbox-community/netbox/blob/v3.5.8/netbox/utilities/fields.py#L117

The exception seems to be hit because ct.model_class() returns None, and therefore you can't get .objects on it. Looking closer at what ct is, it seems to be <ContentType: coordinate>, which I believe is a content type added by netbox-topology-views. And then of course when the plugin is uninstalled, the model class doesn't exist (ct.model_class() return None) any more, and thus causes a problem.

In trying to fix this problem on my system, I first tried just dropping all the tables relating to netbox_topology_views from Postgres, but that wasn't enough. With the help of @kkthxbye-code, we were able to identify that the problem was that, in addition to stale data still existing in the database (which I had previously manually removed), cached data related to this was still present in the extras_cachedvalue table.

We expected that rebuilding the search index using python netbox/manage.py reindex would fix this, but that didn't happen. As a workaround, I truncated the entire extras_cachedvalue table and then ran reindex to build a search index from scratch, and that seems to have got rid of the issue completely.

From my analysis there's at least few places where this could be addressed and my initial thoughts for how to address this:

  1. At the point where the exception was thrown, maybe a check for ct.model_class() being None before referencing it would help steer the search code into some less crashy behaviour when this condition is encountered.
  2. The python netbox/manage.py reindex should clear out cached values for models that no longer exist.
  3. Maybe some kind of housekeeping task that checks for cached data with non-present models in the index? Maybe as a self-check on startup?

@pv2b
Copy link
Contributor Author

pv2b commented Aug 18, 2023

Oh by the way, I've saved a copy of the database that is corrupted in this state, so in case the proposed repro doesn't quite work out, that data will be available for more investigation. However I'm not comfortable sharing this SQL dump because it contains confidential information.

@pv2b
Copy link
Contributor Author

pv2b commented Aug 18, 2023

For anyone else running into this, until there's a code fix for this, here's how you can completely re-build the search index from scratch. That should clear this issue up. However, as for any time you're working with nbshell and and working at this level, and especially when running commands posted by some random guy on the Internet, make sure you have a full backup of your system, in case something goes off the rails.

  1. Issue the following commands in nbshell to completely clear the search index. (This will completely break search until a reindex is run, which is the next step.)
from netbox.search.backends import search_backend
search_backend.clear()
exit()
  1. Issue the following command to re-build the search index. This might take some time if you have a lot of data and/or a slow computer.
python manage.py reindex

This has worked for me and for one other person who's had this issue.

@pv2b
Copy link
Contributor Author

pv2b commented Aug 18, 2023

FYI: I've written up a fix for the missing "None" check (and I found a second one as well) but due to project rules I won't be able to open a real PR until this has been accepted, so I've opened it as a draft PR.
#13508

@kkthxbye-code kkthxbye-code added status: needs owner This issue is tentatively accepted pending a volunteer committed to its implementation severity: medium Results in substantial degraded or broken functionality for specfic workflows labels Aug 21, 2023
@kkthxbye-code
Copy link
Contributor

kkthxbye-code commented Aug 21, 2023

Replicates fine, thank you for the thorough report. My steps:

  • pip install netbox-topology-views==3.6.1
  • Add netbox-topology-views to PLUGINS and PLUGINS_CONFIG
  • Start netbox, go to Coordinates and just add one.
  • Search for the group name.
  • Remove plugin from PLUGINS

This issue is similar to #11335 - fixed by #11709 and #13117

I'm not sure your draft PR fixes the underlying issue and trying it out quick doesn't fix it, but does move the issue to another method. Maybe @jeremystretch or @abhi1693 can have a look, I'm not really that familiar with the custom code we have for gfk lookups.

@pv2b
Copy link
Contributor Author

pv2b commented Aug 21, 2023

Replicates fine, thank you for the thorough report. My steps:

* `pip install netbox-topology-views==3.6.1`

* Add `netbox-topology-views` to `PLUGINS` and `PLUGINS_CONFIG`

* Start netbox, go to `Coordinates` and just add one.

* Search for the group name.

This issue is similar to #11335 - fixed by #11709 and #13117

I'm not sure your draft PR fixes the underlying issue and trying it out quick doesn't fix it, but does move the issue to another method. Maybe @jeremystretch or @abhi1693 can have a look, I'm not really that familiar with the custom code we have for gfk lookups.

Depends what you mean is the underlying issue, in my opinion there are at least two underlying issues:

  1. There's old data left over in the search cache for content types that no longer exists.
  2. There's no error checking to handle the case where content type no longer exists.

Cleaning up the search cache will address 1, and adding checks for "None" will address 2, it's possible your repro is slightly different than the one I ran into myself, because my fix "works for me". But I did a quick code search before and I found a lot of cases of the pattern of just calling get_model() on a contenttype and expecting it to work, so it's possible that I might have missed a code path.

In my opinion it should be fixed on both ends.

@pv2b
Copy link
Contributor Author

pv2b commented Aug 21, 2023

Replicates fine, thank you for the thorough report. My steps:

* `pip install netbox-topology-views==3.6.1`

* Add `netbox-topology-views` to `PLUGINS` and `PLUGINS_CONFIG`

* Start netbox, go to `Coordinates` and just add one.

* Search for the group name.

Are you missing a repro step here, did you not need to uninstall the plugin?

@kkthxbye-code
Copy link
Contributor

Are you missing a repro step here, did you not need to uninstall the plugin?

Just missed the last step, added it now, thanks.

@bitcollector1
Copy link

This fix worked for me but I've not had the time to restore the old DB and try to narrow down the culprit with DEBUG. That said it looks like you fine folks have a good grasp on this issue.

I just wanted to say thanks again to @pv2b for the fix as it was annoying me for months!

@abhi1693 abhi1693 self-assigned this Aug 24, 2023
@abhi1693 abhi1693 added status: accepted This issue has been accepted for implementation and removed status: needs owner This issue is tentatively accepted pending a volunteer committed to its implementation labels Aug 24, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
severity: medium Results in substantial degraded or broken functionality for specfic workflows status: accepted This issue has been accepted for implementation type: bug A confirmed report of unexpected behavior in the application
Projects
None yet
4 participants