Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PACER documents can belong to multiple cases #765

Closed
mlissner opened this issue Nov 16, 2017 · 3 comments
Closed

PACER documents can belong to multiple cases #765

mlissner opened this issue Nov 16, 2017 · 3 comments

Comments

@mlissner
Copy link
Member

(Split off from #2185)

This is going to be a rough one to fix properly, but it's worth figuring out. The basic problem as stated in the parent ticket, is:

• In CMECF, there is a many-to-one mapping between docket numbers and documents. A single document can belong to multiple docket numbers, as when an order is filed in two related cases.

I'm seeing this right now with pacer_doc_id 12707472047, which occurs as an attachment in both:

https://www.courtlistener.com/docket/4343877/in-re-state-street-bank-and-trust-co-fixed-income-funds-investment/?page=2#entry-105

And:

https://www.courtlistener.com/docket/4345781/yu-v-state-street-corp/#entry-58

Both involve "state street bank". The issue as it's hitting me today is that I can't add the document because I have a unique constraint on pacer_doc_id, and sure enough they both have the same value.

The solution here (as discussed in depth in the RECAP channel on Slack today) is to remove the unique constraint and just let the document exist in our system twice. We'll have to go through a fair bit of code to make sure this doesn't cause problems, but it's probably the right way forward.

Other solutions

The other ways forward are either:

  1. Adding an alias field joining the RECAPDocument table to itself to handle this case. That could work, but it's kind of a mess. It's really not ideal.

  2. Remodel the DB to pull apart the document data itself from any PACER metadata. This could work, but it adds a fourth level of joins to the model, it adds complexity, and it's a huge change.

@johnhawkinson
Copy link
Contributor

BTW, freelawproject/recap#174 (comment) is an extreme example, with a single order belonging to 5 civil cases.

@mlissner
Copy link
Member Author

mlissner commented Dec 14, 2017

Ok, so making some headway here. Things that need to happen for this to go off as a success:

  • I need to migrate the DB to remove the unique property of pacer_doc_id.
  • I need to populate any pacer_doc_id value that was formerly None with '' since we're no longer worried about unique collisions on the blank field and that'll conform with Django standards.
  • I need to search for anywhere that we're catching an IntegrityError, and figure out if it's still needed. (Note that these can be thrown by the unique_together field on the RECAPDocument as well.)
  • I need to go through anywhere that we do a RECAPDocument.objects.get() that relies on the uniqueness of pacer_doc_id to only get one result. These need to be updated to be filters or to otherwise only get one result.
  • I need to go through the rest of the code where it mentions pacer_doc_id and see if it's still good code.

(I'll be editing and updating this comment as I identify more things to do.)

@mlissner
Copy link
Member Author

I went through the failed documents that had this issue and reprocessed them all. Of 219, all but ten were processed successfully. This will have an even bigger effect on dockets, but I'm just going to let that take effect going forward, rather than reprocessing all the dockets we've already received (that would be a pretty big job for me and the server).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants