-
-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documents belong to multiple cases; multiple cases belong to one docket (the doppelganger bug) #2185
Comments
footnote: it's also possible to run "combined docket reports" for multiple cases. For instance, in Cadden, checking subcases Even more frightfully from the RECAP perspective, this combined report feature is not limited to subcases. You can enter arbitrary unrelated caseids seperated by semicolons in the URL parameter string for Although probably nobody does this, so it's not a big worry. But the data model should accommodate it. And I think it screws up receipts (that is, they are not reliable indicators of the case to which the document belongs, if indeed the document belongs to a single case in the combined report). |
Looks like at least three issues here. The first issue here is that documents can belong to multiple cases. I've split that off into its own ticket: #765 |
This is a better description of the "doppelganger cases" issue described in freelawproject/recap#36 and freelawproject/recap#146 [Editor's note: both now closed as dups]. |
For the record, subcases need not have consecutive caseids. See US v. Murgio in SDNY: <request number="15cr769">
<case number="1:15-cr-769" id="449632" title="1:15-cr-00769-AJN USA v. Murgio et al" defendant="0" sortable="1:2015-cr-00769-AJN"/>
<case number="1:15-cr-769-1" id="449633" title="1:15-cr-00769-AJN-1 Anthony R. Murgio (closed 10/25/2017)" defendant="1" sortable="1:2015-cr-00769"/>
<case number="1:15-cr-769-2" id="450676" title="1:15-cr-00769-AJN-2 Yuri Lebedev (closed 11/01/2017)" defendant="2" sortable="1:2015-cr-00769"/>
<case number="1:15-cr-769-3" id="454366" title="1:15-cr-00769-AJN-3 Trevon Gross (closed 11/16/2017)" defendant="3" sortable="1:2015-cr-00769"/>
<case number="1:15-cr-769-4" id="456495" title="1:15-cr-00769-AJN-4 Michael J. Murgio (closed 01/30/2017)" defendant="4" sortable="1:2015-cr-00769"/>
<case number="1:15-cr-769-5" id="464041" title="1:15-cr-00769-AJN-5 Jose M Freundt" defendant="5" sortable="1:2015-cr-00769"/>
<case number="1:15-cr-769-6" id="467688" title="1:15-cr-00769-AJN-6 Ricardo Hill" defendant="6" sortable="1:2015-cr-00769"/>
</request> |
Grr, automated commit message thing. This is not fixed. |
I confess I'm still not sure how to proceed here. When we have multiple pacer case IDs, are those IDs just a different view into the same docket or are they actually different dockets altogether? In some form, we need to link all these dockets together under one umbrella, like PACER does, but I don't understand what PACER is accomplishing with these well enough to understand how to do it in our UI. |
@mlissner I see the dilemma and yes, I think this is what I was running into with freelawproject/recap#267 ... I've been thinking about it and, ultimately, I think @johnhawkinson hit on probably the best solution -- CL needs an additional layer that knows about the sub-dockets as that seems to be at the core. As noted, it seems that documents can belong to multiple sub-dockets simultaneously but those sub-dockets might also have their own unique items. Right now, as described, CL treats these individual sub-dockets as separate cases in the database (e.g. they get their own CL docket ID number because the pacer docket ID is different). This results in the behavior we're seeing here. Instead, if there's a "related dockets" table, this data could be broken out and then separated. Consider 2 situations: Simple DocketUse existing systems, no sub/related cases. Things function as normal. Complex DocketSub-Dockets or related cases - Need to maintain information that allows for correlation of related dockets. High Level OverviewNew DB table that indicates the relationships between the dockets and, if needed, storage of any "master" information. Extend existing docket table to include a column indicating the "master" docket entry. (If null or 0, no master docket - or something similar). Keep maintaining separate CL entries for each sub-docket (because that matches how PACER works) but if there are multiple related entries, display a "master" page indicating all related dockets. Correlating the cases could be accomplished in multiple ways - some more accurate/reliable than others:
UI IdeasI'm not 100% sure how best to implement searching -- but for related dockets, display 1 entry in the search results leading to the master docket page that then shows the sub-related dockets. (Similar to how the PACER query looks that @johnhawkinson posted). On each sub-docket page, at a minimum, display a link back to the parent master docket - preferably a sidebar listing of related dockets. Closing Thoughts / CaveatsI have not spent nearly enough time to fully understand the existing database structure and the inner workings (gotta pay the bills, we all know how that one goes) - there might be something I'm missing here but it seems that the only way to fix this is to give CL/RECAP the ability to know that dockets could be related to each other. I think that probably adding an additional DB layer to store that relationship information would enable this to be resolved once and for all. Consider this code: courtlistener/cl/recap/tasks.py Lines 1355 to 1360 in 19b215c
If I'm reading this correctly, )a) the current method for dealing with duplicate dockets is to update the oldest and (b) the system differentiates between CL dockets on the basis of the PACER case ID/docket number... which can be different for related cases. |
I thought that was undisputed
Well, normally speaking all the case numbers in this situation are consecutive, so that's a huge win.
Well. Please don't use the term "related" for multiple subdockets of the same master criminal docket. We use the term "related" to refer to a different kind of relationship between cases, like where I file a civil action a year after you did while yours is still pending and they address common issues of law but joinder may not be appropriate, so I mark my case as related to yours and they are typically assigned to the same judge for reasons of judicial economy (varies district-to-district). Or similarly in an MDL context. This usage is important because: Court staff have the ability to file a CMECF document in multiple cases, and those cases will all refer to the same docket number. The cases need not have a subcase relationship. It is typically the case that this happens in related cases, though, using the "related" meaning that I have explained above.
I'm not entirely sure what you mean by this.
See above. They are likely to be related (but possibly not; say a judge gets sick and the chief judge dockets a stay/postponement order in all of his active cases with calendar dates in the next week), but not necessarily with a subdocket relationship. |
The data model at https://www.courtlistener.com/api/rest-info/ can have this change to begin with: RECAPDocument table cannot have "docket_entry". More than one case (and therefore dockets) can refer the same document. This is not only common in criminal cases, but in any case. Therefore, DocketEntry table must keep the reference of the document instead. |
I'm not sure if there's a separate issue on this but: The problem of (what appears to be) a single docket in PACER turning into multiple dockets/cases on RECAP is still a major issue. See: https://www.courtlistener.com/?type=r&q=&type=r&order_by=score%20desc&docket_number=2%3A18-cr-00422&court=azd I need to do more digging but it appears that all 8 of these RECAP dockets will lead to the same PACER docket report (when using the "View on PACER" blue header button). More interestingly/concerningly, documents are being uploaded and associated, but not always with the same RECAP docket. Further, the RECAP extension appears to be able to find the document availability in RECAP without an issue... (when viewing the "do you want to buy this document" page in PACER) I remember there being a discussion about how a (supposedly single) PACER docket could somehow turn into multiple RECAP dockets. Regardless, this is becoming a bigger and bigger issue. I need to look a bit more at the 8 different RECAP dockets in the search link above but it does appear that there are documents that are associated with only one of the RECAP dockets. (In other words, there are unique documents in each RECAP docket.) From a data accuracy/integrity standpoint, this is kinda messy. Perhaps solving the creation of multiple dockets in RECAP is unnecessary - perhaps the solution is to make the links work in every RECAP docket? (assuming there's something in the database that would associate the multiple RECAP dockets) |
I believe this is the proper issue, @danieldjewell. The case you cite, USA v. Lacey, is expected to have 8 RECAP dockets, since there are 7 criminal subcases plus the master case:
I do think it's a correct observation that the CourtListener docket report should stop searching by case number and document ID and merely search by document ID, and that would remove some of the pain, at least where the docket report had been run. But this problem calls out for more serious attention than it has gotten, since basically "RECAP is unusable for criminal cases" is where it shakes out, and that just sucks. |
I raised the issue #2181, cited above. I’m wondering about how this problem can be fixed. Would it be possible to merge identical dockets? Or for example to make a request to do so? |
FYI this issue was referenced on the Law SE site: Why are there two case numbers for United States v. Trump? |
+1 For United States v. Assange there also seem to be two similar entries. https://www.courtlistener.com/docket/14488925/united-states-v-assange/ https://www.courtlistener.com/docket/68881226/united-states-v-assange/ |
I'm not going to breaking any new ground here @mlissner with this comment but criminal is a mess and I think may have a lot to do with why criminal is not as hot a topic as civil. Because we do not properly link and process criminal cases we are creating numerous problems for our users and not doubt confusion. Pacer creates a parent docket and child dockets for each criminal case. Every child docket goes onto the parent docket. We need to update our model to mimic this pattern and point our users to the parent docket, while also allowing someone to find the child docket if they so choose. To do this we need to
We may have some difficulty always identifying the parent pacer case id in single defendant cases - because the numbers are not sequential (but normally are). And it's going to be nerve racking decoupling docket entries. But right now our users could subscribe to a criminal case - that case may plead out and miss the remaining 15 years of updates. The could also buy a document and not realize it was added to a child docket or the parent docket. They could also buy the same docket multiple times across dockets. Yikes. Once we do this though we will be able to look at one docket - reduce our search queries in criminal cases a lot and reduce the number of documents store because of duplicates. |
Some investigation about how to fix this...
So I think there are a few big pieces of this:
So, summarizing, what I think we need to do is:
That's a solid start and leaves the documents part. I'm not sure what to do there yet. |
|
Thanks. Good point about IDB. Bummer about that. Memorializing the conversation from our 1-on-1:
That should get us pretty far towards a solution. Other goals:
I think that'll get us close. |
Overview
This is a long-standing issue but lately it comes up more and more for me.
• In CMECF, there is a many-to-one mapping between docket numbers and documents. A single document can belong to multiple docket numbers, as when an order is filed in two related cases.
• In CMECF, there is a many-to-one mapping between docket numbers and internal caseids (
de_caseid
). This is extremely common in criminal cases, where the numbers are generally contiguous. This is so when there are multiple defendants who each get a sub-case, but it is also so when there is a single defendant: there is a main case and a single subcase.This throws a wrench in RECAP because different people will get to the same docket number via different caseid paths. Depending on what one searches for in PACER's
iquery.pl
and whether you choose All Defendants or single defendant or a combination thereof, you may get different (or multiple) caseids.For instance, take 1:14-cr-10363-RGS USA v. Cadden et al in ecf.mad:
Or in XML form, query https://ecf.mad.uscourts.gov/cgi-bin/possible_case_numbers.pl?1410363 (free) to get:
All caseids from 166116-166130 refer to the same docket number. Many (most?) documents in the case belong to multiple (all?) subcases.
But RECAP and CL treat them like differenet dockets with identical docket numbers, and don't show the subcase suffix number either.
For instance the main case is
https://www.courtlistener.com/docket/4275782/united-states-v-cadden/
which has through docket entries through DE514 (Jan. 2016), and was last updated 2 months ago.
But the
-1
case is https://www.courtlistener.com/docket/5135835/united-states-v-cadden/which has through DE1260 (Oct. 24), 2017, and was last updated 12 days ago,.
But the
-2
case is https://www.courtlistener.com/docket/6145187/united-states-v-cadden/ has through DE1281, but was also updated 12 days ago.Although the
-2
case is more recent, it doesn't actually have the PDF for DE1260.So this is like a huge mess.
Single-defendant criminal cases, too
The problem even occurs for single defendant criminal cases, although the path to pain is less obvious.
Let's take our friend George Papadopoulos, in ecf.dcd. He's the sole defendant and it looks like there's only one case:
https://ecf.dcd.uscourts.gov/cgi-bin/possible_case_numbers.pl?17182
So it looks like it's just 189898.
But, surprise:
https://ecf.dcd.uscourts.gov/cgi-bin/DktRpt.pl?189897 1:17-cr-00182-RDM USA v. PAPADOPOULOS
https://ecf.dcd.uscourts.gov/cgi-bin/DktRpt.pl?189898 1:17-cr-00182-RDM-1 - PAPADOPOULOS, GEORGE
The '898 is easily found in the PACER UI, but unfortunately we can't ignore the '897, because it appears in other places. For instance, the email NEF sent to parties and "interested party" ECF users yesterday:
Of course, this problem is more likely to effect people who use NEFs, which is lawyers and journalists, and not too many members of the general public. But those are important RECAP constituencies.
Upshot
CL needs to track the docket number and caseid for each document independently, recognizing there can be more than one of each. For sanity's sake, CL docket pages should make the caseid visible somewhere (IA docket pages had it in the URL), even if it's small and at the bottom. Makes debugging your brain much simpler.
CL should acknowledge the concept of subdockets. I'm not sure all of what this entails. This is a nice-to-have, but not critical. If all the searches for 1:14-cr-10363 returned an amalgamation of the main docket and 14 subdockets, that would not be so bad
Maybe
Perhaps the RECAP extension should query docket number against
possible_case_numbers.pl
, and report to the server associated caseids. I think this is a bad idea, because it means the extension is no longer passive, it can be identified (and blocked) by the courts, and it is using a nonpublic API. Furthermore, it would not return the second caseid in the case of a single-defendant case.Perhaps the RECAP extension should query adjacent caseids against
DktRpt.pl
until it runs into a different docket number on either side. Again, for the same reasons as above, I think that's bad. Also it could be many queries. I ran into a 60-defendant case last night.Perhaps the CL server should do these queries, maybe on a one-time basis.
Mitigation
It should be straightforward to identify, in the CL database, where there are multiple caseids for a given docket number, and then take some action to combine them. This is separate but related to from what the server and extension should do about this going forward.
Discuss!
The text was updated successfully, but these errors were encountered: