You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
├── dedupe_types.json # Document type deduplication mappings (generated)
44
46
├── analyses.json # AI document analyses (generated)
45
47
├── src/ # 11ty source files for website
46
48
├── .eleventy.js # Static site generator configuration
@@ -133,6 +135,37 @@ This will:
133
135
}
134
136
```
135
137
138
+
**Deduplicate Document Types:**
139
+
140
+
The LLM may also extract document types with inconsistent formatting (e.g., "deposition", "Deposition", "DEPOSITION TRANSCRIPT"). Run the type deduplication script:
141
+
142
+
```bash
143
+
python deduplicate_types.py
144
+
```
145
+
146
+
This will:
147
+
- Collect all document types from `./results/`
148
+
- Use AI to merge similar types into canonical forms
149
+
- Create a `dedupe_types.json` mapping file
150
+
- The website build will automatically use this mapping
151
+
152
+
**Example dedupe_types.json:**
153
+
```json
154
+
{
155
+
"stats": {
156
+
"original_types": 45,
157
+
"canonical_types": 12,
158
+
"reduction_percentage": 73.3
159
+
},
160
+
"mappings": {
161
+
"deposition": "Deposition",
162
+
"DEPOSITION": "Deposition",
163
+
"deposition transcript": "Deposition",
164
+
"court filing": "Court Filing"
165
+
}
166
+
}
167
+
```
168
+
136
169
### 5. Analyze Documents (Optional but Recommended)
137
170
138
171
Generate AI summaries and insights for each document:
@@ -209,32 +242,18 @@ This is an open archive project. Contributions welcome:
209
242
- Add additional document sources
210
243
- Improve entity extraction
211
244
212
-
## Support This Project
213
-
214
-
If you find this archive useful, consider supporting its maintenance and hosting:
The site will be available at: `https://epstein-docs.github.io/`
238
257
239
258
## Future: Relationship Graphs
240
259
@@ -278,3 +297,17 @@ The deduplication step is essential for accurate relationship mapping - without
278
297
## Disclaimer
279
298
280
299
This is an independent archival project. Documents are sourced from public releases. The maintainers make no representations about completeness or accuracy of the archive.
300
+
301
+
## License
302
+
303
+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
304
+
305
+
The code in this repository is open source and free to use. The documents themselves are public records.
0 commit comments