Feedback from a procurement-trace investigation using tango-python

This started from Renée DiResta's Lawfare piece — ["Fewer Bots, More
Ads: The Pentagon's Evolving Online Influence Campaigns"](https://www.lawfaremedia.org/article/fewer-bots--more-ads--the-pentagon-s-evolving-online-influence-campaigns)
— which identifies GDIT as a verified advertiser running a network of
covert-attribution websites (the "gc_" prefix sites) but doesn't
identify the federal contract the work is running under. I wanted to
see what additional context the public procurement record could add.

About ~150 API calls across SWMS IDV drilldown, OTA sweep, subaward
chains, and IT Dashboard sweeps, on `tango-python` v0.5.0, medium
tier.

## What I was trying to find

The Lawfare piece names a prime contractor (GDIT) and a class of
activity (covert ad-buys for a network of "gc_" sites targeting
foreign-language audiences in CENTCOM / SOUTHCOM / AFRICOM / INDOPACOM
AORs) but does not name the specific federal contract paying for the
work. So the question was: **given a known prime and a known class of
work but no contract name, can the procurement record narrow down
the vehicle?** That turned into six concrete data needs.

For each need I tried USASpending and SAM.gov first, then Tango. The
specific things Tango returned that the others didn't:

### 1. Find a named IDIQ family that's not in USASpending's keyword index

The trace's original §4 needed to identify what the GBPS → SWMS
"bridge contracts" of 2016-17 were bridging *to*. SWMS is a real
SOCOM IDIQ family, but it's invisible to USASpending and SAM.gov
keyword search:

- USASpending `spending_by_award` `keywords=["SWMS"]` → **0 results**
- SAM.gov phrase search for "SWMS" → 0 results
- Tango `list_idvs(search="SWMS")` → **25 results**, including the
  three Group A IDIQs, the sixteen Group B IDIQs, and a Group C
  IDV (`H9222217D0004`, St. Michael's, $150M ceiling).

That's the headline source-coverage gap.

### 2. Parent-IDV → task-order traversal as one query pattern

- USASpending has the IDV detail page (e.g.
  `/award/CONT_IDV_H9222216D0040_9700/`) that lists child orders in
  a paginated UI, but `spending_by_award` doesn't accept an
  `idv_piid` filter — you can't grab every TO under an IDV via API
  in one call.
- SAM.gov doesn't have this at all (solicitation index, not awards).
- Tango: `list_idvs(piid=...)` → `list_idv_awards(key=..., limit=100,
  cursor=...)`. Walked 19 of the 20 SWMS IDV PIIDs I had and pulled
  123 child task orders in one script, ~40 API calls. (The 20th
  was the Group C St. Michael's IDV, which I'd missed because I
  didn't have its PIID — Tango's `list_idvs(search="SWMS")` from
  need #1 above would have caught it, lesson learned for me.)

### 3. Subaward chains per prime contract, retrievable by PIID

- USASpending has `spending_by_subaward`, but I didn't do a
  systematic per-PIID cross-check between USASpending and Tango on
  every GDIT prime in this trace, so I can't precisely characterize
  the relative completeness here. Anecdotally, USASpending's
  subaward feed is sparse for many older / smaller contracts; both
  sources show zero subs on the same GBPS-era contracts.
- SAM.gov: no subaward data.
- Tango: `list_subawards(award_key=...)` returned a usable vendor
  stack on the two flagship GDIT primes — 61 subaward rows on WSP
  (`47QFCA19F0035`) and 43 on TSS (`47QFCA24F0007`) — which is what
  let me characterize each prime by capability composition.

### 4. OTA / OTIDV index, with keyword search

- USASpending: `spending_by_award` returns zero matches for
  `W15QKN1791011`, a $48M federal OTA whose description explicitly
  says "support the US Government's shaping and influence
  efforts." It's there in the underlying federal record but not
  indexed in the contract-keyword endpoint.
- SAM.gov: solicitation-side only, doesn't help for awarded OTAs.
- Tango: `list_otas(search="influence")` surfaces it (along with
  ~50 other relevant OTAs / OTIDVs).

### 5. IT Dashboard cross-reference

- GSA does have a public "IT Collect" API behind itdashboard.gov
  ([docs](https://open.gsa.gov/api/itcollect/),
  [schema](https://gsa.github.io/ITDB-schema/)), API-key gated.
  I didn't try it directly — for this investigation Tango's
  `list_itdashboard_investments(search=...)` made it a one-line
  lookup with the same shape syntax as the rest of the SDK.
  Confirmed `TRWI`, `AWIP`, `CCMD`, `WebOps`, `PSYOP` all return
  zero DoD title matches — informative negative.

### 6. Cross-source comparison

- This isn't a Tango feature per se, but a workflow: for each
  candidate finding, cross-check the same PIID on USASpending.
  Tango's stable response shape made the comparison clean —
  `key`, `piid`, `recipient.uei` always present and consistent,
  so the diff "Tango sees this, USASpending doesn't" was easy to
  detect programmatically.

Repo layout (anonymized for sharing):

```
scripts/
  01_swms_drilldown.py    # need #2: list_idvs(piid=) → list_idv_awards(key=)
  02_socom_otas.py        # need #4: list_otas / list_otidvs sweeps
  03_subawards.py         # need #3: list_subawards per award_key
  04_itdashboard.py       # need #5: list_itdashboard_investments keyword sweep
data/                     # raw JSON output, committed for reproducibility
```

## What I'd want preserved in future versions

A few things that worked and would be worth not breaking in future
SDK versions:

- `list_idvs(piid=...)` → `list_idv_awards(key=...)` traversal — the
  cursor pagination is consistent across both endpoints and the
  shape syntax carries through.
- Nested-field shape syntax (`recipient(display_name,uei)`,
  `awarding_office(*)`) on contract / IDV endpoints.
- `TangoRateLimitError` as a separate exception class — made the
  retry helper trivial. (`TangoNotFoundError` and `TangoAPIError`
  are also useful.)

## Friction we hit (ranked by impact)

### 1. `ContractOrIDVCompetition` is referenced as a nested model but isn't in `EXPLICIT_SCHEMAS`

**Repro:**

```python
shape = "key,piid,competition(extent_competed,number_of_offers_received)"
client.list_idv_awards(key=KEY, shape=shape)
```

**Expected:** request succeeds with the named subfields populated.

**Actual:**

```
tango.exceptions.ShapeValidationError: Field 'extent_competed' does not exist in
ContractOrIDVCompetition. No valid fields found in model schema.
```

**Root cause:** in `tango/shapes/explicit_schemas.py`, both
`CONTRACT_SCHEMA["competition"]` and `IDV_SCHEMA["competition"]` set
`nested_model="ContractOrIDVCompetition"`, but `EXPLICIT_SCHEMAS` only
registers `"Competition": COMPETITION_SCHEMA`. The
`ContractOrIDVCompetition` dataclass exists in `models.py` with the
right fields (its docstring even says "alias for Competition"), but
the explicit-schema parser doesn't know about it.

**Workaround:** use `competition(*)` — returns the same data without
field-level selection.

**Possible fix:** either (a) add `"ContractOrIDVCompetition": COMPETITION_SCHEMA`
to `EXPLICIT_SCHEMAS`, since the two are described as aliases, or (b)
point the `nested_model` references at `"Competition"` to match the
registered key.

### 2. Server-side shape-validation errors don't name the bad field

When the client-side parser catches a bad field (issue #1), the error
is great — it names the field, names the model, even suggests
alternatives. When the *server-side* validator catches one (e.g. a
field that doesn't exist on the resource at all), the error is just:

```
TangoValidationError: Invalid request parameters: Invalid shape
```

No field name. We had to bisect the shape to find which field was
rejected. Would be great if the server's error matched the client
parser's quality.

### 3. OTA / OTIDV response shapes are much thinner than Contract / IDV

`OTA_SCHEMA` and `OTIDV_SCHEMA` expose only `key, piid, award_date,
description, total_contract_value, obligated, recipient`. No
`awarding_office`, no `place_of_performance`, no `psc_code` /
`naics_code`, no `competition`.

Meanwhile `list_otas()` *accepts* `awarding_agency`, `funding_agency`,
`psc`, `naics`, `pop_*`, `awarding_office` etc. as filter parameters —
so the API clearly has those fields. They're just not in the response
shape.

This made triaging OTA keyword-search results awkward — we wanted to
filter the 50 keyword-matched OTAs to "awarded by H92… (SOCOM)" but
had to fall back to PIID-prefix matching because there was no
`awarding_office` field on the record.

Bringing OTA / OTIDV response shapes up to parity with Contract / IDV
(`awarding_office`, `place_of_performance`, `psc_code`, `naics_code`,
optional `competition`) would close this gap.

### 4. `list_subawards` shape rejects `amount` and `award_date`

Per the comment in `tango/models.py:721`:

```python
# Note: API does not accept "id" or "amount" in shape (unknown_field).
# Use only accepted fields.
```

This is a real friction point for any quantitative subaward analysis:
to attach a dollar value or date to each subaward row you have to
follow the parent `award_key` back to `list_contracts()`. Two API
calls when one should do.

If `amount` and `award_date` are columns on the underlying subaward
table, exposing them on the shape would dramatically reduce
round-trips. (And if they aren't — i.e. subaward dollars aren't
tracked in Tango — that's worth saying in the SDK docstring so users
don't waste time trying.)

### 5. `list_subawards` pagination silently caps at ~5000 rows

`list_subawards` uses page/limit pagination (every other list endpoint
we touched uses cursor pagination) and hits a ceiling around 50 pages
× 100 rows = 5,000 rows per query. On a large prime UEI like Booz
Allen's `JCBMLGPE6Z71`, that truncates before the full list is
delivered. **No indication in the response that there are more rows
you didn't get.**

Either:
- Bump pagination to keyset like `list_contracts` / `list_idvs` /
  `list_otas` already use, or
- Add a truncation flag / warning header to the response when the cap
  is hit, or
- Document the cap prominently in the SDK docstring so users know to
  scope their queries before pulling.

The current behavior fails silently, which made one of our analyses
inflate "inter-entity link counts" until we noticed and re-scoped (see
"methodology lesson" below).

### 6. 43 of 50 keyword-search OTAs return `recipient: null` and `award_date: null`

For most OTAs surfaced via `list_otas(search=...)`, both `recipient`
and `award_date` came back null. Calling `get_ota(key)` on the same
keys returned the same nulls — so this isn't a list-vs-detail
truncation; it's a data gap in the source.

Notable example: `OT_AWD_W15QKN1791011_9700_-NONE-_-NONE-` is a $48M
OTA whose description explicitly says "support the US Government's
shaping and influence efforts." No recipient, no award date. Same
PIID returns zero matches on USASpending's `spending_by_award`
endpoint, so Tango is uniquely surfacing it but can't attribute it.

For comparison, OTAs returned by `awarding_agency="USSOCOM"` all had
recipient + date populated. So the data-density gap seems specific to
the search-indexed records.

May not be Tango's fault directly, but a note in the OTA docs about
which records are likely to have null attribution fields (and why)
would help users know whether to retry / cross-reference vs accept
the null.

### 7. `awarding_agency` parameter accepts forms inconsistently

```python
list_otas(awarding_agency="USSOCOM")                       # → 6 results
list_otas(awarding_agency="SOCOM")                         # → same 6
list_otas(awarding_agency="U.S. Special Operations Command")  # → 0
list_otas(awarding_agency="SPECIAL OPERATIONS COMMAND")    # → 0
```

There's no documented canonical form per endpoint. A
`get_agency_choices()` style helper, or even just a documented list of
accepted strings, would help. (CourtListener has a similar
`get_choices` MCP tool, FWIW — works really well for discovering
accepted enum values.)

### 8. Exposing awarding-office filter would help (related to #3)

The PIID prefix is the best available proxy for awarding office (e.g.
`H92240` = HQ USSOCOM, `H92277` = direct USSOCOM contracting at
MacDill, `HR0011` = DARPA), but having to filter by PIID prefix in
post-processing is much slower than a server-side filter. Exposing
`awarding_office_code` as a filter parameter on `list_contracts` /
`list_otas` / `list_subawards` would be a real win.

## Wishlist

- **A schema discovery helper.** Something like
  `client.get_resource_schema("Contract")` that returns the field
  list and nested-model field sets. Would save a lot of grepping
  through `explicit_schemas.py`.
- **`list_idv_awards` accepting PIID directly.** Currently requires
  `list_idvs(piid=...)` to resolve PIID → key first. PIIDs are stable
  identifiers; this would save the lookup.
- **`list_award_subs(piid=...)` convenience method.** Resolves PIID
  → contract key → subawards in one call. Doubles as guidance away
  from the bulk-by-UEI scope trap (see methodology lesson below).
- **More examples in the docs of cross-endpoint patterns.** "Find
  every subaward under a given IDIQ family" is a powerful use case
  but you have to assemble it from `list_idvs` → `list_idv_awards`
  → per-TO `list_subawards`. A documented recipe would help.

## Methodology lesson (not a bug, but worth documenting)

We initially bulk-pulled all subawards for Booz Allen and GDIT by
prime UEI. On a large integrator UEI, that returned 5,000+ rows of
unrelated work (hitting the page cap from #5) and made cross-entity
link counts look stronger than they actually were: 21 "GDIT → BAH"
rows collapsed to **1 unique contract** after de-duping by
`award_key`, and 8 "BAH → Hoplite" rows collapsed to **1 unique
contract** on an unrelated GSA TO, not the SOCCENT contract the
inference targeted.

The right pattern was per-PIID `list_subawards(award_key=...)` on the
specific contracts the investigation cared about. For small specialty
firms, bulk-by-UEI is fine; for large integrators it's a trap.

Two small docs / SDK changes would prevent this for future users:

- The `list_subawards` docstring could mention that for large primes,
  per-contract queries (`award_key`) are usually what you want — not
  per-UEI.
- The `list_award_subs(piid=...)` convenience method (wishlist above)
  would make per-PIID the default mental model.

Thanks for building this. The SWMS-via-Tango find was the only reason
this investigation was tractable.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feedback from a procurement-trace investigation using tango-python #29

What I was trying to find

1. Find a named IDIQ family that's not in USASpending's keyword index

2. Parent-IDV → task-order traversal as one query pattern

3. Subaward chains per prime contract, retrievable by PIID

4. OTA / OTIDV index, with keyword search

5. IT Dashboard cross-reference

6. Cross-source comparison

What I'd want preserved in future versions

Friction we hit (ranked by impact)

1. `ContractOrIDVCompetition` is referenced as a nested model but isn't in `EXPLICIT_SCHEMAS`

2. Server-side shape-validation errors don't name the bad field

3. OTA / OTIDV response shapes are much thinner than Contract / IDV

4. `list_subawards` shape rejects `amount` and `award_date`

5. `list_subawards` pagination silently caps at ~5000 rows

6. 43 of 50 keyword-search OTAs return `recipient: null` and `award_date: null`

7. `awarding_agency` parameter accepts forms inconsistently

8. Exposing awarding-office filter would help (related to #3)

Wishlist

Methodology lesson (not a bug, but worth documenting)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feedback from a procurement-trace investigation using tango-python #29

Description

What I was trying to find

1. Find a named IDIQ family that's not in USASpending's keyword index

2. Parent-IDV → task-order traversal as one query pattern

3. Subaward chains per prime contract, retrievable by PIID

4. OTA / OTIDV index, with keyword search

5. IT Dashboard cross-reference

6. Cross-source comparison

What I'd want preserved in future versions

Friction we hit (ranked by impact)

1. ContractOrIDVCompetition is referenced as a nested model but isn't in EXPLICIT_SCHEMAS

2. Server-side shape-validation errors don't name the bad field

3. OTA / OTIDV response shapes are much thinner than Contract / IDV

4. list_subawards shape rejects amount and award_date

5. list_subawards pagination silently caps at ~5000 rows

6. 43 of 50 keyword-search OTAs return recipient: null and award_date: null

7. awarding_agency parameter accepts forms inconsistently

8. Exposing awarding-office filter would help (related to #3)

Wishlist

Methodology lesson (not a bug, but worth documenting)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. `ContractOrIDVCompetition` is referenced as a nested model but isn't in `EXPLICIT_SCHEMAS`

4. `list_subawards` shape rejects `amount` and `award_date`

5. `list_subawards` pagination silently caps at ~5000 rows

6. 43 of 50 keyword-search OTAs return `recipient: null` and `award_date: null`

7. `awarding_agency` parameter accepts forms inconsistently