Skip to content

Feedback from a procurement-trace investigation using tango-python #29

@abigailhaddad

Description

@abigailhaddad

This started from Renée DiResta's Lawfare piece — "Fewer Bots, More
Ads: The Pentagon's Evolving Online Influence Campaigns"

— which identifies GDIT as a verified advertiser running a network of
covert-attribution websites (the "gc_" prefix sites) but doesn't
identify the federal contract the work is running under. I wanted to
see what additional context the public procurement record could add.

About ~150 API calls across SWMS IDV drilldown, OTA sweep, subaward
chains, and IT Dashboard sweeps, on tango-python v0.5.0, medium
tier.

What I was trying to find

The Lawfare piece names a prime contractor (GDIT) and a class of
activity (covert ad-buys for a network of "gc_" sites targeting
foreign-language audiences in CENTCOM / SOUTHCOM / AFRICOM / INDOPACOM
AORs) but does not name the specific federal contract paying for the
work. So the question was: given a known prime and a known class of
work but no contract name, can the procurement record narrow down
the vehicle?
That turned into six concrete data needs.

For each need I tried USASpending and SAM.gov first, then Tango. The
specific things Tango returned that the others didn't:

1. Find a named IDIQ family that's not in USASpending's keyword index

The trace's original §4 needed to identify what the GBPS → SWMS
"bridge contracts" of 2016-17 were bridging to. SWMS is a real
SOCOM IDIQ family, but it's invisible to USASpending and SAM.gov
keyword search:

  • USASpending spending_by_award keywords=["SWMS"]0 results
  • SAM.gov phrase search for "SWMS" → 0 results
  • Tango list_idvs(search="SWMS")25 results, including the
    three Group A IDIQs, the sixteen Group B IDIQs, and a Group C
    IDV (H9222217D0004, St. Michael's, $150M ceiling).

That's the headline source-coverage gap.

2. Parent-IDV → task-order traversal as one query pattern

  • USASpending has the IDV detail page (e.g.
    /award/CONT_IDV_H9222216D0040_9700/) that lists child orders in
    a paginated UI, but spending_by_award doesn't accept an
    idv_piid filter — you can't grab every TO under an IDV via API
    in one call.
  • SAM.gov doesn't have this at all (solicitation index, not awards).
  • Tango: list_idvs(piid=...)list_idv_awards(key=..., limit=100, cursor=...). Walked 19 of the 20 SWMS IDV PIIDs I had and pulled
    123 child task orders in one script, ~40 API calls. (The 20th
    was the Group C St. Michael's IDV, which I'd missed because I
    didn't have its PIID — Tango's list_idvs(search="SWMS") from
    need chore: go live #1 above would have caught it, lesson learned for me.)

3. Subaward chains per prime contract, retrievable by PIID

  • USASpending has spending_by_subaward, but I didn't do a
    systematic per-PIID cross-check between USASpending and Tango on
    every GDIT prime in this trace, so I can't precisely characterize
    the relative completeness here. Anecdotally, USASpending's
    subaward feed is sparse for many older / smaller contracts; both
    sources show zero subs on the same GBPS-era contracts.
  • SAM.gov: no subaward data.
  • Tango: list_subawards(award_key=...) returned a usable vendor
    stack on the two flagship GDIT primes — 61 subaward rows on WSP
    (47QFCA19F0035) and 43 on TSS (47QFCA24F0007) — which is what
    let me characterize each prime by capability composition.

4. OTA / OTIDV index, with keyword search

  • USASpending: spending_by_award returns zero matches for
    W15QKN1791011, a $48M federal OTA whose description explicitly
    says "support the US Government's shaping and influence
    efforts." It's there in the underlying federal record but not
    indexed in the contract-keyword endpoint.
  • SAM.gov: solicitation-side only, doesn't help for awarded OTAs.
  • Tango: list_otas(search="influence") surfaces it (along with
    ~50 other relevant OTAs / OTIDVs).

5. IT Dashboard cross-reference

  • GSA does have a public "IT Collect" API behind itdashboard.gov
    (docs,
    schema), API-key gated.
    I didn't try it directly — for this investigation Tango's
    list_itdashboard_investments(search=...) made it a one-line
    lookup with the same shape syntax as the rest of the SDK.
    Confirmed TRWI, AWIP, CCMD, WebOps, PSYOP all return
    zero DoD title matches — informative negative.

6. Cross-source comparison

  • This isn't a Tango feature per se, but a workflow: for each
    candidate finding, cross-check the same PIID on USASpending.
    Tango's stable response shape made the comparison clean —
    key, piid, recipient.uei always present and consistent,
    so the diff "Tango sees this, USASpending doesn't" was easy to
    detect programmatically.

Repo layout (anonymized for sharing):

scripts/
  01_swms_drilldown.py    # need #2: list_idvs(piid=) → list_idv_awards(key=)
  02_socom_otas.py        # need #4: list_otas / list_otidvs sweeps
  03_subawards.py         # need #3: list_subawards per award_key
  04_itdashboard.py       # need #5: list_itdashboard_investments keyword sweep
data/                     # raw JSON output, committed for reproducibility

What I'd want preserved in future versions

A few things that worked and would be worth not breaking in future
SDK versions:

  • list_idvs(piid=...)list_idv_awards(key=...) traversal — the
    cursor pagination is consistent across both endpoints and the
    shape syntax carries through.
  • Nested-field shape syntax (recipient(display_name,uei),
    awarding_office(*)) on contract / IDV endpoints.
  • TangoRateLimitError as a separate exception class — made the
    retry helper trivial. (TangoNotFoundError and TangoAPIError
    are also useful.)

Friction we hit (ranked by impact)

1. ContractOrIDVCompetition is referenced as a nested model but isn't in EXPLICIT_SCHEMAS

Repro:

shape = "key,piid,competition(extent_competed,number_of_offers_received)"
client.list_idv_awards(key=KEY, shape=shape)

Expected: request succeeds with the named subfields populated.

Actual:

tango.exceptions.ShapeValidationError: Field 'extent_competed' does not exist in
ContractOrIDVCompetition. No valid fields found in model schema.

Root cause: in tango/shapes/explicit_schemas.py, both
CONTRACT_SCHEMA["competition"] and IDV_SCHEMA["competition"] set
nested_model="ContractOrIDVCompetition", but EXPLICIT_SCHEMAS only
registers "Competition": COMPETITION_SCHEMA. The
ContractOrIDVCompetition dataclass exists in models.py with the
right fields (its docstring even says "alias for Competition"), but
the explicit-schema parser doesn't know about it.

Workaround: use competition(*) — returns the same data without
field-level selection.

Possible fix: either (a) add "ContractOrIDVCompetition": COMPETITION_SCHEMA
to EXPLICIT_SCHEMAS, since the two are described as aliases, or (b)
point the nested_model references at "Competition" to match the
registered key.

2. Server-side shape-validation errors don't name the bad field

When the client-side parser catches a bad field (issue #1), the error
is great — it names the field, names the model, even suggests
alternatives. When the server-side validator catches one (e.g. a
field that doesn't exist on the resource at all), the error is just:

TangoValidationError: Invalid request parameters: Invalid shape

No field name. We had to bisect the shape to find which field was
rejected. Would be great if the server's error matched the client
parser's quality.

3. OTA / OTIDV response shapes are much thinner than Contract / IDV

OTA_SCHEMA and OTIDV_SCHEMA expose only key, piid, award_date, description, total_contract_value, obligated, recipient. No
awarding_office, no place_of_performance, no psc_code /
naics_code, no competition.

Meanwhile list_otas() accepts awarding_agency, funding_agency,
psc, naics, pop_*, awarding_office etc. as filter parameters —
so the API clearly has those fields. They're just not in the response
shape.

This made triaging OTA keyword-search results awkward — we wanted to
filter the 50 keyword-matched OTAs to "awarded by H92… (SOCOM)" but
had to fall back to PIID-prefix matching because there was no
awarding_office field on the record.

Bringing OTA / OTIDV response shapes up to parity with Contract / IDV
(awarding_office, place_of_performance, psc_code, naics_code,
optional competition) would close this gap.

4. list_subawards shape rejects amount and award_date

Per the comment in tango/models.py:721:

# Note: API does not accept "id" or "amount" in shape (unknown_field).
# Use only accepted fields.

This is a real friction point for any quantitative subaward analysis:
to attach a dollar value or date to each subaward row you have to
follow the parent award_key back to list_contracts(). Two API
calls when one should do.

If amount and award_date are columns on the underlying subaward
table, exposing them on the shape would dramatically reduce
round-trips. (And if they aren't — i.e. subaward dollars aren't
tracked in Tango — that's worth saying in the SDK docstring so users
don't waste time trying.)

5. list_subawards pagination silently caps at ~5000 rows

list_subawards uses page/limit pagination (every other list endpoint
we touched uses cursor pagination) and hits a ceiling around 50 pages
× 100 rows = 5,000 rows per query. On a large prime UEI like Booz
Allen's JCBMLGPE6Z71, that truncates before the full list is
delivered. No indication in the response that there are more rows
you didn't get.

Either:

  • Bump pagination to keyset like list_contracts / list_idvs /
    list_otas already use, or
  • Add a truncation flag / warning header to the response when the cap
    is hit, or
  • Document the cap prominently in the SDK docstring so users know to
    scope their queries before pulling.

The current behavior fails silently, which made one of our analyses
inflate "inter-entity link counts" until we noticed and re-scoped (see
"methodology lesson" below).

6. 43 of 50 keyword-search OTAs return recipient: null and award_date: null

For most OTAs surfaced via list_otas(search=...), both recipient
and award_date came back null. Calling get_ota(key) on the same
keys returned the same nulls — so this isn't a list-vs-detail
truncation; it's a data gap in the source.

Notable example: OT_AWD_W15QKN1791011_9700_-NONE-_-NONE- is a $48M
OTA whose description explicitly says "support the US Government's
shaping and influence efforts." No recipient, no award date. Same
PIID returns zero matches on USASpending's spending_by_award
endpoint, so Tango is uniquely surfacing it but can't attribute it.

For comparison, OTAs returned by awarding_agency="USSOCOM" all had
recipient + date populated. So the data-density gap seems specific to
the search-indexed records.

May not be Tango's fault directly, but a note in the OTA docs about
which records are likely to have null attribution fields (and why)
would help users know whether to retry / cross-reference vs accept
the null.

7. awarding_agency parameter accepts forms inconsistently

list_otas(awarding_agency="USSOCOM")                       # → 6 results
list_otas(awarding_agency="SOCOM")                         # → same 6
list_otas(awarding_agency="U.S. Special Operations Command")  # → 0
list_otas(awarding_agency="SPECIAL OPERATIONS COMMAND")    # → 0

There's no documented canonical form per endpoint. A
get_agency_choices() style helper, or even just a documented list of
accepted strings, would help. (CourtListener has a similar
get_choices MCP tool, FWIW — works really well for discovering
accepted enum values.)

8. Exposing awarding-office filter would help (related to #3)

The PIID prefix is the best available proxy for awarding office (e.g.
H92240 = HQ USSOCOM, H92277 = direct USSOCOM contracting at
MacDill, HR0011 = DARPA), but having to filter by PIID prefix in
post-processing is much slower than a server-side filter. Exposing
awarding_office_code as a filter parameter on list_contracts /
list_otas / list_subawards would be a real win.

Wishlist

  • A schema discovery helper. Something like
    client.get_resource_schema("Contract") that returns the field
    list and nested-model field sets. Would save a lot of grepping
    through explicit_schemas.py.
  • list_idv_awards accepting PIID directly. Currently requires
    list_idvs(piid=...) to resolve PIID → key first. PIIDs are stable
    identifiers; this would save the lookup.
  • list_award_subs(piid=...) convenience method. Resolves PIID
    → contract key → subawards in one call. Doubles as guidance away
    from the bulk-by-UEI scope trap (see methodology lesson below).
  • More examples in the docs of cross-endpoint patterns. "Find
    every subaward under a given IDIQ family" is a powerful use case
    but you have to assemble it from list_idvslist_idv_awards
    → per-TO list_subawards. A documented recipe would help.

Methodology lesson (not a bug, but worth documenting)

We initially bulk-pulled all subawards for Booz Allen and GDIT by
prime UEI. On a large integrator UEI, that returned 5,000+ rows of
unrelated work (hitting the page cap from #5) and made cross-entity
link counts look stronger than they actually were: 21 "GDIT → BAH"
rows collapsed to 1 unique contract after de-duping by
award_key, and 8 "BAH → Hoplite" rows collapsed to 1 unique
contract
on an unrelated GSA TO, not the SOCCENT contract the
inference targeted.

The right pattern was per-PIID list_subawards(award_key=...) on the
specific contracts the investigation cared about. For small specialty
firms, bulk-by-UEI is fine; for large integrators it's a trap.

Two small docs / SDK changes would prevent this for future users:

  • The list_subawards docstring could mention that for large primes,
    per-contract queries (award_key) are usually what you want — not
    per-UEI.
  • The list_award_subs(piid=...) convenience method (wishlist above)
    would make per-PIID the default mental model.

Thanks for building this. The SWMS-via-Tango find was the only reason
this investigation was tractable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions