Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenSanctions / FollowTheMoney search support #353

Open
adamdecaf opened this issue Jun 18, 2021 · 5 comments
Open

OpenSanctions / FollowTheMoney search support #353

adamdecaf opened this issue Jun 18, 2021 · 5 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@adamdecaf
Copy link
Member

adamdecaf commented Jun 18, 2021

@pudo was hosting Happy Hour this week and mentioned that Watchman could support searching FollowTheMoney's schemas using OpenSanctions consolidated lists which would really increase the breadth of data available for searching.

Initially I'm thinking of an endpoint like the following:

GET /opensanctions/<dataset-name>/search?objects=<schema-list>&prop1=<string>&prop2=<string>

Each OpenSanctions datasource has the following property:

"name": "interpol_red_notices",

Source: https://data.opensanctions.org/datasets/latest/index.json

We can use that to download, parse, and index the list (assuming it's in FollowTheMoney's schemas) for search.
Schemas: https://followthemoney.readthedocs.io/en/latest/model.html

A few examples:

GET /opensanctions/everypolitician/search?objects=person&surname=Bazar
GET /opensanctions/worldbank_debarred/search?objects=company,legalentity,organization&name=Acme+Corp

We can also initiate downloads for each list and offer endpoints to list available fields/schemas from lists.

POST /opensanctions/interpol_yellow_notices/download
GET /opensanctions/address/schema

{ 
        "label": "Address",
        "plural": "Addresses",
        "properties": {
          "city": {
            "description": "City, town, village or other locality",
            "label": "City",
            "name": "city",
            "qname": "Address:city",
            "type": "string"
          },
...
}
@adamdecaf adamdecaf added enhancement New feature or request help wanted Extra attention is needed labels Jun 18, 2021
@pudo
Copy link

pudo commented Sep 28, 2021

Hey @adamdecaf - sorry for the long radio silence on my end. I think I finally have OpenSanctions in a place where it could be used in watchman and produce great results. What we're offering now is this:

  • There's a combined dataset called sanctions (https://opensanctions.org/datasets/sanctions/). It contains targets from 13 different sanctions lists, including the US OFAC/BIS/CSL, EU, GB, etc.
  • The data is offered in a nested JSON format (download) where each line represents a sanctioned entity. On the entity you'll find an array of the datasets in which it is found (we're still doing de-duplication, it'll hopefully be fully merged within October), and nested data specifying all the sanctions linked to it, addresses, and linked people and companies.
  • There's a full documentation of the data format at https://opensanctions.org/reference/ and https://opensanctions.org/docs/usage/
  • If you were to pull in that nested targets file, it should be reasonably easy to flatten this into a format that can serve the existing watchman APIs, e.g. by pulling out all the linked names, countries and address fragments.

I'm curious if this useful yet, or if you have suggestions for further pre-processing or export formats we should consider.

@adamdecaf
Copy link
Member Author

That's awesome @pudo! I'm OOO for a bit soon, but will look over what's been added here.

Do you think it makes sense to expose opensanctions in the Watchman URLs? I'm wondering if masking it as a data source makes sense or not.

@pudo
Copy link

pudo commented Sep 29, 2021

A thing you might consider is to provide API support for OpenSanctions collections (i.e. generate various endpoints): I mentioned sanctions above, but there's also peps, crime and default (all of the other three) - these are different subsets of entities from different sources that a user might want to use for their checks.

@adamdecaf
Copy link
Member Author

adamdecaf commented Nov 2, 2021

Just to clarify are you referring to endpoints like the following? We could include them in the root GET /search, but that endpoint has been growing and can suffer performance issues.

  • GET /opensanctions/everypolitician/search?objects=person&surname=Bazar
  • GET /opensanctions/peps/search?objects=person&surname=Bazar
  • GET /opensanctions/sanctions/search?objects=person&surname=Bazar

I like the idea of offering a pretty generic endpoint which offers Watchman's precompute, indexing, and search over the OS lists.

Do you see problems with this?

GET /opensanctions/<dataset-name>/search?objects=<schema-list>&prop1=<string>&prop2=<string>

objects=<schema-list> would be a comma separated list of schemas from OS.

@pudo
Copy link

pudo commented Nov 3, 2021

That's really cool! This way people could easily pick and choose from the endpoints that are available. And picking the schema means being able to define a query scope. Nice! Most curious about how you find the data format to work with - I hope it's pretty easy but maybe you have ideas for additional export formats we should publish!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants