## Choose data source, collections and schema

*Use this section to choose the data source, collections and schema to query.*

### Run once
*You need only run the cells in this section when you first choose the data source, collections and schema, or when you want to change the data source, collections, or schema.*

#### Generate a list of schemas and their selected collections:

In [None]:
%%capture collections

# Get a list of schemas that contain the `selected_collections` table

list_schemas = """

SELECT
	schemaname
FROM
	pg_tables
WHERE
	tablename = 'selected_collections'

"""

schemas = %sql {list_schemas}

# Get the selected collections from each schema and store the results in a DataFrame

template = """

SELECT
  '{schema}' as schema_name,
  array_agg(id) as collections
FROM
  {schema}.selected_collections

  """

collections_list = pd.DataFrame()

for schema in schemas['schemaname'].to_list():

  statement = template.format(schema = schema)

  collections = %sql {statement}
  collections_list = collections_list.append(collections)


Log errors:

In [None]:
# Some schemas listed in `pg_tables` (and `information_schema.views`) are not accessible, log those errors and warn the user

if len(collections.stdout) > 0:
  print('`selected_collections` is not accessible for some schemas. See collections.log for details')
  %store collections.stdout > collections.log

#### Find the source

Run this cell and use the 'Filter' button to find the `source_id`:

In [None]:
list_source_ids()

#### Find the collections

Update the `source_id`, run this cell, and use the 'Filter' button to find the `collection_id`(s):

In [None]:
source_id = 'paraguay_dncp_records'

list_collections(source_id)

#### Find the schema

Update the `collection_ids` and use the 'Filter' button to find the schema:

In [None]:
collection_ids = [2119, 2120]  # list of collection_ids 

collections_list = collections_list.astype({'collections': str})
collections_list[collections_list['collections'].str.contains('|'.join(str(id) for id in collection_ids))]

### Run each time
*You must run the cells in this section each time you connect to a new runtime.*

#### Set collections and schema

Update the `collection_ids` and `schema_name`:

In [None]:
collection_ids = [2119, 2120]  # list of collection_ids 
schema_name = 'view_data_paraguay_covid'

collection_ids = tuple(collection_ids)  # convert list to tuple for use in sql queries
set_search_path(schema_name) # see https://github.com/open-contracting/kingfisher-colab/issues/39