# How to delete butler collections. STAFF ONLY

Users should be encouraged to take up space, and not try to delete things from the butler.

In [1]:
import os
import getpass
from lsst.daf.butler import Butler, DatasetType, CollectionType, Datastore

In [2]:
config = 'dp02'
butler = Butler(config)









## Identify collections to delete

In [3]:
my_outputCollection = 'u/melissagraham/coadd_recreation_nb'

Get the list of all the collections with that name which already exist.

These were all made in draft_Create_Custom_Coadds.ipynb

The collection 'u/melissagraham/coadd_recreation_nb' was a CHAINED collection that was made, but has already been deleted so it does not appear below.

In [4]:
for c in sorted(butler.registry.queryCollections()):
    if c.find(my_outputCollection) > -1:
        print(c)

u/melissagraham/coadd_recreation_nb/20220610T182343Z
u/melissagraham/coadd_recreation_nb/20220610T184028Z
u/melissagraham/coadd_recreation_nb/20220610T185057Z
u/melissagraham/coadd_recreation_nb/20220610T190249Z
u/melissagraham/coadd_recreation_nb/20220610T190623Z
u/melissagraham/coadd_recreation_nb/20220615T205142Z
u/melissagraham/coadd_recreation_nb/20220615T212442Z
u/melissagraham/coadd_recreation_nb/20220622T190819Z
u/melissagraham/coadd_recreation_nb/20220622T193047Z
u/melissagraham/coadd_recreation_nb/20220622T194232Z
u/melissagraham/coadd_recreation_nb/20220622T194815Z
u/melissagraham/coadd_recreation_nb/20220622T195818Z
u/melissagraham/coadd_recreation_nb/20220622T235340Z
u/melissagraham/coadd_recreation_nb/20220623T023518Z
u/melissagraham/coadd_recreation_nb/20220623T024126Z
u/melissagraham/coadd_recreation_nb/20220715T214157Z
u/melissagraham/coadd_recreation_nb/20220715T220907Z


I want to delete all of those old mistakes.

I will need a butler with write permissions.

Instantiate it with only `my_outputCollection`, for safety.

In [5]:
del butler

In [6]:
tmpButler = Butler(config, collections=my_outputCollection, writeable=True)

`tmpButler.registry.removeCollection` will work for removing a collection. This is a CHAINED collection.

The following are commented out b/c the chained collection had already been removed.

In [7]:
# tmpButler.registry.removeCollection('u/melissagraham/coadd_recreation_nb')

Check:

In [8]:
# for c in sorted(butler.registry.queryCollections()):
#     if c.find(my_outputCollection) > -1:
#         print(c)

But it will not work to remove the results of a RUN collection, which is what all the timestamped collections are.

This does not work: <br>
`tmpButler.registry.removeCollection('u/melissagraham/coadd_recreation_nb/20220610T171249Z')`
<br>

Instead, do this:

In [9]:
tmpButler.removeRuns(['u/melissagraham/coadd_recreation_nb/20220610T182343Z'])

See?

In [10]:
for c in sorted(tmpButler.registry.queryCollections()):
    if c.find(my_outputCollection) > -1:
        print(c)

u/melissagraham/coadd_recreation_nb/20220610T184028Z
u/melissagraham/coadd_recreation_nb/20220610T185057Z
u/melissagraham/coadd_recreation_nb/20220610T190249Z
u/melissagraham/coadd_recreation_nb/20220610T190623Z
u/melissagraham/coadd_recreation_nb/20220615T205142Z
u/melissagraham/coadd_recreation_nb/20220615T212442Z
u/melissagraham/coadd_recreation_nb/20220622T190819Z
u/melissagraham/coadd_recreation_nb/20220622T193047Z
u/melissagraham/coadd_recreation_nb/20220622T194232Z
u/melissagraham/coadd_recreation_nb/20220622T194815Z
u/melissagraham/coadd_recreation_nb/20220622T195818Z
u/melissagraham/coadd_recreation_nb/20220622T235340Z
u/melissagraham/coadd_recreation_nb/20220623T023518Z
u/melissagraham/coadd_recreation_nb/20220623T024126Z
u/melissagraham/coadd_recreation_nb/20220715T214157Z
u/melissagraham/coadd_recreation_nb/20220715T220907Z


REMOVE THEM ALL:

In [11]:
for c in sorted(tmpButler.registry.queryCollections()):
    if c.find(my_outputCollection) > -1:
        print('Removing: ', c)
        tmpButler.removeRuns([c])

Removing:  u/melissagraham/coadd_recreation_nb/20220610T184028Z
Removing:  u/melissagraham/coadd_recreation_nb/20220610T185057Z
Removing:  u/melissagraham/coadd_recreation_nb/20220610T190249Z
Removing:  u/melissagraham/coadd_recreation_nb/20220610T190623Z
Removing:  u/melissagraham/coadd_recreation_nb/20220615T205142Z
Removing:  u/melissagraham/coadd_recreation_nb/20220615T212442Z
Removing:  u/melissagraham/coadd_recreation_nb/20220622T190819Z
Removing:  u/melissagraham/coadd_recreation_nb/20220622T193047Z
Removing:  u/melissagraham/coadd_recreation_nb/20220622T194232Z
Removing:  u/melissagraham/coadd_recreation_nb/20220622T194815Z
Removing:  u/melissagraham/coadd_recreation_nb/20220622T195818Z
Removing:  u/melissagraham/coadd_recreation_nb/20220622T235340Z
Removing:  u/melissagraham/coadd_recreation_nb/20220623T023518Z
Removing:  u/melissagraham/coadd_recreation_nb/20220623T024126Z
Removing:  u/melissagraham/coadd_recreation_nb/20220715T214157Z
Removing:  u/melissagraham/coadd_recreat

See? All gone:

In [17]:
for c in sorted(tmpButler.registry.queryCollections()):
    if c.find(my_outputCollection) > -1:
        print(c)

In [18]:
del tmpButler

<br>
<br>
<br>

## No, none of this was the way.

Just leaving these explorations here, in case.

From the sqlalche.me link, an "Integrity Error" is "Exception raised when the relational integrity of the database is affected, e.g. a foreign key check fails."

Let's explore more to try and figure out why I can't delete that collection.

Double check the type of collection it is.

In [None]:
# tmpButler.registry.getCollectionType('u/melissagraham/coadd_recreation_nb/20220610T171249Z')

Since this is a RUN collection type, it should be removable, so it says here: https://pipelines.lsst.io/py-api/lsst.daf.butler.Registry.html#lsst.daf.butler.Registry.removeCollection

The above also specifies: 
_"If this is a RUN collection, all datasets and quanta in it are also fully removed. This requires that those datasets be removed (or at least trashed) from any datastores that hold them first."_

OK so we must first remove the dataset from the datastore: https://pipelines.lsst.io/modules/lsst.daf.butler/datastores.html

The above says the default configuration values can be inspected at `$DAF_BUTLER_DIR/python/lsst/daf/butler/configs` 

In [None]:
# os.system('ls $DAF_BUTLER_DIR/python/lsst/daf/butler/configs')

In [None]:
# filename_datastore_yaml = '$DAF_BUTLER_DIR/python/lsst/daf/butler/configs/datastore.yaml'

In [None]:
# os.system('more '+filename_datastore_yaml)

There is a `lsst.daf.butler.Datastore.remove` which will _"Indicate to the Datastore that a Dataset can be removed"_. https://pipelines.lsst.io/py-api/lsst.daf.butler.Datastore.html#lsst.daf.butler.Datastore.remove

So now figure out how to use that `Datastore.remove` function on the datasets for my collection.

> **STOP HERE:** not sure that messing with the datastores is the way to go, check with Clare... 

In [None]:
# tmpDatastore = Datastore( ??? )

In [None]:
# tmpButler.datastore.remove( ??? )

In [None]:
# del tmpButler