Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DID data type to transfer and deletion events #4557

Closed
dchristidis opened this issue Apr 20, 2021 · 7 comments
Closed

Add DID data type to transfer and deletion events #4557

dchristidis opened this issue Apr 20, 2021 · 7 comments

Comments

@dchristidis
Copy link
Contributor

Motivation

This will improve our monitoring capabilities, allowing for better understanding of data flows.

Modification

Enrich the transfer and deletion events with the DID data type. It’s fine if it’s unset or some communities do not use it at all. We should carefully evaluate the performance cost.

@cserf
Copy link
Contributor

cserf commented Nov 16, 2021

🤔
I'm not sure I understand this issue. Deletion and transfers always apply to files. Can you elaborate what is needed ?

@dchristidis
Copy link
Contributor Author

This came up as a request from the sites; they would like to be able to identify what kind of files are being deleted. On the DDM Dashboard, it appears to be useful to be able to group by data type. For example, mc15_pPb8TeV:EVNT.18626196._000899.pool.root.1 should appear as EVNT, mc16_13TeV:HITS.26797763._039273.pool.root.1 should appear as HITS (in short, the datatype from the file’s metadata).

@bari12
Copy link
Member

bari12 commented Dec 10, 2021

I think it would be much easier if this is added to the monitoring/dashboard pipeline. Adding the datatype hardcoded to the events is not very elegant for Rucio, since that field has no meaning outside ATLAS/CMS. So we would have to do something configurable. I wonder if it is just easier to address this in the monitoring pipeline.

@rcarpa
Copy link
Contributor

rcarpa commented Mar 1, 2022

What was the decision about this one ?

@bari12
Copy link
Member

bari12 commented Mar 2, 2022

I don't remember discussing this anymore. @dchristidis @cserf any info?
Adding it to the deletion message is not that problematic, I guess.

@maany
Copy link
Member

maany commented May 25, 2022

I support the idea to add this to the monitoring pipeline, @dchristidis do you prefer to get this info on some dashboards or from rucio directly?

@bari12
Copy link
Member

bari12 commented May 25, 2022

Cedric and I just discussed this: It should be the datatype (link) field from the dids table. Simply from the name it is not consistent.
Probably should also be added for the submitter.

Let's evaluate where we can most efficiently get the did information pulled in. Join in an existing query?

rcarpa added a commit to rcarpa/rucio that referenced this issue Jun 14, 2022
The commit takes quite a dirty approach of retrieving the `datatype`
directly via a sql query when preparing the message. There is
no easy way to improve that:
- In all code path, there is a lot of logic between the moment
when we retrieve the work queue from the database and the moment
when we sent the message. Forwarding the datatype through all the
call stack will make the code more complicated.
- We cannot import from core.dids here, because it creates a circular
import problem. So using existing get_metadata calls is not easily
achievable to avoid a raw database call.
rcarpa added a commit to rcarpa/rucio that referenced this issue Jul 1, 2022
rcarpa added a commit to rcarpa/rucio that referenced this issue Jul 1, 2022
The commit takes quite a dirty approach of retrieving the `datatype`
directly via a sql query when preparing the message. There is
no easy way to improve that:
- In all code path, there is a lot of logic between the moment
when we retrieve the work queue from the database and the moment
when we sent the message. Forwarding the datatype through all the
call stack will make the code more complicated.
- We cannot import from core.dids here, because it creates a circular
import problem. So using existing get_metadata calls is not easily
achievable to avoid a raw database call.
@bari12 bari12 closed this as completed in efb9624 Jul 1, 2022
bari12 pushed a commit that referenced this issue Jul 1, 2022
The commit takes quite a dirty approach of retrieving the `datatype`
directly via a sql query when preparing the message. There is
no easy way to improve that:
- In all code path, there is a lot of logic between the moment
when we retrieve the work queue from the database and the moment
when we sent the message. Forwarding the datatype through all the
call stack will make the code more complicated.
- We cannot import from core.dids here, because it creates a circular
import problem. So using existing get_metadata calls is not easily
achievable to avoid a raw database call.
@bari12 bari12 added this to the 1.28.7 milestone Jul 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants