Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

custom metadata value being displayed instead of title for file #156

Open
Nithyavasudevan opened this issue Aug 31, 2014 · 22 comments
Open

Comments

@Nithyavasudevan
Copy link

A value given for a custom metadata key field is displaying as the title for a file. This file was created directly on the api. Title is stored correctly in the title field and "name" is stored as a key with value of "bond"

Below is the screenshot of the data created using test tool directly on platform
image

Note the file getting listed incorrectly within dataset

image

Note that no custom metadata is listed. Checked all these values in the db on the platform. They seem to be stored correctly.

@adamamyl
Copy link
Contributor

adamamyl commented Sep 1, 2014

Difficulties with CTPEC on Name/Title/Descriptions all being used for names/labels, probably involved here.

@joetsoi
Copy link
Contributor

joetsoi commented Sep 4, 2014

It looks like this is we just have some bad test data here, it looks like the dataset wasn't cleared out before this new one was collected.

If you look at the dataset name (in the second screenshot) it ends in ba92 which would be the final four characters of the dataset id.

In the first screenshot it there is not 'ba92' anywhere in the dataset id, this means the two screenshots are completly unrelated. Now that we have repointed to UAT it will be worth retesting if this is still a problem

@adamamyl
Copy link
Contributor

adamamyl commented Sep 9, 2014

@Nithyavasudevan can you re-test this, once we have clean data (post MS deploy, post our reset)

@Nithyavasudevan
Copy link
Author

will do. what was the actual fix done please? So u understand what to test and also impact?

@adamamyl
Copy link
Contributor

adamamyl commented Sep 9, 2014

@Nithyavasudevan
Copy link
Author

we will discuss this on the call

@Nithyavasudevan
Copy link
Author

Note. Not bad data. Note the file name in first and second screenshot
image
image

@amercader
Copy link
Member

OK, this was more complex than expected and the potential fix might not be entirely satisfying. The way custom fields in resources are handled in CKAN, any field not part of the main schema will be added to the main object before creating / updating, eg:

{
    "Title": "Some file",
    "Quality": 1,
    "Metadata": {
        "custom_field": "custom value"
    }
   ...
}

Gets translated into this CKAN equivalent:

{
"name": "Some file",
"quality": 1,
"custom_field": "custom value",
...
}

Note that the platform field Title is named name in CKAN. If a custom field is using this key, it will incorrectly replace the Title one:

{
    "Title": "Some file",
    "Quality": 1,
    "Metadata": {
        "name": "custom value"
    }
   ...
}

will turn into:

{
"name": "custom value",
"quality": 1,
...
}

We can prevent custom fields with reserved keywords from being imported (eg ignore any custom field with a name key). This is a good idea anyway to prevent changing fields like the url, type, etc via custom fields, but it will mean that right now on some edge cases datasets in the platform and on CKAN have different custom fields. This is the only compromise we can apply at this point.

@Nithyavasudevan
Copy link
Author

@amercader when you say prevent it from being imported, do you mean stop user from creating these on ckan?

@stevenlivz
Copy link

So this happens at ALL levels for custom metadata. Everything in the hierarchy (as it may be a serialized lightweight object rather than a single key/value pair) is normalized to a scalar key/value dictionary list??

Even with a basic approach to using namespaces "custommetadata.name" which is a 5 minute job, it seems this will not work as everything is normalized to a single key/value list. Very limiting indeed if the case. You'd need to convert to "level.custommetadata.name" as the key to get near to this working well.

@stevenlivz
Copy link

My question is, you can't use "name" in the custom metadata as it clashes with the core metadata - which i get.

However, within the metadata, if i use the key "testkey" for two different objects serialized in the JSON, will i only get the value for one of them? e.g.

{
"title":"steven",
"Metadata": {

   "source": {
    "email":"a@b.com",
    "phone":"999"
    },
   "approver": {
    "email":"c@d.com",
    "phone":"111"
    }
}

}

@amercader
Copy link
Member

So this happens at ALL levels for custom metadata. Everything in the hierarchy (as it may be a serialized lightweight object rather than a single key/value pair) is normalized to a scalar key/value dictionary list??

However, within the metadata, if i use the key "testkey" for two different objects serialized in the JSON, will i only get the value for one of them? e.g.

I'm afraid I don't know what you mean in both your comments

Custom field values are serialized as a string so if you input a JSON object it will get serialized as such:

"title": "steven",
"approver": "{\"email\":\"a@b.com\", \"phone\":\"999\"}",
"source": "{\"email\":\"a@b.com\", \"phone\":\"999\"}",
...

See eg:

https://ckanfrontend.cloudapp.net/dataset/dataset-uat-9-9

https://ckanfrontend.cloudapp.net/dataset/dataset-uat-9-9/resource/82e0f207-3dc0-4600-8bdf-33c9157fcf79

Even with a basic approach to using namespaces "custommetadata.name" which is a 5 minute job, it seems this will not work as everything is normalized to a single key/value list. Very limiting indeed if the case. You'd need to convert to "level.custommetadata.name" as the key to get near to this working well.

As I mentioned before this can be done, but it is not a 5 minute job. I don't think we would need the level.custommetadata.name approach at all, because as I mentioned all values get serialized as strings.

@stevenlivz
Copy link

Glad to hear it is serialized as a string - this is what I was hoping.

I was actually using your example where you said "any field not part of the main schema will be added to the main object before creating / updating" which doesn't sound technically true given your statement above says "Custom field values are serialized as a string".

You said:
{
"Title": "Some file",
"Quality": 1,
"Metadata": {
"name": "custom value"
}
...
}

goes to

{
"name": "custom value",
"quality": 1,
...
}

Which could then mean by extension, if everything is stored as a single dictionary that :

{
"Title": "Some file",
"Quality": 1,
"Metadata": {
"name": "custom value",
"moredata":{
"name":"other custom value"
}
}
...
}

would go to something like:

{
"name": "custom value", <- or "other custom value" ??
"quality": 1,
...
}

@amercader
Copy link
Member

On your example:

{
"Title": "Some file",
"Quality": 1,
"Metadata": {
    "name": "custom value",
    "moredata":{
        "name":"other custom value"
    }
}

will get transformed into:

{
"name": "custom value",
"quality": 1,
"moredata": "{\"name\": \"other custom value\"}"
...
}

We obviously need to prevent "name": "custom value", (which is what this issue was originally about), but the name keys inside nested dicts in values should not be a problem.

@stevenlivz
Copy link

Lovely - that looks great Adria. Thanks. @Nithyavasudevan - will we need to just check this scenario but is what i was hoping 👍

amercader added a commit that referenced this issue Sep 10, 2014
This is for resources. For datasets it's implemented on CKAN core
(2.2.1, see ckan/ckan#1894
amercader added a commit that referenced this issue Sep 10, 2014
This is for datasets and resources. This should not happen if they were
created via CKAN as there is validation in place, but for datasets and
files created elsewhere, if the custom field has a key which exists in
the CKAN schema it will be ignored.
@amercader
Copy link
Member

@Nithyavasudevan, @stevenlivz As agreed, these two features are ready for testing:

  1. On the dataset and resource forms a validation error is shown if one of the keys in the custom fields exists in the core schema (eg name or quality) (see image below)
  2. When importing datasets and resources via changelog harvesting, any custom field which has a key that exists in the core schema is ignored.

s4dzqjn

@adamamyl
Copy link
Contributor

🌟

@stevenlivz
Copy link

Reserved words are here https://gist.github.com/amercader/ef1d24fc63ad4e308277

@Nithyavasudevan
Copy link
Author

Based on our agreement that this list is ok, we can close this. This list will need to be documented in tfs for future reference..

@Nithyavasudevan
Copy link
Author

The error messages still need checked.
I created a dataset with custom metadata fields save, before, _before and private. But the error message i got was Extras: There is a schema field with the same name
image

image

@mcginns
Copy link

mcginns commented Sep 17, 2014

Can we tidy up the error message that is presented to the user in this situation pls?
Something like:

Sorry you cannot use key for Custom Field. Please choose an alternative.

.....whatever fits in with the standard lingo used for other user errors in CKAN as long as it's a bit more meaningful than current error.

@mcginns
Copy link

mcginns commented Sep 25, 2014

IS anyone looking at tidying up the error message on this to make it a bit more user friendly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants