New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-38824: use string not char so Postgress DDL is not char(1) #177
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you fix up a few development workflow issues with this PR?
- Update the title so that it starts with "DM-38824:" and is a bit less cryptic
- Write a short description of what was done (Why was the change made? How does it relate to the Jira ticket?)
- Add the sdm_schemas component on the Jira issue
This will help with the project bookkeeping.
The VOTable documentation says the following:
I generated the TAP schema records for this schema from your ticket branch and there is no difference between putting "string" or "char" for the datatype in the YAML file. Either way, the column datatype is "char" in the Columns table within the TAP Schema database. I think what you really need to do is something like:
Even if you use If you don't provide a length for these fields, then there is an error when generating the DDL. The "string" type does not actually map to an unbounded type in Felis - in MySQL, it uses VARCHAR, for instance. We would ideally use datatype of "text" for arbitrary length strings (though this is actually bugged right now in Felis and something I plan to fix in the future). |
when i generate posgress DDL from felis for this schema with string it comes out as varchar instead of char - which is what I wanted. I did not generate TAP schema - but if there is no change there and i get the correct DDL for postgress that seems like the correct solution ... |
I did some local testing on this branch and can confirm that Felis generates valid DDL for PG and creates the tables successfully. It fails in MySQL since VARCHAR requires a length in that database flavor whereas it is optional in PG. I'd still suggest putting length values on these fields so that it works in MySQL, too. This has been the general practice in other schemas, e.g., a length is still provided even when the VOTable arraysize is "*" to indicate unbounded. |
@womullan I'm having trouble figuring out what the end result you want in the database is: do you want |
i believe varchar is the easiest - we do not know the values most are not fixed |
These fields should have a length to follow the current practices in SDM Schemas. There aren't any cases in the existing YAML files where strings are defined without a length, and Felis does not handle this consistently right now. You can just put "length: 128" or "255," based on what you know about the maximum length of the data in these fields. This is something we plan to improve in the future so that variable length strings are handled properly across database flavors. |
So, if you want We are still working on some Felis issues associated with how string lengths are handled, so we may have to come back and clean something up in 2024. I would try to see if it works for you in Postgres without specifying
so that on output it doesn't appear to clients as a fixed-length string. This is one of the things we will clean up in 2024; we should be able to provide better defaults with a little work on Felis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems acceptable now. In the future, the lengths can be taken out if "VARCHAR" with no length is the preferred data type definition in Postgres. (For now including the lengths makes this work with MySQL as well, which is preferable.)
This was already there for all the fields affected by this PR. I checked locally that the generated TAP schema in Postgres reflects this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Three things:
-
Unless the standard is mandating the values of the TAP_SCHEMA
principal
flag that are in the file, I don't think they are set appropriately. If @womullan can confirm that they aren't being imposed externally, I will be happy to recommend -- in fact, even to make a commit with, an updated set. -
It looks like there's a Unicode character in the
category
column'sdescription
? Unfortunately the whole IVOA toolchain is not currently UTF-8-clean, and it's not trivial to clean it up (or the IVOA already would have - everyone is well aware of the problem) so I would strongly recommend not using anything other than 7-bit ASCII for now. -
It's "lost knowledge of the ancients" to some extent, as to why @brianv0 originally recommended prefixing
#
to all the@id
values. Because some of what Felis is doing in the YAML processing through JSON-LD is not well-understood to the current generation yet, I would recommend not taking any chances with this for now, and just put the#
in every@id
-- also for consistency/predictability with our usage elsewhere.
|
340640c
to
95b7715
Compare
95b7715
to
03c96f3
Compare
03c96f3
to
8726b03
Compare
Rebasing to sdm_schemas after the CI failures in other files were corrected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. CI failures fixed.
per agreement - the schema for obsplan is put in sdm_schema