DM-38824: use string not char so Postgress DDL is not char(1) #177

womullan · 2023-12-15T15:33:53Z

per agreement - the schema for obsplan is put in sdm_schema

JeremyMcCormick

@womullan

Can you fix up a few development workflow issues with this PR?

Update the title so that it starts with "DM-38824:" and is a bit less cryptic
Write a short description of what was done (Why was the change made? How does it relate to the Jira ticket?)
Add the sdm_schemas component on the Jira issue

This will help with the project bookkeeping.

JeremyMcCormick · 2023-12-18T21:44:34Z

@womullan

The VOTable documentation says the following:

Strings, which are defined as a set of characters, can therefore be represented in VOTable as a fixed- or variable-length array of characters

I generated the TAP schema records for this schema from your ticket branch and there is no difference between putting "string" or "char" for the datatype in the YAML file. Either way, the column datatype is "char" in the Columns table within the TAP Schema database.

I think what you really need to do is something like:

datatype: char
length: 8

Even if you use votable:arraysize: "*" within the schema, a valid length still needs to be provided in the YAML file. (This is what the other schemas do.)

If you don't provide a length for these fields, then there is an error when generating the DDL. The "string" type does not actually map to an unbounded type in Felis - in MySQL, it uses VARCHAR, for instance. We would ideally use datatype of "text" for arbitrary length strings (though this is actually bugged right now in Felis and something I plan to fix in the future).

womullan · 2023-12-18T21:54:02Z

when i generate posgress DDL from felis for this schema with string it comes out as varchar instead of char - which is what I wanted. I did not generate TAP schema - but if there is no change there and i get the correct DDL for postgress that seems like the correct solution ...

JeremyMcCormick · 2023-12-18T22:45:03Z

when i generate posgress DDL from felis for this schema with string it comes out as varchar instead of char - which is what I wanted. I did not generate TAP schema - but if there is no change there and i get the correct DDL for postgress that seems like the correct solution ...

I did some local testing on this branch and can confirm that Felis generates valid DDL for PG and creates the tables successfully. It fails in MySQL since VARCHAR requires a length in that database flavor whereas it is optional in PG.

I'd still suggest putting length values on these fields so that it works in MySQL, too. This has been the general practice in other schemas, e.g., a length is still provided even when the VOTable arraysize is "*" to indicate unbounded.

gpdf · 2023-12-19T00:40:57Z

@womullan I'm having trouble figuring out what the end result you want in the database is: do you want CHAR or do you want VARCHAR?

womullan · 2023-12-19T15:14:49Z

i believe varchar is the easiest - we do not know the values most are not fixed

JeremyMcCormick · 2023-12-19T22:24:19Z

i believe varchar is the easiest - we do not know the values most are not fixed

These fields should have a length to follow the current practices in SDM Schemas. There aren't any cases in the existing YAML files where strings are defined without a length, and Felis does not handle this consistently right now. You can just put "length: 128" or "255," based on what you know about the maximum length of the data in these fields.

This is something we plan to improve in the future so that variable length strings are handled properly across database flavors.

gpdf · 2023-12-19T22:44:01Z

So, if you want VARCHAR in the database, then use string as the Felis type. As long as you are working with Postgres as the back end, that should "just work".

We are still working on some Felis issues associated with how string lengths are handled, so we may have to come back and clean something up in 2024. I would try to see if it works for you in Postgres without specifying length at all. If you do find you need to specify a length, it has the semantics of "maximum length", not "exact length", so just try to keep it under 255 unless it really needs to be longer. If you do have to specify a length then for now you should also include

votable:arraysize: "*"

so that on output it doesn't appear to clients as a fixed-length string. This is one of the things we will clean up in 2024; we should be able to provide better defaults with a little work on Felis.

JeremyMcCormick

This seems acceptable now. In the future, the lengths can be taken out if "VARCHAR" with no length is the preferred data type definition in Postgres. (For now including the lengths makes this work with MySQL as well, which is preferable.)

JeremyMcCormick · 2023-12-19T23:36:17Z

If you do have to specify a length then for now you should also include

votable:arraysize: "*"

This was already there for all the fields affected by this PR. I checked locally that the generated TAP schema in Postgres reflects this.

gpdf

Three things:

Unless the standard is mandating the values of the TAP_SCHEMA principal flag that are in the file, I don't think they are set appropriately. If @womullan can confirm that they aren't being imposed externally, I will be happy to recommend -- in fact, even to make a commit with, an updated set.
It looks like there's a Unicode character in the category column's description? Unfortunately the whole IVOA toolchain is not currently UTF-8-clean, and it's not trivial to clean it up (or the IVOA already would have - everyone is well aware of the problem) so I would strongly recommend not using anything other than 7-bit ASCII for now.
It's "lost knowledge of the ancients" to some extent, as to why @brianv0 originally recommended prefixing # to all the @id values. Because some of what Felis is doing in the YAML processing through JSON-LD is not well-understood to the current generation yet, I would recommend not taking any chances with this for now, and just put the # in every @id -- also for consistency/predictability with our usage elsewhere.

womullan · 2023-12-20T21:45:50Z

please go ahead - i did not do that very conciously
unicode char removed - it was some quote or such copied from the spec not needed.
ugly but ok done

gpdf · 2024-02-26T20:06:11Z

Rebasing to sdm_schemas after the CI failures in other files were corrected.

gpdf

Looks good. CI failures fixed.

JeremyMcCormick self-requested a review December 18, 2023 20:05

JeremyMcCormick requested changes Dec 18, 2023

View reviewed changes

womullan changed the title ~~string not char~~ "DM-38824: use string not char so Postgress DDL is not char(1) Dec 19, 2023

JeremyMcCormick changed the title ~~"DM-38824: use string not char so Postgress DDL is not char(1)~~ DM-38824: use string not char so Postgress DDL is not char(1) Dec 19, 2023

JeremyMcCormick self-requested a review December 19, 2023 22:04

JeremyMcCormick approved these changes Dec 19, 2023

View reviewed changes

gpdf requested changes Dec 20, 2023

View reviewed changes

womullan force-pushed the tickets/DM-38824 branch from 340640c to 95b7715 Compare December 21, 2023 00:12

womullan force-pushed the tickets/DM-38824 branch from 95b7715 to 03c96f3 Compare February 26, 2024 13:11

womullan added 3 commits February 26, 2024 12:05

string not char

c50ef1f

add length so MySql DDL still funcitons

e038737

GPDF comments

8726b03

gpdf force-pushed the tickets/DM-38824 branch from 03c96f3 to 8726b03 Compare February 26, 2024 20:05

gpdf approved these changes Feb 26, 2024

View reviewed changes

womullan merged commit 3511464 into main Feb 26, 2024
4 checks passed

womullan deleted the tickets/DM-38824 branch February 26, 2024 20:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-38824: use string not char so Postgress DDL is not char(1) #177

DM-38824: use string not char so Postgress DDL is not char(1) #177

womullan commented Dec 15, 2023 •

edited

JeremyMcCormick left a comment •

edited by womullan

JeremyMcCormick commented Dec 18, 2023 •

edited

womullan commented Dec 18, 2023

JeremyMcCormick commented Dec 18, 2023 •

edited

gpdf commented Dec 19, 2023

womullan commented Dec 19, 2023

JeremyMcCormick commented Dec 19, 2023 •

edited

gpdf commented Dec 19, 2023

JeremyMcCormick left a comment

JeremyMcCormick commented Dec 19, 2023

gpdf left a comment •

edited

womullan commented Dec 20, 2023

gpdf commented Feb 26, 2024

gpdf left a comment

DM-38824: use string not char so Postgress DDL is not char(1) #177

DM-38824: use string not char so Postgress DDL is not char(1) #177

Conversation

womullan commented Dec 15, 2023 • edited

JeremyMcCormick left a comment • edited by womullan

Choose a reason for hiding this comment

JeremyMcCormick commented Dec 18, 2023 • edited

womullan commented Dec 18, 2023

JeremyMcCormick commented Dec 18, 2023 • edited

gpdf commented Dec 19, 2023

womullan commented Dec 19, 2023

JeremyMcCormick commented Dec 19, 2023 • edited

gpdf commented Dec 19, 2023

JeremyMcCormick left a comment

Choose a reason for hiding this comment

JeremyMcCormick commented Dec 19, 2023

gpdf left a comment • edited

Choose a reason for hiding this comment

womullan commented Dec 20, 2023

gpdf commented Feb 26, 2024

gpdf left a comment

Choose a reason for hiding this comment

womullan commented Dec 15, 2023 •

edited

JeremyMcCormick left a comment •

edited by womullan

JeremyMcCormick commented Dec 18, 2023 •

edited

JeremyMcCormick commented Dec 18, 2023 •

edited

JeremyMcCormick commented Dec 19, 2023 •

edited

gpdf left a comment •

edited