Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-38824: use string not char so Postgress DDL is not char(1) #177

Merged
merged 3 commits into from Feb 26, 2024

Conversation

womullan
Copy link
Member

@womullan womullan commented Dec 15, 2023

per agreement - the schema for obsplan is put in sdm_schema

Copy link
Collaborator

@JeremyMcCormick JeremyMcCormick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@womullan

Can you fix up a few development workflow issues with this PR?

  • Update the title so that it starts with "DM-38824:" and is a bit less cryptic
  • Write a short description of what was done (Why was the change made? How does it relate to the Jira ticket?)
  • Add the sdm_schemas component on the Jira issue

This will help with the project bookkeeping.

@JeremyMcCormick
Copy link
Collaborator

JeremyMcCormick commented Dec 18, 2023

@womullan

The VOTable documentation says the following:

Strings, which are defined as a set of characters, can therefore be represented in VOTable as a fixed- or variable-length array of characters

I generated the TAP schema records for this schema from your ticket branch and there is no difference between putting "string" or "char" for the datatype in the YAML file. Either way, the column datatype is "char" in the Columns table within the TAP Schema database.

I think what you really need to do is something like:

datatype: char
length: 8

Even if you use votable:arraysize: "*" within the schema, a valid length still needs to be provided in the YAML file. (This is what the other schemas do.)

If you don't provide a length for these fields, then there is an error when generating the DDL. The "string" type does not actually map to an unbounded type in Felis - in MySQL, it uses VARCHAR, for instance. We would ideally use datatype of "text" for arbitrary length strings (though this is actually bugged right now in Felis and something I plan to fix in the future).

@womullan
Copy link
Member Author

when i generate posgress DDL from felis for this schema with string it comes out as varchar instead of char - which is what I wanted. I did not generate TAP schema - but if there is no change there and i get the correct DDL for postgress that seems like the correct solution ...

@JeremyMcCormick
Copy link
Collaborator

JeremyMcCormick commented Dec 18, 2023

when i generate posgress DDL from felis for this schema with string it comes out as varchar instead of char - which is what I wanted. I did not generate TAP schema - but if there is no change there and i get the correct DDL for postgress that seems like the correct solution ...

I did some local testing on this branch and can confirm that Felis generates valid DDL for PG and creates the tables successfully. It fails in MySQL since VARCHAR requires a length in that database flavor whereas it is optional in PG.

I'd still suggest putting length values on these fields so that it works in MySQL, too. This has been the general practice in other schemas, e.g., a length is still provided even when the VOTable arraysize is "*" to indicate unbounded.

@gpdf
Copy link
Collaborator

gpdf commented Dec 19, 2023

@womullan I'm having trouble figuring out what the end result you want in the database is: do you want CHAR or do you want VARCHAR?

@womullan
Copy link
Member Author

i believe varchar is the easiest - we do not know the values most are not fixed

@womullan womullan changed the title string not char "DM-38824: use string not char so Postgress DDL is not char(1) Dec 19, 2023
@JeremyMcCormick JeremyMcCormick changed the title "DM-38824: use string not char so Postgress DDL is not char(1) DM-38824: use string not char so Postgress DDL is not char(1) Dec 19, 2023
@JeremyMcCormick
Copy link
Collaborator

JeremyMcCormick commented Dec 19, 2023

i believe varchar is the easiest - we do not know the values most are not fixed

These fields should have a length to follow the current practices in SDM Schemas. There aren't any cases in the existing YAML files where strings are defined without a length, and Felis does not handle this consistently right now. You can just put "length: 128" or "255," based on what you know about the maximum length of the data in these fields.

This is something we plan to improve in the future so that variable length strings are handled properly across database flavors.

@gpdf
Copy link
Collaborator

gpdf commented Dec 19, 2023

So, if you want VARCHAR in the database, then use string as the Felis type. As long as you are working with Postgres as the back end, that should "just work".

We are still working on some Felis issues associated with how string lengths are handled, so we may have to come back and clean something up in 2024. I would try to see if it works for you in Postgres without specifying length at all. If you do find you need to specify a length, it has the semantics of "maximum length", not "exact length", so just try to keep it under 255 unless it really needs to be longer. If you do have to specify a length then for now you should also include

votable:arraysize: "*"

so that on output it doesn't appear to clients as a fixed-length string. This is one of the things we will clean up in 2024; we should be able to provide better defaults with a little work on Felis.

Copy link
Collaborator

@JeremyMcCormick JeremyMcCormick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems acceptable now. In the future, the lengths can be taken out if "VARCHAR" with no length is the preferred data type definition in Postgres. (For now including the lengths makes this work with MySQL as well, which is preferable.)

@JeremyMcCormick
Copy link
Collaborator

If you do have to specify a length then for now you should also include

votable:arraysize: "*"

This was already there for all the fields affected by this PR. I checked locally that the generated TAP schema in Postgres reflects this.

Copy link
Collaborator

@gpdf gpdf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three things:

  1. Unless the standard is mandating the values of the TAP_SCHEMA principal flag that are in the file, I don't think they are set appropriately. If @womullan can confirm that they aren't being imposed externally, I will be happy to recommend -- in fact, even to make a commit with, an updated set.

  2. It looks like there's a Unicode character in the category column's description? Unfortunately the whole IVOA toolchain is not currently UTF-8-clean, and it's not trivial to clean it up (or the IVOA already would have - everyone is well aware of the problem) so I would strongly recommend not using anything other than 7-bit ASCII for now.

  3. It's "lost knowledge of the ancients" to some extent, as to why @brianv0 originally recommended prefixing # to all the @id values. Because some of what Felis is doing in the YAML processing through JSON-LD is not well-understood to the current generation yet, I would recommend not taking any chances with this for now, and just put the # in every @id -- also for consistency/predictability with our usage elsewhere.

@womullan
Copy link
Member Author

  1. please go ahead - i did not do that very conciously
  2. unicode char removed - it was some quote or such copied from the spec not needed.
  3. ugly but ok done

@gpdf
Copy link
Collaborator

gpdf commented Feb 26, 2024

Rebasing to sdm_schemas after the CI failures in other files were corrected.

Copy link
Collaborator

@gpdf gpdf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. CI failures fixed.

@womullan womullan merged commit 3511464 into main Feb 26, 2024
4 checks passed
@womullan womullan deleted the tickets/DM-38824 branch February 26, 2024 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants