Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better types for table columns (VARCHAR instead of TEXT) #824

Closed
smacker opened this issue May 22, 2019 · 4 comments · Fixed by #916
Closed

Better types for table columns (VARCHAR instead of TEXT) #824

smacker opened this issue May 22, 2019 · 4 comments · Fixed by #916
Assignees
Labels
enhancement New feature or request

Comments

@smacker
Copy link

smacker commented May 22, 2019

Currently query

describe table commits;

produces

+---------------------+----------+
| name                | type     |
+---------------------+----------+
| repository_id       | TEXT     |
| commit_hash         | TEXT     |
| commit_author_name  | TEXT     |
| commit_author_email | TEXT     |
| commit_author_when  | DATETIME |
| committer_name      | TEXT     |
| committer_email     | TEXT     |
| committer_when      | DATETIME |
| commit_message      | TEXT     |
| tree_hash           | TEXT     |
| commit_parents      | JSON     |
+---------------------+----------+

Which is technically correct but columns like repository_id, commit_hash, commit_author_name, ... better be VARCHAR

By semantic TEXT mean huge chunk of text and due to it unique. While VARCHAR holds reasonable amount of text and often repeats.
No wonder people rely on it. For example Superset makes columns gropable and filterable when they are VARCHAR but doesn't do it for TEXT.

Source{d} product has a bug because of it. And it might be useful in some other use cases too.

@smola smola changed the title Better types for table columns Better types for table columns (VARCHAR instead of TEXT) May 27, 2019
@smola smola added the enhancement New feature or request label May 27, 2019
@juanjux juanjux self-assigned this Jun 3, 2019
@smacker
Copy link
Author

smacker commented Jun 27, 2019

any progress on it? Looks like the same issue produces lots of tracebacks during bootstrapping of srcd-ce (which don't break anything but annoying).

@juanjux
Copy link
Contributor

juanjux commented Jun 27, 2019

Some, but I paused it when transferred to solutions, since it's giving problems I'll retake it tomorrow.

@smacker
Copy link
Author

smacker commented Jun 27, 2019

Looks like not all of them produced by TEXT fields but the most anyway.
The full log of errors on bootstrapping: https://gist.github.com/smacker/855a224e46d123b426638c341fa681bf

@juanjux
Copy link
Contributor

juanjux commented Jun 28, 2019

Grepping, sorting, and uniq-ing the types on that log for easier reading:

blobs.blob_content
blobs.blob_hash
blobs.blob_size
blobs.repository_id
commit_blobs.blob_hash
commit_blobs.commit_hash
commit_blobs.repository_id
commit_files.blob_hash
commit_files.commit_hash
commit_files.file_path
commit_files.repository_id
commit_files.tree_hash
commits.commit_author_email
commits.commit_author_name
commits.commit_author_when
commits.commit_hash
commits.commit_message
commits.commit_parents
commits.committer_email
commits.committer_name
commits.committer_when
commits.repository_id
commits.tree_hash
commit_trees.commit_hash
commit_trees.repository_id
commit_trees.tree_hash
files.blob_content
files.blob_hash
files.blob_size
files.file_path
files.repository_id
files.tree_entry_mode
files.tree_hash
ref_commits.commit_hash
ref_commits.history_index
ref_commits.ref_name
ref_commits.repository_id
refs.commit_hash
refs.ref_name
refs.repository_id
remotes.remote_fetch_refspec
remotes.remote_fetch_url
remotes.remote_name
remotes.remote_push_refspec
remotes.remote_push_url
remotes.repository_id
repositories.repository_id
tree_entries.blob_hash
tree_entries.repository_id
tree_entries.tree_entry_mode
tree_entries.tree_entry_name
tree_entries.tree_hash

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants