Setting indexes in the sqlmodel tables automatically #105

nsheff · 2023-10-27T11:21:02Z

In databio/bbconf#19 I raised a point about how we were selecting based on a non-identifier column, and how this therefore needed an index.

I had to manually add the index in the postgres server, which isn't ideal... it would be better if any necessary indexes could be specified in the json schema.

I'd recommend adding an index: true parameter option, and then when we read the schema, automatically setting the indexes on those columns.

The text was updated successfully, but these errors were encountered:

donaldcampbelljr · 2023-10-27T13:26:33Z

Currently, id is the primary key:

 Field(default=None, primary_key=True),

For indexing other definitions, I believe the syntax is:

 Field(default=None, index=True)

So parsed_schema should be able to add this if it is detected in the output schema. Should happen here:

pipestat/pipestat/backends/db_backend/db_parsed_schema.py

Lines 135 to 164 in 6301b86

    
           def _make_field_definitions(self, data: Dict[str, Any], require_type: bool): 
        
               # TODO: default to string if no type key? 
        
               # TODO: parse "required" ? 
        
               defs = {} 
        
               for name, subdata in data.items(): 
        
                   try: 
        
                       typename = subdata[SCHEMA_TYPE_KEY] 
        
                   except KeyError: 
        
                       if require_type: 
        
                           _LOGGER.error(f"'{SCHEMA_TYPE_KEY}' is required for each schema element") 
        
                           raise 
        
                       else: 
        
                           data_type = str 
        
                   else: 
        
                       data_type = self._get_data_type(typename) 
        
                   if data_type == CLASSES_BY_TYPE["object"] or data_type == CLASSES_BY_TYPE["array"]: 
        
                       defs[name] = ( 
        
                           data_type, 
        
                           Field(sa_column=Column(JSONB), default=null()), 
        
                       ) 
        
                   else: 
        
                       defs[name] = ( 
        
                           # Optional[subdata[SCHEMA_TYPE_KEY]], 
        
                           # subdata[SCHEMA_TYPE_KEY], 
        
                           # Optional[str], 
        
                           # CLASSES_BY_TYPE[subdata[SCHEMA_TYPE_KEY]], 
        
                           data_type, 
        
                           Field(default=subdata.get("default")), 
        
                       ) 
        
               return defs

donaldcampbelljr · 2024-02-01T16:12:46Z

This functionality was added in v0.8.0. Closing this issue.

nsheff added enhancement New feature or request priority-low labels Oct 27, 2023

nsheff modified the milestone: v0.7.0 Oct 27, 2023

donaldcampbelljr added a commit that referenced this issue Jan 24, 2024

Solution and Test for #105

7311502

donaldcampbelljr mentioned this issue Jan 24, 2024

Set index if specified in output schema #141

Merged

donaldcampbelljr added this to the 0.8.0 milestone Jan 24, 2024

donaldcampbelljr added a commit that referenced this issue Jan 24, 2024

add docs and defensive check for bool #105

9bf6bca

donaldcampbelljr mentioned this issue Jan 24, 2024

v0.8.0 release #142

Merged

donaldcampbelljr closed this as completed Feb 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting indexes in the sqlmodel tables automatically #105

Setting indexes in the sqlmodel tables automatically #105

nsheff commented Oct 27, 2023

donaldcampbelljr commented Oct 27, 2023

donaldcampbelljr commented Feb 1, 2024

Setting indexes in the sqlmodel tables automatically #105

Setting indexes in the sqlmodel tables automatically #105

Comments

nsheff commented Oct 27, 2023

donaldcampbelljr commented Oct 27, 2023

donaldcampbelljr commented Feb 1, 2024