Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot handle indexed columns with uppercase characters #25

Open
RodrigoCR opened this issue Aug 2, 2017 · 4 comments
Open

Cannot handle indexed columns with uppercase characters #25

RodrigoCR opened this issue Aug 2, 2017 · 4 comments

Comments

@RodrigoCR
Copy link

RodrigoCR commented Aug 2, 2017

I'm trying to copy data from a MS SQL server to a MYSQL server.

I'm following the standard usage example, just adding the included_tables parameter (to 1 table so I can test how much time and resources I will need to copy the remaining tables).

Everything seems to run fine (reading the schema, loading indexes and building query) but almost at the end of the process I get the following error:

ETLAlchemySource (INFO) - Unique columns are '[Column('IdCliente', INTEGER(display_width=11), table=<Cliente>, primary_key=True, nullable=False)]'
ETLAlchemySource (INFO) - Creating 'upsert' statements for '100000' rows, and dumping to 'Cliente.sql'.
Traceback (most recent call last):
  File "copy_with_etl.py", line 11, in <module>
    target.migrate()
  File "/usr/local/lib/python2.7/site-packages/etlalchemy/ETLAlchemyTarget.py", line 86, in migrate
    migrate_data=migrate_data)
  File "/usr/local/lib/python2.7/site-packages/etlalchemy/ETLAlchemySource.py", line 1140, in migrate
    pks, Session)
  File "/usr/local/lib/python2.7/site-packages/etlalchemy/ETLAlchemySource.py", line 840, in dump_data
    uid += str(row[self.current_ordered_table_columns.index(pk)])
ValueError: u'idcliente' is not in list

Why is it happening that the unique columns list is not empty but when reading, it tries to find a string and not an object?


Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

@simnim
Copy link

simnim commented Aug 4, 2017

The tool breaks when there are upper case characters in column names.

I started debugging the tool today for the same reason. The tool lower cases all the pk column names here ETLAlchemySource.py#L1034 but only has the original case column names in the self.current_ordered_table_columns. So naturally the index(pk) fails.

@RodrigoCR
Copy link
Author

@simnim so removing .lower() should work? Cause I can't change the origin table column names.

@seanharr11 seanharr11 changed the title Primary key not found in self.current_ordered_table_columns.index(pk) - ValueError: u'pkey' is not in list Cannot handle columns with uppercase characters Aug 28, 2017
@seanharr11
Copy link
Owner

The lower() calls were a quick-bandaid-fix related to case-sensitivity across databases.

For instance, take the following example when Oracle (Target) has case insensitive column names, and SQL Server (Source) has case sensitive column names.

Image what happens when you move the column IdCliente from SQL Server to Oracle, given your suggestion to remove lower() above. As far as ETLAlchemy knows, the column remains IdCliente in all the references throughout the tool. But....

Once the schema gets moved over, the column becomes just idcliente in the source's MetaData() when reflected via SQLAlchemy.

Now, when you try to move the data between the sources, there is no automagical mapping from source -> target (b/c the columns have different names!). The tool will try to find IdCliente in the table, but it does not exist with an uppercase "I" and "C". This was the reason for coercing columns to lowercase.

**Solution 1: ** The easy solution would be to either fix the description of this issue, and ensure that the self.current_ordered_table_columns list is a list of lowercase columns.

**Solution 2: ** The harder, and better solution is to write a wrapper around the various calls to SQLAlchemy's MetaData.tables['table_name'].columns.get('columnName') function to do a *case-insensitive search for a column in the MetaData. This way, we isolate the DRY out logic to one place in the code, while still supporting case sensitivity between 2 databases that are both case-sensitive (like MySQL and SQL Server).

Make sense? Thoughts?

@seanharr11 seanharr11 changed the title Cannot handle columns with uppercase characters Cannot handle indexed columns with uppercase characters Aug 28, 2017
@original-heart
Copy link

@seanharr11 how to set the table name with uppercase?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants