New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot specify multiple column families for a new HBase table that shc creates on the fly #121
Comments
The fix is in #114 which has been in branch 2.1. Could you please share the stack trace? |
@weiqingy : Here is the stack trace. Thanks.
|
It was supposed to work. I think your case is similar with the test cases here, but the test cases work well. Is there any difference between your catalog definition and the one in the test case? |
@weiqingy : The catalog definition was all primitive types, so nothing different, I don't think. I re-pulled from the remote origin and retried (last evening), and this time it appeared to work OK. I had noticed after my first pull earlier in the week that I didn't see #114 in the git history via tig. After this repull, I do see #114 in the git history. I'm not sure why it wouldn't have come down in the first pull, since it's time stamped April 2nd, but that appears to be the case -- something only made it appear in the master of 2.1 sometime in between my first pull and yesterday. Thanks. |
@khampson Great! Then let's close this issue. Thanks. :) |
When writing out a table to HBase from a dataframe with the following code:
df.write.options( Map(HBaseTableCatalog.tableCatalog -> HbaseCommon.catalogMitLicResults(WriteTableName), HBaseTableCatalog.newTable -> HbaseNumRegions)) .format("org.apache.spark.sql.execution.datasources.hbase") .save()
It works OK if I define all columns in the catalog in the same column family -- let's say
i
. But I decided I wanted to put a couple columns in a different column family, as they were larger fields, and not necessarily needed all the time, so I wanted to save pulling them in on a regular scan unless specifically requested.However, when running with this catalog definition in place -- all columns but two in cf
i
, and two columns in another cfm
, the Spark job failed saying org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family m does not exist in region, so it would appear that shc is not carrying through the column family definition that is outlined in the catalog.Thoughts?
Thanks!
The text was updated successfully, but these errors were encountered: