Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Timestamp formatting in index name breaks Cascading and Pig integrations #985
As described in the documentation:
What this boils down to is that you can use a pattern like
Cascading and Pig break when this formatting is applied in the index part of the name. There is a lot of logic in both Cascading and Pig that expects that the Tap or Load function refer to resources that can be resolved on HDFS. Generally this is not a problem since the paths are processed and do not contain any host address in them, but when the pattern is specified, the colon trips up the path parsing and causes the code to throw an exception.
There's really no way to change this behavior since much of the code lives on Cascading and Pig's side, and when the input classes are queried they must respond with an absolute path to the resource. In Cascading's case this is low impact since we could just return a placeholder and be on our way, but Pig requires the resource to be real, as it will later pass the resource to the load function for loading, and modifying it in any way would cause the load function to not receive the index and type correctly.
To get around this we will need to change the separator in the parsing of formatted patterns. In 6.0 only the pipe character "