-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-19204 Spark-HPCC class(es) to store data on an RDD and a Dataframe to THOR #28
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpmcmu looks good. a few comments.
|
||
public class HpccFileWriter implements Serializable { | ||
static private final long serialVersionUID = 1L; | ||
static private final int DefaultExpiryTimeSecs = 300; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the same as the fileaccess expiry, we should consider sharing the default.
HPCCFile will have static private final int DEFAULT_ACCESS_EXPIRY_SECONDS = 120;
Perhaps it belongs in wsclient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we may want to have different expiry times for write vs read. I would imagine read might benefit from longer expiration times. Maybe this is something we can review when we look into access token renewal?
} | ||
|
||
public enum HpccCluster { | ||
THOR("thor"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HPCCWsclient getAvailableClusterNames might provide this set of clusters
|
||
/** | ||
* HpccFileWriter Constructor | ||
* Attempts to open a connection to the specified HPCC cluster and validate the user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets mention that the target host and port are eclwatch host and port
* HpccFileWriter Constructor | ||
* Attempts to open a connection to the specified HPCC cluster and validate the user. | ||
* @param connectionString of format {http|https}://{HOST}:{PORT} | ||
* @param user a valid account |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ecwatch account and pass
*/ | ||
public HpccFileWriter(String connectionString, String user, String pass) throws Exception { | ||
// Verify connection & password | ||
Pattern connectionRegex = Pattern.compile("(http|https)://([^:]+):([0-9]+)",Pattern.CASE_INSENSITIVE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense to compile the regex once for the class? final static?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated this to make it final but left it in the constructor because it is not used elsewhere in the class at the moment
DFUFileCopyWrapper[] filePartCopies = fileParts[i].getCopies(); | ||
if (filePartCopies.length == 0) { | ||
abortFileCreation(); | ||
throw new Exception("File creation error: File part: " + i + " does not have an file copies associated with it. Aborting write."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i + 1?
Function2<Integer, Iterator<Row>, Iterator<FilePartWriteResults>> writeFunc = | ||
(Integer partitionIndex, Iterator<Row> it) -> { | ||
|
||
InetAddress localAddress = getLocalAddress(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible localAddress.getHostAddress() could be == to the 2nd NIC's address?
if so, perhaps we need a isLocalAddress(String ip) , which iterates over all available NICs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would mean that the system NIC order changed in between getting the partition map and starting the write. If this were to happen I think it would be better to bail
FilePartWriteResults result = new FilePartWriteResults(); | ||
|
||
// Make the parent dir if it does not exist | ||
String fileWriteDir = filePartPath.substring(0,filePartPath.lastIndexOf('/')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider using String separator =System.getProperty("path.separator");
|
||
// calculated size includes the data size field & and padding byte if present | ||
// We want only the data size of the child array | ||
dataSetSize -= 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's create a final static int paddingsize = 4
|
||
this.buffer.clear(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor, this might need new line at the end of the file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpmcmu looks good.
@jpmcmu Please squash |
…me to THOR - Created HpccFileWriter class that allows RDDs to be written to HPCC Signed-off-by: James McMullan <James.McMullan@lexisnexis.com>
@richardkchapman squashed & ready for merge |
@rpastrana please review