New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark sentry, revisted #1400
Spark sentry, revisted #1400
Conversation
…makes the code a little more future proof with some anticpated BU improvements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good to me, just found a few places where we didn't roll back the method name properly. I'll make these changes myself
listenbrainz_spark/ftp/download.py
Outdated
|
||
mapping_file_name = self.get_latest_mapping(mapping) | ||
|
||
t0 = time.monotonic() | ||
current_app.logger.info('Downloading {} from FTP...'.format(mapping_file_name)) | ||
logging.info('Downloading {} from FTP...'.format(mapping_file_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log instances in this file haven't been changed from logging -> logger
listenbrainz_spark/ftp/download.py
Outdated
@@ -145,9 +147,9 @@ def download_listens(self, directory, listens_dump_id=None, dump_type=FULL): | |||
listens_file_name (str): name of downloaded listens dump. | |||
dump_id (int): Unique indentifier of downloaded listens dump. | |||
""" | |||
ftp_cwd = current_app.config['FTP_LISTENS_DIR'] + 'fullexport/' | |||
ftp_cwd = config.FTP_LISTENS_DIR + 'fullexport/' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should use os.path.join to prevent issues with forgetting to add a / in the config file
current_app.logger.info('Some ratings are less than -1 \nMin rating: {}'.format(min_rating)) | ||
logger.info('Some ratings are less than -1 \nMin rating: {}'.format(min_rating)) | ||
|
||
return (max_rating > 1.0, min_rating < -1.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add to docstring the return type
@@ -93,15 +91,15 @@ def push_to_result_queue(self, messages): | |||
avg_size_of_message //= num_of_messages | |||
except ZeroDivisionError: | |||
avg_size_of_message = 0 | |||
current_app.logger.warn("No messages calculated", exc_info=True) | |||
logging.warn("No messages calculated", exc_info=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warn is deprecated, replace with warning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this uses logging, replace with logger
@@ -14,5 +14,7 @@ time ./run.sh /usr/local/spark/bin/spark-submit \ | |||
--conf "spark.driver.memoryOverhead"=$DRIVER_MEMORY_OVERHEAD \ | |||
--conf "spark.executor.memoryOverhead"=$EXECUTOR_MEMORY_OVERHEAD \ | |||
--conf "spark.driver.maxResultSize"=$DRIVER_MAX_RESULT_SIZE \ | |||
--conf "spark.python.use.daemon"=true \ | |||
--conf "spark.python.daemon.module"=sentry_daemon \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation in this file is all over the place, we can fix it up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me!
This PR replaces #1289 by adding an intermediate logging object that should make future BU improvements easier to implement. This PR also fixes merge conflicts.