Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Basic implementation of backup media store #2538

Merged
merged 29 commits into from Oct 13, 2017

Conversation

Projects
None yet
3 participants
Owner

erikjohnston commented Oct 12, 2017

No description provided.

erikjohnston added some commits Oct 12, 2017

Is the intention for this to have some kind of repo failover?

Owner

erikjohnston commented Oct 12, 2017

Yup, just having all files backed up somewhere else so that it can be manually failed over if necessary

Would it be possible to have that be automatic in a way, printing a plethora of warnings? Probably doesn't need to be all that complex: try reading from primary, if that fails, try the backup, continue on with whatever result.

erikjohnston added some commits Oct 12, 2017

Owner

erikjohnston commented Oct 12, 2017

@turt2live its possible, though I'm afraid its not something I'll be doing now, though it'd be neat

file_id[0:2], file_id[2:4], file_id[4:],
file_name
)
+ remote_media_thumbnail = _wrap_in_base_path(remote_media_thumbnail_rel)
+
def remote_media_thumbnail_dir(self, server_name, file_id):
@richvdh

richvdh Oct 12, 2017

Member

should this get a _rel for consistency?

@erikjohnston

erikjohnston Oct 13, 2017

Owner

This actually returns absolute paths (now..)

media_id[0:2], media_id[2:4], media_id[4:],
)
+ url_cache_filepath = _wrap_in_base_path(url_cache_filepath_rel)
+
def url_cache_filepath_dirs_to_delete(self, media_id):
@richvdh

richvdh Oct 12, 2017

Member

_rel? or at least clarify it in the comments?

@erikjohnston

erikjohnston Oct 13, 2017

Owner

This actually returns absolute paths (now..)

media_id[0:2], media_id[2:4], media_id[4:],
file_name
)
+ url_cache_thumbnail = _wrap_in_base_path(url_cache_thumbnail_rel)
+
def url_cache_thumbnail_directory(self, media_id):
@erikjohnston

erikjohnston Oct 13, 2017

Owner

This actually returns absolute paths (now..)

+ self.filepaths = MediaFilePaths(self.primary_base_path)
+
+ self.backup_base_path = None
+ if hs.config.backup_media_store_path:
@richvdh

richvdh Oct 12, 2017

Member

I'm missing why this is needed?

@@ -87,18 +96,77 @@ def _makedirs(filepath):
if not os.path.exists(dirname):
os.makedirs(dirname)
+ @staticmethod
+ def _write_file_synchronously(source, fname, close_source=False):
@richvdh

richvdh Oct 12, 2017

Member

please doc the arg types

+ shutil.copyfileobj(source, f)
+
+ if close_source:
+ source.close()
@richvdh

richvdh Oct 12, 2017

Member

this needs to be in a finally block imho

@richvdh

richvdh Oct 12, 2017

Member

Though I have to say that having the close happen in the same function as the open generally seems less confusing to me

@erikjohnston

erikjohnston Oct 13, 2017

Owner

Yes, it is very sad, but I don't know how else to do it since the writing to the backup can happen asynchronously, and so may last longer than the context which calls it.

@erikjohnston

erikjohnston Oct 13, 2017

Owner

(The other option is to rely on the GC closing the file, but that doesn't sound great either)

@erikjohnston

erikjohnston Oct 13, 2017

Owner

I guess we could copy from the file we wrote into the primary media store, but thats a bit sad making in terms of having to reload that file back into memory.

+ source.close()
+
+ @defer.inlineCallbacks
+ def write_to_file(self, source, path):
@richvdh

richvdh Oct 12, 2017

Member

can you call this something like write_to_file_and_backup? I kept confusing myself over whether it did the backup or not.

+
+ Args:
+ source: A file like object that should be written
+ path: Relative path to write file to
@richvdh

richvdh Oct 12, 2017

Member

path (str): ...

+ path: Relative path to write file to
+
+ Returns:
+ string: the file path written to in the primary media store
@richvdh

richvdh Oct 12, 2017

Member

Deferred[str]

+ string: the file path written to in the primary media store
+ """
+ fname = os.path.join(self.primary_base_path, path)
+ self._makedirs(fname)
@richvdh

richvdh Oct 12, 2017

Member

shouldn't this be done in _write_file_synchronously in case it blocks?

+ self._makedirs(fname)
+
+ # Write to the main repository
+ yield preserve_context_over_fn(
@richvdh

richvdh Oct 12, 2017

Member

preserve_context_over_fn is broken. I think you want make_deferred_yieldable.

Can I suggest that you write a write_file_asynchronously which factors out this and the other instance below?

+ def copy_to_backup(self, source, path):
+ """Copy file like object source to the backup media store, if configured.
+
+ Will close source after its done.
@richvdh

richvdh Oct 12, 2017

Member

doc the args please

+ # We can either wait for successful writing to the backup repository
+ # or write in the background and immediately return
+ if self.synchronous_backup_media_store:
+ yield preserve_context_over_fn(
@richvdh

richvdh Oct 12, 2017

Member

broken, as above

+ )
+ else:
+ source.close()
+
@defer.inlineCallbacks
def create_content(self, media_type, upload_name, content, content_length,
@richvdh

richvdh Oct 12, 2017

Member

if we're changing the type of content, let's take the opportunity to doc it.

Especially if we're going to close the stream (which, as above, I am unconvinced about).

+
+ t_len = os.path.getsize(output_path)
+
+ yield self.store.store_local_thumbnail_rel(
@richvdh

richvdh Oct 12, 2017

Member

_rel looks incorrect here, and makes me worry that this hasn't been well tested.

- if r_method == "scale":
- t_width, t_height = thumbnailer.aspect(r_width, r_height)
- scales.add((
- min(m_width, t_width), min(m_height, t_height), r_type,
@richvdh

richvdh Oct 12, 2017

Member

isn't the deduplication here important?

- t_len = thumbnailer.crop(t_path, t_width, t_height, t_type)
- local_thumbnails.append((
- media_id, t_width, t_height, t_type, t_method, t_len
+ r_width, r_height, r_method, r_type, t_byte_source
@richvdh

richvdh Oct 12, 2017

Member

Isn't this going to result in us having all the thumbnails in memory at once? can't we do the write_to_file here (and for that matter, the store_local_thumbnail)?

- if r_method == "scale":
- t_width, t_height = thumbnailer.aspect(r_width, r_height)
- scales.add((
- min(m_width, t_width), min(m_height, t_height), r_type,
@richvdh

richvdh Oct 12, 2017

Member

again, dedup?

- ])
+
+ remote_thumbnails.append((
+ r_width, r_height, r_method, r_type, t_byte_source
@richvdh

richvdh Oct 12, 2017

Member

again, memory usage

erikjohnston added some commits Oct 13, 2017

@erikjohnston erikjohnston assigned erikjohnston and unassigned richvdh Oct 13, 2017

"""Write `source` to the on disk media store, and also the backup store
if configured.
Will close source once finished.
Args:
source: A file like object that should be written
- path: Relative path to write file to
+ path(str): Relative path to write file to
@richvdh

richvdh Oct 13, 2017

Member

for future ref, I prefer a space between path and (str), but no biggie

if t_byte_source:
- output_path = yield self.write_to_file(
+ t_width, t_height = t_byte_source.dimensions
@richvdh

richvdh Oct 13, 2017

Member

I'm confused.

  • t_byte_source is the result from _generate_thumbnail
  • _generate_thumbnail doesn't document its return type (grr) but apparently it's the result from thumbnailer.crop or thumbnailer.scale
  • crop and scale return an ordinary BytesIO (at least according to their docs)
  • BytesIO doesn't have a dimensions attribute

either something's lying, or you (still) haven't tested this. Or I'm being a total crank.

@richvdh

richvdh Oct 13, 2017

Member

why is this change happening, anyway?

@erikjohnston

erikjohnston Oct 13, 2017

Owner

Yes, this was a spurious change, sowwy.

In other news, I've been looking at getting sytest to test both dynamic and static config, and it turns out that this dynamic code path doesn't work at all on master....

I think we should look into fixing/removing this functionality, but that should probably wait for a separate PR.

@erikjohnston

erikjohnston Oct 13, 2017

Owner

In other news, I've been looking at getting sytest to test both dynamic and static config, and it turns out that this dynamic code path doesn't work at all on master....

After massaging sytest a bit I've exercised that code path in a way that returns 200, so it seems no more broken than before, at least.

if t_byte_source:
- output_path = yield self.write_to_file(
+ t_width, t_height = t_byte_source.dimensions
+ t_width, t_height = thumbnailer.aspect(r_width, r_height)
+ t_width = min(m_width, t_width)
+ t_height = min(m_height, t_height)
+ thumbnails[(t_width, t_height)] = (r_method, r_type)
@richvdh

richvdh Oct 13, 2017

Member

shouldn't r_type be in the key rather than the value?

erikjohnston added some commits Oct 13, 2017

+ Args:
+ source: A file like object to be written
+ fname (str): Path to write to
+ close_source (bool): Whether to close source after writing
+ ))
+
+ if t_byte_source:
+ output_path = yield self.write_to_file_and_backup(
@richvdh

richvdh Oct 13, 2017

Member

looks like t_byte_source isn't being closed any more. Needs more with or finally.

+ ))
+
+ if t_byte_source:
+ output_path = yield self.write_to_file_and_backup(

erikjohnston added some commits Oct 13, 2017

lgtm

@erikjohnston erikjohnston merged commit db3d84f into develop Oct 13, 2017

6 of 8 checks passed

Sytest Dendron (Commit) Build #2790 origin/erikj/media_backup failed in 3 min 58 sec
Details
Sytest Dendron (Merged PR) Build finished.
Details
Sytest Postgres (Commit) Build #3626 origin/erikj/media_backup succeeded in 3 min 23 sec
Details
Sytest Postgres (Merged PR) Build finished.
Details
Sytest SQLite (Commit) Build #3727 origin/erikj/media_backup succeeded in 3 min 16 sec
Details
Sytest SQLite (Merged PR) Build finished.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@erikjohnston erikjohnston deleted the erikj/media_backup branch Oct 26, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment