Skip to content
This repository has been archived by the owner on May 16, 2022. It is now read-only.

Extract modules: Configuration, Statistics, Link Cache, Storage #49

Closed

Conversation

StormyDragon
Copy link

Hi. I apologize if this seems sudden and out of the blue, or like a lot of work. I was hoping that I might offer my services in a bit of an uplift.

In preparation for containerization a little bit of cleanup is required.

  • Configuration module is extracted which allows previous configuration but also adds the possibility to configure via environment variables.
  • Statistics module is extracted to separate the non-functional JSON variant from the MongoDB which is current.
  • Link Cache module is extracted providing the previous JSON behavior, as well as MongoDB, configurable by configuration.
  • Storage Module is extracted to handle the current media folder for downloading. In addition the storage location is now configurable.

Next steps:

  • Use a Python packaging tool for Virtual Environments. I suggest Poetry
  • Creating a Dockerfile and emulating existing deployment via a docker-compose file.
  • Implementation of alternative cache, storage, statistics modules.
  • Creating a deployment script which allows serverless deployment using the docker container.
    • I've focused on the area I know best, which is the Google Cloud Platform
    • Cloud Run handles the service.
    • Firestore as database.
    • Cloud Storage as media store. (with possibility for automatic expiry)

(jk. I've got all of the above ready.)

Future:

  • This should scale up quite significantly without too much overhead, it might even work just fine under googles free tier.
  • There is potential for hangups during media downloads, one proposal is to use Task Queue to run a download-to-storage module while letting the request either complete or await completion.

@daisyUniverse
Copy link
Owner

This is an insane amount of work, and it's all very impressive, I'm sorry I only just now saw it ( I have been taking a break from TwitFix and working on it for a while )

This would require a huge amount of testing and poking and proding before I could start to consider merging this as it makes a huge number of changes, and as I'm taking a break, I can't promise that process will start any time soon

I really appreciate the amount of work that you have put into it, and I would actually suggest for now forking this off into it's own project, as there is a LOT to go over with this

@StormyDragon
Copy link
Author

Hey, no worries. I understand that it might seem daunting and I'm not going to rush a review. You fix yourself first, that is the most important thing, I will express my disappointment in not being equipped to make pull requests against that codebase however.


I'll keep the rest of the changes on my fork for now, maybe I'll extract the poetry one to a separate pull request since that is easy enough.

If it be of use I can write some testing code too, as well as the Github action that will verify that stuff still works.

This pull request

I realize that even this first pull does a lot of things and if it would make things easier I can extract each individual part into a separate pull. My decisions here were based on an eventual future that each system; links, file cache, statistics. can be handled by a different external system and as such untangling these from the main source gives rise to the separate modules.

The intent with this first pull was to find the mechanisms in the code where a choice had been made whether to use the mongo database or a JSON file for storage of links and stats.

if link_cache_system == "db":
    ... # mongo stuff
elif link_cache_system == "json":
    ... # json stuff

Link Cache

First to get pulled config.link_cache

Stat Cache

I notice that statistics were intertwined and separated these calls also.
The variable config.link_cache is reused because stat calls previously just used the mongo
client.

Storage Module

I also notice the download cache for files and decided to separate this as well.
config.storage_module and config.download_base when the former is local

Configuration

I got a little carried away and pulled configuration out into a module also, this should really have been its own separate pull. It behaves as previously, opening the JSON config or writing a default, unless the environment contains TWITFIX_CONFIG_FROM=environment in which case it will read the entire configuration from the environment and ignore the config file.

@StormyDragon StormyDragon deleted the feature/feature-cleanup branch May 20, 2022 20:50
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants