Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a Class To Pass Around Repository Host Data #3338

Merged
merged 12 commits into from Mar 27, 2024

Conversation

mikeyoung85
Copy link
Contributor

This PR changes the RepositoryHost classes to map their upstream data to a defined Class and pass that Class around instead of unstructured Hash objects. This will help clear up what data is available in the different spots where we are sending upstream repository data around the app. I would like to follow up on this PR with some more changes and add some more util classes that would be responsible for assigning Repository column data to reduce the number of spots where that is happening right now and also split out some of the logic around managing duplicate Repositories that currently exists in the RepositoryHost and Repository classes.

Copy link
Contributor

@mpace965 mpace965 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern is a nice improvement, had one idea about the implementation

Comment on lines 8 to 55
class RepositoryHost::RepositoryHostData
attr_reader :archived, :default_branch, :description, :fork, :full_name, :has_issues, :has_wiki, :homepage, :host_type,
:keywords, :language, :license, :name, :owner, :parent, :is_private, :repository_uuid, :scm, :size

def initialize(
archived:,
default_branch:,
description:,
fork:,
full_name:,
has_issues:,
has_wiki:,
homepage:,
host_type:,
keywords:,
language:,
license:,
name:,
owner:,
parent:,
is_private:,
repository_uuid:,
scm:,
size:,
**kwargs
)
@archived = archived
@default_branch = default_branch
@description = description
@fork = fork
@full_name = full_name
@has_issues = has_issues
@has_wiki = has_wiki
@homepage = homepage
@host_type = host_type
@keywords = keywords
@language = language
@license = license
@name = name
@owner = owner
@parent = parent
@private = is_private
@repository_uuid = repository_uuid
@scm = scm
@size = size

raise "Unexpected arguments sent: #{kwargs.keys.join(', ')}" unless kwargs.keys.empty?
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class RepositoryHost::RepositoryHostData
attr_reader :archived, :default_branch, :description, :fork, :full_name, :has_issues, :has_wiki, :homepage, :host_type,
:keywords, :language, :license, :name, :owner, :parent, :is_private, :repository_uuid, :scm, :size
def initialize(
archived:,
default_branch:,
description:,
fork:,
full_name:,
has_issues:,
has_wiki:,
homepage:,
host_type:,
keywords:,
language:,
license:,
name:,
owner:,
parent:,
is_private:,
repository_uuid:,
scm:,
size:,
**kwargs
)
@archived = archived
@default_branch = default_branch
@description = description
@fork = fork
@full_name = full_name
@has_issues = has_issues
@has_wiki = has_wiki
@homepage = homepage
@host_type = host_type
@keywords = keywords
@language = language
@license = license
@name = name
@owner = owner
@parent = parent
@private = is_private
@repository_uuid = repository_uuid
@scm = scm
@size = size
raise "Unexpected arguments sent: #{kwargs.keys.join(', ')}" unless kwargs.keys.empty?
end
RepositoryHost::RepositoryHostData = Struct.new(
:archived, :default_branch, :description, :fork, :full_name, :has_issues, :has_wiki, :homepage, :host_type,
:keywords, :language, :license, :name, :owner, :parent, :is_private, :repository_uuid, :scm, :size,
keyword_init: true
) do

This could be DRYed up by making RepositoryHostData a Struct with keyword_init: true.

That approach would mean you couldn't use the **kwargs parameter, but I'm not sure you need it. Could the factories pass the keyword arguments directly instead of creating a hash that is later splatted out? This also has the benefit of making key mismatch errors for callers catchable via static analysis, whereas this approach defers that error to runtime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure what direction this class was going to end up going but I think a Struct makes sense here, actually a Data would be better but we are stuck on an older Ruby version for now. The splat was me being lazy and copying code around, but I can switch it over to keywords directly now that I'm more confident on what keys are expected to exist. I'll make these changes today.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎱 I'm wondering if it makes sense to have the callers just pass in the raw data hash, and then have this object (or maybe a controller/interaction) figure out how to convert all the inputs into the appropriate data models to be persisted, vs having a bunch of factories/temp objects floating around. Mostly thinking that the factory classes aren't really doing a ton, or represent objects that will be reused a lot.

I'm thinking something like:

RawUpstreamDataToDatabaseModelConverter

def convert_gitlab_repository_raw_data(data_hash)
  repo_hash = {
      repository_uuid: data_hash.api_project.id,
      host_type: "GitLab",
      fork: data_hash.api_project.try(:forked_from_project).present?,
      scm: "git",
      is_private: api_project.visibility != "public",
    }

   create_the_thing(repo_hash)
end

def convert_github_repository_raw_data(data_hash);

def create_the_thing(data_hash_to_be_persisted)
      Repository.create!(repo_hash) # replace the Repository.create_from_hash
end

end

# some caller that gets the api data
github_api_data = Curl.get("https://www.github.com/foo/bar")

foo = RawUpstreamDataToDatabaseModelConverter.convert_gitlab_repository_raw_data(github_api_data) 
# you can play around with how much of that data should go into some sort of initializer in the converter, the caller, or the converter method

return foo

I think this has the advantage of keeping all the conversion logic in one place (easier to DRY up), no need to have intermediary temporary/facade objects, and a clean & consistent interface/pattern for the upstream to use (e.g. with error handling on bad input data). The single responsibility here could be something like "takes in garbage upstream stuff and turns it into our own objects."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the single-factory idea from @jhan217 , additionally because it makes it easy to see the differences among all the sources in one place too.

app/models/repository_host/repository_host_data.rb Outdated Show resolved Hide resolved
@mikeyoung85 mikeyoung85 force-pushed the my/use-class-for-repo-host-data branch from 2c35f1e to 71337ed Compare March 27, 2024 13:57
app/models/repository_host/base.rb Outdated Show resolved Hide resolved
app/models/repository_host/repository_host_data.rb Outdated Show resolved Hide resolved
Comment on lines 8 to 55
class RepositoryHost::RepositoryHostData
attr_reader :archived, :default_branch, :description, :fork, :full_name, :has_issues, :has_wiki, :homepage, :host_type,
:keywords, :language, :license, :name, :owner, :parent, :is_private, :repository_uuid, :scm, :size

def initialize(
archived:,
default_branch:,
description:,
fork:,
full_name:,
has_issues:,
has_wiki:,
homepage:,
host_type:,
keywords:,
language:,
license:,
name:,
owner:,
parent:,
is_private:,
repository_uuid:,
scm:,
size:,
**kwargs
)
@archived = archived
@default_branch = default_branch
@description = description
@fork = fork
@full_name = full_name
@has_issues = has_issues
@has_wiki = has_wiki
@homepage = homepage
@host_type = host_type
@keywords = keywords
@language = language
@license = license
@name = name
@owner = owner
@parent = parent
@private = is_private
@repository_uuid = repository_uuid
@scm = scm
@size = size

raise "Unexpected arguments sent: #{kwargs.keys.join(', ')}" unless kwargs.keys.empty?
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎱 I'm wondering if it makes sense to have the callers just pass in the raw data hash, and then have this object (or maybe a controller/interaction) figure out how to convert all the inputs into the appropriate data models to be persisted, vs having a bunch of factories/temp objects floating around. Mostly thinking that the factory classes aren't really doing a ton, or represent objects that will be reused a lot.

I'm thinking something like:

RawUpstreamDataToDatabaseModelConverter

def convert_gitlab_repository_raw_data(data_hash)
  repo_hash = {
      repository_uuid: data_hash.api_project.id,
      host_type: "GitLab",
      fork: data_hash.api_project.try(:forked_from_project).present?,
      scm: "git",
      is_private: api_project.visibility != "public",
    }

   create_the_thing(repo_hash)
end

def convert_github_repository_raw_data(data_hash);

def create_the_thing(data_hash_to_be_persisted)
      Repository.create!(repo_hash) # replace the Repository.create_from_hash
end

end

# some caller that gets the api data
github_api_data = Curl.get("https://www.github.com/foo/bar")

foo = RawUpstreamDataToDatabaseModelConverter.convert_gitlab_repository_raw_data(github_api_data) 
# you can play around with how much of that data should go into some sort of initializer in the converter, the caller, or the converter method

return foo

I think this has the advantage of keeping all the conversion logic in one place (easier to DRY up), no need to have intermediary temporary/facade objects, and a clean & consistent interface/pattern for the upstream to use (e.g. with error handling on bad input data). The single responsibility here could be something like "takes in garbage upstream stuff and turns it into our own objects."

app/models/repository.rb Outdated Show resolved Hide resolved
Comment on lines 8 to 55
class RepositoryHost::RepositoryHostData
attr_reader :archived, :default_branch, :description, :fork, :full_name, :has_issues, :has_wiki, :homepage, :host_type,
:keywords, :language, :license, :name, :owner, :parent, :is_private, :repository_uuid, :scm, :size

def initialize(
archived:,
default_branch:,
description:,
fork:,
full_name:,
has_issues:,
has_wiki:,
homepage:,
host_type:,
keywords:,
language:,
license:,
name:,
owner:,
parent:,
is_private:,
repository_uuid:,
scm:,
size:,
**kwargs
)
@archived = archived
@default_branch = default_branch
@description = description
@fork = fork
@full_name = full_name
@has_issues = has_issues
@has_wiki = has_wiki
@homepage = homepage
@host_type = host_type
@keywords = keywords
@language = language
@license = license
@name = name
@owner = owner
@parent = parent
@private = is_private
@repository_uuid = repository_uuid
@scm = scm
@size = size

raise "Unexpected arguments sent: #{kwargs.keys.join(', ')}" unless kwargs.keys.empty?
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the single-factory idea from @jhan217 , additionally because it makes it easy to see the differences among all the sources in one place too.

@mikeyoung85
Copy link
Contributor Author

mikeyoung85 commented Mar 27, 2024

  • changed from class to Struct with some appended method
  • updated name to RepositoryHost::RawUpstreamData
  • moved the different "factories" into a single converter
  • updated method name to create_from_data and updated some callers I didn't see before
  • added convenience method for full_name.downcase

it "can fetch repository data" do
VCR.use_cassette("gitlab/ase") do
repository_data = described_class.fetch_repo(repository.id_or_name, api_token)
expect(repository_data).not_to be_nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these spex do more? e.g. expect the repository_data object to have successfully extracted specific elements / keys with the correct values, and that it's not just getting all the keys and populating them with nils or data from the wrong fields?

app/models/concerns/github_identity.rb Outdated Show resolved Hide resolved
app/models/concerns/github_identity.rb Outdated Show resolved Hide resolved
# frozen_string_literal: true

class RepositoryHost::RawUpstreamDataConverter
def self.convert_from_github_api(upstream_repository_data_hash)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the error handling get standardized in here too? Saw a few rescue *self::IGNORABLE_EXCEPTIONS nil in the code.

Alternatively, now that this is written out, it looks like there's some pretty tight coupling w/ the api return objects, so the convert_ methods could be moved to their respective methods in the repository_host/.rb files, as there doesn't seem to be a ton that's similar between the convert methods besides the object that's tying everything together, unless stuff like error handling is getting standardized. I was thinking there'd be more similarities between the objects being converted that there could be more extracted out and shared during the conversion process, like defaulting values/data validations/logging/etc.

@mikeyoung85 mikeyoung85 merged commit 030365d into main Mar 27, 2024
3 checks passed
@mikeyoung85 mikeyoung85 deleted the my/use-class-for-repo-host-data branch March 27, 2024 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants