New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a Class To Pass Around Repository Host Data #3338
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern is a nice improvement, had one idea about the implementation
class RepositoryHost::RepositoryHostData | ||
attr_reader :archived, :default_branch, :description, :fork, :full_name, :has_issues, :has_wiki, :homepage, :host_type, | ||
:keywords, :language, :license, :name, :owner, :parent, :is_private, :repository_uuid, :scm, :size | ||
|
||
def initialize( | ||
archived:, | ||
default_branch:, | ||
description:, | ||
fork:, | ||
full_name:, | ||
has_issues:, | ||
has_wiki:, | ||
homepage:, | ||
host_type:, | ||
keywords:, | ||
language:, | ||
license:, | ||
name:, | ||
owner:, | ||
parent:, | ||
is_private:, | ||
repository_uuid:, | ||
scm:, | ||
size:, | ||
**kwargs | ||
) | ||
@archived = archived | ||
@default_branch = default_branch | ||
@description = description | ||
@fork = fork | ||
@full_name = full_name | ||
@has_issues = has_issues | ||
@has_wiki = has_wiki | ||
@homepage = homepage | ||
@host_type = host_type | ||
@keywords = keywords | ||
@language = language | ||
@license = license | ||
@name = name | ||
@owner = owner | ||
@parent = parent | ||
@private = is_private | ||
@repository_uuid = repository_uuid | ||
@scm = scm | ||
@size = size | ||
|
||
raise "Unexpected arguments sent: #{kwargs.keys.join(', ')}" unless kwargs.keys.empty? | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class RepositoryHost::RepositoryHostData | |
attr_reader :archived, :default_branch, :description, :fork, :full_name, :has_issues, :has_wiki, :homepage, :host_type, | |
:keywords, :language, :license, :name, :owner, :parent, :is_private, :repository_uuid, :scm, :size | |
def initialize( | |
archived:, | |
default_branch:, | |
description:, | |
fork:, | |
full_name:, | |
has_issues:, | |
has_wiki:, | |
homepage:, | |
host_type:, | |
keywords:, | |
language:, | |
license:, | |
name:, | |
owner:, | |
parent:, | |
is_private:, | |
repository_uuid:, | |
scm:, | |
size:, | |
**kwargs | |
) | |
@archived = archived | |
@default_branch = default_branch | |
@description = description | |
@fork = fork | |
@full_name = full_name | |
@has_issues = has_issues | |
@has_wiki = has_wiki | |
@homepage = homepage | |
@host_type = host_type | |
@keywords = keywords | |
@language = language | |
@license = license | |
@name = name | |
@owner = owner | |
@parent = parent | |
@private = is_private | |
@repository_uuid = repository_uuid | |
@scm = scm | |
@size = size | |
raise "Unexpected arguments sent: #{kwargs.keys.join(', ')}" unless kwargs.keys.empty? | |
end | |
RepositoryHost::RepositoryHostData = Struct.new( | |
:archived, :default_branch, :description, :fork, :full_name, :has_issues, :has_wiki, :homepage, :host_type, | |
:keywords, :language, :license, :name, :owner, :parent, :is_private, :repository_uuid, :scm, :size, | |
keyword_init: true | |
) do |
This could be DRYed up by making RepositoryHostData
a Struct
with keyword_init: true
.
That approach would mean you couldn't use the **kwargs
parameter, but I'm not sure you need it. Could the factories pass the keyword arguments directly instead of creating a hash that is later splatted out? This also has the benefit of making key mismatch errors for callers catchable via static analysis, whereas this approach defers that error to runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure what direction this class was going to end up going but I think a Struct
makes sense here, actually a Data
would be better but we are stuck on an older Ruby version for now. The splat was me being lazy and copying code around, but I can switch it over to keywords directly now that I'm more confident on what keys are expected to exist. I'll make these changes today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎱 I'm wondering if it makes sense to have the callers just pass in the raw data hash, and then have this object (or maybe a controller/interaction) figure out how to convert all the inputs into the appropriate data models to be persisted, vs having a bunch of factories/temp objects floating around. Mostly thinking that the factory classes aren't really doing a ton, or represent objects that will be reused a lot.
I'm thinking something like:
RawUpstreamDataToDatabaseModelConverter
def convert_gitlab_repository_raw_data(data_hash)
repo_hash = {
repository_uuid: data_hash.api_project.id,
host_type: "GitLab",
fork: data_hash.api_project.try(:forked_from_project).present?,
scm: "git",
is_private: api_project.visibility != "public",
}
create_the_thing(repo_hash)
end
def convert_github_repository_raw_data(data_hash);
def create_the_thing(data_hash_to_be_persisted)
Repository.create!(repo_hash) # replace the Repository.create_from_hash
end
end
# some caller that gets the api data
github_api_data = Curl.get("https://www.github.com/foo/bar")
foo = RawUpstreamDataToDatabaseModelConverter.convert_gitlab_repository_raw_data(github_api_data)
# you can play around with how much of that data should go into some sort of initializer in the converter, the caller, or the converter method
return foo
I think this has the advantage of keeping all the conversion logic in one place (easier to DRY up), no need to have intermediary temporary/facade objects, and a clean & consistent interface/pattern for the upstream to use (e.g. with error handling on bad input data). The single responsibility here could be something like "takes in garbage upstream stuff and turns it into our own objects."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the single-factory idea from @jhan217 , additionally because it makes it easy to see the differences among all the sources in one place too.
2c35f1e
to
71337ed
Compare
class RepositoryHost::RepositoryHostData | ||
attr_reader :archived, :default_branch, :description, :fork, :full_name, :has_issues, :has_wiki, :homepage, :host_type, | ||
:keywords, :language, :license, :name, :owner, :parent, :is_private, :repository_uuid, :scm, :size | ||
|
||
def initialize( | ||
archived:, | ||
default_branch:, | ||
description:, | ||
fork:, | ||
full_name:, | ||
has_issues:, | ||
has_wiki:, | ||
homepage:, | ||
host_type:, | ||
keywords:, | ||
language:, | ||
license:, | ||
name:, | ||
owner:, | ||
parent:, | ||
is_private:, | ||
repository_uuid:, | ||
scm:, | ||
size:, | ||
**kwargs | ||
) | ||
@archived = archived | ||
@default_branch = default_branch | ||
@description = description | ||
@fork = fork | ||
@full_name = full_name | ||
@has_issues = has_issues | ||
@has_wiki = has_wiki | ||
@homepage = homepage | ||
@host_type = host_type | ||
@keywords = keywords | ||
@language = language | ||
@license = license | ||
@name = name | ||
@owner = owner | ||
@parent = parent | ||
@private = is_private | ||
@repository_uuid = repository_uuid | ||
@scm = scm | ||
@size = size | ||
|
||
raise "Unexpected arguments sent: #{kwargs.keys.join(', ')}" unless kwargs.keys.empty? | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎱 I'm wondering if it makes sense to have the callers just pass in the raw data hash, and then have this object (or maybe a controller/interaction) figure out how to convert all the inputs into the appropriate data models to be persisted, vs having a bunch of factories/temp objects floating around. Mostly thinking that the factory classes aren't really doing a ton, or represent objects that will be reused a lot.
I'm thinking something like:
RawUpstreamDataToDatabaseModelConverter
def convert_gitlab_repository_raw_data(data_hash)
repo_hash = {
repository_uuid: data_hash.api_project.id,
host_type: "GitLab",
fork: data_hash.api_project.try(:forked_from_project).present?,
scm: "git",
is_private: api_project.visibility != "public",
}
create_the_thing(repo_hash)
end
def convert_github_repository_raw_data(data_hash);
def create_the_thing(data_hash_to_be_persisted)
Repository.create!(repo_hash) # replace the Repository.create_from_hash
end
end
# some caller that gets the api data
github_api_data = Curl.get("https://www.github.com/foo/bar")
foo = RawUpstreamDataToDatabaseModelConverter.convert_gitlab_repository_raw_data(github_api_data)
# you can play around with how much of that data should go into some sort of initializer in the converter, the caller, or the converter method
return foo
I think this has the advantage of keeping all the conversion logic in one place (easier to DRY up), no need to have intermediary temporary/facade objects, and a clean & consistent interface/pattern for the upstream to use (e.g. with error handling on bad input data). The single responsibility here could be something like "takes in garbage upstream stuff and turns it into our own objects."
class RepositoryHost::RepositoryHostData | ||
attr_reader :archived, :default_branch, :description, :fork, :full_name, :has_issues, :has_wiki, :homepage, :host_type, | ||
:keywords, :language, :license, :name, :owner, :parent, :is_private, :repository_uuid, :scm, :size | ||
|
||
def initialize( | ||
archived:, | ||
default_branch:, | ||
description:, | ||
fork:, | ||
full_name:, | ||
has_issues:, | ||
has_wiki:, | ||
homepage:, | ||
host_type:, | ||
keywords:, | ||
language:, | ||
license:, | ||
name:, | ||
owner:, | ||
parent:, | ||
is_private:, | ||
repository_uuid:, | ||
scm:, | ||
size:, | ||
**kwargs | ||
) | ||
@archived = archived | ||
@default_branch = default_branch | ||
@description = description | ||
@fork = fork | ||
@full_name = full_name | ||
@has_issues = has_issues | ||
@has_wiki = has_wiki | ||
@homepage = homepage | ||
@host_type = host_type | ||
@keywords = keywords | ||
@language = language | ||
@license = license | ||
@name = name | ||
@owner = owner | ||
@parent = parent | ||
@private = is_private | ||
@repository_uuid = repository_uuid | ||
@scm = scm | ||
@size = size | ||
|
||
raise "Unexpected arguments sent: #{kwargs.keys.join(', ')}" unless kwargs.keys.empty? | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the single-factory idea from @jhan217 , additionally because it makes it easy to see the differences among all the sources in one place too.
|
it "can fetch repository data" do | ||
VCR.use_cassette("gitlab/ase") do | ||
repository_data = described_class.fetch_repo(repository.id_or_name, api_token) | ||
expect(repository_data).not_to be_nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these spex do more? e.g. expect the repository_data object to have successfully extracted specific elements / keys with the correct values, and that it's not just getting all the keys and populating them with nils or data from the wrong fields?
# frozen_string_literal: true | ||
|
||
class RepositoryHost::RawUpstreamDataConverter | ||
def self.convert_from_github_api(upstream_repository_data_hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the error handling get standardized in here too? Saw a few rescue *self::IGNORABLE_EXCEPTIONS nil
in the code.
Alternatively, now that this is written out, it looks like there's some pretty tight coupling w/ the api return objects, so the convert_ methods could be moved to their respective methods in the repository_host/.rb files, as there doesn't seem to be a ton that's similar between the convert methods besides the object that's tying everything together, unless stuff like error handling is getting standardized. I was thinking there'd be more similarities between the objects being converted that there could be more extracted out and shared during the conversion process, like defaulting values/data validations/logging/etc.
This PR changes the RepositoryHost classes to map their upstream data to a defined Class and pass that Class around instead of unstructured Hash objects. This will help clear up what data is available in the different spots where we are sending upstream repository data around the app. I would like to follow up on this PR with some more changes and add some more util classes that would be responsible for assigning Repository column data to reduce the number of spots where that is happening right now and also split out some of the logic around managing duplicate Repositories that currently exists in the RepositoryHost and Repository classes.