New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Use a Class To Pass Around Repository Host Data #3338

Merged

mikeyoung85 merged 12 commits into main from my/use-class-for-repo-host-data

Mar 27, 2024

Contributor

mikeyoung85 commented Mar 25, 2024

This PR changes the RepositoryHost classes to map their upstream data to a defined Class and pass that Class around instead of unstructured Hash objects. This will help clear up what data is available in the different spots where we are sending upstream repository data around the app. I would like to follow up on this PR with some more changes and add some more util classes that would be responsible for assigning Repository column data to reduce the number of spots where that is happening right now and also split out some of the logic around managing duplicate Repositories that currently exists in the RepositoryHost and Repository classes.

mikeyoung85 requested review from tiegz, mpace965, jhan217 and johnbintz-tidelift

March 25, 2024 15:08

mpace965 reviewed

View reviewed changes

Contributor

mpace965 left a comment

This pattern is a nice improvement, had one idea about the implementation

app/models/repository_host/repository_host_data.rb Outdated

Comment on lines 8 to 55

+              class RepositoryHost::RepositoryHostData
+                attr_reader :archived, :default_branch, :description, :fork, :full_name, :has_issues, :has_wiki, :homepage, :host_type,
+                            :keywords, :language, :license, :name, :owner, :parent, :is_private, :repository_uuid, :scm, :size
+                def initialize(
+                  archived:,
+                  default_branch:,
+                  description:,
+                  fork:,
+                  full_name:,
+                  has_issues:,
+                  has_wiki:,
+                  homepage:,
+                  host_type:,
+                  keywords:,
+                  language:,
+                  license:,
+                  name:,
+                  owner:,
+                  parent:,
+                  is_private:,
+                  repository_uuid:,
+                  scm:,
+                  size:,
+                  **kwargs
+                )
+                  @archived = archived
+                  @default_branch = default_branch
+                  @description = description
+                  @fork = fork
+                  @full_name = full_name
+                  @has_issues = has_issues
+                  @has_wiki = has_wiki
+                  @homepage = homepage
+                  @host_type = host_type
+                  @keywords = keywords
+                  @language = language
+                  @license = license
+                  @name = name
+                  @owner = owner
+                  @parent = parent
+                  @private = is_private
+                  @repository_uuid = repository_uuid
+                  @scm = scm
+                  @size = size
+                  raise "Unexpected arguments sent: #{kwargs.keys.join(', ')}" unless kwargs.keys.empty?
+                end

Contributor

mpace965 Mar 26, 2024

Suggested change

      
            class RepositoryHost::RepositoryHostData
          
              attr_reader :archived, :default_branch, :description, :fork, :full_name, :has_issues, :has_wiki, :homepage, :host_type,
          
                          :keywords, :language, :license, :name, :owner, :parent, :is_private, :repository_uuid, :scm, :size
          
              def initialize(
          
                archived:,
          
                default_branch:,
          
                description:,
          
                fork:,
          
                full_name:,
          
                has_issues:,
          
                has_wiki:,
          
                homepage:,
          
                host_type:,
          
                keywords:,
          
                language:,
          
                license:,
          
                name:,
          
                owner:,
          
                parent:,
          
                is_private:,
          
                repository_uuid:,
          
                scm:,
          
                size:,
          
                **kwargs
          
              )
          
                @archived = archived
          
                @default_branch = default_branch
          
                @description = description
          
                @fork = fork
          
                @full_name = full_name
          
                @has_issues = has_issues
          
                @has_wiki = has_wiki
          
                @homepage = homepage
          
                @host_type = host_type
          
                @keywords = keywords
          
                @language = language
          
                @license = license
          
                @name = name
          
                @owner = owner
          
                @parent = parent
          
                @private = is_private
          
                @repository_uuid = repository_uuid
          
                @scm = scm
          
                @size = size
          
                raise "Unexpected arguments sent: #{kwargs.keys.join(', ')}" unless kwargs.keys.empty?
          
              end
          
            RepositoryHost::RepositoryHostData = Struct.new(
          
              :archived, :default_branch, :description, :fork, :full_name, :has_issues, :has_wiki, :homepage, :host_type,
          
              :keywords, :language, :license, :name, :owner, :parent, :is_private, :repository_uuid, :scm, :size,
          
              keyword_init: true
          
            ) do

This could be DRYed up by making RepositoryHostData a Struct with keyword_init: true.

That approach would mean you couldn't use the **kwargs parameter, but I'm not sure you need it. Could the factories pass the keyword arguments directly instead of creating a hash that is later splatted out? This also has the benefit of making key mismatch errors for callers catchable via static analysis, whereas this approach defers that error to runtime.

Contributor Author

mikeyoung85 Mar 27, 2024

I wasn't sure what direction this class was going to end up going but I think a Struct makes sense here, actually a Data would be better but we are stuck on an older Ruby version for now. The splat was me being lazy and copying code around, but I can switch it over to keywords directly now that I'm more confident on what keys are expected to exist. I'll make these changes today.

Contributor

jhan217 Mar 27, 2024

🎱 I'm wondering if it makes sense to have the callers just pass in the raw data hash, and then have this object (or maybe a controller/interaction) figure out how to convert all the inputs into the appropriate data models to be persisted, vs having a bunch of factories/temp objects floating around. Mostly thinking that the factory classes aren't really doing a ton, or represent objects that will be reused a lot.

I'm thinking something like:

RawUpstreamDataToDatabaseModelConverter

def convert_gitlab_repository_raw_data(data_hash)
  repo_hash = {
      repository_uuid: data_hash.api_project.id,
      host_type: "GitLab",
      fork: data_hash.api_project.try(:forked_from_project).present?,
      scm: "git",
      is_private: api_project.visibility != "public",
    }

   create_the_thing(repo_hash)
end

def convert_github_repository_raw_data(data_hash);

def create_the_thing(data_hash_to_be_persisted)
      Repository.create!(repo_hash) # replace the Repository.create_from_hash
end

end

# some caller that gets the api data
github_api_data = Curl.get("https://www.github.com/foo/bar")

foo = RawUpstreamDataToDatabaseModelConverter.convert_gitlab_repository_raw_data(github_api_data) 
# you can play around with how much of that data should go into some sort of initializer in the converter, the caller, or the converter method

return foo

I think this has the advantage of keeping all the conversion logic in one place (easier to DRY up), no need to have intermediary temporary/facade objects, and a clean & consistent interface/pattern for the upstream to use (e.g. with error handling on bad input data). The single responsibility here could be something like "takes in garbage upstream stuff and turns it into our own objects."

Contributor

tiegz Mar 27, 2024

I like the single-factory idea from @jhan217 , additionally because it makes it easy to see the differences among all the sources in one place too.

app/models/repository_host/repository_host_data.rb Outdated Show resolved Hide resolved

mikeyoung85 added 4 commits

March 27, 2024 09:57


          use a class to pass around repository host data

a05f64d


          don't use private as var name

9cfed08


          remove some commented code

5d4dedc


          add spec for github api call

71337ed

mikeyoung85 force-pushed the my/use-class-for-repo-host-data branch from 2c35f1e to 71337ed Compare

March 27, 2024 13:57

jhan217 reviewed

View reviewed changes

app/models/repository_host/base.rb Outdated Show resolved Hide resolved

app/models/repository_host/repository_host_data.rb Outdated Show resolved Hide resolved

app/models/repository_host/repository_host_data.rb Outdated

Comment on lines 8 to 55

+              class RepositoryHost::RepositoryHostData
+                attr_reader :archived, :default_branch, :description, :fork, :full_name, :has_issues, :has_wiki, :homepage, :host_type,
+                            :keywords, :language, :license, :name, :owner, :parent, :is_private, :repository_uuid, :scm, :size
+                def initialize(
+                  archived:,
+                  default_branch:,
+                  description:,
+                  fork:,
+                  full_name:,
+                  has_issues:,
+                  has_wiki:,
+                  homepage:,
+                  host_type:,
+                  keywords:,
+                  language:,
+                  license:,
+                  name:,
+                  owner:,
+                  parent:,
+                  is_private:,
+                  repository_uuid:,
+                  scm:,
+                  size:,
+                  **kwargs
+                )
+                  @archived = archived
+                  @default_branch = default_branch
+                  @description = description
+                  @fork = fork
+                  @full_name = full_name
+                  @has_issues = has_issues
+                  @has_wiki = has_wiki
+                  @homepage = homepage
+                  @host_type = host_type
+                  @keywords = keywords
+                  @language = language
+                  @license = license
+                  @name = name
+                  @owner = owner
+                  @parent = parent
+                  @private = is_private
+                  @repository_uuid = repository_uuid
+                  @scm = scm
+                  @size = size
+                  raise "Unexpected arguments sent: #{kwargs.keys.join(', ')}" unless kwargs.keys.empty?
+                end

Contributor

jhan217 Mar 27, 2024

🎱 I'm wondering if it makes sense to have the callers just pass in the raw data hash, and then have this object (or maybe a controller/interaction) figure out how to convert all the inputs into the appropriate data models to be persisted, vs having a bunch of factories/temp objects floating around. Mostly thinking that the factory classes aren't really doing a ton, or represent objects that will be reused a lot.

I'm thinking something like:

RawUpstreamDataToDatabaseModelConverter

def convert_gitlab_repository_raw_data(data_hash)
  repo_hash = {
      repository_uuid: data_hash.api_project.id,
      host_type: "GitLab",
      fork: data_hash.api_project.try(:forked_from_project).present?,
      scm: "git",
      is_private: api_project.visibility != "public",
    }

   create_the_thing(repo_hash)
end

def convert_github_repository_raw_data(data_hash);

def create_the_thing(data_hash_to_be_persisted)
      Repository.create!(repo_hash) # replace the Repository.create_from_hash
end

end

# some caller that gets the api data
github_api_data = Curl.get("https://www.github.com/foo/bar")

foo = RawUpstreamDataToDatabaseModelConverter.convert_gitlab_repository_raw_data(github_api_data) 
# you can play around with how much of that data should go into some sort of initializer in the converter, the caller, or the converter method

return foo

I think this has the advantage of keeping all the conversion logic in one place (easier to DRY up), no need to have intermediary temporary/facade objects, and a clean & consistent interface/pattern for the upstream to use (e.g. with error handling on bad input data). The single responsibility here could be something like "takes in garbage upstream stuff and turns it into our own objects."

tiegz reviewed

View reviewed changes

app/models/repository.rb Outdated Show resolved Hide resolved

app/models/repository_host/repository_host_data.rb Outdated

Comment on lines 8 to 55

+              class RepositoryHost::RepositoryHostData
+                attr_reader :archived, :default_branch, :description, :fork, :full_name, :has_issues, :has_wiki, :homepage, :host_type,
+                            :keywords, :language, :license, :name, :owner, :parent, :is_private, :repository_uuid, :scm, :size
+                def initialize(
+                  archived:,
+                  default_branch:,
+                  description:,
+                  fork:,
+                  full_name:,
+                  has_issues:,
+                  has_wiki:,
+                  homepage:,
+                  host_type:,
+                  keywords:,
+                  language:,
+                  license:,
+                  name:,
+                  owner:,
+                  parent:,
+                  is_private:,
+                  repository_uuid:,
+                  scm:,
+                  size:,
+                  **kwargs
+                )
+                  @archived = archived
+                  @default_branch = default_branch
+                  @description = description
+                  @fork = fork
+                  @full_name = full_name
+                  @has_issues = has_issues
+                  @has_wiki = has_wiki
+                  @homepage = homepage
+                  @host_type = host_type
+                  @keywords = keywords
+                  @language = language
+                  @license = license
+                  @name = name
+                  @owner = owner
+                  @parent = parent
+                  @private = is_private
+                  @repository_uuid = repository_uuid
+                  @scm = scm
+                  @size = size
+                  raise "Unexpected arguments sent: #{kwargs.keys.join(', ')}" unless kwargs.keys.empty?
+                end

Contributor

tiegz Mar 27, 2024

I like the single-factory idea from @jhan217 , additionally because it makes it easy to see the differences among all the sources in one place too.

mikeyoung85 added 5 commits

March 27, 2024 11:07


          change to struct and some fixes

6c312c2


          rename to create_from_data

b58405d


          rename to RawUpstreamData

6218ca5


          move conversions into single factory

5f85a01


          add method to standardize full_name.downcase

79ea005

Contributor Author

mikeyoung85 commented Mar 27, 2024 •

edited

changed from class to Struct with some appended method
updated name to RepositoryHost::RawUpstreamData
moved the different "factories" into a single converter
updated method name to create_from_data and updated some callers I didn't see before
added convenience method for full_name.downcase

mikeyoung85 requested review from tiegz, mpace965 and jhan217

March 27, 2024 15:34

mikeyoung85 added 2 commits

March 27, 2024 11:44


          fix one more rename

f37b865


          call method for var instead treating like hash

7a6f86b

tiegz approved these changes

View reviewed changes

jhan217 reviewed

View reviewed changes

spec/models/repository_host/gitlab_spec.rb

+                it "can fetch repository data" do
+                  VCR.use_cassette("gitlab/ase") do
+                    repository_data = described_class.fetch_repo(repository.id_or_name, api_token)
+                    expect(repository_data).not_to be_nil

Contributor

jhan217 Mar 27, 2024

Should these spex do more? e.g. expect the repository_data object to have successfully extracted specific elements / keys with the correct values, and that it's not just getting all the keys and populating them with nils or data from the wrong fields?

app/models/concerns/github_identity.rb Outdated Show resolved Hide resolved

app/models/concerns/github_identity.rb Outdated Show resolved Hide resolved

app/models/repository_host/raw_upstream_data_converter.rb

+              # frozen_string_literal: true
+              class RepositoryHost::RawUpstreamDataConverter
+                def self.convert_from_github_api(upstream_repository_data_hash)

Contributor

jhan217 Mar 27, 2024

Should the error handling get standardized in here too? Saw a few rescue *self::IGNORABLE_EXCEPTIONS nil in the code.

Alternatively, now that this is written out, it looks like there's some pretty tight coupling w/ the api return objects, so the convert_ methods could be moved to their respective methods in the repository_host/.rb files, as there doesn't seem to be a ton that's similar between the convert methods besides the object that's tying everything together, unless stuff like error handling is getting standardized. I was thinking there'd be more similarities between the objects being converted that there could be more extracted out and shared during the conversion process, like defaulting values/data validations/logging/etc.


          use lower name and fix class rename

d60d572

mikeyoung85 merged commit 030365d into main

3 checks passed

mikeyoung85 deleted the my/use-class-for-repo-host-data branch

March 27, 2024 17:36

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment