Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark submodules as vendored by default #164

Merged
merged 3 commits into from
Sep 5, 2023

Conversation

spenserblack
Copy link
Owner

Resolves #163

@spenserblack spenserblack linked an issue Sep 5, 2023 that may be closed by this pull request
@spenserblack
Copy link
Owner Author

@Byron FYI

@spenserblack
Copy link
Owner Author

Yuck, I actually don't like this 😆 It really makes Gengo single-use (although, realistically, who would run Gengo::analyze multiple times on the same repo?). Although the code becomes a bit redundant, perhaps this should be moved to the builder instead, to initialize the additional vendored globs.

@codecov
Copy link

codecov bot commented Sep 5, 2023

Codecov Report

Merging #164 (14f43ec) into main (48db652) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #164      +/-   ##
==========================================
+ Coverage   82.33%   82.36%   +0.03%     
==========================================
  Files          14       14              
  Lines         566      567       +1     
==========================================
+ Hits          466      467       +1     
  Misses        100      100              
Flag Coverage Δ
82.36% <100.00%> (+0.03%) ⬆️
macOS-latest 82.30% <100.00%> (+0.03%) ⬆️
ubuntu-latest 79.58% <100.00%> (+0.03%) ⬆️
windows-latest ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
gengo/src/lib.rs 81.29% <100.00%> (+0.13%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@spenserblack
Copy link
Owner Author

@Byron We discussed it a bit (#157 (comment)), but could you explain how attributes would get resolved for files in submodules again?

Running on my dotfiles repo (non-bare clone, after running git submodule update --init), gengo excludes cowfiles/ (Perl code) by default, and includes it with gengo --all. However, cowfiles/pug.cow gengo-detectable in the root .gitattributes doesn't seem to work. *.cow gengo-detectable in the submodule's .gitattributes works, however. I thought #157 (comment) meant that the root .gitattributes could affect submodules, or did I misinterpret that comment?

@spenserblack spenserblack enabled auto-merge (squash) September 5, 2023 14:29
@Byron
Copy link
Contributor

Byron commented Sep 5, 2023

I thought #157 (comment) meant that the root .gitattributes could affect submodules, or did I misinterpret that comment?

That's great, because it's working as expected right now. The comment meant that if you wanted superproject attributes to affect submodule paths, you could make it happen. All that needs to be done is to query attributes on the complete path using the superproject's repo, instead of the submodule-relative path in the submodule-repo. However, I don't think how useful that would really be.

@spenserblack
Copy link
Owner Author

spenserblack commented Sep 5, 2023

Oh, OK, got it, that makes sense. Sorry, misinterpreted the comment and thought that it was already builtin.

You make a good point about having an "escape hatch," but submodules are a special case and I don't think .gitattributes are the way to go. As far as gengo-bin goes, I'm fine with submodules effectively being "vendored no matter what," since IMO a submodule is its own repo with its own distribution. And it's kind of similar to linguist this way, which AFAIK would never include submodules in its distribution.

We can discuss what this "escape hatch" could be. I think the best way would be to see how Onefetch could use this escape hatch. Just some ideas off the top of my head:

  • Entry could have a submodule field to go with vendored
  • vendored could be an enum (No, Submodule, Yes)
  • analyze_with(opts), with one of the options being vendor_submodules: true|false

Edit: I'm leaning towards vendored being an enum

@spenserblack spenserblack merged commit 748cd7e into main Sep 5, 2023
11 checks passed
@spenserblack spenserblack deleted the chore/163/vendor-submodules branch September 5, 2023 14:44
@Byron
Copy link
Contributor

Byron commented Sep 5, 2023

I think my explanation makes it sounds more complicated as it has to be. What I meant with 'using .gitattributesis that in the superproject, there ispath/to/submodule` - this path can be run against the superproject's attributes to get additional configuration. This could lead to a quick-path where all blobs in the submodule are considered vendorered, without actually decoding them, or it could lead to the path that was originally implemented where submodules where just like any other repo, with its own attributes and all.

So if the intention is to vendor all information in a submodule, I really think there should be a fast-path that sums everything up using Repository::find_header().

@spenserblack
Copy link
Owner Author

🤔 I'll have to look into that, but if by "decoding" you mean getting the file contents, we still will need to do that -- the first line is needed to check shebangs, and more for potential heuristic checks. Even if the files are vendored, gengo tries to detect their language.

@Byron
Copy link
Contributor

Byron commented Sep 5, 2023

Ah, then please ignore the comment. No fast-path possible :D.

I definitely wasn't aware it counts vendored code in addition.

Something I don't understand though is how this works at all:

❯ time gengo
  0.04% 34104   GitHub Workflow
 97.17% 73649526 Rust
  0.47% 354914  JavaScript
  0.50% 381896  Shell
  0.36% 273951  Python
  0.20% 150819  C++
  0.10% 78427   Docker
  0.06% 41934   C
  0.09% 67818   CSS
  0.11% 86982   PowerShell
  0.35% 262194  HTML
  0.26% 200420  TypeScript
  0.00% 389     Assembly
  0.27% 207385  Makefile
  0.00% 640     CMake
gengo  6.69s user 0.39s system 786% cpu 0.899 total

The above is the Rust repo, it has a lot of submodules and by the looks of it, gengo now only counts the toplevel repository. However, it takes as long as it did when it was gathering details on all submodules, even though they don't show up at all.

For comparison, this is how it looked like when submodules where just counted as part of the codebase:

❯ gengo
  0.01% 111768  GitHub Workflow
 11.11% 93530217 Rust
  0.07% 610736  JavaScript
  0.08% 684455  Shell
  1.39% 11724344 Python
 44.93% 378243234 C++
  0.01% 111795  Docker
 27.07% 227890032 C
  0.02% 175182  CSS
  0.01% 86982   PowerShell
  1.46% 12303662 HTML
  0.03% 232882  TypeScript
 12.45% 104784462 Assembly
  0.03% 293374  Makefile
  0.38% 3210646 CMake
  0.01% 60485   Emacs Lisp
  0.84% 7082722 Fortran Modern
  0.00% 30811   C#
  0.03% 241110  Perl
  0.00% 9815    FORTRAN Legacy
  0.01% 49676   Julia
  0.00% 11933   Lua
  0.00% 334     D
  0.04% 327689  OCaml
  0.00% 41237   Jupyter Notebook
  0.00% 20883   Vim Script

@spenserblack
Copy link
Owner Author

spenserblack commented Sep 5, 2023

Using gengo --all should add submodules back in. There could perhaps be some additional optimization by skipping submodules when unless --all is used, but I'd like to save that for later. But the library will analyze all files, and gengo (without --all) will compile that into a summary, excluding non-detectable files.

Edit: typo 🤦

Excuse me if I don't make the most sense today 😆

@Byron
Copy link
Contributor

Byron commented Sep 5, 2023

That works, thanks for the hint! And I am definitely looking forward to it not actually processing submodules unless --all is provided as well, and agree that it can wait as onefetch probably won't be using that either.

@spenserblack spenserblack mentioned this pull request Sep 5, 2023
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle submodules
2 participants