-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
./pants
does not work out of the box - no requests-sessions
#1
Comments
./pants
does not work out of the box - no requests-sessions
Hello, John, how's things : ) Sooo...I had this whole thing where I was asking you what OS you were running and all that....but I see the issue. We thank you in our bootstrap script because that script is basically identical to your pants setup script you made. Since you already have ran that script for 0065, you already have a 0065 virtualenv in your home directory so it never installs the FOURSQUARE_REQUIREMENTS. I will push an update to the bootstrap scripts soon but the fix is generally to just differentiate upstream pants virtualenv from the fsqio one. A random contributor to Fsqio (patrick) feels pretty strongly we should keep the virtualenv under the repo root, I guess this is an argument in his favor : ) On the plus side, you inspired me to go out and test this on some Linux images. I will update the README but you will need |
I think we have the fix for this pushed...thanks for the report! |
This was a colloboration with @abe-winter-4s, who debugged some tricky failure case that ended up being due to the caching process starting before the copy to results_dir had completed. This commit finishes the migration of all our custom codegen back onto the upstream base class, and leaves the artifact cache and synthetic target construction to be identical to every target. This is has been a long-standing problem for us, and the effected areas are discussed below, but there is a cost for the correctness: Regenerating all of our ~500 targets jumps from 30s to about 3 mins That is super rarely required. In most cases, folks will be rarely require more than double digit thrift targets at once. This should safely be a perf win across the board, when incremental compile and invalidation correctness are factored in. But fyi. Ivy and javac correctness/speed ------------------------------- This has always been an issue and has bitten us regularly. But now that we are experimenting with Ivy resolver, this has the potential to be very disruptive, because we are currently using the cache_key to keep the produce artifacts of each resolver separate from the other. During Pants upgrades we see things like missing analysis files and ClassNotFound errors, which are generally rooted in the nonexistant spindle namespacing. As we move forwad, I intend to more thoroughly convert to the Ivy backend in order to leverage incremental compile support. One of the biggest wins on the table there is allowing CI-constructed targets to serve as the basis of incremental compilation. But that cannot be relied upon when this munging of output is live, because the things that the incremental compiler is at risk of doing poorly is in the same scope and therefore cannot be fairly evaluated with this unfixed. The speed of Ivy-resolved compiles + warm incremental cache has been oberved as on the order of 4x what we see in an equivalent pom-resolve and no incremental run. So this is not academic benefit - getting a warm cache from our CIs would be one of the biggest wins available. Pants and Codegen correctness/speed ----------------------------------- TLDR: This class was coupled to the conventions of Pants 0.40 instead of modern day Pants. This meant that things that were supposed to safely create their own namespaces (due to Task version, cache_key value, changes in Pants/Zinc javac analysis files, JDK version ,etc, etc, xyz) were not respected. This included the artifacts being served by the cache. We could clean and bump the cache_key, but compile nondeterminism would return because the cache_key of the generated scala stayed the same. This is a bit of a compromise, in that this is not a pure implementer of SimpleCodegen. As Pants evolved, it became clear that the need for isolation and its correctness trumped the need for speed in almost every case. Before that was fully accepted, codegen tasks were often 1-many as a perf hack. As folks wearied of non-determinism, almost every codegen task has moved away from the "1 -> many" model and onto "1 -> 1". My first pass at this change enforced that on Spindle - but the perf hit was severe to the point of near-hilarity. Spindle was just not written with isolation in mind, and to that matter, neither was Thrift. Passing a single target to Spindle still requires gathering all the scala_record_library targets it depends on, and the code generator will have to parse and regenerate all of them, no matter how recently it just did identical work. Spindle also maintains its own cache, a scalate directory that holds class files generated from the ssps and internal state. So isolating the ~500 spindle targets meant that those files were regenerated ~500 times. The compromise here is to treat the isolation of each Pants invocation as a sizable enough win. Each invocation that requires executing Spindle creates a scalate and output directory in a temp dir. Each target is generated from that location, and a heuristic is used to gather any files that correspond to the declared namespaces found in the target's direct source files. Intellij Stubs ------------- This refactor now allows the stubs to be included in the artifact cache. It now only regenerates the files that have actually changed, and will repopulate the output path should any expected files go missing. We could consider turning this on for global runs now - it would be served by the cache and silently update to whereever you are checked out to. The only reason not to turn it on universally is that people who regularly change thrift files would have to wait through those same changes to regenerate the spindle and the stubs. Fully regenerating both sets takes about 4.5 mins, if there are no cache hits. Before this change, each invocation of ide-gen took about 30s, no matter the file state and with no cache support. After this lands, it will generally noop. If a thrift file changes, it will only regenerate the required file's worth - a single thrift file takes about 5s, but the first target is the slowest. A full rebuild of every stub now takes 60s or so, which is the pathological case but the one we are worried about. For now, I have turned on stub generation for the CIs, so generally speaking users should be able to download them with a manual Pants run. If you run Pants from inside Intellij, you should add the --no-ide-gen-spindle-stubs-skip to your invocation, which will always keep those stubs up to date. To continue to generate on demand, run: ./pants ide-gen --no-skip src/thrift:: and checkout your scripts/Intellij/spindle-stubs/scala_record directory. Closes: #1 * Fix the --skip option. I have removed the custom product, so this task will now be scheduled everytime regular SpindleGen is run. But it is set to short circuit by default, it runs only when passing ./pants ide-gen --no-skip <targets> or ./pants <goal> --no-ide-gen-spindle-stubs-skip <targets> I did it this way so that the CI would populate the cache and so that Intellij users could add that to their IDE run command and get the stubs populated on as as-needed basis in the background. The shortcircuit doesn't happen until after invalidation so that the CI's tiny artifacts still get pulled down * fixlint: print path and line * spindle size checks & PrintWriter.checkError * unicode fix for linter * bump spindlegen version to trigger build issues on CI * bump SpindleGen version again to force rebuild * bump spindle version to force build * experiment: disable caching in CI to fix gen-spindle * restore jenkins caching * bump version again to inspect jenkins local cache * bump version (again) to inspect .pants.d/gen in CI * reorder copy2 & vt.update to solve incomplete artifacts * remove size_checks (not needed w/out nondeterminism) * bump version (last time) to lucky number 13 * rm extra imports * Bump the global cache key and final polish on spindle changes. * add buildgen (sapling split of dbb275e02839447ce1796ab9828dcc2ef50f9628)
Looks like so on a fresh clone:
So the build script either needs to bootstrap a venv with
3rdparty/python/requirements.txt
or else instructions need to be added to the README to do some out of band manual setup.The text was updated successfully, but these errors were encountered: